Pages

Machine check events

X86  CPUs report errors detected by the CPU as machine check events (MCEs).  These can be data corruption detected in the CPU caches, in main memory by an integrated memory controller, data transfer errors on the front side bus or CPU interconnect or other internal errors.  Possible causes can be cosmic radiation, in stable  power supplies, cooling problems, broken hardware, or bad luck.Most errors can be corrected by the CPU by internal error correction mechanisms. Uncorrected errors cause machine check exceptions which may panic the machine.When  a  corrected  error  happens  the  x86  kernel  writes  a record describing the MCE into a internal ring buffer available through the /dev/mcelog device mce log retrieves errors from /dev/mcelog, decodes them into a human readable format and prints them on the standard output or optionally into the system log.

You will see following messages when there is a mce event


Aug 20 17:59:28 hostname kernel: Machine check events logged
Aug 20 18:04:28 hostname kernel: Machine check events logged

This log message indicates that, Machine Check Events have been detected and are available for processing in /dev/mcelog. This logs are then redirected to /var/log/mcelog by the /etc/cron.hourly/mcelog.cron cronjob. The log can be also checked by running the following command

#mcelog




No comments:

Post a Comment