X86 CPUs report errors detected by the CPU as machine check events (MCEs). These can be data corruption detected in the CPU caches, in main memory by an integrated memory controller, data transfer errors on the front side bus or CPU interconnect or other internal errors. Possible causes can be cosmic radiation, in stable power supplies, cooling problems, broken hardware, or bad luck.Most errors can be corrected by the CPU by internal error correction mechanisms. Uncorrected errors cause machine check exceptions which may panic the machine.When a corrected error happens the x86 kernel writes a record describing the MCE into a internal ring buffer available through the /dev/mcelog device mce log retrieves errors from /dev/mcelog, decodes them into a human readable format and prints them on the standard output or optionally into the system log.
You will see following messages when there is a mce event
Aug 20 17:59:28 hostname kernel: Machine check events logged
Aug 20 18:04:28 hostname kernel: Machine check events logged
This log message indicates that, Machine Check Events have been detected and are available for processing in /dev/mcelog. This logs are then redirected to /var/log/mcelog by the /etc/cron.hourly/mcelog.cron cronjob. The log can be also checked by running the following command
No comments:
Post a Comment