A PowerEdge 2950 II running VMware ESXi, 6.0.0, 5050593 Image Profile (Updated) ESXi-6.0.0-20170202001-standard has been running without issue for quite some time, and the underlying hardware has had no issues for several years. Recently, an Intel 350T2V2 NIC was installed and configured for use, then a Dell SAS 6 GB HBA External Controller Card 7RJDT was installed. Neither installation had a negative impact on system stability.
Next, upon replacing four (4) Crucial 4GB 240 Pin 512Mx72 DDR2 PC2-5300 CL5 ECC DIMMs with eight (8) A-TECH 8G DDR2 PC2-5300 ECC FULLY BUFFERED DIMMs, the BIOS memory check passed, but seemed to proceed very (very) slowly. ESXi started to boot, but took an extraordinarily (very) long time at the /sb.v00 and /s.v00 steps of the "Loading VMware Hypervisor" stages. Eventually, and a (very) long time later, a message appeared stating "Relocating modules and starting up the kernel...". Again, a significant amount of time transpired. Then, the screen blacked out and this:
VMB: 398: Unexpected exception 2 @0x41800e06957e
VMB: 405: cr0 0x8001003d cr2 0x0 cr3 0x100803000 cr4 0x30
VMB: 407: error code 0x2 rip 0x41800001eee0 cs 0x8
VMB: 409: rflags 0x86 rsp 0x42800001eee0 ss 0x0
VMB: 411: rax 0x12345678 rcx 0x101ffff rdx 0xffff4c000
VMB: 413: rbx 0x0 rbp 0x0 rsi 0x1000
VMB: 415: rdi 0xffff81100004c000 r8 0x2 r9 0x23
VMB: 417: r10 0x8000000000000003 r11 0x0 r12 0xffff4c
VMB: 419: r13 0x420000045221 r14 0xd r15 0x0
VMB: 420: gs 0x10 fs 0x10
VMB: 422: FSbase:0x0 GSase:0x417rce236200 kernelGSbase:0x0
VMB: 139: [0x42800001eee0] 0x41800e06957e
VMB: 139: [0x42800001ef00] 0x41800e06a0ad
VMB: 139: [0x42800001ef900] 0x41800e814c24
VMB: 139: [0x42800001efc0] 0x41800e000fb8
VMB: 85: Halting.
At the same time, the PowerEdge 2950 front panel LCD switch from blue to amber and reported:
E1420 CPU BUS PERR
At this point the system is dead and must be powered off.
The RAC System Event Log shows entries like:
Entry 007 of 007
Severity: Non-Recoverable
Date and Time: Wed May 10 13:48:12 2017
Description:
CPU Bus PERR: Processor sensor, transition to
non-recoverable was asserted.
Dell forums show a flurry of PowerEdge 1950/2950 CPU Bus PERR reports in the Apr-May 2008 time frame, but no conclusive resolutions were spotted, though it seemed apparent Dell acknowledged an issue at some point and RHEL issued a related OS patch at some point. Xeon E5xxx processors were mentioned and this one has Xeon E5345 CPUs. Various posts seemed to suggest the issue might be related to virtualization.
Various BIOS setting changes have been tested per a number of Dell / VMware forum posts to no avail.
The system successfully boots a CentOS 7 1503 Live KDE 64-bit and CentOS 6.5 Live KDE 32-bit DVDs, though one gets an impression that possibly the system is running a slow.
One is led to suspect the new DIMMs triggered this situation, but it seems over hasty to remove a 64GB upgrade and return to a 16GB configuration since 16GB RAM is not going to support VMs planned for this system. To this end, research continues.