  | | | MCA error | MCA error 2007-07-26 - By Ivan Marinkovic
Back Hi,
yesterday we had a system crashed, and looking on logs, the problem seems to be the following:
Jul 25 17:49:15 acacia kernel: mca: CPU 0 SAL log contains MCA error record
looking in /var/log/salinfo/raw, I get the following data:
[root@(protected) raw]# salinfo_decode acacia-2007 (See http://cia-2007.ora-code.com)-07-25-17\:46\:18-cpu0-mca0 oemdata_fd[0](6) oemdata_fd[1](5) BEGIN HARDWARE ERROR STATE from acacia-2007 (See http://cia-2007.ora-code.com)-07-25-17:46:18-cpu0-mca0 Err Record ID: 3 SAL Rev: 0.02 Time: 2007-07-25 17:46:18 Severity 0 Processor Device Error Info Section UNCORRECTED PROCESSOR ERROR: Bus Check processor lid : 0x0000000000000000 cpu: A nasid: 0x0 processor state parameter: 0x20000000fff21120 rz [2]=0 rendezvous request unsuccessful ra [3]=0 rendezvous was not attempted mn [5]=1 min state registered with PAL sy [6]=0 storage integrity not synchronized co [7]=0 not continuable ci [8]=1 machine check is isolated mi [12]=1 more info available pi [13]=0 ip logged is not precise pm [14]=0 min state is not precise dy [15]=0 processor dynamic state is not valid rs [17]=1 rse is valid cm [18]=0 fault has not been corrected cr [20]=1 control registers are valid pc [21]=1 performance counters are valid dr [22]=1 debug registers are valid tr [23]=1 translation registers are valid rr [24]=1 region registers are valid ar [25]=1 application registers are valid br [26]=1 branch registers are valid pr [27]=1 predicate registers are valid fp [28]=1 floating point registers are valid b1 [29]=1 bank one general registers are valid b0 [30]=1 bank zero general registers are valid gr [31]=1 general registers are valid bc [61]=1 bus check PAL recovery status: error was isolated and contained, continuable if sw can recover processor error map : 0x0000000001000000 processor code id: 0 logical thread id: 0 processor bus level 1 error BUS Check Info [0] Transaction size: 1, External Bus Error:, Type: 10 (I/O space write), Severity: 0, Hierarchy: 0, Status information: 3 (Hard fail) target identifier : 0x0003fffffc0f23c8 CPUID Regs: 0x49656e69756e6547 0x6c65746e 0 0x1f020104
Processor static data: xip : 0xe00000000480b380 xfs : 0x0000000000000000 xpsr : 0x0000121008026018 [5:0]=24 User mask be [1]=0 little endian up [2]=0 user performance monitor disabled ac [3]=1 alignment check enabled mfl [4]=1 lower (f2 .. f31) floating-point registers written mfh [5]=0 upper (f32 .. f127) floating-point registers not written [23:0]=155672 System mask ic [13]=1 interrupt collection enabled i [14]=1 interrupts enabled pk [15]=0 protection key disabled dt [17]=1 data address translation enabled dfl [18]=0 disabled floating-point low register not set dfh [19]=0 disabled floating-point high register not set sp [20]=0 secure performance monitor disabled pp [21]=0 privileged performance monitor disabled di [22]=0 disable instruction set transition not set si [23]=0 secure interval timer disabled db [24]=0 debug breakpoint fault disabled lp [25]=0 lower privilege transfer trap disabled tb [26]=0 taken branch trap disabled rt [27]=1 register stack translation enabled cpl [33:32]=0 current privilege level is [34]=0 IA64 instruction set mc [35]=0 machine check abort enabled it [36]=1 instruction address translation enabled id [37]=0 instruction debug fault enabled da [38]=0 enable data access and dirty-bit faults dd [39]=0 data debug fault enabled ss [40]=0 single step disabled ri [42:41]=1 restart instruction ed [43]=0 exception deferral disabled bn [44]=1 bank 1 ia [45]=0 instruction access-bit faults enabled iip : 0xe0000000046d21e0 iipa : 0xe0000000046d21e0 ipsr : 0x0000121008026018 [5:0]=24 User mask be [1]=0 little endian up [2]=0 user performance monitor disabled ac [3]=1 alignment check enabled mfl [4]=1 lower (f2 .. f31) floating-point registers written mfh [5]=0 upper (f32 .. f127) floating-point registers not written [23:0]=155672 System mask ic [13]=1 interrupt collection enabled i [14]=1 interrupts enabled pk [15]=0 protection key disabled dt [17]=1 data address translation enabled dfl [18]=0 disabled floating-point low register not set dfh [19]=0 disabled floating-point high register not set sp [20]=0 secure performance monitor disabled pp [21]=0 privileged performance monitor disabled di [22]=0 disable instruction set transition not set si [23]=0 secure interval timer disabled db [24]=0 debug breakpoint fault disabled lp [25]=0 lower privilege transfer trap disabled tb [26]=0 taken branch trap disabled rt [27]=1 register stack translation enabled cpl [33:32]=0 current privilege level is [34]=0 IA64 instruction set mc [35]=0 machine check abort enabled it [36]=1 instruction address translation enabled id [37]=0 instruction debug fault enabled da [38]=0 enable data access and dirty-bit faults dd [39]=0 data debug fault enabled ss [40]=0 single step disabled ri [42:41]=1 restart instruction ed [43]=0 exception deferral disabled bn [44]=1 bank 1 ia [45]=0 instruction access-bit faults enabled isr : 0x00000a0200000000 [15:0]=0 Code [23:16]=0 Vector w [33]=1 write exception ei [42:41]=1 excepting instruction ed [43]=1 exception deferal pr : 0xa4009650a69a6a29 p0, p3, p5, p9, p11, p13-14, p17, p19-20, p23, p25-26, p29, p31, p36, p38, p41-42, p44, p47, p58, p61, p63 cr0 (dcr) : 0x0000000000007e04 cr1 (itm) : 0x0000017c35918c0b cr2 (iva) : 0xe000000004400000 cr8 (pta) : 0x1ffc0000000000c9 cr16 (ipsr) : 0x0000121008026018 cr17 (isr) : 0x00000a0200000000 cr19 (iip) : 0xe0000000046d21e0 cr20 (ifa) : 0xc003fffffc0f23c8 cr21 (itir) : 0x0000000000000660 cr22 (iipa) : 0xe0000000046d21e0 cr23 (ifs) : 0x8000000000000389 cr24 (iim) : 0x0000000000045000 cr25 (iha) : 0xbffc0000000002b0 cr64 (lid) : 0x0000000000000000 cr66 (tpr) : 0x0000000000000000 cr68 (irr0) : 0x0480000000000000 cr69 (irr1) : 0x0000000000000000 cr70 (irr2) : 0x0000000000000000 cr71 (irr3) : 0x0000800000000000 cr72 (itv) : 0x00000000000000ef cr73 (pmv) : 0x00000000000000ee cr74 (cmcv) : 0x000000000000001f cr80 (lrr0) : 0x0000000000010000 cr81 (lrr1) : 0x0000000000010000 ar16 (rsc) : 0x0000000000000003 ar17 (bsp) : 0xe00000010f0893a8 ar18 (bspstore) : 0xe00000010f089150 ar19 (rnat) : 0x0000000000000000 ar32 (ccv) : 0x0000000000000001 ar36 (unat) : 0x0000000000000000 ar40 (fpsr) : 0x0009804c0270033f ar64 (pfs) : 0x0000000000000389 ar65 (lc) : 0x0000000000000000 ar66 (ec) : 0x0000000000000000 r0 : 0x0000000000000000 0xe000000004cbbd00 0x0000000000000001 0xe000000004b90760 r4 : 0x60000fffffffaf00 0x20000000003cd378 0x20000000003cdd40 0x0000000000000000 r8 : 0x00000000000000f2 0x0000000000000fff 0xe000000004b90758 0x0000000000000000 r12: 0xe00000010f08fc30 0xe00000010f088000 0x0000000000000006 0xe000000004998060 bk0 r16: 0xc003fffffc0f23c8 0x0010000000000661 0x0000000000000010 0x0013fffffc0f2671 bk0 r20: 0x00000a0200000000 0x00001a1008026018 0x0000000000000000 0x0000000000000000 bk0 r24: 0x0000000000000000 0x0000000000000000 0xc000000000000288 0x000000000000000f bk0 r28: 0x2000000000307180 0x00001213085a6010 0x0000000000000000 0xa4009650a69a6a29 bk1 r16: 0xe000000004b90758 0x0000000000000000 0xc003fffffc000000 0xe000000004998068 bk1 r20: 0xe00000000499c048 0xe000000004b6fac0 0xe000000004b6fa60 0xe0000000049ef990 bk1 r24: 0xe000000004d75568 0xe000000004adef80 0xe000000004b7d670 0xe000000004b89ab8 bk1 r28: 0x60000fffffffaba0 0x0000000000000001 0x60000fffffffaa90 0xffffffffffffabf7 b0 : 0xe0000000046d21d0 0x400000000031c8e0 0x4000000000129080 0x0000000000000000 b4 : 0x0000000000000000 0x0000000000000000 0xe000000004402f70 0xe00000000480b300 k0 : 0x2000000000000000 0x5eb000ebc0004000 0x0000000000000000 0x0000000000000000 k4 : 0x000000000000010f 0xe0000003fcb48000 0x000000010f088000 0x000000010f080000 rr0 : 0x00000000009eb839 0x00000000009eb839 0x00000000009eb839 0x00000000009eb839 rr4 : 0x00000000009eb839 0x00000000009eb839 0x00000000009eb839 0x00000000009eb839 Platform Specific Error Info Section Platform Specific Error Detail Platform PCI Bus Error Info Section PCI Bus Error Detail Error Status: 0x121900 Error Type: 0x4 Bus ID: 0xe0 Bus Address: 0xf4040064 Requestor ID: 0xfed2e000 Target ID: 0xf4040064 OEM Specific Data 0x0000 2e 12 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0010 98 00 00 00 00 00 00 00 82 44 e6 9f 2d a0 f7 4e 0x0020 ad e6 c6 63 59 62 53 99 00 00 00 00 00 00 07 00 0x0030 1c 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x0050 00 00 00 00 00 00 00 00 64 00 04 f4 00 00 00 00 0x0060 50 1d 00 00 00 00 00 00 48 00 00 00 00 00 00 00 0x0070 3c 10 2e 12 46 01 b0 22 02 00 20 00 37 02 00 0f 0x0080 00 00 00 00 00 00 00 00 07 00 01 00 00 ff 13 00 0x0090 03 24 03 00 1b 3f 02 00 48 00 00 00 00 00 00 00 0x00a0 60 94 c5 3a 2e 75 aa a1 00 00 00 00 00 00 00 00 END HARDWARE ERROR STATE from acacia-2007 (See http://cia-2007.ora-code.com)-07-25-17:46:18-cpu0-mca0
Can someone give me the directions how I can determine the cause of the error?
Machine data: HP Integrity rx2620 Itanium 2 1.6 GHz, 16 GB RAM
Red Hat Enterprise Linux ES release 3 (Taroon Update 5) Linux acacia 2.4.21-32.EL #1 SMP Fri Apr 15 21:02:52 EDT 2005 ia64 ia64 ia64 GNU/Linux
[root@(protected) raw]# lspci 00:01.0 USB Controller: NEC Corporation USB (rev 41) 00:01.1 USB Controller: NEC Corporation USB (rev 41) 00:01.2 USB Controller: NEC Corporation USB 2.0 (rev 02) 00:02.0 IDE interface: Silicon Image, Inc. SiI 0649 Ultra ATA/100 PCI to ATA Host Controller (rev 02) 00:1c.0 Host bridge: Hewlett-Packard Company zx1 Local Bus Adapter (rev 32) 00:1d.0 Bridge: Hewlett-Packard Company zx1 I/O Controller (rev 23) 00:1e.0 Bridge: Hewlett-Packard Company zx1 System Bus Adapter (rev 23) 20:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08) 20:01.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08) 20:02.0 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) 20:02.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) 20:1e.0 Host bridge: Hewlett-Packard Company zx1 Local Bus Adapter (rev 32) 40:01.0 PCI bridge: IBM PCI-X to PCI-X Bridge (rev 03) 40:1e.0 Host bridge: Hewlett-Packard Company zx1 Local Bus Adapter (rev 32) 41:04.0 RAID bus controller: Compaq Computer Corporation Smart Array 64xx (rev 01) 60:1e.0 Host bridge: Hewlett-Packard Company zx1 Local Bus Adapter (rev 32) 80:01.0 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) 80:01.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03) 80:1e.0 Host bridge: Hewlett-Packard Company zx1 Local Bus Adapter (rev 32) c0:1e.0 Host bridge: Hewlett-Packard Company zx1 Local Bus Adapter (rev 32) e0:01.0 Communication controller: Hewlett-Packard Company Auxiliary Diva Serial Port (rev 01) e0:01.1 Serial controller: Hewlett-Packard Company Diva Serial [GSP] Multiport UART (rev 03) e0:02.0 VGA compatible controller: ATI Technologies Inc Radeon RV100 QY [Radeon 7000/VE] e0:1e.0 Host bridge: Hewlett-Packard Company zx1 Local Bus Adapter (rev 32) [root@(protected) raw]#
Thanx a lot.
Ivan Marinkovic Chile.
-- Taroon-list mailing list Taroon-list@(protected) https://www.redhat.com/mailman/listinfo/taroon-list
|
|
 |