0

PSOD Blues (when drivers crash) part 2

The evening before the Boston VMUG conference my friends and I were in our hotel winding down with some pizza and TV, discussing which breakout sessions we were going to attend the next day. At 10:50PM my phone lit up “VMware vCenter – Alarm alarm.HAhostStatus.” I immediately fired up my VPN connection and launched iLO to see another Purple Screen Of Death.

PSODlpfc

Damn! Exception 14 again. That’s two hosts within a month! I extracted the logs as described in part 1 and found the following.

2014-06-24T02:49:00.449Z cpu17:33512)@BlueScreen: #PF Exception 14 in world 33512:lpfc_do_work IP 0x4180057c62c6 addr 0x747369
PTEs:0x13e5d1027;0x13e2e9027;0x0;
2014-06-24T02:49:00.449Z cpu17:33512)Code start: 0x418005000000 VMK uptime: 60:12:38:37.042
2014-06-24T02:49:00.449Z cpu17:33512)0x41238ba1da80:[0x4180057c62c6]lpfc_sli4_bpl2sgl@#+0xb6 stack: 0x410a793b2248
2014-06-24T02:49:00.450Z cpu17:33512)0x41238ba1db50:[0x4180057cfef8]__lpfc_sli_issue_iocb_s4@#+0x80 stack: 0x412e80c5d5c0
2014-06-24T02:49:00.450Z cpu17:33512)0x41238ba1db90:[0x4180057c9c54]lpfc_sli_issue_iocb@#+0xc0 stack: 0x418000000002
2014-06-24T02:49:00.450Z cpu17:33512)0x41238ba1dc60:[0x418005768082]lpfc_els_rsp_rls_acc@#+0x142 stack: 0x41238ba1dd00
2014-06-24T02:49:00.451Z cpu17:33512)0x41238ba1ddb0:[0x4180057d45dd]lpfc_sli_handle_mb_event@#+0x129 stack: 0xc5913e00000078
2014-06-24T02:49:00.451Z cpu17:33512)0x41238ba1df30:[0x41800577c810]lpfc_work_done@#+0xbcc stack: 0x225e
2014-06-24T02:49:00.451Z cpu17:33512)0x41238ba1df80:[0x4180057821d4]lpfc_do_work_event@#+0xbc stack: 0x13

This time it wasn’t the HPSA, but rather the LPFC driver used by our HP VirtualConnects that caused the crash. This host was running version 10.0.575.8. VMware tells me that they are working with Emulex to fix the driver and VMware recommends upgrading to LPFC version 10.2.261.7 available here as there have been no reported PSOD’s with this version.

Installing the new driver is identical to the process described in part 1 and is well documented on VMware’s site.

To find what version LPFC driver you’re running…

  • SSH to your ESXi host
  • Identify the HBA’s installed and make sure you have LPFC interfaces
# esxcfg-scsidevs -a
 vmhba0 hpsa link-n/a sas.5001438026743f90 (0:3:0.0) Hewlett-Packard Company Smart Array P220i
 vmhba1 lpfc link-up fc.5001438002a30041:5001438002a30040 (0:4:0.2) ServerEngines Corporation Emulex OneConnect OCe11100 FCoE Initiator
 vmhba2 lpfc link-up fc.5001438002a30043:5001438002a30042 (0:4:0.3) ServerEngines Corporation Emulex OneConnect OCe11100 FCoE Initiator
  • Now that we verified our host has two LPFC’s, run the following to get the version
# vmkload_mod -s lpfc |grep Version
 Version: 10.0.575.8-1OEM.550.0.0.1198611

If you have version 10.0.575.8-1OEM.550.0.0.1198611 like the example, you’d best get patching.

 

 

 

 

Matt Bradford

Leave a Reply