VMware ESXi Purple Screen of Death on HP Proliant ML110 or ML115

A common occurrence discovered by HP Proliant ML110 and ML115 owners when having installed the HP version of VMware ESXi  containing the management agents is a Purple Screen of Death (PSoD) after about 2 minutes of the ESXi host being up and running.  The following video demonstrates this:

 

 

The reason this occurs is that ESXi loads the HP CIM (Common Information Model) agents which then subsequently crashes ESXi if particular hardware is not found.  Needless to say the hardware the CIM agents are looking for are not present in the Proliant 100 series of servers which includes the ML110 and ML115 models.

With the introduction of VMware vSphere you are now able to view the majority of hardware in your ESX(i) host and its current status via the use of the common interfaces standard, Intelligent Platform Management Interface (IPMI).  Though you will notice that this doesn’t quite extend to the onboard disk controller of the Proliant ML115 (or ML110).  Similar specification entry level servers from the likes of Dell do show the full disk and onboard controller details, so what’s the catch?  From what I can make out HP uses their own minor variation of IPMI which talks to the BMC which then in turn talks to the ILO to gain some of the additional component information.

 

VMware ESX Health Status - HP Proliant

 

Herein lies part of the problem – the HP Proliant 100 series of servers don’t come with the full version of the ILO.  They can have the Lights Out 100 card installed though this is the equivalent of the low fat/no frills version of the full ILO which doesn’t provide the same level of hardware detection and monitoring information.

So how do you know if one of your hard disks connected to the onboard controller is having issues?  Well, the answer is – you won’t.  The official line is that these servers aren’t on VMware’s hardware compatibility list (HCL) so you may want to consider is using a PCIe based array controller such as the HP Smart Array E200 or a Dell DRAC if you want this level of hardware monitoring.  Both of these array controllers can often be found at a semi-reasonable price on EBay.

Unfortunately your disk controller and drive monitoring issues don’t go away if you buy an HP E200 as you will still need to load the HP Management Agents on to your ESX(i) host for the E200 to be detected and presented in the ‘Health Status’ window of ESX(i).  At this point you are then back to square one as once the HP Management Agents are loaded ESX(i) will PSoD again.

To overcome this PSoD issue you will need to install the HP Management Agents and then stop all but the Smart Array controller agents from loading.  Once this is complete and the ESX(i) host rebooted you should now see the HP Smart Array E200 controller and attached hard disks in the ‘Health Status’.  I only have one E200 controller in my lab though have used this method since ESX 3.5 without any issues.  I will put together a step by step guide if anyone is would like to see how this is done.

 

About Simon Seagrave 706 Articles
Simon is a UK based Virtualization, Cloud & IT Technology Evangelist working as a Senior Technology Consultant and vSpecialist for EMC. He loves working in the ever changing IT industry & spends most of his time working with Virtualization, Cloud & other Enterprise IT based technologies, in particular VMware, EMC and HP products. As well as on this site, you can find him on Twitter and Google+

26 Comments

  1. Hi Si,

    Thanks for the heads up on ESXi. I am running ESX 3.5 with E200 controllers and would certainly like to see the process for getting the managment agents up and running.

    Good work as ever.

    Cheers

    Leigh

  2. Hi,

    I’m having the psod issue on a hp ml110 as well.

    I tried to prevent the hp agents from loading by quickly going to the console and moving some files before the panic occurs but I did not managed to do it, I still have the panic, maybe that’s because I’m not fast enough.

    My next guess would be to boot the server with a livecd and find a way to prevent hp agents from loading but that would imply to re-pack the files.

    Can you provide more detailed instructions ?

    Regards
    Alexandre

  3. Hi Si,
    Thanks a lot for your article. I am running Esx(i) 4.0 with E200 controllers, but now I am not able to check and manage, for example, the disk status.

    I would like to see your video about a step by step guide.

    Thanks in advance,
    M

    • Any idea on when you when you will find time to document the fix for the HP Management Agents ? If its going to be several weeks I will need to purchase a different controller card. I was hoping to avoid that purchase.

      Thank you in advance!

  4. 1. Install regular (!) ESXi 4.0.0 Update 1 (208167), not the HP one.
    2. Set root password via vSphere Client
    3. Put the server in Maintenance Mode via vSphere Client
    4. Download the HP Tools [http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?swItem=MTX-25f06077ad5541f5a962dd2a69&lang=en&cc=us&idx=1&mode=4&]
    5. Install the tools via the Remote CLI: vihostupdate.pl –server ip.adres.of.server –install –bundle [full-path-to]hp-esxi4.0uX-bundle-1.1.zip. User root, password as set in 2.
    6. Reboot server via vSphere Client

    I’m not sure but I thought this was working without PSOD since Update 1. If you still get the PSOD, check this blog post starting at header Procedure: http://forum.lettronics.com/forums/thread/1273.aspx

      • Yes, I have a ML115 G5 with a E200-controller and I did follow my steps. But – it was a few months ago – I’m not sure about the last part, as I already mentioned.

        I also tried the HP-integrated install from the HP website but that one didn’t work even with the steps from lettronics.com. But that was 4.0.0 (not Update 1).

        Currently that ML115 is gone to a customer so I can’t retry the steps to confirm the last part. You’ll know quickly ’cause the PSOD occurs within a few minutes. It happened to me once that I got the PSOD while login in to the console (unsupported) so you have to be quick ;).

        • Egbert

          I attempted your steps tonight. It looked real promising for about 5 minutes. The I got the PSOD. I also tried the steps in the lettronics link. But the VI part required so much deletion that I got kind of nervous and decided to just wait. Hopefully Simon will find time to share his magic with us soon.

          Thank you

  5. Hi ws2000

    Today I had to connect to the ESXi host at the customer. I couldn’t get a connection due to … a PSOD. So I had to instruct someone there to do the ‘lettronics steps’ and it is working after that.

    I can imagine the VI/deletion-stuff is getting you nervous but try to get over that ;): The SA-parts are almost at the end. You’ll need to do that step. Try the multi-line delete function in VI, e.g. 5dd to remove 5 lines at once. The lettronics post didn’t mention that you also have to leave the [SMX_SmartArray]-parts. I left these too.

    It also says (at the end): cp oem.tgz /bootbank/oem.tgz. I asked the fellow at the customers place to do that AND to check if the original file was overwritten (cd /bootbank && ls -la). It was NOT… So I ordered him to a) delete the original oem.tgz in /bootbank and b) to copy the modified file. Probably you also can try [cp -f oem.tgz /bootbank/oem.tgz] to force the original file to be overwritten.

    I hope you can get further with this.

    If you have to do the lettronics-steps on more than one system it is probably worth to investigate if it is possible to change the oem.tgz-file in the HP install files.

    • Egbert

      All right. You have talked me into trying it again 🙂 I really want this to work.

      Its funny you would mention the [SMX_SmartArray lines. That is where the uncertainly stopped me. Is there way to uninstall the HP Tools if this still fails?

      Simon posted above ” To overcome this PSoD issue you will need to install the HP Management Agents and then stop all but the Smart Array controller agents from loading.” His process sounds different than the lettronics-steps. Any idea how he is making it work?

        • Simon

          The video will be great. But if you are pressed for time just a typed overview of the steps would also be greatly appreciated.

          Thanks

          • Hi,

            I haven’t forgetted about HP agents post – honest. 🙂

            Hope to get some time to write this up – also before I forget how I did it in the first place 😉

            Cheers,

            Si

  6. Hi, I am the author of the link about installing HP tools on the ML110 G5.

    If you follow the instructions, you should end up with a reliable ESXi running on a Proliant ML110 G5 with an E200 or P400 SMART Array controller (HP do not support P400 in ML110 G5, but it works fine if you do not take the firmware beyond 4.28 from memory).

    If any of the steps are unclear, drop me a message via my site, and I’ll clarify the article – many, many pages now link back to that single source that I created after spending a lot of time trying to get some kind of disk monitoring working on the ML110.

    • Lettronics

      I started your process and then chickened out because of issues like Egbert pointed out. 1. ” The lettronics post didn’t mention that you also have to leave the [SMX_SmartArray]-parts. I left these too.” 2. “It also says (at the end): cp oem.tgz /bootbank/oem.tgz. I asked the fellow at the customers place to do that AND to check if the original file was overwritten (cd /bootbank && ls -la). It was NOT… So I ordered him to a) delete the original oem.tgz in /bootbank and b) to copy the modified file. Probably you also can try [cp -f oem.tgz /bootbank/oem.tgz] to force the original file to be overwritten.”

      Maybe not a big deal to more experienced folks but enough for a newbie to be stopped in his tracks 🙂

      I’m still really hoping to get Simon to share his process so we can compare notes.

      Thanks

  7. Tried the same steps as listed above on my generic whitebox server with a P400 array controller and esxi 4.1. However, it fails trying to install the update, I get this:

    Please wait patch installation is in progress …
    No matching bulletin or VIB was found in the metadata.No bulletins for this platform could be found. Nothing to do.

    I can’t find a way to force the install. Any tips?

  8. I unzipped the driver package, which was a .zip file. I found inside the package a metadata.zip file…I unzipped that file as well. Inside the metadata folder I found a platforms.xml file and a vmware.xml file. I modified both of the those files, replacing any 4.* or 4.0.0 with a 4.1.0 (using notepad), and resaved the files. I carefully rezipped it all back together (using WinRar), being careful of the file structure (no metadata folder inside my metadata.zip file), and bingo…driver installed!

  9. The reply from Demo got me thinking. I unzipped the files as he did, and modified vmware.xml as follows:

    In the first VIB section, with VIPID:

    cross_oem-hp-smx-provider_410.02.07.60-235786

    remove ALL the hwPlatform references, they look like this:

    Then, remove the entire next VIB section, starting with ending with . This is the ILO VIB, my server has no ILO, so no need for it.

    Then proceed with the install, following the instructions to edit the oem.tgz above.

    Make sure to pay attention that your zip files contain no folders.

  10. So I realized when I last posted this, somehow the web site blew away all my example code. That’s bad, since my server blew up and I had to redo it.

    So here are the instructions again, with a bit more detail. This time for version 1.1 of the CIM provider.

    Extract the HP esxi bundle zip file. Within that zip file, you’ll dine metadata.zip, extract that. Edit vmware.xml. There are 2 VIB sections. One of them has a VIBID of cross_oem-vmware-esx-drivers-char-hpilo_400.8.7.1.1VMW-164009. See the ILO part, my whitebox server doesn’t have ILO, so remove that entire VIB. You’ll have left over another VIB, scroll down till you fine the hwplatform model sections. Remove all of those. You should be left with just swPlatform Local entries.

    Finally, carefully recompress all your files, make sure the paths are identical, windows Zip tends to put extra folders in your zip files.

    I think this forum software bans right and left arrows. Here is my complete file, a little find and replace should get the arrows back.

    [metadataResponse]
    [version]2.0[/version]
    [timestamp]2011-02-03T11:36:20.864312-06:00[/timestamp]
    [bulletin]
    [id]hpq-esxi4.1uX-bundle-1.1[/id]
    [vendor]Hewlett-Packard Company[/vendor]
    [summary]HP ESXi 4.1 Bundle 1.1[/summary]
    [severity]general[/severity]
    [category]general[/category]
    [urgency/]
    [vendorProduct/]
    [releaseType]extension[/releaseType]
    [description]HP ESXi 4.1 Bundle 1.1[/description]
    [kbUrl]www.hp.com[/kbUrl]
    [contact]hpvmwesxi@hp.com[/contact]
    [releaseDate]2011-02-03T11:32:52[/releaseDate]
    [platforms]
    [softwarePlatform locale=”” version=”4.1.0″ productLineID=”embeddedEsx”/]
    [/platforms]
    [vibList]
    [vib]
    [vibVersion]1.4.5[/vibVersion]
    [vibID]cross_oem-hp-smx-provider_410.02.07.70-260247[/vibID]
    [name]oem-hp-smx-provider[/name]
    [vendor/]
    [version]410.02.07.70-260247[/version]
    [buildDate]2011-02-03T17:29:11[/buildDate]
    [vibType]cross[/vibType]
    [systemReqs]
    [swPlatform locale=”” version=”4.*” productLineID=”esx”/]
    [swPlatform locale=”” version=”4.*” productLineID=”embeddedEsx”/]
    [maintenanceMode install=”False”]true[/maintenanceMode]
    [/systemReqs]
    [postInstall]
    [rebootRequired]true[/rebootRequired]
    [hostdRestart]false[/hostdRestart]
    [/postInstall]
    [visorDestination]oem[/visorDestination]
    [pkgfile]oem-hp-smx-provider-410.02.07.70-260247.x86_64.rpm[/pkgfile]
    [softwareTags]
    [tag]cim[/tag]
    [/softwareTags]
    [vibFile]
    [sourceUrl/]
    [relativePath/]
    [packedSize]3299648[/packedSize]
    [checksum]
    [checksumType]sha-1[/checksumType]
    [checksum]e2afccb5301fad750640660bbcc10b255dfd8e8f[/checksum]
    [/checksum]
    [/vibFile]
    [/vib]
    [/vibList]
    [/bulletin]
    [/metadataResponse]

Leave a Reply

Your email address will not be published.


*


This site uses Akismet to reduce spam. Learn how your comment data is processed.