Thanks to vinf.net for giving me a dig in ribs to finally get around to review whether the HP ML110 G4/G5 (Dual Core) and ML115 G5 (Dual & Quad Core) servers are compatible with VMware vSphere’s new Fault Tolerance (FT) feature.
For those of you unfamiliar with vSphere’s Fault Tolerance here is a description from the VMware site which describes how Fault Tolerance helps you … “Maximize uptime in your datacenter and reduce downtime management costs by enabling VMware Fault Tolerance for your virtual machines. VMware Fault Tolerance, based on vLockstep technology, provides zero downtime, zero data loss continuous availability for your applications, without the cost and complexity of traditional hardware or software clustering solutions.”
So enough of the marketing speak – what does this actually mean?
As you probably already know VMware’s High Availability (HA) functionality allows VMs to start on another ESX host(s) in a predefined cluster in the event that the ESX host they were running on failed due to hardware failure or similar.
This is a useful feature and minimises downtime though there is a period of usually a couple of minutes where the VMs from the failed ESX host will be offline whilst they start up on another working host in the cluster.
Fault Tolerance is the next natural step in the HA offering. Through a couple of mouse clicks in vCenter Server a VM is able to be protected from ESX host failure and won’t experience the same period of downtime as when using HA. It achieves this by establishing and maintaining an active secondary VM (on a separate physical ESX host) which executes the same sequence of virtual guest instructions as the primary VM. If the primary VM’s ESX host fails then the secondary host picks up where the primary left off and a new active secondary VM is created (if possible) on another ESX host thereby maintaining this high level of resilience. For a more detailed look into the workings of VMware’s Fault Tolerance take a look at this VMware white paper. So as you can see FT is great for those business critical applications that need maximum uptime and that perhaps aren’t developed to be run in a traditional Microsoft type cluster
There is however an overhead in running FT, for example a dedicated 1Gb connection between the ESX hosts in the cluster is required. For most ESX installations this would mean you’d need 4 NICs (Service Console, General VM Traffic, VMotion and Fault Tolerance logging).
Below is a good video demonstrating FT in action – worth a watch. Also,
Key Features of Fault Tolerance
The following are some key features of VMware Fault Tolerance as provided in VMware’s FT white paper:
• Runs on standard x86 based servers, vendor neutral.
• Supports standard unmodified guest operating systems and
• Protects dozens of guest operating systems already supported
by ESX, including 32- and 64-bit Windows, Linux, Solaris, and many other legacy guests.
• x86 hypervisor-based solution; integration with virtual machine technology, operating system neutral.
• Support for all emerging applications frameworks that have
not yet evolved their own clustering solutions.
• Support for existing virtual machines.
• Single image management: virtual machine is installed and
managed in the usual way as a single image; no need for
additional operating system and software licenses.
• vLockstep guarantees: the primary and secondary execute
exactly the same x86 instruction sequences.
• Transparent failover with no data or state loss in the virtual
machine; all state, including storage, memory, and networking
is preserved even in the face of catastrophic hardware failures.
• Potential different physical locations for primary and secondary
to guard against campuswide or buildingwide failures.
• Automatic re-establishment of fault tolerance after hardware
• Integration with HA and DRS that are responsible for selecting
a new secondary host after a failure; no manual steps during
failover or after recovery.
• Failing systems can be returned to the HA cluster after repairs
without any additional FT reconfiguration.
• Component failover when combined with network teaming
and storage multipathing.
• No additional installation; FT is a built-in feature of VMware
• Mixing FT and non-FT virtual
Current Fault Tolerance Limitations
There are however a couple of limitations:
Compatibility – VMware Fault Tolerance will only run with the following models of CPU and above:
The reason for this being that these models of CPU have additional physical processor extensions required by the vLockstep technology.
It’s in the family – When using FT in a cluster all physical ESX hosts must be running CPU’s from the same family. For more information on this see Eric Sloof’s informative posting here which has an easy to follow table.
vCPU – FT, at this stage, will only with work VM’s with a single vCPU – at this stage though expect to see multi-vCPU compatibility in the future.
Thin Provisioning – FT is unable to protect VMs that are running thin provisioned disks.
So can I run it on my HP Proliant ML110 or ML115?
From the list of compatible CPU’s provided by VMware (see details in table above) we can see that there is only one potential candidate in the existing Proliant ML110 and ML115 line-up that will work with the new FT feature. This server being the newest model in the range, the ML115 G5 Quad Core (1352).
To test this was the case, along with testing the other models to demonstrate that they were not compatible I used VMware’s free SiteSurvey tool. If you’re not 100% sure whether your server is compatible with FT or other advanced VMware features then download this useful utility and run it against your cluster.
As you can see the ML115 G5 Quad Core with the 1352 CPU is the only ML110/115 model that will be capable of using FT.
Now I just need some extra spare cash to buy another one so I can test.