One of things holding many companies back from moving their tier 1 business critical applications across to VMware’s vSphere virtualized platform has been its inability to provide fault tolerance (FT) for virtual machines (VMs) running more than one virtual CPU (vCPU), which is a requirement for many applications, particularly those of the tier 1 variety.
Up until vSphere 6, it has only been possible to provide real-time fault tolerance for single vCPU VMs. What this meant for those VMs with FT enabled, and the applications running on them, is that they could sustain a host level failure, with a shadow copy of the VM being run, and constantly kept in sync (i.e.: vCPU, vRAM, etc) on a second physical ESXi host. So, if the primary instance of the VM was to fail, due to a hardware fault or similar to the underlying ESXi host, then the secondary/shadow instance would automatically become the primary, thereby providing little, to no, disruption to the end user. At this time a new secondary instance would be created on another functioning ESXi host in the cluster. This VM level FT could be achieved within vSphere through a simple few clicks of the mouse, much easier (and often cheaper) than trying to configure most application or OS level clustering software.
Of course, this is a very high level view of what happens in a FT enabled instance, though it really doesn’t do justice to the phenomenal amount of architecting that has gone in to ensuring that the vCPU and vRAM states are kept in sync. between the primary and secondary FT instances. As you can probably imagine the sheer amount of data that needs to be kept in-check and the timings surrounding all of this is no small feat!
Now… this is for a single vCPU VM instance, imagine the fine a balancing act around timings of vCPU and vRAM state across the primary and secondary FT enabled VM instances when you start introducing multiple vCPUs into the mix! Making this shift from single vCPU to multiple vCPU FT, or SMP-FT as it is now called, has been the challenge facing VMware’s vSphere architects and programming teams.
Amazingly with the announcement of VMware vSphere 6, this multi vCPU FT functionality becomes a reality, with the ability to have VMs with up to 4 vCPUs (and 64GB memory) protected in a FT configuration. In fact, VMware have rewritten how FT is handled, dropping the old “lock-step” approach in favour of a new “fast check-pointing” type method.
This multi-vCPU FT limit roadblock has now been removed, which will almost certainly see many businesses start to virtualize their heavier lifting, and tier 1 type applications/VMs, greatly enabling them to get a step or two closer to running a 100% server-level virtualized environment.
I should also mention, as you can probably imagine, the amount of vCPU instruction data being passed between ESXi hosts has increased significantly, so there is now a 10Gb networking connectivity requirement between ESXi hosts. This is a consideration, with potential financial implications, for those businesses thinking of implementing SMP-FT.
How does SMP-FT affect your business, and what use-cases does it open? I’m thinking the easy clustering of vCenter VMs (2 vCPU min. requirement) would be good. Leave a comment below, we’d love to hear what you think.