VMware ESXi vSwapping with SandForce SSDs

If, like me (James), your lab server is struggling along maxed out with only 8GB of RAM, disk IO can be a real problem because of guest paging and vSwapping.

I should really upgrade the box, but that means ditching the trusty ML115 – and in any case, the whole point of ESX is to do more with less.  There are also still many machines appearing with 8GB max RAM capacity, such as HP’s new microserver.  So I’ve looked at other options.

Memory Over-commitment

ESXi 4 achieves RAM over-commitment with page sharing, ballooning and vSwapping, and v4.1 adds compression.  Yet whenever my box is over-committed to any serious extent, my mediawiki VM for example takes minutes to spew out a page, and any audio streaming just stops.

The disks are the issue – a 4-drive RAID-10 volume for everything provided by a Perc 5i RAID controller.  A pretty harsh test is firing up a 3GB VM with the box already running at 90% RAM – ballooning requests guest level paging and the SATA array clatters away at something like 500 IOPS to service it.  With vSwapping doing more of the same, the controller queue depth of 128 commands pushes latency to a whopping 250ms with one an inevitable consequence – everything pretty much grinds to a halt.

SandForce SSDs

I’ve had an eye on SSDs since reading this vmware community’s article on vSwapping to SSD at the start of the year.  The concept is simple enough – use an SSDs for vSwapping, which can respond 50 times quicker than a mechanical disk.  The problem is that SSDs have been pretty expensive and many of the more affordable drives have awful 4K random write performance (slower than a mechanical disk in some cases).

Simon wrote on SSDs earlier this year, which inspired me to look again – and led me to the line of drives appearing with SandForce controllers, one of which I’ve been testing in my ML115.

Running an OCZ Vertex II in the ML115 G5

At time of writing there is a compatibility issue with the SandForce 1200 controller and the nVidia MCP55 SATA controller used in the ML115 G5.  The drive is detected in the BIOS, but ESXi doesn’t see it so I’ve hooked it up to the Perc 5i instead.  My box is pretty full so I’ve made do with slotting the SSD into one of the PCI card supports with a screw – hardly ideal, but the box isn’t going anywhere.

SSDs on Dell Perc RAID Controllers

The 5i is an old design and lacks SATA NCQ.  The controller works fine with the SSD, but performance is sub-optimal (about 7,000 4K random IOPS) so I swapped out the 5i for a 6i since it’s a drop-in replacement – it’s a bit faster, uses less power and has SSD and SATA NCQ support.  Note that in the ML115 G5, the on-board SATA ports need to be disabled to get into the card’s <CTRL-R> BIOS utility.

NCQ enables the controller the pass multiple commands to the SSD, making use of internal parallelism in the drive and so boosting throughput.  The 4K IOPS test (50% write) jumped to 14,700 on the 6i – quite simply stunning compared to mechanical storage and just what’s needed for vSwapping!

NCQ also gives the SATA RAID-10 array a 30% speed boost.  The 6i struggles through with the SandForce SSD, with sequential write levelling out at about 60MB/s.  Neither OCZ nor Dell could help unfortunately but it’s not of concern for my random IO use anyway.

Configuring vSwapping to run on SSD

With the SSD working OK on the 6i, a datastore can be created in the usual way and then used for vSwapping (see ‘Virtual Machine Swapfile Location’ page in the vSphere client). VMs then need to be suspended and resumed for the change to take effect.

vSwapping is a very blunt stick though – ballooning generates less disk IO and usually has less impact to VMs, because the VM’s own memory management is more intelligent.  It knows about areas that shouldn’t be paged and RAM that hasn’t been used recently, where vSwapping just chooses a bunch of pages at random and swaps them out (and stalls the VM while it does so).

So it seems to me that guest swap space should also be on the SSD, by adding a thick-provisioned disk to each VM on the SSD, creating a 64K aligned partition on the disk in the VM, and finally moving the swap file onto that drive (see here for Linux info).  The downside is the amount of space required – essentially twice the VM’s RAM allocation.

Does the SSD Deliver?

It’s already been demonstrated that SQL-Server performs quite well, but that was in an environment with much more RAM in the first place and an enterprise grade SSD.  My testing is less scientific, but here’s what I’ve found.

My test box typically runs with about 7GB used, so generating memory over-commitment isn’t hard – just starting a vSphere lab does the trick: two ESXi VMs, vCentre Server, a Windows domain controller and a virtual router together take the RAM load to about 15GB.  Ramping that up in a short space of time is a harsh test and without the SSD completely stalled the box – RDP sessions were dropped, audio streaming died and web servers appeared offline.

With the vSwap and guest paging all running from the SSD, the host survives the test with some stuttering on audio streaming.  Response from web servers on a ‘first page out’ basis after the test seems to vary from about nine to about 30 seconds.  An active session to a VM running Photoshop during test was a bit patchy but mostly usable.

Once the system stabilises however, performance of everything seems pretty OK with the RAM initially truely maxed.

Being a home lab server, it hasn’t got hundreds of users pounding every VM so the system hangs together pretty well.  Essentially I guess that provided the active memory is comfortably within the physical RAM, performance should hold up.

SSD Swap and RAM Compression

With RAM compression disabled, the swap rate peaked at 36 MB/s with the ongoing rate depending entirely on the load.

I was expecting RAM compression to help a bit, but was surprised by how much.  RAM compressed consistently reduced the swap rate by well over half with my ‘keen’ settings, but did seem to slow things down quite a bit too:

mem.MemZipAllocPct – 50
mem.MemZipLowMemMaxSwapOut – 50
mem.MemZipBalloonXferPct – 30
mem.MemZipMaxRejectionPct – 10
mem.MemSwapSkipPct – 75

Tweaking MemZipAllocPct and MemZipLowMemMaxSwapOut to 25% seems to provide a happy balance of swap and RAM compression throughputs.  Reducing the write load on the SSD is a good thing due to their limited write cycle life.

With this configuration, creating sudden memory pressure by starting a 3GB VM seems to enable the system to work everything much harder – the SSD peaks at nearly 50MB/s and compression 30MB/s.  Writing this, a WHS VM is performing de-duplicated backups, I’m installing vCentre, and audio streaming is continuing pretty well.

In the lab environment some attention is also needed to mem.idletax, since it is not always desirable for idle VMs to be more heavily paged.

In Summary

The SandForce SSDs completely change the storage dynamic for small offices and home labs – a single SSD provides three times the random (swap/database) throughput than the EqualLogic PS4000 reviewed on Techhead previously, at 1/200th of the power consumption.

Paging memory to memory is hardly new (remember EMS?), but for ESX, using SSD for swapping enables much higher RAM over-commitment (and hence VM density).  Echoing the earlier vmware communities blog on the subject, vSwapping to SSD is something it seems that vmware should be looking at supporting formally, for example by adding TRIM support in the vSwapping configuration (to maintain the SSD performance) and enabling any queue depth throttling to be overridden for a vSwap datastore.

The quick RAM loading test performed here proves that there is no substitute for real RAM, but with workloads more in line with the box capacity (each with perhaps 10% of physical RAM allocated) everything holds up without a hitch with some serious over-commitment.

I found best performance with the guest paging and vSwapping on the SSD and RAM compression enabled.  The balloon driver was able to ramp up recovered RAM more quickly than vSwap, and VM responsiveness was improved because spinning storage latencies were not then affected by host level swapping by multiple VMs.

The bottom line – adding the SSD has significantly increased the VM capacity of my ML115, but for how long will remain to be seen.

_________________________________________________

About the Author

Author: James PearceJames is regular guest contributor to TechHead and is a Kent based qualified accountant, currently working in information security and technical architecture with  most of his  time “being spent on virtualisation and business continuity at the moment”. Check out his virtualisation and storage blog here for more interesting and informative posts.

_________________________________________________

 

Technorati Tags: ,,,,
Share on facebook
Facebook
Share on twitter
Twitter
Share on linkedin
LinkedIn