From Virtual Bridges blog
In the world of IT, marketing folks like to come up with interesting and “clever” terms to describe things. That’s why IT admins like to play IT Buzzword Bingo at trade shows during keynote presentations or breakouts. As long as no one yells out BINGO during the session, everyone wins!
Sometimes, these colorful words make sense. Other times, not so much. But either way, they usually at least sound good. And if you weave in an acronym, that’s like double word score in Scrabble.
Today, we’re talking about a buzzword that gets tossed around a lot during VDI and desktop virtualization discussions: Boot Storm.
But—as you might have guessed—we’re not talking about the weather and we’re not talking about footwear. So what the heck is Boot Storm?
Let’s start with a little background information. When you create a VDI environment, you typically go through a storage-sizing exercise based on anticipated capacity and performance requirements. The exercise aims to capture the steady-state of virtual machine operation so that you can be sure you have the hardware in place to support it. But what happens when steady-state isn’t the norm?
What happens is a boot storm, which is the situation that arises in a VDI environment when a large number of virtual desktops “boot up,” or power on, either at the same time or within seconds of one another. Hundreds or thousands of virtual desktops all powering on at, or near, the same time (say, 8:00 am Monday morning) can cause a huge drag on network throughput, storage I/O and host server performance.
But boot storms aren’t limited to the morning or the start of a day. They also happen during shift changes, or following a system-wide update (i.e. Windows Patch Tuesday or a major anti-virus definition update) that forces all machines to reboot at the same time.
Boot storms, in other words, are not isolated to a single event or a particular time of day. And whenever they occur, they can cause major problems: I/O: disks can run hot, which leads to queuing, which leads to longer boot times, which leads to slower virtual desktop environments, which leads to unhappy end users and a black eye for VDI in general.
Over the years, boot storms have led to the abandonment of plenty of VDI projects. And that’s unfortunate, because boot storms should not occur at all. They are typically the result of a poorly designed or under-designed VDI implementation. Choosing the right VDI solution, implementing the right architecture, and starting with the right hardware technology will allow you to take on the dreaded boot storm issue.
A Closer Look at Boot Storms
Consider a midrange laptop computer designed to boot up under Windows 7. It might deliver five IOPS under regular, steady-state use, a common industry average. But during boots, anti-virus scans, logins and application launches, this number can easily increase to 50 IOPS. Multiply that by 100 or 1,000 VDI users, and the result is a disastrously large load on the network and a huge Tier 1 disk spindle requirement. This means that a conventional VDI solution would typically require five to ten times the number of spindles and bus bandwidth.
Back of the napkin math shows that if you have a NAS storage device with enough spindles to provide your environment with a solid 5,000 IOPS, that should be enough to throw at your inbound I/O requests, right? If you’re running 100 VMs in that environment, you get the 50 IOPS per VM that you need, which is more than enough for shared steady-state. When you grow that environment to, say, 500 VMs, you’ve decreased your IOPS per VM down to 10 IOPS.
That still might be enough for steady-state, but what happens when a subset of those VMs need to be rebooted simultaneously (for example, if a physical server hosting 100 VMs goes down and comes back up), and they lay claim to that I/O in order to boot up in a timely manner? That’s no longer steady-state. Instead, you have a host of machines battling for I/O, causing delayed boot times, and taking I/O away from other steady-state operating VMs who are now being starved below the “normal” I/O requirements, causing poor application performance, so everyone ends up suffering.
Now that’s a boot storm!
Solving the Boot Storm Challenge
There are short term work-arounds to this issue, but they don’t fully address the problem. One way to prevent boot storms is to stagger the powering-on of virtual machines. This works well enough for a server virtualization environment where the virtual machines are static and often have 99.99% uptime anyway, and so remain powered on pretty much constantly. But you can’t typically control when your end users, or the system, will decide to power on a desktop VM.
Instead, organizations may choose to throw more spindles and bus bandwidth at the problem in the hopes of mitigating performance hits during boot storms. But adding additional storage to a NAS or SAN because of the architecture involved and the number of images involved will also further increase the cost of a VDI deployment.
Another solution is to use solid-state disk systems (SSD) to handle the high levels of I/O requests. And while that would also likely solve most boot storm problems, the costs associated with doing this can often prove prohibitive to implementing a VDI environment. These types of disks are not cheap when compared to the costs of traditional mechanical disks, and the costs of these SSD drives may outweigh the benefits of doing VDI in the first place.
You could also use third-party, add-on cache-based solid state storage. Again, while this type of solution will usually help solve the problem, the additional costs may not be justified when responding to such a small percentage of these I/O challenges or use cases.
And finally, technologies like Virtual Bridges StorageOptimizer™ with CacheIO™ can address high-use issues such as boot storms and large application launches directly, without the need for additional hardware or software, by offloading most IOPS from the NAS/SAN.