More on Transient VMs
In my article ("Virtualization Adoption Lifecycle") I introduced the concept of "Transient VMs".
At around the same time, I started hearing other people at VMware talking about "Transient VMs" – so, I'm not going to take credit for coining the term (unless our product management organization regularly reads vmMBA.com).
When I discussed this with a co-worker (who recently moved from the San Francisco Bay area), he said "yes, we really need to do something about those transients". (Probably funnier when you're sitting through day-long product update presentations like we were). No, I'm not talking about homeless people.
What are Transient VMs?
Consider two types of virtual machines:
- Traditional VMs: deployed with the intent that it remain powered on and managed indefinitely
-
Transient VMs: deployed for a specific purpose, with a non-permanent lifespan; may or may not remain powered on constantly during its lifespan
Traditional VMs can be managed and priced in a way that is somewhat similar to physical servers. Although there are efficiencies gained through portability, replication, snapshots, and various automation touch points, a traditional VM is a server (or desktop) that must be managed on a day-to-day basis, and consumes resources even when it is idle.Transient VMs give us a lot more flexibility in management and allow us to optimize resource utilization. We can take advantage of transient VMs in several ways today, and there are several technologies that are on their way that will make transient VMs even more prevalent.
Here are some examples:
Example 1: Legacy Application Servers
In my days in outsourcing, I came across a lot of servers that had long outlived their usefulness, but administrators were afraid to turn them off. Development servers for applications that had reached a stable state are often not used for years. Production servers for applications that had been sunset often have regulatory requirements to remain accessible.
Legacy servers are scary. They typically run on unsupported operating systems, which do not even have driver support for new servers. They are left in place, because no one even knows how to rebuild them. If we convert them to VMs, we immediately improve availability if they are left running (VMs are hardware-independent, run on newer hardware, and have HA built-in for VI3 Enterprise). Or, we can archive them, knowing we can recover the full state (configuration, OS, application, and data – not just data). We could even leave them powered off and keep them up to date on patches with Update Manager, with minimal human intervention.
Example 2: VMware Lab Manager
Traditionally targeted to the myriad of test lab sandboxes, Lab Manager gives a subset of control over to the development, test, and application management staff, allowing them to build entire application stacks from templates and collaborate on bug fixes (while IT still maintains templates and performs system and security management). Lab Manager is also evolving into a general-purpose tool for managing Transient VMs as customers find new and unique ways to use it (for example, in training labs).
Example 3: VMware Stage Manager
Announced at VMworld Europe, Stage Manager will allow application administrators to march an application through phases such as Unit Testing, Integration Testing, QA, Staging, and User Acceptance testing, while following prescribed change management processes, approvals, and archival, as applicable. Test systems can also be created as copies of production. The intent is to minimize configuration drift while also minimizing management costs (higher quality and lower costs, wow!) – and Transient VMs are a big part of it.
Example 4: VMware Lifecycle Manager
With the introduction of Lifecycle Manager last quarter, VMware fundamentally changed the way VMs can be requested and provisioned, and give customers an easy option for an "expiry date" for VMs. Although the automated provisioning features are very helpful, the fact that VMs can have defined approvals, ownership, costing, and set expiry dates means ongoing infrastructure and management costs can be reduced significantly.
Example 5: Instant Test Servers
Most of the servers deemed "test" are used very infrequently. Let me differentiate "test" from "staging", "user acceptance testing" or "QA" servers: in this case, "test" servers are used by IT staff to test configuration changes, patches, or upgrades.
Whereas production servers are used constantly, and development/qa/staging/UAT servers are used in bursts of activity, test servers are typically used before making a change on a production server. When we use traditional VMs (or physical servers), we "manage" the server (monitor it, keep it up to date on patches, and troubleshoot when certain things go wrong).
Conversely, we could create a test environment from a clone of production (or an image-level backup) when needed. We could even create multiple test environments to test multiple scenarios – thus improving our test quality. The overall result should be a lower management cost and higher quality of service.
Example 6: VDI
Changing gears: in many virtual desktop scenarios, user sessions can be defined as non-persistent. If user data and profiles are stored outside of the VMs themselves, and users do not self-install applications, "permanent" VMs aren't required. In these cases, the only image that is patched and managed is the master image – the non-persistent VMs can be destroyed at logoff (and re-created from the master for the next login).
How do we design for Transient VMs?
Transient VMs require a new way of looking at the way we build architectures for the data center. When faced with some key events such as migrations, refresh projects, or new implementations, we should evaluate whether 100% of migration candidates are actually needed. Some typical candidates for transient VMs include:
- Test servers: could point-in-time test servers (clones of production) be more useful than dedicated test servers?
- Development servers: would developers be better served with a flexible self-service environment that facilitates better collaboration (and would this ease the burden on server operations staff)?
- Legacy applications: are there servers with no active users, but are kept powered on to satisfy regulatory requirements or out of fear?
- Bursty workloads: are there applications that require a set of servers for brief periods in order to satisfy cyclical or intermittent workloads, such as tax season, end-of-year processing, or peak sales periods? Web and SOA applications that are componentized are usually good candidates for a more flexible approach with transient VMs. Additional web and application server VMs can be created as needed, added to the pool, and destroyed when no longer required.
When we go through a server list to decide upon a migration plan, build plan, or refresh plan, we should be thinking about how transient VMs could be used to reduce costs and/or improve flexibility for IT operations, application developers, or the business units themselves.
How do we account for Transient VMs?
This is the challenging part. It's complex enough to determine the fixed, variable, and semi-variable costs, along with the shared and dedicated components when all of our VMs are powered on and running 100% of the time. Transient VMs have some unique features that affect the cost model:
- Infrastructure costs are not "free" for powered-off VMs. There needs to be enough reserve capacity to handle the maximum number of transient VMs that are powered on at one time. When the use of transient VMs comes in bursts (several at a time – such as during enterprise application performance testing activities), it may be easier to assume that 100% of their capacity is required at all times
- Energy costs may be reduced if the number of powered-on hosts can be easily managed based upon changing workloads (i.e. using Distributed Power Management)
- One-time costs should be minimized: if system administrator intervention is required for each power-on and power-off (and/or archive) operation, it can hurt the value proposition for transient VMs. Automation and self-provisioning can help
- Storage costs can be better managed with an Information Lifecycle Management strategy for transient VMs to move them off to low-cost storage when not in use. This is made easier with Storage VMotion, or with storage virtualization technologies such as EMC Invista, Hitachi's USP, or IBM's SAN Volume Controller
- Monitoring costs can be lower, but there must be a streamlined way to update the monitoring tools when VMs are archived or removed (so that it does not trigger a false positive downtime event). In many cases (e.g. Lab Manager), the VMs themselves may be unmonitored
- Management costs should be lower for transient VMs. Automation helps. Lab Manager, for example, is a self-service environment, and the VMs themselves are often "unmanaged". With Lifecycle Manager, many of the typical provisioning and configuration workflows are automated. VMware Update Manager can patch VMs even when they are offline. Even without automation, a VM that is powered on only infrequently should be easier to manage than one that is continuously powered on (as long as server operations are streamlined for transient VMs). Even better are VMs that are truly temporary and created for a specific short-term purpose (e.g. a clone of production for a temporary test activity).
At some point in the future, I'd like to build a cost model that addresses transient VMs. If anyone has any that they can share with me, please forward.
It may seem counter-intuitive that VMware is finding ways to reduce the number of VMs in a customer's environment. The assumption is that the prospect of Transient VMs will do much more than previous waves of virtualization to transform the way infrastructure is designed, built, and managed, and thus move more servers (and desktops) over to a virtual world than is possible with standard processes.





Comments