Some Guiding Principles for Chargeback with Server Virtualization (Part 1)


I spend a lot of time talking to outsourcers about various methods for pricing virtual servers. Some key themes have emerged over the past few years that seem to make for more effective chargeback scenarios. Success can be measured in various ways, and often depend on the vantage point (customers are less interested in whether an outsourcer makes a profit or not, but invariably anything other than "win-win" will fall apart over time).

These principles are in no particular order, and are by no means an exhaustive list. I welcome comments on some real-world scenarios that you all may be experiencing.

 

1.    Incentives are as important as cost recovery

 Many chargeback models are focused simply on cost recovery. This is a missed opportunity, and it diverges from the way products are priced in the "real world". Companies that sell goods and services of all sorts set their pricing not only to recover costs and make a decent profit, but they also have pricing strategies that attempt to affect the behavior of customers, partners, and competitors.

It should be no different when setting prices in an IT chargeback framework. If we want to encourage our customers to use standard services (instead of custom) and to adopt virtualization in a widespread way, we should set prices to encourage that behavior. Sometimes this strategy may diverge from a simple "cost plus" method in the short term, but the goal of moving a customer toward standardized, modern services is often worth the short-term sacrifice.

Chargeback methods also create incentives for the service provider. For example, a "one size fits all" VM price that assumes a certain average capacity will discourage the use of "large" VMs. The service provider will make it easy for customers to use VMs that are sized under the calculated average, but will make life difficult for anyone wanting a VM that is larger than the average.

 

2.    Separate infrastructure costs from services costs

Most successful chargeback methods do not co-mingle the cost of data center capacity (CPU, Memory, Storage, and Facilities) with the labor and tools to manage individual application and OS instances. An extreme example of the reason for this is the difference between 10 VMs with 1 GB RAM each and 1 VM with 10 GB of RAM. Each has a total of 10 GB of RAM capacity, but 10 VMs are usually much more work to manage than a single, large VM.

Thus, in the interest of fairness and to avoid adverse selection,  it's best to separate the management costs from infrastructure costs in a chargeback model.

 

3.    Consumption-based pricing isn't always best

There are several tools that are emerging that allow organizations to measure capacity utilization to a very granular level. I am not convinced that charging for utilization is appropriate for most organizations - yet. There are three potential problems with offering consumption-based pricing:

  1. If a service provider is to recover costs based on capacity utilization, it must also recover the costs of unused capacity. Unless workload requirements between different customers or business units are complementary (i.e. spikes occur at different times), unused capacity is usually burdened into the usage charge. Because the service provider
  2. Infrastructure utilization spikes can occur for many reasons, and not all of them are directly correlated with business demand. An example is poorly written code or poorly implemented management tools; these examples can cause some serious finger-pointing when the customer receives the bill. Consumption-based pricing can only be successful if it is clear that these issues can be avoided
  3. With the exception of true "utilities" like power and water, most core services that a business receives come at a predictable price, and changes to the price are usually well correlated with actual business demand. The switch to a "utility" based IT model with variable monthly pricing is something that not every organization is prepared for unless the business demand is well correlated with utilization (for example, E-commerce sites).

However, there is a lot of upside to consumption-based pricing. Assuming the chargeback method is driven down to the business unit level (instead of the CIO-level), there is probably some control over demand, and behavior can be affected by the "pay for what you use" approach. Second, as IT infrastructure moves to a shared (or cloud) model, the possibility of complimentary workload profiles increases, thus opening up more opportunities for savings in a true "utility" model. For example, a business unit that is busy at month-end, combined with another with workloads that are less time critical, thus reducing the excess capacity problem with utilization-based pricing. It also lends itself to an enterprise "cloud" approach where peak workload can be run at the most cost effective location.

A simpler alternative to consumption-based pricing is static capacity-based pricing. This is best implemented as "slices": the customer pays for a set amount of capacity, regardless of its consumption (i.e. a "slice" of the overall infrastructure"). Slices can be aggregated (a resource pool) or granular (individual VMs). Thus it is relatively easy to measure and bill for allocation of these slices, the cost of unused capacity can be recuperated more easily, and customers are unlikely to be shocked by the bill at the end of the month.

 

4.    Unit costs change over time

Shared infrastructure is a combination of fixed costs and variable costs. As the environment grows, the proportion of fixed costs as a percentage of the whole will decrease. A good example is an enterprise storage array – once the initial hurdle of the array and controllers is overcome, adding more drives to the array is relatively inexpensive. The same applies to virtual servers – there is an entry cost that typically requires two or three hosts, access to shared storage and network fabric, and incremental capacity is less expensive for each additional VM.

Chargeback methods need to account for the effect of decreasing marginal costs. To address this, one may: (1) revise the model over time, (2) assume a long-term average steady-state environment (where margins are low initially, and increase over time), or (3) ignore the growth effect in the cost model, allowing cost reductions to improve margins over time. In an outsourcing scenario, ignoring the growth effect means that excess profits are captured ex-post (after the fact). Sales teams do not get compensated on profits realized after the deal has been signed!

 

5.    Use tiered pricing

Chargeback for virtual infrastructure can be tiered on several dimensions. One dimension is on capacity: assuming the "slice" based model above is used, then the capacity tiers are either carved up into some form of small/medium/large slice, or as slices measured in discrete units (e.g. a slice defined as 1 GHz CPU and 1 GB RAM, and a larger VM requiring 4 GB of RAM is counted as four slices). Without tiered pricing for capacity, a "large" VM would be avoided for no other reason other than it sends the cost model underwater.

Some other tiering options are based on availability (with or without HA), or protection (recovery time objectives and recovery point objectives). There is no reason not to combine tiering options. Technologies for self-service provisioning/management (e.g. VMware Lab Manager and Lifecycle Manager) and security (e.g. VMsafe and complementary technologies) offer additional options for tiering.

 

6.    Bundling is a necessity

In traditional IT environments, infrastructure is captive to the applications that are supported by it. Thus it is simple enough for the service provider to pass on the cost of network ports, storage, server hardware, power, and floor space on to the customer as discrete components that have little correlation with the customer's actual business demand.

Virtualization changes that. First, it abstracts the applications and operating systems from the physical infrastructure. Then, it moves everything into a shared infrastructure. Sending the customer a bill that shows usage of network ports, power (kWh) and cooling usage (BTU) is not only wrong, but it is also almost impossible to properly calculate (e.g. they could be using fractions of network ports and may be using portions of de-duplicated storage).

It is better to bundle costs as much as reasonably possible. I know, I just stated that labor and infrastructure costs are best kept separate, but when it comes to infrastructure costs, it is best to roll up the core infrastructure (floor space, power, cooling, network connectivity, etc.) into a bundled infrastructure rate, since the usage of those elements tend to be correlated with actual compute usage in a virtualized world.

7.    Be prepared for a dynamic environment

Virtualization offers customers many more options for creating, cloning, powering on, and powering off of virtual systems. Using a chargeback model that assumes the traditional lifecycle (provision, manage during useful life, de-provision) will not reflect the realities of instant test environments, capacity-on-demand through cloning or expansion, archived VMs for regulatory compliance, and on-demand DR exercises.

With that in mind, chargeback in this more dynamic environment must take into account three major themes:

  1. More self-service tools for IT infrastructure means many of the provisioning and management tasks are automated or are passed to the customer
  2. Capacity-based pricing must take into account capacity that is used on an on-demand basis (e.g. if capacity usage is calculated at the end of every month, it ignores any burst capacity that was used at the beginning of the month)
  3. For non-automated functions, labor used for turn-on, turn-off, cloning, etc. should not be ignored (hence the need for some self-service automation such as VMware Lifecycle Manager when the business requires such a dynamic environment).


This is likely going to be one of a series of guiding principles (please, let's not call it "best practices") for chargeback in a virtual environment. I intend for these to be practical suggestions; implementing a complicated, variable pricing structure with aggressive incentive plans may be intriguing, but we need things we can implement in real life.

  

 [Edit 11/19/08: Replaced "moral hazard" with "adverse selection" ; not sure why I had "moral hazard" in there; must have been thinking about the financial bailout when I wrote this]

 Digg 

 

What did you think of this article?




Trackbacks
  • No trackbacks exist for this entry.
Comments

  • 10/20/2008 2:59 PM Martin wrote:
    Great post! Thanx for this information. Looking forward to the next article.
    Reply to this
  • 10/22/2008 9:00 AM Chris Beauchamp wrote:
    Excellent Article Gerod!

    The one other component I am struggling with is, understanding the CPU efficiencies as it relates to both performance expectations and also potentially for costing factors.

    Translated, newer/faster/better CPUs/machines may cost less than the previous generation but they are more efficient (note this does not directly corelate to mhz/ghz). If there was some table/chart that showed all of the CPUs in certified machines and give a relative weighting factor, we can address this in our planning and costing models.

    Thoughts?

    Cheers,

    Chris
    Reply to this
    1. 10/22/2008 11:03 AM gerod wrote:
      Chris,

      Good point.  I think the closest proxy for normalizing the different generations, core counts, and clock speeds of CPUs is spec.org (specifically SPEC CPU2006 rate).  The challenge is that the only thing that we can see in Virtual Center or any other tool is the CPU's clock speed and core count, and there's probably no good way to extrapolate SPEC numbers within the software itself.

      It would be interesting to see just how correlated the SPEC rates are with core count and clock speed.  I have a feeling that they're actually getting more correlated, as CPU manufacturers have put more effort into jamming more cores onto a chip than other innovative ways of satisfying Moore's Law.

      I could see how it can become more challenging now that there are fewer VMotion constraints between CPU generations because of Enhanced VMotion.  Whereas before, people had separate clusters of ESX hosts of different CPU generations (and could calculate unit costs accordingly), some of those barriers have been eliminated, and it's reasonable to assume that someone could have a mixed farm of servers with different CPU generations, and could have some challenges calculating the unit cost of a CPU cycle.

      Definitely an area for more consideration...

      Reply to this
  • 11/18/2008 5:31 PM Andrew Cooke wrote:
    Great article.
    I have been singing the same song when I am with customers. The challenge appears to be that we can say all these things and they listen, however we don’t appear to be offering any examples of software or any other methods of solving this problem. Sure there are always going to be 10 different ways of doing something, but some real life examples are always good.
    I think vKernel recognise this, but it is too early to say if they are able to help solve the problem.
    If anyone knows of any other software that fits into this space please share.
    Reply to this
  • 3/6/2009 2:29 PM Peter Weinlein wrote:
    Really great article...keep 'em coming!
    Reply to this
  • 10/14/2009 8:07 AM software developers wrote:
    Hey, that was interesting,

    this is a excellent and inspiring article, keep them coming...

    Anyway, thanks for the post
    Reply to this
Leave a comment

Submitted comments will be subject to moderation before being displayed.

 Enter the above security code (required)

 Name

 Email (will not be published)

 Website

Your comment is 0 characters limited to 3000 characters.