Shy Cohen's Blog: May 2009

Wednesday, May 27, 2009

An Economic View of Cloud Computing, and More

McKinsey & Company recently released a report where they compare, contrast, and provide their own definition for “The Cloud”, clear up some points around its usage, and provide adoption recommendations. Their main point is that at this point in time the cloud is more suitable for small to medium enterprises than it is to large enterprises. Their calculations show this to be the case primarily (but not solely) due to simple financial reasons – enterprises can in most cases get better “bang for the buck” by running their own servers, while avoiding all sort of concerns associated with running their software on someone else’s hardware (like reliability and security).

While the report makes some very good points, I’ve had some thoughts while reading it that I would like to share.

Note: It is important to note that their report is focused on the Cloud as a platform-as-a-service for running one’s software on, and not Cloud Services which are composable components that one can utilize when building a SOA-based solution. The report is somewhat inconsistent in its use of the term Cloud though, and on several occasions use “cloud services” to mean “cloud platform services”. Just keep that in mind as you read through it.

The report takes an operational-cost centric approach, which I like in general. They also provide information to support their analysis which is interesting by itself. However, there are some implicit assumptions they make and viewpoints they take, that I believe should have been called out.

Process vs. Machine Virtualization

The analysis in the report seems to take a very VM-centric view of the cloud as a platform – the report is comparing running servers in the cloud to running them in-house (directly on the hardware, or by leveraging virtualization). This approach addresses the model that AWS is offering, but misses the mark on process-level virtualization of the type that Microsoft Azure, Force.com, and Google AppEngine have to offer.

I think that part of the reason for this is that very few people have grokked process-level virtualization at this point in time, and that measuring per-machine costs is much better understood than measuring per-process costs. Nevertheless, this is still missing from the article’s analysis and is something that needs looking into before dismissing the cloud platform as being “too expensive”.

Short-Term vs. Long-Term Investments

While operational-cost is typically measured over a long period of time (which is exactly why keeping it low is so important!), there are cases where long-term savings are not as important. In particular, I am talking about cases where the time-period over which the resource is used is short.

The report looks at the steady-state, and does not address the benefits that large enterprises can get with scale-up-on-demand. There are great cost advantages, as well as improvements to ROI and ROA, that large enterprises can introduce by avoiding (a) the delay introduced by the IT procurement cycle, (b) the cost of additional IT administration headcount, and (c) the cost and delay associated with facilities.

If an enterprise needs to spin up resources to deal with a short spike in the demand for computational or storage resources, it should definitely explore cloud-computing as an option. Examples of such short-term, spike-generating events include external events that affect the business (e.g. tax day, election day, a sport event, etc.), one-time processing of large volume of data, one-off large-scale simulation, etc.

Enterprise IT vs. Departmental IT

An aspect that the report did not address, and that greatly changes the way things play out in the “real world”, is the Departmental nature of of the Enterprise. As it turns out, many enterprises are run as an amalgamation of departments, where in addition to the global IT budget each department has its own separate IT budget. (In some cases this has to do with organizational politics as much as it has to do with technological and financial management.) In organization of this nature some IT facilities are shared and financed through a company-wide IT budget, and some are not shared and are financed through departmental-level IT budgets.

One example of such enterprise is Microsoft. Microsoft offers many shared resources like Active Directory, email servers, SharePoint servers, and alike, to the different departments and product teams in th3 company. In addition to those shared resources, every product team at Microsoft has its own IT budget that it can use to drive product development. Instead having to coordinate resource usage in a central lab with the rest of the company (like time-sharing on mainframes), separate budget and facilities make it much simpler for each team to acquire its own computing resources and manage its their usage.

If a team at a company that is manages in this way needs to use large-scale resources on a sporadic basis, there is little reason for that team to build, support, and staff a lab full of hardware. For example, a team may need to run large-scale stress test on their product for a few weeks before they ship. A team like that would most likely benefit from utilizing the pay-for-use, FTE-saving, on-demand resource availability model that the cloud can offer.

Resource Management

Up until now I mentioned examples where an Enterprise can benefit from the Cloud as a platform. There are however reasons why the Cloud may not yet be ready for broad Enterprise adoption. One key hindering aspects for the adoption of the cloud as a platform is the Management Infrastructure, or lack/immaturity thereof.

The ability to monitor and manage resource utilization, as well as the charges that their consumption creates, is paramount to broad adoption. The report only touches on this briefly (on slide 32), but generally speaking that aspect is a huge adoption blocker, as no CTO worth his salt would approve spending on something that they cannot measure and have the tools to control. There are some vendors out there today that offer such management services, but at this point in time there are no standards on how to integrate this management with existing management infrastructures that Enterprises (of any scale) have in place.

Recommendations

At this point in time I would make the following recommendations for large enterprises who are looking to adopt the Cloud as a platform:

Look closely at the economic viability of the various cloud platforms compared to aggressive virtualization, or running your own “internal cloud”

Examine all virtualization models, not just virtual machines.
Experiment and measure your actual costs of running your applications in these environments versus on your own servers

Adopt different approaches for short-term vs. long-term investments.

Don’t limit yourself to thinking about hardware as an asset that you needed to buy and manage.
Consider using cloud platforms for short-term investments

Look to adopt cloud platform at the departmental level instead of investing in your own departmental HW and the FTEs to run and manage it.

Especially for needs of a “spiky” nature

Pay close attention to cost, and push your Management software vendor to provide you with management capabilities for your cloud platform investments

Your thoughts and comments on this matter are most welcome!

Cheers,
Shy.

Labels: SOA Governance

# posted by Shy Cohen : 2:58 AM 0 Comments Links to this post

Sunday, May 17, 2009

The Economics of SOA Reuse

(Re)use with Caution

We live in a world where software is a business tool and seen more and more as an asset – having the right tools can give you a “leg up” on the competition, help you be more efficient, and increase your profit margins. In a business sense, software can be (and often is) measured like other assets, in terms of initial investment, operational cost, overall gains, and ROI.

One of the key selling points of SOA is the reuse of existing software assets. When functionality is made accessible for composition in a SOA, reuse is a natural thing – it allows us to recapitalize on existing investments when building new systems or extending existing ones, and gain additional value from investments we’ve already made. Personally, I am an avid supporter of reuse, and the notion of “composing” software out of reusable building blocks. Build it once, build it right, and use it again and again - this is goodness. This makes sense not only from a business standpoint, but also from a technology standpoint.

However, it is important to recognize that reusability has down sides as well. Reuse should only take place when it makes sense, and making reuse a goal in its own right may lead to strange, unwanted results. In fact, there are cases where reducing reuse can lead to better business results. In this article I’d like to show when this true, and why.

The Strange Effects of Wrong Motivations

Before we dive into the economics of Reuse, I’d like to briefly talk about the dangers of make Reuse a goal by itself and providing incentives for it.

The first danger lies with motivating people who are creating services based on how reusable these services are. Just to be clear, Reuse should definitely be considered when designing a service. However, if you incentivize people on how reusable their services are they might spend undue time on making the service “generic” instead of focusing on current needs and delivering immediate value. Common anti-patterns that this may lead to are Analysis Paralysis and Feature Creep. It can also lead to the service interface design being overly complicated or generic, and reduce the usability factor for many of its clients. These effects can also result in projects going over time and/or over budget.

The second danger lies with motivating people who are building clients to over-use services. If you incentivize people based on their level of asset reuse, they might use things that they have don't really need to, just to satisfy some corporate policy. Common anti-patterns this may lead to are Stovepipe Systems and Dependency Hell.

In a sense, we are relearning with SOA Reuse what we’ve learned in the past with code defects: You can’t create incentives around reuse for the same reasons that you cannot create incentives for developers or testers around bugs.

Finding the Balance

So how does one find the right balance and introduce the right level of reuse?

The first step involves making sure that the service is economically viable. The simple idea here is that the cost of the service should not exceed the value that it is providing – this is quite obvious. However, the way to measure cost and value is not necessarily trivial. One interesting factor is the initial investment that one would make in developing or acquiring the service. This factor alone makes the service initially a “money sink”. Over time it is expected that the service would start showing profit, and eventually allow us to regain the initial investment. Let’s look at the different investments and gains of a service to figure out how this works.

At a high-level, there are 3 kinds of investments in a service:

Service Development Cost (DC): This is the one-time cost incurred when developing or acquiring the service. The influence of reusability on this element may vary, and will depend (amongst other things) on how much time the service designer might spend factoring in requirements from multiple clients.
Service Maintenance Cost (MC): This is the ongoing cost of maintaining the service (bug fixes, adding features, etc.). The influence of reusability on this element may vary, and will depend (amongst other things) on how much time the service maintainer might spend on adding features to accommodate new clients.
Service Operational Cost (OC): This is the ongoing cost for running the service. This cost combines 2 main factors. The first is the expenses incurred for running the service, such as hardware costs, power, etc. The second comes from the compositional nature of SOA – it is the total cost of using other services that this service depends on. For example, if Service A is calling Service B 100 times a day on average, and every call to Service B costs 10 cents, then the average daily cost of that dependency is $10. This cost is where we would focus the discussion on reduction of reuse.

Offsetting the investments are 2 kinds of gains:

Service Operational Gains (OG): This is the measured business value provided by the service. In its simplest form, this value comes through reuse, and the value passed on by the service’s clients in the form of usage fees. In economical terms, this kind of gain is called Revenue. For example, Service B in the previous example charges its callers 10 cents per-call. Assuming each client makes 100 calls a day on average, the revenue of the service would be $300 a month per client. Other examples for how to measure this value relate to cost savings and indirect gains, like having a lower cost compared to the alternative, or providing the ability to handle new/additional business.
Other Gains: A service can also provide value that is harder to measure in money terms. This is particularly common with “staple services” such as Communication or Utility services which provide some basic functional needs and are mostly seen as an infrastructure investment (e.g. Message Routing or Identity). However, if a service is providing value, then this value must be measurable in some way. I will elaborate on that later on, but ignore it for now.

The Periodic Profit (PP) of a service is the OG for that period minus the OC and MC for that period. This value must be positive from a Flow of Value standpoint: PP = OG – OC – MC.

PP is important, but we must not forget about the initial investment we’ve made in developing and/or acquiring the service. The Lifetime Profit of a service at a certain point in time after it is deployed is calculates as the total OG for that period minus the total Cost for the period. The total Cost is the sum of the periodic operational and maintenance cost, plus the initial investment. Profit (time) = PP * time - DC.

As the service shows PP, that profit will start to cover the initial development investment. The time period required to recapture the initial investment (Time to Recover Investment, or TRI) will depend on the size of the initial investment, and the PP. TRI = DC / PP.

An Economically Viable service is a service that has a reasonable TRI – according to the criteria set by the business decision makers.

Keeping Afloat and Weighing Options

Being economically viable is not always enough to justify the building of a service, especially when you have limited resources and need to choose between multiple options. For example, would it be better to “fix an old tool” (e.g. by replace the legacy inventory-tracking system with a brand new service) or “build a new tool” (e.g. by introducing a workflow-monitoring service)? One index that we can use to figure this out is the rate of Return on Investment for each option. This can be calculated by dividing the total Profit by the total Cost: ROI = Profit / Cost.

Let’s see how this might work using some sample numbers.

The first option we will look at involves upgrading an existing service (Service A). This service currently has an OC of $900 a month, MC of $100 a month, and OG (revenue) of $1400 a month. The OC in our example is high since Service A reuses an existing system that by itself costs $600 a month to operate. With a $20000 investment the service can be rebuilt in a way that avoids using that system, and instead using a new, cheaper implementation that will take the OC down to $400 a month. The new monthly cost would then be $500, which means that the monthly revenue would be $900, which in turn means that the cost of upgrading would be recovered in under 2 years (TRI < 2 years). Once deployed, the new implementation would reduce the OC by $6000 a year by not reusing the expensive system.

But is it really worth reducing the dependency in this case, or would we be better off building a new service and providing additional value?

Let’s look at a second option, in which we will build a new service. This new service (Service B) will also have an OC of $400 a month, DMC of $100 a month, and DC of $20000, but the OG would only be $1100. Would it make sense to build this service, create more assets, but keep the old, expensive one in place? Let’s look at the numbers over different periods of time:

		One Time	Monthly					Total
		DC	MC	OC	Cost	OG	Profit	Months	Cost	OG	Profit	ROI
2 years	A (original)	0	100	900	1000	1400	400	24	24000	33600	9600	0.40
	A (improved)	20000	100	400	500	1400	900	24	12000	33600	1600	0.05
	B	20000	100	400	500	1000	500	24	12000	24000	-8000	-0.25
	B + A (original)	20000	200	1300	1500	2400	900	24	36000	57600	1600	0.03
3 years	A (original)	0	100	900	1000	1400	400	36	36000	50400	14400	0.40
	A (improved)	20000	100	400	500	1400	900	36	18000	50400	12400	0.33
	B	20000	100	400	500	1000	500	36	18000	36000	-2000	-0.05
	B + A (original)	20000	200	1300	1500	2400	900	36	54000	86400	12400	0.17
4 years	A (original)	0	100	900	1000	1400	400	48	48000	67200	19200	0.40
	A (improved)	20000	100	400	500	1400	900	48	24000	67200	23200	0.53
	B	20000	100	400	500	1000	500	48	24000	48000	4000	0.09
	B + A (original)	20000	200	1300	1500	2400	900	48	72000	115200	23200	0.25

As you can see, both options result in the same Profit, and recoup the initial investment in less than 2 years. However, the ROI for investing in the existing system is much higher, due to the higher cost of operating and maintaining 2 services. Another financial ration that we can apply here is the Return on Assets (ROA). ROA is a measure of how efficiently our resources are managed, and is calculated as: ROA = = Profit / Assets. In option B we are creating more assets but generating the same amount of profit, which means that we in fact are less efficient.

Given these indicators, the right business decision in this case would be to upgrade the existing service and not build a new one.

Ok, I “cooked” the numbers a little bit so that the profit end up the same in both cases, and the real result depends heavily on the real values for DC, MC, OC, and the OG. However, the fact still stands that even if the Profit for option B will be somewhat higher, the lower ROI and ROA may not justify the new investment. If you want to see what may happen if the values change even slightly, download the Excel spreadsheet and plug in your own values (for example, try varying the OG for B by $50 up and down and see what happens with the Profit and ROI).

Taking Care of Basic Needs

I have mentioned above that is sometimes harder to measure the OG of “staple services” such as Communication or Utility services in money terms. These services are essential to the technical foundation of the business, but how do we measure their value?

One way in which such services are measured is as an Expense Center (EC). A service operated as an EC typically has a fixed budget to work with, and it is expected to maximize its output within the constraints of that budget. That budget is managed as part of the overall IT budget, which makes it susceptible to the effects of political and other types of negotiations. Since the users of such a service are generally not charged directly for their utilization of it, a possible side effect could be that the users will to over-consume the service, leading to higher-than-planned OC and MC.

Another way to make sure such a service viable is to charge for its usage. The charge can be per-operation or a periodic one. When utilizing this approach, proper governance and a shared interest in the success of the organization as a whole are key to ensuring that the cost that “staple service” is at reasonable level that can support that service without overburdening its users.

Yet another approach is to measure the value of these services in by using a point system, where point values are assigned to services based on their “popularity”. This method allows you to correctly meter and justify the cost of infrastructure services on which you spend real money but without which your business cannot function efficiently.

In Summary

This article comes to show that although reuse is important, and a key value proposition for SOA, it should not take place at any cost. Having something available doesn’t mean that reusing it is always the right thing to do; reuse should be evaluated and decided on as business decision, as well as a technological one. Encouraging reuse as a general policy and rewarding for it can lead to undesirable results, and should be generally avoided. There are cases where reuse is the only logical choice (e.g. reuse of the user identity management system). In all other cases one should measure the flow of value to help decide whether reuse makes sense or not.

To ensure the economic viability of a service one needs to apply the right methods for valuating it. Profit estimation, ROI, and ROA are useful tools to employ when comparing investment alternatives, but there are other ways to do that we well (such as the point system mentioned above). The need to ensure the economic viability of a service exists for application services, as well as infrastructure and foundational services, and proper governance could serve as a great gatekeeper to ensure that proper due diligence takes place.

This goal of article was to provide you with some tools and techniques to help you make the right calls on SOA reuse. Now it is up to you to go and make them. Go forth and innovate!

Technorati Tags: soa, soa-governance

# posted by Shy Cohen : 4:48 PM 0 Comments Links to this post

Wednesday, May 13, 2009

Cloud Computing Is for Small Companies Too

Yesterday at TechEd I got to chat with Michael Stiefel about some of the reasons why Cloud computing can be compelling to small organizations (both small businesses, and semi-independent business units inside of a large organization). The talk was recorded and is posted online at http://tinyurl.com/qkfzkd.

# posted by Shy Cohen : 12:58 PM 0 Comments Links to this post

Subscribe to Posts [Atom]