Why cloud has not taken off in the enterprises?

Given all the hypes around cloud computing, it is surprising that there is only scant evidence that cloud has taken off in the enterprise. Sure, there is salesforce.com, but I am more referring to the infrastructure cloud, such as Amazon, or even Microsoft Azure and Google App Engine. Why have not it taken off? I have heard various theories, and I also have some of my own. I list them in the following. Please feel free to chime in if I miss anything.

  • I am using it but I do not want you to know. Cloud is different from previous technologies that it is not a top-down adoption (e.g., mandated by the CIO), but rather a bottom-up adoption where engineers are fed up with the CIO and want a way around. I have heard various companies using Amazon extensively, particularly in life sciences and financial sector, where there is a greater demand for more computation power. In one case, I have heard a hedge fund company who rents 3,000 EC2 servers every night to crunch through the numbers. Even though they use cloud, they do not want to generate unnecessary attention to invite questions from upper management and CIOs, because cloud is still not an approved technology and putting any data outside of the firewall is still questionable.
  • Security. Even if the CIO wants to use the cloud, security is always a major hurdle to get around. I have been involved in such a situation. Before using cloud, it has to be officially approved by the cyber security team; however, the cyber security team has every incentive to disapprove it because moving data outside the firewall exposes them to additional risks that they have to be responsible for. In the case I was involved in, the cyber security team came up with a long list of requirements that, in the end, we found that some of their internal applications do not even meet. Needlessly to say that the project I was involved in was not a go, even with a strong push from the project team.
  • Redemptification. CIOs have to cover themselves too. Most cloud vendors’, including Amazon’s, license term disclaims all liabilities should lose or failure happens. This is very different from the traditional hosting model, where the hosting provider claims responsibility in writing. CIOs have to be able to balance the risk. A redemptification clause in the contract is like an insurance policy, and few CIOs want to take the responsibility for something they do not control.
  • Small portion of cost. Infrastructure cost is only a small portion of the IT cost, especially for capital rich companies. I heard from one company that their infrastructure cost is < 20% of the budget, and the bulk of the budget is spent on application development and maintenance. For them, finding a way to reduce application cost is the key. For startups, cloud makes a lot of sense. However, for enterprises, cloud may not be the top priority. To make matters worse, porting applications to the cloud tends to require the application to be re-architected and re-written, causing the application development cost to go even higher.
  • Management. CIOs need to be in control. Having every employee pulling out their credit card to provision for compute resources is not ok. Who is going to control the budget? Who will ensure data is secured properly? Who can reclaim the data when the employee leaves the company? Existing cloud management tools simply do not meet enterprises’ governance requirements.

Knowing the reasons is only half of the battle. At Accenture Labs, we are working on solutions, often in partnership with cloud vendors, to address cloud shortcomings. I am confident that, in a few years, the barrier to adoption in enterprises would be much lower.

Google’s MapReduce patent and its impact on Hadoop and Cloud MapReduce

It is widely covered that Google finally received its patent on MapReduce, after several rejections. Derrick argued that Google would not enforce its patent because Google would not “risk the legal and monetary consequences of losing any hypothetical lawsuit“. Regardless of its business decision (whether to risk or not), I want to comment on the technical novelty aspects. Before I proceed, I have to disclaim that I am not a lawyer, and the following does not constitute a legal advice. It is purely a personal opinion based on my years of experience as a consulting expert in patent litigations.

First of all, in my view, the patent is an implementation patent, where it covers the Google implementation of the MapReduce programming model, but not the programming model itself. The independent claims 1 (system claim) and 9 (method claim) both describe in details the Google implementation including the processes used, how the operators are invoked and how to coordinate the processing.

The reason that Google did not get a patent on the programming model is because the model is not novel, at least in legal terms (that is probably why the patent took so long to be granted). First, it borrows ideas from functional programming, where the idea of “map” and “reduce” has been around for a long time. As pointed out by the database community, MapReduce is a step backward partly because it is “not novel at all — it represents a specific implementation of well known techniques developed nearly 25 years ago”. Second, the User Defined Function (UDF) aspect is also a well known idea in the database community,  which has been implemented in several database product before Google’s invention.

Even though it is arguable whether the programming model is novel in  legal terms, it is clear to me that the specific Google implementation is novel. For example, the fine grain fault tolerance capability is clearly missing in other products. A recent debate on MapReduce vs. DBMS would shed light on what aspects of MapReduce is novel, see CACM articles here, and here, so I would not elaborate further.

Let us first talk about what the patent means to Cloud MapReduce. The answer is: Cloud MapReduce does not infringe. The independent claims 1 and 9 state that “the plurality of processes including a master process, for coordinating a data processing job for processing a set of input data, and worker processes“. Since Cloud MapReduce does not have any master node, it clearly does not infringe. Cloud MapReduce uses a totally different architecture than what Google described in their MapReduce paper, so it only implements the MapReduce programming model, but does not copy the implementation.

For Hadoop, my personal opinion is that it infringes the patent, because Hadoop exactly copies the Google implementation as described in the Google paper. If Google enforces the patent, Hadoop can do several things. First, Hadoop can find an invalidity argument, but I personally think it is hard. The Google patent is narrow, it only covers the specific Google implementation of MapReduce. Given how widely MapReduce is known, if there were a similar system, we would have known about it by now. Second, Hadoop could change its implementation. The patent claim language includes many “wherein” clauses. If Hadoop does not meet any one of those “wherein” clauses, it can be off the hook. The downside, though, is that a change in implementation could introduce a lot of inefficiencies. Last, Hadoop can adopt an architecture like Cloud MapReduce‘s. Hadoop is already moving in this direction. The latest code base moved HDFS into a separate module. This is the right move to separate out functions into independent cloud services. Now only if Hadoop can implement a queue service, Cloud MapReduce can port right over :-).