Cloud MapReduce — MapReduce built on a cloud operating system

We have finally finished a cool project — Cloud MapReduce — that we have been working on on-off for almost the whole past year. It is a new MapReduce implementation built on top of a cloud operating system. I described what is a cloud operating system before. We looked hard to understand how different is a cloud operating system (OS) from a traditional OS.  I think we have found the key difference — a cloud OS’s scalability. Unlike a traditional OS, a cloud OS has to be much more scalable because it must manage a large infrastructure (much bigger than a PC) and it must serve many customers.  By exploiting a cloud OS’ scalability, Cloud MapReduce achieves three advantages against other MapReduce implementations, such as Hadoop:

Faster: Cloud MapReduce is faster than Hadoop, up to 60 times in one case.

More scalable: No single point of bottleneck, i.e., no single master node that coordinates everything. It is a fully distributed implementation.

Simpler: Only 3000 lines of Java code. Which means it is very easy to change it to suit your needs. Have you ever thought about changing Hadoop? I got a headache even thinking about the 280K lines of code in Hadoop.

I encourage you to read the Cloud MapReduce technical report to learn more about what we have done.