<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Huan Liu&#039;s Blog</title>
	<atom:link href="http://huanliu.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://huanliu.wordpress.com</link>
	<description>Cloud computing, distributed system, research results</description>
	<lastBuildDate>Fri, 02 Sep 2011 00:36:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='huanliu.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Huan Liu&#039;s Blog</title>
		<link>http://huanliu.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://huanliu.wordpress.com/osd.xml" title="Huan Liu&#039;s Blog" />
	<atom:link rel='hub' href='http://huanliu.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Launch a new site in 3.5 weeks with Amazon</title>
		<link>http://huanliu.wordpress.com/2011/09/02/launch-a-new-site-in-3-5-weeks-with-amazon/</link>
		<comments>http://huanliu.wordpress.com/2011/09/02/launch-a-new-site-in-3-5-weeks-with-amazon/#comments</comments>
		<pubDate>Fri, 02 Sep 2011 00:29:43 +0000</pubDate>
		<dc:creator>huanliu</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[technology]]></category>
		<category><![CDATA[Amazon]]></category>
		<category><![CDATA[gamification]]></category>
		<category><![CDATA[wellness]]></category>

		<guid isPermaLink="false">http://huanliu.wordpress.com/?p=159</guid>
		<description><![CDATA[Getting started quick is one of the reasons that people adopted cloud, and that is why Amazon Web Services (AWS) is so popular. But people often overlook the fact that the retail part of Amazon is also amazing. If your project involves supply chain, you can also leverage Amazon retail to get up and running [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=159&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Getting started quick is one of the reasons that people adopted cloud, and that is why Amazon Web Services (AWS) is so popular. But people often overlook the fact that the retail part of Amazon is also amazing. If your project involves supply chain, you can also leverage Amazon retail to get up and running quickly.</p>
<p>We recently launched a wellness pilot project at Accenture where we leveraged both Amazon retail and Amazon web services. The <a href="http://steptacular.org">Steptacular</a> pilot is designed to encourage Accenture US employees to lead a healthy lifestyle. We all had our new year resolutions, but we always procrastinate, and we never exercise as much as we should. Why? Because there is a lack of motivation and engagement. The Steptacular pilot uses a pedometer to track a participant&#8217;s physical activity, then it leverages concepts in Gamification, uses social incentive (peer pressure) and monetary incentive to constantly engage participants. I will talk about the pilot and its results in details in a future post, but in this post, let me share how we are able to launch within 3.5 weeks, the key capabilities we leveraged from Amazon and some lessons we learned from this experience.</p>
<p><strong>Supply chain side</strong></p>
<p>The Steptacular pilot requires participants to carry a pedometer to track their physical activity. This is the first step of increasing engagement &#8212; using technology to alleviate the hassle of manual (and inaccurate) entry. We quickly locked into the <a href="http://www.amazon.com/Omron-HJ-720ITFFP-Pedometer-Advanced-Management/dp/B003U3HMN2/ref=sr_1_1?ie=UTF8&amp;qid=1314922442&amp;sr=8-1">Omron HJ-720</a> model because it is low cost and it has a USB connector so that we can automate the step upload process.</p>
<p>We got in touch with Omron. The guys at Omron are super nice. Once they learned what we are trying to do, they immediately approved us as a reseller. That means we can buy pedometer at the wholesale price. Unfortunately, we still have to figure out how we can get the devices into our participants&#8217; hands. Accenture is a distributed organization with 42 offices in the US alone. To make the matter worse, many consultants work from client sites, so it is not feasible to distribute in person. We seriously considered three options:</p>
<ol>
<li>Ask our participants to order directly from Amazon. This is the solution we chose in the end, after connecting with the Amazon buyer in charge of the Omron pedometer and being assured that they will have no problem handling the volume. It turns out that this not only saves us a significant amount of shipping hassle, but it is also very cost effective for our participants.</li>
<li>Be a vendor ourselves and uses Amazon for supply chain. Although I did not know about it before, I am pleasantly surprised to learn about the <a href="http://www.amazonservices.com/content/fulfillment-by-amazon.htm">Fulfillment by Amazon</a> capability. This is Amazon&#8217;s cloud for supply chain. Like a cloud, this is provided as a service &#8212; you store your merchandise in Amazon&#8217;s warehouse, and they handle the inventory and shipping. Also, like a cloud, it is pay per use with no long term commitment. Although equally good at reducing hassle for us, we did not find that we can save cost. Amazon retail is so efficient and has such a small margin that we realize we cannot compete even though we are happy with a 0% margin and even though we (supposedly) pay for the same wholesale price.</li>
<li>Ship and manage by ourselves. The only way we could be cheaper is if we manage the supply chain and shipping logistics ourselves, and of course, this is assuming that we work for free. However, the amount of work is huge, and none of us wants to lick envelope for a few weeks, definitely not for free.</li>
</ol>
<p>The pilot officially launched on Mar. 29th. Besides Amazon itself, another Amazon affiliate, J&amp;R music, also sells the same pedometer on Amazon&#8217;s website. Within a few minutes, our participants were able to totally drain J&amp;R&#8217;s stock. However, Amazon remained in stock for the whole duration. Within a week, they sold roughly 3,000 pedometers pedometers. I am sure J&amp;R is still mystified by the sudden surge in demand. If you are from J&amp;R, my apologies for not giving adequate warning ahead and kudos to you for not overcommitting your stock like many TouchPad vendors did recently (I am one of those burned by OnSale).</p>
<p>In addition to managing device distribution, we also have to worry about how to subsidize our participants. Our sponsors agreed to subsidize each pedometer by $10 to ease the adoption, but we could not just write each participant a $10 check &#8212; that is too much work. Again, Amazon came to the rescue. There are two options. One is that Amazon could generate a bunch of one-time-use $10 discount code which is specifically tied to the pedometer product, then, based on how many are redeemed, Amazon could bill us for the total cost. The other option is that we could buy a bunch of $10 gift cards in bulk and distribute to our participants electronically. We ultimately chose the gift card option for its flexibility and also for the fact that it is not considered a discount so that the device would still cost more than $25 for our participants to qualify for super saver shipping. Looking back, I do regret choosing the gift card option, because managing squatters turns out to be a big hassle, but that is not Amazon&#8217;s fault, it is just human nature.</p>
<p><strong>Technology platform side</strong></p>
<p>It is a no-brainer to use Amazon to leverage its scaling capabilities, especially for a short-term quick project like ours. One key thing we learned from this experience is that you should only use what you need. Amazon web services offer a wide range of services, all designed for scale, so it is likely that you will find a service that serves your need.</p>
<p>Take for example the email service Amazon provides. Initially, we used Gmail for sending out signup confirmations and email notifications. During the initial scaling trial, we soon hit Gmail&#8217;s limit on how fast we can send emails. Once realizing the problem, we quickly switched to Amazon SES (Simple Email Service). There is an initial cap on how many we can send, but it only took a couple of emails for us to lift the limit. With a couple of hours of coding and testing, we all of a sudden can send thousands of emails at once.</p>
<p>In addition to SES, we also leveraged AWS&#8217; CloudWatch service to enable us to closely monitor and be alerted of system failures. Best of all, it all comes for free without any development effort from our side.</p>
<p>Even though Amazon web services offer a large array of services, you should only choose what you absolutely need. In other words, do not over engineer. Let us taking auto scaling as an example. If you host a website in Amazon, it is natural to think about putting in an auto-scaling solution, just in case to handle the unexpected. Amazon has its auto scaling solution, and we, at the Accenture labs, have even developed an auto-scaling solution called <a href="https://blogs.accenture.com/technology_labs_blog/archive/2010/06/04/webscalar-a-prebuilt-fault-tolerant-auto-scaling-web-server-farm-in-the-cloud.aspx">WebScalar</a> in the past. If you are Netflix, it makes absolute sense to do so because your traffic is huge and it fluctuates widely. But if you are smaller, you may not need to scale beyond a single instance. If you do not need it, it is extra complexity that you do not want to deal with especially when you want to launch quick. We estimated that we will have around 4,000 participants, and when we did a quick profiling, we figured that a standard extra-large instance in Amazon would be adequate to handle the load. Sure enough, even though the website experienced a slow down for a short period of time during launch, it remains adequate to handle the traffic for the whole duration of the pilot.</p>
<p>We also learned a lesson on fault tolerance &#8212; really think through your backup solution. Steptacular survived two large-scale failures in the US East data center. We enjoyed peace of mind partly because we are lucky, partly because we have a plan. Steptacular uses an instance-store instance (instead of an EBS instance). We made the choice mainly for performance reasons &#8212; we want to free up the network bandwidth and leverage the local hard disk bandwidth. This turns out to have saved us from the <a href="http://aws.amazon.com/message/65648/">first failure in Apr.</a> which is caused by EBS blocks failure. Even though we cannot count on EBS for persistency, we build in our own solution. Most static content on the instance is bundled into a Amazon Machine Image (AMI). There are two pieces of less static content (the content that changes often) stored on the instance: the website logic and the steps database. The website logic is stored in a Subversion repository and the database is synced to another database running outside of the US East data center. This architecture allows us to be back up and running quickly, by first launching our AMI, then check out website code from repository and lastly dump and reload the database from the mirror. Even though we did not have to initiate this backup procedure, it is good to have the peace of mind knowing your data is safe.</p>
<p>Thanks to Amazon, both Amazon retail and Amazon web services, we are able to pull off the pilot in 3.5 weeks. More importantly, the pilot itself has collected some interesting results on how we can motivate people to exercise more. But I will leave that to a future post after we have a chance to dig deep into the data.</p>
<p><strong>Acknowledgments</strong></p>
<p>Launching Steptacular in 3.5 weeks would not have been possible without the help of many people. We would like to especially thank the following folks:</p>
<ul>
<li>Jim Li from Omron for providing both hardware, software and logistics support</li>
<li><a href="http://www.jeff-barr.com/">Jeff Barr</a> from Amazon for connecting us with the right folks at Amazon retail</li>
<li><a href="http://perspectives.mvdirona.com/">James Hamilton</a> from Amazon for increasing our email limit on the spot</li>
<li>Charles Allen from Amazon for getting us the gift codes quickly</li>
<li>Tiffany Morley and Helen Shen from Amazon for managing the inventory so that the pedometer miraculously stayed in stock despite the huge demand</li>
</ul>
<p>Last but not least, big kudos to the Steptacular team, which includes several Stanford students, who worked really hard even through the finals week to get the pilot up and running. They are one of the best team I proudly have ever worked with.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/huanliu.wordpress.com/159/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/huanliu.wordpress.com/159/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/huanliu.wordpress.com/159/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/huanliu.wordpress.com/159/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/huanliu.wordpress.com/159/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/huanliu.wordpress.com/159/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/huanliu.wordpress.com/159/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/huanliu.wordpress.com/159/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/huanliu.wordpress.com/159/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/huanliu.wordpress.com/159/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/huanliu.wordpress.com/159/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/huanliu.wordpress.com/159/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/huanliu.wordpress.com/159/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/huanliu.wordpress.com/159/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=159&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://huanliu.wordpress.com/2011/09/02/launch-a-new-site-in-3-5-weeks-with-amazon/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/aabcaed537a9f2df3a2010d2158d4546?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">huanliu</media:title>
		</media:content>
	</item>
		<item>
		<title>How to run MapReduce in Amazon EC2 spot market</title>
		<link>http://huanliu.wordpress.com/2011/06/22/how-to-run-mapreduce-in-amazon-ec2-spot-market/</link>
		<comments>http://huanliu.wordpress.com/2011/06/22/how-to-run-mapreduce-in-amazon-ec2-spot-market/#comments</comments>
		<pubDate>Wed, 22 Jun 2011 21:47:43 +0000</pubDate>
		<dc:creator>huanliu</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[MapReduce]]></category>
		<category><![CDATA[Amazon EC2]]></category>
		<category><![CDATA[Spot Market]]></category>

		<guid isPermaLink="false">http://huanliu.wordpress.com/?p=154</guid>
		<description><![CDATA[If you often run large-scale MapReduce/Hadoop jobs in Amazon EC2, you must have thought about using the spot market. EC2&#8242;s spot market price for a spot instance is typically 60+% less than that of an on-demand instance. For a large job, where you use many instances for many hours, a 60+% saving could be a [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=154&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>If you often run large-scale MapReduce/Hadoop jobs in Amazon EC2, you must have thought about using the spot market. EC2&#8242;s spot market price for a spot instance is typically 60+% less than that of an on-demand instance. For a large job, where you use many instances for many hours, a 60+% saving could be a substantial amount.</p>
<p>Unfortunately, using spot market has not been trivial. In exchange for the lower price, Amazon has your explicit agreement that they can terminate you at any time. This is a problem since you may lose all your work. A <a href="http://www.usenix.org/event/hotcloud10/tech/full_papers/Chohan.pdf">research paper from HotCloud</a> last year showed that even adding more spot instances (not replacing existing nodes) could be detrimental to a running MapReduce job. In other words, you add more resources to your cluster, but your running time could actually be longer.</p>
<p>Beyond lengthening your computation, spot market could even make you lose your data. Existing MapReduce implementations, such as Google&#8217;s internal implementation or Hadoop, are designed with failure in mind already. However, the assumed scenario is a hardware failure, i.e., a small fraction of nodes may go down at any time. This assumption is not true in the spot market environment, where all nodes of a cluster may fail at the same time. You not only can lose all your states (when the master nodes go down), but you can also lose all your data (when nodes holding replicas for a piece of data all go down).</p>
<p>What about bidding for a really high price for your spot instances, and hoping that Amazon never increases the price that high? Unfortunately there is no guarantee on how high the spot market price could be. There are several occasions last year where the spot instances price actually exceeded the on-demand instances. This is likely because some guys were bidding at a high-than-on-demand-instance price, and Amazon really needed to kill those instances to free up capacity.</p>
<p>While the naive approach of bidding at a high price may not work, I am happy to report that there is a new technique that can help you leverage spot market to save money. We recently developed a MapReduce implementation that could tolerate large-scale node failures (e.g., when your bid price is below Amazon&#8217;s spot price). Even if all nodes in your cluster are terminated, we can guarantee that no state is lost, and that you can continue make forward progress when your cluster comes back online (e.g., when your bid price is higher than Amazon&#8217;s spot price).</p>
<p>Our implementation leverages two key things. First, when Amazon terminates your instance, it is not a hard power off. Instead, it is a soft OS shutdown, where you have a couple of minutes to execute your shutdown script. We modified our shutdown script where we save the current progress and generate a new task for the remaining work so that another node can take over in the future. In other words, we use on-demand checkpointing to save states only when needed.</p>
<p>Second, we constantly save intermediate data in order to minimize the volume of state we have to save in the shutdown phase. Our solution is built on <a href="http://code.google.com/p/cloudmapreduce">Cloud MapReduce</a>, which constantly streams intermediate data out of the local node. In comparison, other MapReduce implementations, such as Hadoop, save all intermediate data locally before a task finishes. This could result in too large a dataset to save during the short shutdown window.</p>
<p>I would not belabor the details of our implementation, except mentioning that it was published last week at USENIX HotCloud conference. You can read the <a href="http://sites.google.com/site/huanliu/spot.pdf">Spot Cloud MapReduce paper</a> for the full details.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/huanliu.wordpress.com/154/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/huanliu.wordpress.com/154/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/huanliu.wordpress.com/154/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/huanliu.wordpress.com/154/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/huanliu.wordpress.com/154/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/huanliu.wordpress.com/154/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/huanliu.wordpress.com/154/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/huanliu.wordpress.com/154/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/huanliu.wordpress.com/154/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/huanliu.wordpress.com/154/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/huanliu.wordpress.com/154/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/huanliu.wordpress.com/154/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/huanliu.wordpress.com/154/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/huanliu.wordpress.com/154/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=154&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://huanliu.wordpress.com/2011/06/22/how-to-run-mapreduce-in-amazon-ec2-spot-market/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/aabcaed537a9f2df3a2010d2158d4546?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">huanliu</media:title>
		</media:content>
	</item>
		<item>
		<title>Terremark announces new lower price</title>
		<link>http://huanliu.wordpress.com/2011/04/05/terremark-announces-new-lower-price-2/</link>
		<comments>http://huanliu.wordpress.com/2011/04/05/terremark-announces-new-lower-price-2/#comments</comments>
		<pubDate>Tue, 05 Apr 2011 16:55:21 +0000</pubDate>
		<dc:creator>huanliu</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[cost]]></category>
		<category><![CDATA[Terremark]]></category>

		<guid isPermaLink="false">http://huanliu.wordpress.com/?p=149</guid>
		<description><![CDATA[Competition is a good thing. Continuous drop in the hardware price is a good thing. Improvement in efficient is also a good thing. All those should translate into a lower computing cost to you &#8212; the end user &#8212; over time. That is exactly what Terremark has done today &#8212; lowering its cloud offering price, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=149&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Competition is a good thing. Continuous drop in the hardware price is a good thing. Improvement in efficient is also a good thing. All those should translate into a lower computing cost to you &#8212; the end user &#8212; over time. That is exactly what Terremark has done today &#8212; lowering its <a href="http://vcloudexpress.terremark.com/pricing.aspx">cloud offering price</a>, in one case, up to 42% lower. We <a href="http://huanliu.wordpress.com/2011/01/29/terremark-cost-comparison-with-amazon-ec2/">compared Terremark&#8217;s price to EC2</a> before, it is time to update the comparison based on the new pricing.</p>
<p>Following what we did to get EC2&#8242;s unit cost, we can run regression analysis to estimate Terremark&#8217;s unit cost. We only consider Linux VMs, in order not to factor is software license cost. We assume the Cost = c * CPU + m * RAM (Terremark charges storage separately from the VM cost at $0.25/GB/month). The regression determines the unit cost to be</p>
<p>c = 1.31 cents/VPU/hour<br />
m = 5.38 cents/GB/hour</p>
<p>Not surprisingly, both unit costs are lower than their previous price (<em>c </em>used to be 2.06 cents/VPU/hour, and <em>m </em>used to be 6.46 cents/GB/hour).</p>
<p>The regression result still does not fit the real cost very well. Terremark offers economy-of-scale in its cost model, where it heavily discounts both CPU and RAM as you move up in the configuration. The following table shows both the newer reduced cost (color green) and the cost as determined by the estimated parameters (color red) for the various VM configurations.</p>
<table border="1">
<tbody>
<tr>
<td>memory (GB)\CPU</td>
<td>1 VPU</td>
<td>2 VPU</td>
<td>4 VPU</td>
<td>8 VPU</td>
</tr>
<tr>
<td>0.5</td>
<td><span style="color:#339966;">3.5 </span>/ <span style="color:#ff0000;">4</span></td>
<td><span style="color:#339966;">4</span> / <span style="color:#ff0000;">5.31</span></td>
<td><span style="color:#339966;">4.5</span> / <span style="color:#ff0000;">7.93</span></td>
<td><span style="color:#339966;">4.9</span>/ <span style="color:#ff0000;">13.17</span></td>
</tr>
<tr>
<td>1</td>
<td><span style="color:#339966;">6</span> / <span style="color:#ff0000;">6.69</span></td>
<td><span style="color:#339966;">7</span> / <span style="color:#ff0000;">8</span></td>
<td><span style="color:#339966;">8</span> / <span style="color:#ff0000;">10.62</span></td>
<td><span style="color:#339966;">10</span> / <span style="color:#ff0000;">15.86</span></td>
</tr>
<tr>
<td>1.5</td>
<td><span style="color:#339966;">9</span> / <span style="color:#ff0000;">9.38</span></td>
<td><span style="color:#339966;">10.5</span> / <span style="color:#ff0000;">10.69</span></td>
<td><span style="color:#339966;">12</span> / <span style="color:#ff0000;">13.31</span></td>
<td><span style="color:#339966;">13.5</span> / <span style="color:#ff0000;">18.55</span></td>
</tr>
<tr>
<td>2</td>
<td><span style="color:#339966;">12 </span>/ <span style="color:#ff0000;">12.07</span></td>
<td><span style="color:#339966;">14.1 </span>/ <span style="color:#ff0000;">13.38</span></td>
<td><span style="color:#339966;">16.1 </span>/ <span style="color:#ff0000;">16</span></td>
<td><span style="color:#339966;">20 </span>/ <span style="color:#ff0000;">21.24</span></td>
</tr>
<tr>
<td>4</td>
<td><span style="color:#339966;">21.7 </span>/ <span style="color:#ff0000;">22.82</span></td>
<td><span style="color:#339966;">27.1 </span>/ <span style="color:#ff0000;">24.13</span></td>
<td><span style="color:#339966;">30.1 </span>/ <span style="color:#ff0000;">26.75</span></td>
<td><span style="color:#339966;">35.9 </span>/ <span style="color:#ff0000;">31.99</span></td>
</tr>
<tr>
<td>8</td>
<td><span style="color:#339966;">40.1 </span>/ <span style="color:#ff0000;">44.33</span></td>
<td><span style="color:#339966;">48.2 </span>/ <span style="color:#ff0000;">45.64</span></td>
<td><span style="color:#339966;">56.7 </span>/ <span style="color:#ff0000;">48.26</span></td>
<td><span style="color:#339966;">63.4 </span>/ <span style="color:#ff0000;">53.5</span></td>
</tr>
<tr>
<td>12</td>
<td><span style="color:#339966;">60.2 </span>/ <span style="color:#ff0000;">65.85</span></td>
<td><span style="color:#339966;">68.6 </span>/ <span style="color:#ff0000;">67.16</span></td>
<td><span style="color:#339966;">76.2 </span>/ <span style="color:#ff0000;">69.78</span></td>
<td><span style="color:#339966;">82.4 </span>/ <span style="color:#ff0000;">75.02</span></td>
</tr>
<tr>
<td>16</td>
<td><span style="color:#339966;">80.3 </span>/ <span style="color:#ff0000;">87.36</span></td>
<td><span style="color:#339966;">84.4 </span>/ <span style="color:#ff0000;">88.67</span></td>
<td><span style="color:#339966;">89.9 </span>/ <span style="color:#ff0000;">91.29</span></td>
<td><span style="color:#339966;">93.2 </span>/ <span style="color:#ff0000;">96.53</span></td>
</tr>
</tbody>
</table>
<p>Again, we compare cost by comparing with a fictitious EC2 instance with the exact same spec. For simplicity, we assume a VPU in a Terremark&#8217;s VM can get the full attention of a physical core. This is a more common case because Terremark uses VMWare&#8217;s DRS (Distributed Resource Scheduler), which can dynamically reassign virtual cores to a different physical core to avoid contention.</p>
<p>The following table shows the EC2 equivalent cost assuming a virtual core can get the full power of the physical core.</p>
<table border="1">
<tbody>
<tr>
<td>memory (GB)</td>
<td>VPU</td>
<td>Terremark price (cents/hour)</td>
<td>Equivalent EC2 cost (cents/hour)</td>
<td>Terremark cost/EC2 cost</td>
</tr>
<tr>
<td>0.5</td>
<td>1</td>
<td>3.5</td>
<td>4.09</td>
<td>0.86</td>
</tr>
<tr>
<td>0.5</td>
<td>2</td>
<td>4</td>
<td>7.17</td>
<td>0.56</td>
</tr>
<tr>
<td>0.5</td>
<td>4</td>
<td>4.5</td>
<td>13.33</td>
<td>0.34</td>
</tr>
<tr>
<td>0.5</td>
<td>8</td>
<td>4.9</td>
<td>25.65</td>
<td>0.19</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>6</td>
<td>5.09</td>
<td>1.18</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>7</td>
<td>8.17</td>
<td>0.86</td>
</tr>
<tr>
<td>1</td>
<td>4</td>
<td>8</td>
<td>14.33</td>
<td>0.56</td>
</tr>
<tr>
<td>1</td>
<td>8</td>
<td>10</td>
<td>26.66</td>
<td>0.38</td>
</tr>
<tr>
<td>1.5</td>
<td>1</td>
<td>9</td>
<td>6.1</td>
<td>1.48</td>
</tr>
<tr>
<td>1.5</td>
<td>2</td>
<td>10.5</td>
<td>9.18</td>
<td>1.14</td>
</tr>
<tr>
<td>1.5</td>
<td>4</td>
<td>12</td>
<td>15.34</td>
<td>0.78</td>
</tr>
<tr>
<td>1.5</td>
<td>8</td>
<td>13.5</td>
<td>27.66</td>
<td>0.49</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>12</td>
<td>7.1</td>
<td>1.69</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
<td>14.1</td>
<td>10.18</td>
<td>1.38</td>
</tr>
<tr>
<td>2</td>
<td>4</td>
<td>16.1</td>
<td>16.35</td>
<td>0.98</td>
</tr>
<tr>
<td>2</td>
<td>8</td>
<td>20</td>
<td>28.67</td>
<td>0.7</td>
</tr>
<tr>
<td>4</td>
<td>1</td>
<td>21.7</td>
<td>11.12</td>
<td>1.95</td>
</tr>
<tr>
<td>4</td>
<td>2</td>
<td>27.1</td>
<td>14.21</td>
<td>1.91</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>30.1</td>
<td>20.37</td>
<td>1.48</td>
</tr>
<tr>
<td>4</td>
<td>8</td>
<td>35.9</td>
<td>32.69</td>
<td>1.1</td>
</tr>
<tr>
<td>8</td>
<td>1</td>
<td>40.1</td>
<td>19.17</td>
<td>2.09</td>
</tr>
<tr>
<td>8</td>
<td>2</td>
<td>48.2</td>
<td>22.25</td>
<td>2.17</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>56.7</td>
<td>28.41</td>
<td>2.0</td>
</tr>
<tr>
<td>8</td>
<td>8</td>
<td>63.4</td>
<td>40.73</td>
<td>1.56</td>
</tr>
<tr>
<td>12</td>
<td>1</td>
<td>60.2</td>
<td>27.21</td>
<td>2.21</td>
</tr>
<tr>
<td>12</td>
<td>2</td>
<td>68.6</td>
<td>30.29</td>
<td>2.26</td>
</tr>
<tr>
<td>12</td>
<td>4</td>
<td>76.2</td>
<td>36.45</td>
<td>2.09</td>
</tr>
<tr>
<td>12</td>
<td>8</td>
<td>82.4</td>
<td>48.78</td>
<td>1.69</td>
</tr>
<tr>
<td>16</td>
<td>1</td>
<td>80.3</td>
<td>35.25</td>
<td>2.28</td>
</tr>
<tr>
<td>16</td>
<td>2</td>
<td>84.4</td>
<td>38.33</td>
<td>2.20</td>
</tr>
<tr>
<td>16</td>
<td>4</td>
<td>89.9</td>
<td>44.5</td>
<td>2.02</td>
</tr>
<tr>
<td>16</td>
<td>8</td>
<td>93.2</td>
<td>56.82</td>
<td>1.64</td>
</tr>
</tbody>
</table>
<p>Like we observed before, there are several configurations where Terremark is much cheaper than EC2. The 8VPU+0.5GB configuration is still the cheapest at 19% of the equivalent EC2 cost. What is different from before is that the larger VM configurations are getting significantly cheaper. For example, the 16GB+8VPU configuration costs only 1.64 over its EC2 equivalent, compared to a ratio of 2.83 before. This means that it is getting more economical to run larger VMs in Terremark.</p>
<p>Let us hope the trend continues that cloud providers continue to reduce the cost of computing so that we can pay less for the same work or get more work done for the same budget.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/huanliu.wordpress.com/149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/huanliu.wordpress.com/149/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/huanliu.wordpress.com/149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/huanliu.wordpress.com/149/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/huanliu.wordpress.com/149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/huanliu.wordpress.com/149/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/huanliu.wordpress.com/149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/huanliu.wordpress.com/149/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/huanliu.wordpress.com/149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/huanliu.wordpress.com/149/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/huanliu.wordpress.com/149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/huanliu.wordpress.com/149/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/huanliu.wordpress.com/149/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/huanliu.wordpress.com/149/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=149&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://huanliu.wordpress.com/2011/04/05/terremark-announces-new-lower-price-2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/aabcaed537a9f2df3a2010d2158d4546?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">huanliu</media:title>
		</media:content>
	</item>
		<item>
		<title>EC2 spot pricing coming to Cluster Compute and Cluster GPU instances</title>
		<link>http://huanliu.wordpress.com/2011/02/08/ec2-spot-pricing-coming-to-cluster-compute-and-cluster-gpu-instances/</link>
		<comments>http://huanliu.wordpress.com/2011/02/08/ec2-spot-pricing-coming-to-cluster-compute-and-cluster-gpu-instances/#comments</comments>
		<pubDate>Tue, 08 Feb 2011 20:15:32 +0000</pubDate>
		<dc:creator>huanliu</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[AWS]]></category>
		<category><![CDATA[cluster]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[spot price]]></category>

		<guid isPermaLink="false">http://huanliu.wordpress.com/?p=137</guid>
		<description><![CDATA[AWS EC2 introduced the Cluster Compute and Cluster GPU instances few months back. Those instances are very good for high-throughput HPC applications, unfortunately, they have not been available in the spot market. I think that is about to change, the Cluster Compute and Cluster GPU instances should be available through the spot market shortly. Over [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=137&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>AWS EC2 introduced the <a href="http://aws.typepad.com/aws/2010/07/the-new-amazon-ec2-instance-type-the-cluster-compute-instance.html">Cluster Compute</a> and <a href="http://aws.typepad.com/aws/2010/11/new-ec2-instance-type-the-cluster-gpu-instance.html">Cluster GPU</a> instances few months back. Those instances are very good for high-throughput HPC applications, unfortunately, they have not been available in the spot market. I think that is about to change, the Cluster Compute and Cluster GPU instances should be available through the spot market shortly.</p>
<p>Over the weekend (Feb. 6, 2011), I have noticed that both cc1.4xlarge and cg1.4xlarge instances are showing up on AWS&#8217;s console when you query the price history. But that went away quickly the next day. I am guessing they were testing the features. Starting today (Feb. 8, 2011), you can query AWS API or use the AWS command line tool to see the price history for cc1.4xlarge instances. Furthermore, you can start bidding for cc1.4xlarge already (it was not possible over the weekend because the &#8220;current price&#8221; is not set for those instances).</p>
<p>Although there is no official announcement yet, I think it is just a matter of time. Have fun crunching through your HPC applications for cheap.</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/huanliu.wordpress.com/137/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/huanliu.wordpress.com/137/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/huanliu.wordpress.com/137/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/huanliu.wordpress.com/137/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/huanliu.wordpress.com/137/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/huanliu.wordpress.com/137/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/huanliu.wordpress.com/137/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/huanliu.wordpress.com/137/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/huanliu.wordpress.com/137/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/huanliu.wordpress.com/137/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/huanliu.wordpress.com/137/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/huanliu.wordpress.com/137/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/huanliu.wordpress.com/137/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/huanliu.wordpress.com/137/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=137&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://huanliu.wordpress.com/2011/02/08/ec2-spot-pricing-coming-to-cluster-compute-and-cluster-gpu-instances/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/aabcaed537a9f2df3a2010d2158d4546?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">huanliu</media:title>
		</media:content>
	</item>
		<item>
		<title>Comparing cloud providers on VM cost</title>
		<link>http://huanliu.wordpress.com/2011/02/02/comparing-cloud-providers-on-vm-cost/</link>
		<comments>http://huanliu.wordpress.com/2011/02/02/comparing-cloud-providers-on-vm-cost/#comments</comments>
		<pubDate>Wed, 02 Feb 2011 22:04:38 +0000</pubDate>
		<dc:creator>huanliu</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[cost comparison]]></category>
		<category><![CDATA[EC2]]></category>
		<category><![CDATA[ECU]]></category>
		<category><![CDATA[Rackspace]]></category>
		<category><![CDATA[Terremark]]></category>

		<guid isPermaLink="false">http://huanliu.wordpress.com/?p=132</guid>
		<description><![CDATA[How do you compare two IaaS clouds? Is Amazon EC2&#8242;s small standard instance (1 ECU, 1.7GB RAM, 160GB storage) cheaper or is Rackspace cloud&#8217;s 256MB server (4 cores, 256MB RAM, 10GB storage) cheaper? It is obviously simpler to compare them if you focus only on one metric. For example, let us assume your application is [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=132&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>How do you compare two IaaS clouds? Is Amazon EC2&#8242;s small standard instance (1 ECU, 1.7GB RAM, 160GB storage) cheaper or is Rackspace cloud&#8217;s 256MB server (4 cores, 256MB RAM, 10GB storage) cheaper? It is obviously simpler to compare them if you focus only on one metric. For example, let us assume your application is CPU bound and it does not require much memory at all. Then you should focus solely on the CPU power a cloud VM gives you. We have translated <a href="http://huanliu.wordpress.com/2011/01/25/gogrid-cost-comparison-with-amazon-ec2/">GoGrid</a>, <a href="http://huanliu.wordpress.com/2011/01/25/rackspace-cost-comparison-with-amazon-ec2/">Rackspace</a>, and <a href="http://huanliu.wordpress.com/2011/01/29/terremark-cost-comparison-with-amazon-ec2/">Terremark</a>&#8216;s VM configurations into their equivalent ECU, so you can simply take a ratio between the cost and the ECU rating and pick the lowest ratio. Unfortunately, real-life applications are never that simple. They demand CPU cycle, memory, as well as hard disk storage capacity. So, how do you compare apple-to-apple?</p>
<p><strong>The methodology</strong></p>
<p>Since no methodology exists yet, we will propose one. Since the comparison results depend highly on the methodology chosen, we first will spell out the methodology we use so that if you have a different one and you come up with a different result, you can trace the source of the difference. If you see areas where we can improve the methodology, please do leave a comment. The methodology works as follows:</p>
<ol>
<li>We first break down the cost components in Amazon EC2. We assume Amazon has priced their instances using a linear model, i.e., the cost is equal to <em>c </em>* <em>CPU </em>+ <em>m </em>* <em>Mem </em>+ <em>s </em>* <em>Storage</em>, where c is the unit cost of CPU per ECU per hour, m is the unit cost of memory per GB per hour, and s is the unit cost of storage per GB per hour. Amazon provides several types of instances, each with a different combination of CPU, memory and storage, which is enough of a hint for us to use regression analysis to estimate c, m and s. The details are in our <a href="http://huanliu.wordpress.com/2011/01/24/the-true-cost-of-an-ecu">ECU cost breakdown analysis</a>.</li>
<li>Once we have the unit cost in EC2, we can compare it with another cloud provider. We take one VM configuration from a cloud provider at a time, we then compute what Amazon EC2 would charge for an instance with the exact same specification if EC2 were to offer it. This can be easily done by multiplying the EC2 unit costs (<em>c</em>, <em>m</em>, and <em>s</em>) with the amount of CPU, RAM, and storage in the VM, and add them up. Of course, this is hypothetical, because EC2 does not offer an instance with an exact same spec. So even if the EC2 price is lower, you cannot just buy a substitute from Amazon. However, this gives us a good sense of the relative cost.</li>
</ol>
<p>We have done the analysis with <a href="http://huanliu.wordpress.com/2011/01/25/gogrid-cost-comparison-with-amazon-ec2/">GoGrid</a>, <a href="http://huanliu.wordpress.com/2011/01/25/rackspace-cost-comparison-with-amazon-ec2/">Rackspace</a>, and <a href="http://huanliu.wordpress.com/2011/01/29/terremark-cost-comparison-with-amazon-ec2/">Terremark</a>.</p>
<p>We can compute a ratio between a cloud VM&#8217;s cost with its hypothetical equivalent in EC2. The following lists the top few VMs that have the lowest ratio. If you are curious about the ratio for other VM configurations, feel free to dig into the individual posts on each provider. The ratio listed is assuming that you will get the maximum CPU allowed under bursting, which is frequently the case in those cloud providers. Further, the ratio listed is comparing with EC2 N. Virginia data center. Other EC2 data centers have a higher cost.</p>
<table border="1">
<tbody>
<tr>
<td>Provider</td>
<td>RAM (GB)</td>
<td>CPU (cores)</td>
<td>storage (GB)</td>
<td>cost ratio with an equivalent in EC2</td>
</tr>
<tr>
<td>Rackspace</td>
<td>0.25</td>
<td>4</td>
<td>10</td>
<td>0.168</td>
</tr>
<tr>
<td>Terremark</td>
<td>0.5</td>
<td>8</td>
<td>charged separately at $0.25/month/GB</td>
<td>0.19</td>
</tr>
<tr>
<td>Rackspace</td>
<td>0.5</td>
<td>4</td>
<td>20</td>
<td>0.314</td>
</tr>
<tr>
<td>Terremark</td>
<td>0.5</td>
<td>4</td>
<td>charged separately at $0.25/month/GB</td>
<td>0.338</td>
</tr>
<tr>
<td>Terremark</td>
<td>1</td>
<td>8</td>
<td>charged separately at $0.25/month/GB</td>
<td>0.375</td>
</tr>
<tr>
<td>Terremark</td>
<td>1.5</td>
<td>8</td>
<td>charged separately at $0.25/month/GB</td>
<td>0.491</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p><strong>How to use this data?</strong></p>
<p>Due to the limitations of this methodology (comparing with a hypothetical equivalent in EC2), it only makes sense if one of the cloud provider you are comparing is Amazon EC2. In other words, do not compare Rackspace with Terremark based on the ratio.</p>
<p>It also makes no sense to use our results if you know the exact specification for your server. In that case, you should find a minimum VM configuration that is just barely bigger than your requirement and compare price.</p>
<p>Our results are useful if your application is flexible. For example, instead of using one m1.small instance in EC2, you could use several Rackspace 256MB VMs to achieve a dramatic cost savings. Examples of a flexible application include a batch application, such as a MapReduce job, which could be chopped down to a finer granularity. Another example could be web servers in a web server farm, where the load balancer can divide up the work to take advantage of whatever computation capacity provisioned on the web server.</p>
<p>Our results are also useful if you want to get a high level overview. Consider an enterprise purchaser who wants to choose a cloud platform. There are many dimensions he has to consider, e.g., features, cost, SLA, contract terms&#8230;.. Doing a deep analysis at the beginning is just going to be overwhelming. Since Amazon is a big player in cloud, it most likely will be part of the evaluation. Having a ratio would give a ten-thousand-feet view such that the decision maker would know whether an alternative cloud would save him money. Then, as the evaluation progresses, he can dig deeper into a finer comparison.</p>
<p><strong>Caveats</strong>:</p>
<p>There are many caveats in using our results that we should spell out.</p>
<ul>
<li>This is only comparing a VM cost, including its CPU, memory and storage. But, it does not include other costs, such as bandwidth transfers. The bandwidth cost varies wildly, for example, GoGrid offers free inbound traffic, which can translate into a significant cost saving.</li>
</ul>
<ul>
<li>When we compare CPUs, we are only comparing their processing power, not their IO capabilities (both disk and network IO). In Amazon, we sometimes observe degraded IO performance, possibly due to competing VMs on the same host. It is a sad side effect of using popular cloud offerings.</li>
</ul>
<ul>
<li>As we mentioned, this only applies to fungible applications that can take full advantage of provisioned CPU, memory and storage resources. For example, if you cannot take advantage of the provisioned RAM, it does not matter if it is a good deal. You are wasting the memory, and you may be better off with a VM configuration from a different cloud provider with a smaller provisioned RAM.</li>
</ul>
<ul>
<li>This is not a substitute for feature comparisons. For example, GoGrid offers free F5 hardware load balancer. If you need a hardware load balancer, you should consider that separately.</li>
</ul>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/huanliu.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/huanliu.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/huanliu.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/huanliu.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/huanliu.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/huanliu.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/huanliu.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/huanliu.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/huanliu.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/huanliu.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/huanliu.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/huanliu.wordpress.com/132/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/huanliu.wordpress.com/132/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/huanliu.wordpress.com/132/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=132&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://huanliu.wordpress.com/2011/02/02/comparing-cloud-providers-on-vm-cost/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/aabcaed537a9f2df3a2010d2158d4546?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">huanliu</media:title>
		</media:content>
	</item>
		<item>
		<title>Terremark cost comparison with Amazon EC2</title>
		<link>http://huanliu.wordpress.com/2011/01/29/terremark-cost-comparison-with-amazon-ec2/</link>
		<comments>http://huanliu.wordpress.com/2011/01/29/terremark-cost-comparison-with-amazon-ec2/#comments</comments>
		<pubDate>Sat, 29 Jan 2011 01:21:16 +0000</pubDate>
		<dc:creator>huanliu</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[cost comparison]]></category>
		<category><![CDATA[Terremark]]></category>
		<category><![CDATA[vcloud]]></category>

		<guid isPermaLink="false">http://huanliu.wordpress.com/?p=117</guid>
		<description><![CDATA[(Earlier posts in this series are: EC2 cost break down, GoGrid &#38; EC2 cost comparison, Rackspace &#38; EC2 cost comparison) In this post, let us compare the VM cost between Terremark vCloud express and Amazon EC2. Terremark is one of the first cloud providers based on VMWare technology. Unlike EC2, Rackspace and GoGrid, which use [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=117&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>(Earlier posts in this series are: <a href="http://huanliu.wordpress.com/2011/01/24/the-true-cost-of-an-ecu/">EC2 cost break down</a>, <a href="http://huanliu.wordpress.com/2011/01/25/gogrid-cost-comparison-with-amazon-ec2/">GoGrid &amp; EC2 cost comparison</a>, <a href="http://huanliu.wordpress.com/2011/01/25/rackspace-cost-comparison-with-amazon-ec2/">Rackspace &amp; EC2 cost comparison</a>)</p>
<p>In this post, let us compare the VM cost between Terremark vCloud express and Amazon EC2. Terremark is one of the first cloud providers based on VMWare technology. Unlike EC2, Rackspace and GoGrid, which use Xen as the hypervisor, Terremark uses VMWare&#8217;s ESX hypervisor, which arguably is richer in functionality.</p>
<p>Following the methodology we have used so far, we need to first understand Terremark&#8217;s hardware infrastructure and its resource allocation policy. Using the same technique we used for <a href="http://huanliu.wordpress.com/2010/06/14/amazons-physical-hardware-and-ec2-compute-unit/">EC2&#8242;s hardware analysis</a>, we determine that Terremark runs on a platform with two sockets of Quad-core AMD Opteron 8389 processors. PassMark does not have a benchmark result for this processor, so we have to run the benchmark ourselves. We used the 16GB+8VPU configuration &#8212; its largest &#8212; to minimize interference from other VMs, and we run it multiple times late at night to ensure that we are indeed measuring the underlying hardware&#8217;s capacity. On average, the PassMark CPU mark result is 7100, which is roughly 18 ECU.</p>
<p>Terremark uses the ESX hypervisor&#8217;s default policy for scheduling CPU, i.e., a core shares the CPU equally with another core regardless of how much memory the VM has. This is different from GoGrid and Rackspace where the CPU is shared proportional to the amount of RAM a VM has. The scheduling policy can be verified by reading the GuestSDK API exposed by VMTools. By reading the API, we know that a VM not only has no minimum guaranteed CPU, but it also does not have a maximum burst limit. Each virtual core of a VM is assigned a CPU share of 1000, regardless of the memory it is allocated. Thus, the more cores a VM has, the more shares of the CPU it will get (e.g., 1VPU has 1000 shares, and 8VPU has 8000 shares).</p>
<p>It is difficult to determine how many VMs could be on a physical host, which determines the minimum guaranteed CPU. We are told in their forum that each physical host has 128GB of memory, which can accommodate at least 8 VMs, for example, each with 8 VPU+16GB RAM (its largest configuration). VMWare ESX hypervisor allows over-committing memory, so in theory, there could be many more VMs on a host. When we launched a vanilla 512MB VM, we learned from the Guest API that our VM only occupied 148MB RAM. Clearly, there is lots of room to over-commit, even though we see no evidence that they are doing so. Assuming there is no over-commitment, there still could be a lot of VMs competing for the CPU. In the worst case, all VMs on the host have 512MB RAM and 8VPU, which consume the least memory, but gain the maximum CPU weights. A physical host can host 256 such VMs, leaving a negligible CPU share for each VM. If a VM has only one core, it owns only 1/(8*256) share of the CPU, and an 8 VPU (8 virtual cores) VM owns only 1/256 share of the CPU.</p>
<p>Following what we did to get EC2&#8242;s unit cost, we can run regression analysis to estimate Terremark&#8217;s unit cost. We assume the Cost = c * CPU + m * RAM (Terremark charges storage separately from the VM cost at $0.25/GB/month). The regression determines the unit cost to be</p>
<p>c = 2.06 cents/VPU/hour<br />
m = 6.46 cents/GB/hour</p>
<p>The regression result does not fit the real cost very well. The following table shows both the original cost (color green) and the cost as determined by the estimated parameters (color red) for the various VM configurations.</p>
<table border="1">
<tbody>
<tr>
<td>memory (GB)\CPU</td>
<td>1 VPU</td>
<td>2 VPU</td>
<td>4 VPU</td>
<td>8 VPU</td>
</tr>
<tr>
<td>0.5</td>
<td><span style="color:#339966;">3.5 </span>/ <span style="color:#ff0000;">5.29</span></td>
<td><span style="color:#339966;">4</span> / <span style="color:#ff0000;">7.36</span></td>
<td><span style="color:#339966;">4.5</span> / <span style="color:#ff0000;">11.48</span></td>
<td><span style="color:#339966;">5</span> / <span style="color:#ff0000;">19.72</span></td>
</tr>
<tr>
<td>1</td>
<td><span style="color:#339966;">6</span> / <span style="color:#ff0000;">8.53</span></td>
<td><span style="color:#339966;">7</span> / <span style="color:#ff0000;">10.6</span></td>
<td><span style="color:#339966;">8</span> / <span style="color:#ff0000;">14.7</span></td>
<td><span style="color:#339966;">10</span> / <span style="color:#ff0000;">23</span></td>
</tr>
<tr>
<td>1.5</td>
<td><span style="color:#339966;">9</span> / <span style="color:#ff0000;">11.8</span></td>
<td><span style="color:#339966;">10.5</span> / <span style="color:#ff0000;">13.8</span></td>
<td><span style="color:#339966;">12</span> / <span style="color:#ff0000;">17.9</span></td>
<td><span style="color:#339966;">13.6</span> / <span style="color:#ff0000;">26.2</span></td>
</tr>
<tr>
<td>2</td>
<td><span style="color:#339966;">12 </span>/ <span style="color:#ff0000;">15</span></td>
<td><span style="color:#339966;">14.1 </span>/ <span style="color:#ff0000;">17</span></td>
<td><span style="color:#339966;">16.1 </span>/ <span style="color:#ff0000;">21.2</span></td>
<td><span style="color:#339966;">20.1 </span>/ <span style="color:#ff0000;">29.4</span></td>
</tr>
<tr>
<td>4</td>
<td><span style="color:#339966;">24.1 </span>/ <span style="color:#ff0000;">27.9</span></td>
<td><span style="color:#339966;">28.1 </span>/ <span style="color:#ff0000;">30</span></td>
<td><span style="color:#339966;">30.1 </span>/ <span style="color:#ff0000;">34.1</span></td>
<td><span style="color:#339966;">40.2 </span>/ <span style="color:#ff0000;">42.4</span></td>
</tr>
<tr>
<td>8</td>
<td><span style="color:#339966;">40.2 </span>/ <span style="color:#ff0000;">53.8</span></td>
<td><span style="color:#339966;">48.2 </span>/ <span style="color:#ff0000;">55.8</span></td>
<td><span style="color:#339966;">60.2 </span>/ <span style="color:#ff0000;">60</span></td>
<td><span style="color:#339966;">80.3 </span>/ <span style="color:#ff0000;">68.2</span></td>
</tr>
<tr>
<td>12</td>
<td><span style="color:#339966;">60.2 </span>/ <span style="color:#ff0000;">79.6</span></td>
<td><span style="color:#339966;">72.3 </span>/ <span style="color:#ff0000;">81.7</span></td>
<td><span style="color:#339966;">90.3 </span>/ <span style="color:#ff0000;">85.8</span></td>
<td><span style="color:#339966;">120.5 </span>/ <span style="color:#ff0000;">94.1</span></td>
</tr>
<tr>
<td>16</td>
<td><span style="color:#339966;">80.3 </span>/ <span style="color:#ff0000;">105.5</span></td>
<td><span style="color:#339966;">96.4 </span>/ <span style="color:#ff0000;">107.5</span></td>
<td><span style="color:#339966;">120.5 </span>/ <span style="color:#ff0000;">111.7</span></td>
<td><span style="color:#339966;">160.6 </span>/ <span style="color:#ff0000;">112</span></td>
</tr>
</tbody>
</table>
<p>The reason that the regression analysis does not work well here is that Terremark heavily discounts both CPU and RAM as you move up in the configuration. Our linear model does not capture the economy of scale very well. However, we can think of the linear regression as a trend line, and the trend line indicates that Terremark is likely more expensive than EC2. For example, it costs 6.46 cents/GB/hour for its RAM, which is much higher than the 2.01 cents Amazon values its RAM at.</p>
<p>Another way to compare cost is to use EC2&#8242;s unit cost to figure out what an equivalent configuration will cost in EC2. The following table shows the cost comparison where we assume you can only get the minimum CPU at the worst case, where all other VMs are busy and a physical host is fully loaded with 8VPU+0.5GB VMs (without over-commitment). Each row shows the RAM and CPU configuration, Terremark&#8217;s price, what it would cost in EC2, and the ratio between Terremark and EC2 cost.</p>
<table border="1">
<tbody>
<tr>
<td>memory (GB)</td>
<td>VPU</td>
<td>Terremark price (cents/hour)</td>
<td>Equivalent EC2 cost (cents/hour)</td>
<td>Terremark cost/EC2 cost</td>
</tr>
<tr>
<td>0.5</td>
<td>1</td>
<td>3.5</td>
<td>1.02</td>
<td>3.44</td>
</tr>
<tr>
<td>0.5</td>
<td>2</td>
<td>4</td>
<td>1.03</td>
<td>3.89</td>
</tr>
<tr>
<td>0.5</td>
<td>4</td>
<td>4.5</td>
<td>1.05</td>
<td>4.27</td>
</tr>
<tr>
<td>0.5</td>
<td>8</td>
<td>5</td>
<td>1.10</td>
<td>4.54</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>6</td>
<td>2.02</td>
<td>2.97</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>7</td>
<td>2.03</td>
<td>3.44</td>
</tr>
<tr>
<td>1</td>
<td>4</td>
<td>8</td>
<td>2.06</td>
<td>3.89</td>
</tr>
<tr>
<td>1</td>
<td>8</td>
<td>10</td>
<td>2.11</td>
<td>4.75</td>
</tr>
<tr>
<td>1.5</td>
<td>1</td>
<td>9</td>
<td>3.03</td>
<td>2.97</td>
</tr>
<tr>
<td>1.5</td>
<td>2</td>
<td>10.5</td>
<td>3.04</td>
<td>3.45</td>
</tr>
<tr>
<td>1.5</td>
<td>4</td>
<td>12</td>
<td>3.06</td>
<td>3.92</td>
</tr>
<tr>
<td>1.5</td>
<td>8</td>
<td>13.6</td>
<td>3.11</td>
<td>4.37</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>12</td>
<td>4.03</td>
<td>2.98</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
<td>14.1</td>
<td>4.05</td>
<td>3.49</td>
</tr>
<tr>
<td>2</td>
<td>4</td>
<td>16.1</td>
<td>4.07</td>
<td>3.96</td>
</tr>
<tr>
<td>2</td>
<td>8</td>
<td>20.1</td>
<td>4.12</td>
<td>4.88</td>
</tr>
<tr>
<td>4</td>
<td>1</td>
<td>24.1</td>
<td>8.06</td>
<td>2.99</td>
</tr>
<tr>
<td>4</td>
<td>2</td>
<td>28.1</td>
<td>8.07</td>
<td>3.48</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>30.1</td>
<td>8.09</td>
<td>3.72</td>
</tr>
<tr>
<td>4</td>
<td>8</td>
<td>40.2</td>
<td>8.14</td>
<td>4.94</td>
</tr>
<tr>
<td>8</td>
<td>1</td>
<td>40.2</td>
<td>16.1</td>
<td>2.5</td>
</tr>
<tr>
<td>8</td>
<td>2</td>
<td>48.2</td>
<td>16.11</td>
<td>2.99</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>60.2</td>
<td>16.13</td>
<td>3.73</td>
</tr>
<tr>
<td>8</td>
<td>8</td>
<td>80.3</td>
<td>16.18</td>
<td>4.96</td>
</tr>
<tr>
<td>12</td>
<td>1</td>
<td>60.2</td>
<td>24.14</td>
<td>2.49</td>
</tr>
<tr>
<td>12</td>
<td>2</td>
<td>72.3</td>
<td>24.15</td>
<td>2.99</td>
</tr>
<tr>
<td>12</td>
<td>4</td>
<td>90.3</td>
<td>24.18</td>
<td>3.73</td>
</tr>
<tr>
<td>12</td>
<td>8</td>
<td>120.5</td>
<td>24.23</td>
<td>4.97</td>
</tr>
<tr>
<td>16</td>
<td>1</td>
<td>80.3</td>
<td>32.18</td>
<td>2.49</td>
</tr>
<tr>
<td>16</td>
<td>2</td>
<td>96.4</td>
<td>32.2</td>
<td>2.99</td>
</tr>
<tr>
<td>16</td>
<td>4</td>
<td>120.5</td>
<td>32.22</td>
<td>3.74</td>
</tr>
<tr>
<td>16</td>
<td>8</td>
<td>160.6</td>
<td>32.27</td>
<td>4.98</td>
</tr>
</tbody>
</table>
<p>The table shows that Terremark is 2.49 to 4.98 times more expensive than an equivalent in EC2. This is mainly due to the way Terremark shares CPUs. A 0.5GB VM in Terremark shares the CPU equally with a 16GB VM; thus, in the worst case, a VM may get very little CPU. Since Terremark does not set a minimum guarantee on the CPU share in the hypervisor, we have to assume the worst case.</p>
<p>In reality, you are unlikely to encounter the worst case, and you are very likely to get the full attention of a physical core. The reason is not only because the majority of VMs have more than 0.5GB (so that you can pack fewer of them on a host), but also because Terremark uses VMWare&#8217;s DRS (Distributed Resource Scheduler). We have noticed that, when we drive up the load on our VMs, our VMs are often moved (through VMotion) to a different host, presumably to avoid contention. Thus, unless the whole cluster gets really busy, it is unlikely that your VM would have a lot of other busy VMs to contend with on the same host. The following table shows the EC2 equivalent cost assuming a virtual core can get the full power of the physical core.</p>
<table border="1">
<tbody>
<tr>
<td>memory (GB)</td>
<td>VPU</td>
<td>Terremark price (cents/hour)</td>
<td>Equivalent EC2 cost (cents/hour)</td>
<td>Terremark cost/EC2 cost</td>
</tr>
<tr>
<td>0.5</td>
<td>1</td>
<td>3.5</td>
<td>4.09</td>
<td>0.86</td>
</tr>
<tr>
<td>0.5</td>
<td>2</td>
<td>4</td>
<td>7.17</td>
<td>0.56</td>
</tr>
<tr>
<td>0.5</td>
<td>4</td>
<td>4.5</td>
<td>13.33</td>
<td>0.34</td>
</tr>
<tr>
<td>0.5</td>
<td>8</td>
<td>5</td>
<td>25.65</td>
<td>0.19</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>6</td>
<td>5.09</td>
<td>1.18</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>7</td>
<td>8.17</td>
<td>0.86</td>
</tr>
<tr>
<td>1</td>
<td>4</td>
<td>8</td>
<td>14.33</td>
<td>0.56</td>
</tr>
<tr>
<td>1</td>
<td>8</td>
<td>10</td>
<td>26.66</td>
<td>0.38</td>
</tr>
<tr>
<td>1.5</td>
<td>1</td>
<td>9</td>
<td>6.1</td>
<td>1.48</td>
</tr>
<tr>
<td>1.5</td>
<td>2</td>
<td>10.5</td>
<td>9.18</td>
<td>1.14</td>
</tr>
<tr>
<td>1.5</td>
<td>4</td>
<td>12</td>
<td>15.34</td>
<td>0.78</td>
</tr>
<tr>
<td>1.5</td>
<td>8</td>
<td>13.6</td>
<td>27.66</td>
<td>0.49</td>
</tr>
<tr>
<td>2</td>
<td>1</td>
<td>12</td>
<td>7.1</td>
<td>1.69</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
<td>14.1</td>
<td>10.18</td>
<td>1.38</td>
</tr>
<tr>
<td>2</td>
<td>4</td>
<td>16.1</td>
<td>16.35</td>
<td>0.98</td>
</tr>
<tr>
<td>2</td>
<td>8</td>
<td>20.1</td>
<td>28.67</td>
<td>0.7</td>
</tr>
<tr>
<td>4</td>
<td>1</td>
<td>24.1</td>
<td>11.12</td>
<td>2.17</td>
</tr>
<tr>
<td>4</td>
<td>2</td>
<td>28.1</td>
<td>14.21</td>
<td>1.98</td>
</tr>
<tr>
<td>4</td>
<td>4</td>
<td>30.1</td>
<td>20.37</td>
<td>1.48</td>
</tr>
<tr>
<td>4</td>
<td>8</td>
<td>40.2</td>
<td>32.69</td>
<td>1.23</td>
</tr>
<tr>
<td>8</td>
<td>1</td>
<td>40.2</td>
<td>19.17</td>
<td>2.1</td>
</tr>
<tr>
<td>8</td>
<td>2</td>
<td>48.2</td>
<td>22.25</td>
<td>2.17</td>
</tr>
<tr>
<td>8</td>
<td>4</td>
<td>60.2</td>
<td>28.41</td>
<td>2.12</td>
</tr>
<tr>
<td>8</td>
<td>8</td>
<td>80.3</td>
<td>40.73</td>
<td>1.97</td>
</tr>
<tr>
<td>12</td>
<td>1</td>
<td>60.2</td>
<td>27.21</td>
<td>2.21</td>
</tr>
<tr>
<td>12</td>
<td>2</td>
<td>72.3</td>
<td>30.29</td>
<td>2.39</td>
</tr>
<tr>
<td>12</td>
<td>4</td>
<td>90.3</td>
<td>36.45</td>
<td>2.48</td>
</tr>
<tr>
<td>12</td>
<td>8</td>
<td>120.5</td>
<td>48.78</td>
<td>2.47</td>
</tr>
<tr>
<td>16</td>
<td>1</td>
<td>80.3</td>
<td>35.25</td>
<td>2.28</td>
</tr>
<tr>
<td>16</td>
<td>2</td>
<td>96.4</td>
<td>38.33</td>
<td>2.51</td>
</tr>
<tr>
<td>16</td>
<td>4</td>
<td>120.5</td>
<td>44.5</td>
<td>2.71</td>
</tr>
<tr>
<td>16</td>
<td>8</td>
<td>160.6</td>
<td>56.82</td>
<td>2.83</td>
</tr>
</tbody>
</table>
<p>There are several configurations where Terremark is much cheaper than EC2. The 8VPU+0.5GB configuration is the cheapest at 19% of the equivalent EC2 cost. This is due to two reasons. First, the 8 VPU has more scheduling weight, and it can compete for the full power of the physical host. Second, the RAM is the smallest. As we have seen, Terremark values RAM more than EC2 does (m=6.46 cents/GB/hour vs. EC2 m=2.01 cents/GB/hour), so the less RAM a configuration has, the lower the cost. The cost savings go away as you add more RAM and more CPU to the configuration.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/huanliu.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/huanliu.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/huanliu.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/huanliu.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/huanliu.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/huanliu.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/huanliu.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/huanliu.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/huanliu.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/huanliu.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/huanliu.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/huanliu.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/huanliu.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/huanliu.wordpress.com/117/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=117&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://huanliu.wordpress.com/2011/01/29/terremark-cost-comparison-with-amazon-ec2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/aabcaed537a9f2df3a2010d2158d4546?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">huanliu</media:title>
		</media:content>
	</item>
		<item>
		<title>Rackspace cost comparison with Amazon EC2</title>
		<link>http://huanliu.wordpress.com/2011/01/25/rackspace-cost-comparison-with-amazon-ec2/</link>
		<comments>http://huanliu.wordpress.com/2011/01/25/rackspace-cost-comparison-with-amazon-ec2/#comments</comments>
		<pubDate>Tue, 25 Jan 2011 22:58:40 +0000</pubDate>
		<dc:creator>huanliu</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[cost comparison]]></category>
		<category><![CDATA[EC2]]></category>
		<category><![CDATA[Rackspace]]></category>

		<guid isPermaLink="false">http://huanliu.wordpress.com/?p=113</guid>
		<description><![CDATA[(Earlier posts in this series are: EC2 cost break down, GoGrid &#38; EC2 cost comparison) We looked at Amazon EC2 and GoGrid cost earlier. Let us examine another IaaS provider &#8212; Rackspace cloud. The first step again is to unify on the same unit of measurement on the CPU power. Using the same methodology as [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=113&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>(Earlier posts in this series are: <a href="http://huanliu.wordpress.com/2011/01/24/the-true-cost-of-an-ecu/">EC2 cost break down</a>, <a href="http://huanliu.wordpress.com/2011/01/25/gogrid-cost-comparison-with-amazon-ec2/">GoGrid &amp; EC2 cost comparison</a>)</p>
<p>We looked at Amazon EC2 and GoGrid cost earlier. Let us examine another IaaS provider &#8212; Rackspace cloud. The first step again is to unify on the same unit of measurement on the CPU power. Using the same methodology as we used for <a href="http://huanliu.wordpress.com/2010/06/14/amazons-physical-hardware-and-ec2-compute-unit/">EC2&#8242;s hardware analysis</a>, we determine that Rackspace runs on a platform with two sockets of Quad-Core AMD Opteron 2374 HE processor. According to <a href="http://www.cpubenchmark.net/cpu_list.php">PassMark-CPU Mark results</a>, this platform has a CPU mark score of 4642, which is roughly 12 ECU. <a href="http://www.rackspacecloud.com/cloud_hosting_products/servers/faq/">Rackspace cloud&#8217;s FAQ</a> states that &#8220;<em>For Linux distributions, each Cloud Server is assigned four virtual cores and the amount of CPU cycles allocated to these cores is weighted based on the size of the Cloud Server</em>.&#8221; From talking to Rackspace support, we know that each physical host has 32GB of RAM, and it can host at most 2 16GB (15.5GB to be precise) VMs. Therefore, a 16GB VM would own the complete 4 cores it is allocated, i.e., the 16GB VM has a guaranteed capacity of half of the platform, which is 6 ECU. Since Rackspace states that the CPU is proportionally shared based on the RAM, we can derive the minimum guaranteed CPU based on how many other VMs could fit on the same physical host. The following table lists the minimum CPU and the maximum CPU (assuming full bursting when all other VMs are idle). Again, we are only concerned about Linux VMs, as they do not include license costs, so they more accurately represent the true hardware cost.</p>
<table border="1">
<tbody>
<tr>
<td>RAM (GB)</td>
<td>Storage (GB)</td>
<td>Min CPU (ECU)</td>
<td>Max CPU (ECU)</td>
<td>Cost (cents/hour)</td>
</tr>
<tr>
<td>0.256</td>
<td>10</td>
<td>0.09375</td>
<td>6</td>
<td>1.5</td>
</tr>
<tr>
<td>0.512</td>
<td>20</td>
<td>0.1875</td>
<td>6</td>
<td>3</td>
</tr>
<tr>
<td>1</td>
<td>40</td>
<td>0.375</td>
<td>6</td>
<td>6</td>
</tr>
<tr>
<td>2</td>
<td>80</td>
<td>0.75</td>
<td>6</td>
<td>12</td>
</tr>
<tr>
<td>4</td>
<td>160</td>
<td>1.5</td>
<td>6</td>
<td>24</td>
</tr>
<tr>
<td>8</td>
<td>320</td>
<td>3</td>
<td>6</td>
<td>48</td>
</tr>
<tr>
<td>16</td>
<td>620</td>
<td>6</td>
<td>6</td>
<td>96</td>
</tr>
</tbody>
</table>
<p>Similar to GoGrid, Rackspace only charges based on the RAM, so it is not possible to determine how it values each component (i.e., CPU, RAM and storage) separately, as we have done for EC2. However, it is possible to project what a similar configuration would cost in EC2 using the unit cost we have derived from the <a href="http://huanliu.wordpress.com/2011/01/24/the-true-cost-of-an-ecu/">EC2 cost breakdown</a>. The results are shown in the following table where we assume a VM only gets its minimum guaranteed CPU. Each row corresponds to one VM configuration, which is denoted by its RAM size in the first column. We also show the ratio between the Rackspace cost and the projected equivalent EC2 cost.</p>
<table border="1">
<tbody>
<tr>
<td>RAM (GB)</td>
<td>Rackspace cost (cents/hour)</td>
<td>Equivalent EC2 cost (cents/hour)</td>
<td>Rackspace cost/EC2 cost</td>
</tr>
<tr>
<td>0.256</td>
<td>1.5</td>
<td>0.8</td>
<td>1.87</td>
</tr>
<tr>
<td>0.512</td>
<td>3</td>
<td>1.6</td>
<td>1.87</td>
</tr>
<tr>
<td>1</td>
<td>6</td>
<td>3.16</td>
<td>1.9</td>
</tr>
<tr>
<td>2</td>
<td>12</td>
<td>6.32</td>
<td>1.9</td>
</tr>
<tr>
<td>4</td>
<td>24</td>
<td>12.6</td>
<td>1.9</td>
</tr>
<tr>
<td>8</td>
<td>48</td>
<td>25.3</td>
<td>1.9</td>
</tr>
<tr>
<td>16</td>
<td>96</td>
<td>50.2</td>
<td>1.91</td>
</tr>
</tbody>
</table>
<p>Since a Rackspace VM can burst if other VMs on the same host are idle, it could potentially grab a much larger share of the CPU. The following table shows the cost comparison assuming that the VM bursts to its fullest extent.</p>
<table border="1">
<tbody>
<tr>
<td>RAM (GB)</td>
<td>Rackspace cost (cents/hour)</td>
<td>Equivalent EC2 cost (cents/hour)</td>
<td>Rackspace cost/EC2 cost</td>
</tr>
<tr>
<td>0.256</td>
<td>1.5</td>
<td>8.89</td>
<td>0.17</td>
</tr>
<tr>
<td>0.512</td>
<td>3</td>
<td>9.56</td>
<td>0.31</td>
</tr>
<tr>
<td>1</td>
<td>6</td>
<td>10.86</td>
<td>0.55</td>
</tr>
<tr>
<td>2</td>
<td>12</td>
<td>13.5</td>
<td>0.89</td>
</tr>
<tr>
<td>4</td>
<td>24</td>
<td>18.8</td>
<td>1.28</td>
</tr>
<tr>
<td>8</td>
<td>48</td>
<td>29.4</td>
<td>1.63</td>
</tr>
<tr>
<td>16</td>
<td>96</td>
<td>50.2</td>
<td>1.91</td>
</tr>
</tbody>
</table>
<p>If your VM is only getting the minimum guaranteed CPU, Rackspace is about 1.9 times more expensive than an equivalent in EC2. However, in our experience, we can frequently grab a much larger share of the CPU. Assuming you can grab the full 4 cores, the 256MB, 512MB, 1GB, and 2GB VMs are a great bargain, which are 17%, 31%, 55%, and 89% of the equivalent EC2 cost respectively.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/huanliu.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/huanliu.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/huanliu.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/huanliu.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/huanliu.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/huanliu.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/huanliu.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/huanliu.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/huanliu.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/huanliu.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/huanliu.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/huanliu.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/huanliu.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/huanliu.wordpress.com/113/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=113&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://huanliu.wordpress.com/2011/01/25/rackspace-cost-comparison-with-amazon-ec2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/aabcaed537a9f2df3a2010d2158d4546?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">huanliu</media:title>
		</media:content>
	</item>
		<item>
		<title>GoGrid cost comparison with Amazon EC2</title>
		<link>http://huanliu.wordpress.com/2011/01/25/gogrid-cost-comparison-with-amazon-ec2/</link>
		<comments>http://huanliu.wordpress.com/2011/01/25/gogrid-cost-comparison-with-amazon-ec2/#comments</comments>
		<pubDate>Tue, 25 Jan 2011 09:11:07 +0000</pubDate>
		<dc:creator>huanliu</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[cost comparison]]></category>
		<category><![CDATA[iaas]]></category>

		<guid isPermaLink="false">http://huanliu.wordpress.com/?p=109</guid>
		<description><![CDATA[updated 1/30/2011 to include our own PassMark benchmark result and include GoGrid&#8217;s prepaid plan. Then updated 2/1/2011 to include cost/ECU comparison and clarifications. (Other posts in the series are: EC2 cost break down, Rackspace &#38; EC2 cost comparison, Terremark and EC2 cost comparison). Continue on our series on cost comparison between IaaS cloud providers, we [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=109&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><em>updated 1/30/2011 to include our own PassMark benchmark result and include GoGrid&#8217;s prepaid plan. Then updated 2/1/2011 to include cost/ECU comparison and clarifications.<br />
</em></p>
<p>(Other posts in the series are: <a href="http://huanliu.wordpress.com/2011/01/24/the-true-cost-of-an-ecu/">EC2 cost break down</a>, <a href="http://huanliu.wordpress.com/2011/01/25/rackspace-cost-comparison-with-amazon-ec2/">Rackspace &amp; EC2 cost comparison</a>, <a href="http://huanliu.wordpress.com/2011/01/25/rackspace-cost-comparison-with-amazon-ec2/">Terremark and EC2 cost comparison</a>).</p>
<p>Continue on our series on cost comparison between IaaS cloud providers, we will look at GoGrid&#8217;s cost structure in this post. It is easier to compare RAM and storage apple-to-apple because all cloud providers standardize on the same unit, e.g., GB. To have a meaningful comparison on CPU, we must similarly standardize on a common unit of measurement. Unfortunately, the cloud providers do not make this easy, so we have to do the conversion ourselves.</p>
<p>Because Amazon is a popular cloud provider, we decide to standardize on its unit of measurement &#8212; the ECU (Elastic Compute Unit). In our <a href="http://huanliu.wordpress.com/2010/06/14/amazons-physical-hardware-and-ec2-compute-unit/">EC2 hardware analysis</a>, we concluded that an ECU is equivalent to a <a href="http://www.cpubenchmark.net/multi_cpu.html">PassMark-CPU Mark</a> score of roughly 400. We have run the benchmark in Amazon&#8217;s N. Virginia data center on several types of instances to verify experimentally that the CPU Mark score does scale linearly as the instance&#8217;s advertised ECU rating.</p>
<p>All we need to do now is to figure out GoGrid&#8217;s PassMark-CPU Mark number. This is easy to do if we know the underlying hardware. Following the same methodology we used for the EC2 hardware analysis, we find that the GoGrid infrastructure consists of two types of hardware platform: one with dual-socket Intel E5520 processors, another with dual-socket Intel X5650 processors. According to <a href="http://www.cpubenchmark.net/cpu_list.php">PassMark-CPU mark results</a>, we know the dual-socket E5520 has a score of 9,174 and the dual-socket X5650 has a score of 15,071. GoGrid enables hyperthreading, so the dual-socket E5520 platform has 16 cores, and the dual-socket X5650 platform has 24 cores. Hyperthreading does not really double the performance because there is still only one physical core which is hardware-threaded by two virtual cores.</p>
<p>Instead of relying on PassMark&#8217;s reported result, we also run the benchmark ourselves to get a true measure of performance. We run the benchmark late at night for several times to make sure that the result is stable and that we are getting the maximum CPU allowed by bursting. PassMark benchmark only runs on Windows OS, and in Windows, we can only see up to 8 cores. As a result, the 8GB(8cores) and 16GB(8cores) VMs both return a CPU mark result of roughly 7850, which is 19.5 ECU. The 4GB(4cores) VM returns a CPU mark result of roughly 3,800, which is 9.6 ECU. And, the 2GB(2cores) VM returns a CPU mark of roughly 1,900, which is 4.8 ECU. Since there are no 1GB(1core) or 0.5GB(1core) Windows VM, we project their maximum CPU power to be half of a 2-core VM at 2.4 ECU. Lastly, since we cannot measure the 16 cores performance, we use the reported E5520 benchmark result of 9174 from PassMark instead as its maximum, which is 23 ECU. These numbers determine the maximum CPU when bursting full. Based on <a href="http://wiki.gogrid.com/wiki/index.php/Cloud_Servers">GoGrid&#8217;s VM configuration</a>, we can then determine the minimum guaranteed CPU from maximum CPU.</p>
<p>The translation from GoGrid&#8217;s CPU allocation to an equivalent ECU is shown in the following table. Each row of the table corresponds to one GoGrid&#8217;s VM configuration, where we list the amount of CPU, RAM and storage in each configuration. We also list GoGrid&#8217;s current pay-as-you-go VM price as the last column for reference.</p>
<table border="1">
<tbody>
<tr>
<td>Min CPU (cores)</td>
<td>Min CPU (ECU)</td>
<td>Max CPU (cores)</td>
<td>Max CPU (ECU)</td>
<td>RAM (GB)</td>
<td>Storage (GB)</td>
<td>pay-as-you-go Cost (cents/hour)</td>
</tr>
<tr>
<td>0.5</td>
<td>1.2</td>
<td>1</td>
<td>2.4</td>
<td>0.5</td>
<td>25</td>
<td>9.5</td>
</tr>
<tr>
<td>1</td>
<td>2.4</td>
<td>1</td>
<td>2.4</td>
<td>1</td>
<td>50</td>
<td>19</td>
</tr>
<tr>
<td>1</td>
<td>2.4</td>
<td>2</td>
<td>4.8</td>
<td>2</td>
<td>100</td>
<td>38</td>
</tr>
<tr>
<td>3</td>
<td>7.2</td>
<td>4</td>
<td>9.6</td>
<td>4</td>
<td>200</td>
<td>76</td>
</tr>
<tr>
<td>6</td>
<td>14.4</td>
<td>8</td>
<td>19.2</td>
<td>8</td>
<td>400</td>
<td>152</td>
</tr>
<tr>
<td>8</td>
<td>19.2</td>
<td>16</td>
<td>23</td>
<td>16</td>
<td>800</td>
<td>304</td>
</tr>
</tbody>
</table>
<p>One way to compare GoGrid and EC2 is to purely look at the cost per ECU. The following table shows the cost/ECU for GoGrid VMs assuming all of them get the maximum possible CPU. We list two cost/ECU results, one based on their pay-as-you-go price of $0.19/RAM-hour, another based on their Enterprise cloud prepaid plan of $0.05/RAM-hour.</p>
<table border="1">
<tbody>
<tr>
<td>RAM (GB)</td>
<td>Max CPU (ECU)</td>
<td>pay-as-you-go cost/ECU<br />
(cents/ECU/hour)</td>
<td>prepaid cost/ECU<br />
(cents/ECU/hour)</td>
</tr>
<tr>
<td>0.5</td>
<td>2.4</td>
<td>3.96</td>
<td>1.04</td>
</tr>
<tr>
<td>1</td>
<td>2.4</td>
<td>7.91</td>
<td>2.08</td>
</tr>
<tr>
<td>2</td>
<td>4.8</td>
<td>7.91</td>
<td>2.08</td>
</tr>
<tr>
<td>4</td>
<td>9.6</td>
<td>7.91</td>
<td>2.08</td>
</tr>
<tr>
<td>8</td>
<td>19.2</td>
<td>7.91</td>
<td>2.08</td>
</tr>
<tr>
<td>16</td>
<td>23</td>
<td>13.2</td>
<td>3.48</td>
</tr>
</tbody>
</table>
<p>In comparison, the following table shows EC2 cost/ECU for the nine different types of instances in the N. Virginia data center.</p>
<table border="1">
<tbody>
<tr>
<td>instance</td>
<td>CPU (ECU)</td>
<td>RAM (GB)</td>
<td>cost/ECU (cents/ECU/hour)</td>
</tr>
<tr>
<td>m1.small</td>
<td>1</td>
<td>1.7</td>
<td>8.5</td>
</tr>
<tr>
<td>m1.large</td>
<td>4</td>
<td>7.5</td>
<td>8.5</td>
</tr>
<tr>
<td>m1.xlarge</td>
<td>8</td>
<td>15</td>
<td>8.5</td>
</tr>
<tr>
<td>t1.micro</td>
<td>0.35</td>
<td>0.613</td>
<td>5.71</td>
</tr>
<tr>
<td>m2.xlarge</td>
<td>6.5</td>
<td>17.1</td>
<td>7.69</td>
</tr>
<tr>
<td>m2.2xlarge</td>
<td>13</td>
<td>34.2</td>
<td>7.69</td>
</tr>
<tr>
<td>m2.4xlarge</td>
<td>26</td>
<td>68.4</td>
<td>7.69</td>
</tr>
<tr>
<td>c1.medium</td>
<td>5</td>
<td>1.7</td>
<td>3.4</td>
</tr>
<tr>
<td>c1.xlarge</td>
<td>20</td>
<td>7</td>
<td>3.4</td>
</tr>
</tbody>
</table>
<p>Comparing on cost/ECU only makes sense when your application is CPU bound, i.e., your memory requirement is always less than what the instance gives you.</p>
<p>Here, we propose a different way, comparing them by taking into account the CPU, the RAM and storage allocation altogether. Ideally, if we can derive the unit cost of each, we can straightforwardly compare. Unfortunately, GoGrid charges purely based on RAM hours, it is not possible to figure out how it values CPU, RAM and storage separately, like we have done for Amazon EC2. If we do a regression analysis, the result will show that CPU and storage cost nothing, and RAM bears all the cost.</p>
<p>Since we cannot compare the unit cost, we propose a different approach. Basically, we take one VM configuration from GoGrid, and try to figure out what a hypothetical instance with the exact same specification would cost in EC2 if Amazon were to offer it. We can project what EC2 would charge for such a hypothetical instance because we know EC2&#8242;s unit cost from our <a href="http://huanliu.wordpress.com/2011/01/24/the-true-cost-of-an-ecu/">EC2 cost break down</a>.</p>
<p>The following table shows what a VM will cost in EC2 if the same configuration is offered there, assuming we only get the minimum guaranteed CPU. Each row of the table corresponds to one GoGrid VM configuration, where we only list the RAM size for that configuration (see the previous table for a configuration&#8217;s CPU and storage size). We also show the ratio between the GoGrid pay-as-you-go price and the projected EC2 cost.</p>
<table border="1">
<tbody>
<tr>
<td>RAM (GB)</td>
<td>GoGrid pay-as-you-go cost (cents/hour)</td>
<td>Equivalent EC2 cost (cents/hour)</td>
<td>GoGrid cost/hypothetical EC2 cost</td>
</tr>
<tr>
<td>0.5</td>
<td>9.5</td>
<td>3.05</td>
<td>3.12</td>
</tr>
<tr>
<td>1</td>
<td>19</td>
<td>6.09</td>
<td>3.12</td>
</tr>
<tr>
<td>2</td>
<td>38</td>
<td>8.9</td>
<td>4.27</td>
</tr>
<tr>
<td>4</td>
<td>76</td>
<td>21.1</td>
<td>3.6</td>
</tr>
<tr>
<td>8</td>
<td>152</td>
<td>42.2</td>
<td>3.6</td>
</tr>
<tr>
<td>16</td>
<td>304</td>
<td>71.2</td>
<td>4.27</td>
</tr>
</tbody>
</table>
<p>Unlike EC2, other cloud providers, including GoGrid, all allow a VM to burst beyond their minimum guaranteed capacity if there are free cycles available. The following table compares the cost under the optimistic scenario where you get the maximum CPU possible.</p>
<table border="1">
<tbody>
<tr>
<td>RAM (GB)</td>
<td>GoGrid pay-as-you-go cost (cents/hour)</td>
<td>Equivalent EC2 cost (cents/hour)</td>
<td>GoGrid cost/EC2 cost</td>
</tr>
<tr>
<td>0.5</td>
<td>9.5</td>
<td>4.69</td>
<td>2.03</td>
</tr>
<tr>
<td>1</td>
<td>19</td>
<td>6.1</td>
<td>3.12</td>
</tr>
<tr>
<td>2</td>
<td>38</td>
<td>12.2</td>
<td>3.12</td>
</tr>
<tr>
<td>4</td>
<td>76</td>
<td>24.4</td>
<td>3.12</td>
</tr>
<tr>
<td>8</td>
<td>152</td>
<td>48.7</td>
<td>3.12</td>
</tr>
<tr>
<td>16</td>
<td>304</td>
<td>76.4</td>
<td>3.98</td>
</tr>
</tbody>
</table>
<p>As Paul from GoGrid pointed out, GoGrid also offers a prepaid plan that is significantly cheaper than the pay-as-you-go plan. This is different from Amazon&#8217;s reserved instance where you get a discount if you pay an up-front fee. Although cheaper, Amazon&#8217;s reserved instance pricing only applies to that one instance you reserved, and when you need to dynamically scale, you cannot benefit from the lower price. GoGrid&#8217;s prepaid plan allows you to use the discount on any instances. To see the benefits of buying bulk, we also compare EC2 cost with GoGrid&#8217;s Enterprise Cloud prepaid plan, which costs $9,999 a month, but entitles you to 200,000 RAM hours at $0.05/hour. For brevity, we do not compare with other prepaid plans, which you can easily do yourself following our methodology.</p>
<p>The following table shows what a VM will cost in EC2 if the same  configuration is offered there, assuming we only get the minimum  guaranteed CPU.</p>
<table border="1">
<tbody>
<tr>
<td>RAM (GB)</td>
<td>GoGrid Enterprise cloud pre-paid cost (cents/hour)</td>
<td>Equivalent EC2 cost (cents/hour)</td>
<td>GoGrid cost/EC2 cost</td>
</tr>
<tr>
<td>0.5</td>
<td>2.5</td>
<td>3.05</td>
<td>0.82</td>
</tr>
<tr>
<td>1</td>
<td>5</td>
<td>6.09</td>
<td>0.82</td>
</tr>
<tr>
<td>2</td>
<td>10</td>
<td>8.9</td>
<td>1.12</td>
</tr>
<tr>
<td>4</td>
<td>20</td>
<td>21.1</td>
<td>0.95</td>
</tr>
<tr>
<td>8</td>
<td>40</td>
<td>42.2</td>
<td>0.95</td>
</tr>
<tr>
<td>16</td>
<td>80</td>
<td>71.2</td>
<td>1.12</td>
</tr>
</tbody>
</table>
<p>The following table compares the cost under the  optimistic scenario where you get the maximum CPU possible.</p>
<table border="1">
<tbody>
<tr>
<td>RAM (GB)</td>
<td>GoGrid enterprise cloud pre-paid cost (cents/hour)</td>
<td>Equivalent EC2 cost (cents/hour)</td>
<td>GoGrid cost/EC2 cost</td>
</tr>
<tr>
<td>0.5</td>
<td>2.5</td>
<td>4.69</td>
<td>0.53</td>
</tr>
<tr>
<td>1</td>
<td>5</td>
<td>6.1</td>
<td>0.82</td>
</tr>
<tr>
<td>2</td>
<td>10</td>
<td>12.2</td>
<td>0.82</td>
</tr>
<tr>
<td>4</td>
<td>20</td>
<td>24.4</td>
<td>0.82</td>
</tr>
<tr>
<td>8</td>
<td>40</td>
<td>48.7</td>
<td>0.82</td>
</tr>
<tr>
<td>16</td>
<td>80</td>
<td>76.4</td>
<td>1.05</td>
</tr>
</tbody>
</table>
<p>Under GoGrid&#8217;s pay-as-you-go plan, we can see that GoGrid is 2 to 4 times more expensive  than a hypothetical instance in EC2 with an exact same specification. However, if you can buy bulk, the cost is significantly lower. The smaller 0.5GB server could be as cheap as 53% of the cost of an equivalent EC2 instance.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/huanliu.wordpress.com/109/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/huanliu.wordpress.com/109/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/huanliu.wordpress.com/109/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/huanliu.wordpress.com/109/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/huanliu.wordpress.com/109/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/huanliu.wordpress.com/109/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/huanliu.wordpress.com/109/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/huanliu.wordpress.com/109/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/huanliu.wordpress.com/109/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/huanliu.wordpress.com/109/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/huanliu.wordpress.com/109/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/huanliu.wordpress.com/109/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/huanliu.wordpress.com/109/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/huanliu.wordpress.com/109/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=109&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://huanliu.wordpress.com/2011/01/25/gogrid-cost-comparison-with-amazon-ec2/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/aabcaed537a9f2df3a2010d2158d4546?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">huanliu</media:title>
		</media:content>
	</item>
		<item>
		<title>The true cost of an ECU</title>
		<link>http://huanliu.wordpress.com/2011/01/24/the-true-cost-of-an-ecu/</link>
		<comments>http://huanliu.wordpress.com/2011/01/24/the-true-cost-of-an-ecu/#comments</comments>
		<pubDate>Mon, 24 Jan 2011 09:47:56 +0000</pubDate>
		<dc:creator>huanliu</dc:creator>
				<category><![CDATA[Cloud]]></category>
		<category><![CDATA[cost]]></category>
		<category><![CDATA[EC2]]></category>
		<category><![CDATA[ECU]]></category>

		<guid isPermaLink="false">http://huanliu.wordpress.com/?p=105</guid>
		<description><![CDATA[How do you compare the cost of two cloud or IaaS offerings? Is Amazon EC2&#8242;s small instance (1 ECU, 1.7GB RAM, 160GB storage) cheaper or is Rackspace cloud&#8217;s 256MB server (4 cores, 256MB RAM, 10GB storage) cheaper? Unfortunately, answering this question is very difficult. One reason is that cloud vendors have been offering virtual machines [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=105&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>How do you compare the cost of two cloud or IaaS offerings? Is Amazon EC2&#8242;s small instance (1 ECU, 1.7GB RAM, 160GB storage) cheaper or is Rackspace cloud&#8217;s 256MB server (4 cores, 256MB RAM, 10GB storage) cheaper? Unfortunately, answering this question is very difficult. One reason is that cloud vendors have been offering virtual machines with different configurations, i.e., different combinations of CPU power, memory and storage, making is difficult to perform an apple-to-apple comparison.</p>
<p>Towards the goal of a better apple-to-apple comparison, I will break down the cost for CPU, memory and storage individually for Amazon EC2 in this post. For those not interested in understanding the methodology, the high level conclusions are as follows. In Amazon&#8217;s N. Virginia data center, the unit costs are:</p>
<ul>
<li>1 ECU costs $0.01369/hour</li>
<li>1 GB of RAM costs $0.0201/hour</li>
<li>1 GB of local storage costs $0.000159/hour</li>
<li>A 10GB network interface costs $0.41/hour</li>
<li>A GPU costs $0.52/hour</li>
</ul>
<p>Before we can break down the cost, we have to know what an instance&#8217;s (Amazon&#8217;s term for a virtual machine) cost consists of. We assume the cost includes solely the cost of its CPU, its memory, and its local storage space. This means that there is no fixed cost component, for example, to account for the hardware chassis, or to account for the static IP address. We make this assumption purely for simplicity. In practice, it makes little difference to the end result even if we assume there is a fixed cost component. We also note that the instance cost does not include the cost for the network bandwidth consumed, which is always charged separately, at least in the cloud providers we looked at.</p>
<p>Let us assume the instance cost is a linear function of the three components, i.e., Cost = c * CPU + m * Mem + s * Storage, where c, m and s are the unit cost of CPU, memory and local storage respectively. It is fortunate that Amazon EC2 offers several types of instances, each type of instance has a different combination of CPU, memory and storage, which offers us a clue of what each component costs. Combining the many types of instances, we can estimate the parameters c, m and s by using a least-square regression analysis. Let us first look at Amazon&#8217;s N. Virginia data center. We only use Linux instances&#8217; hourly cost as the instance cost to avoid accounting for an OS&#8217;s licensing cost. The results from least-square regression are:</p>
<p>s = 0.0159 cent/GB/hour<br />
m = 2.01 cent/GB/hour<br />
c = 1.369  cent/ECU/hour</p>
<p>The linear model and the estimation actually match the real data really well. The following table shows the instances we used for regression. The last column shows the instance cost as predicted by our estimated parameters, and the second-to-last column shows the real EC2 cost. As you can see, the two costs actually match fairly well, suggesting that a linear model is a good approximation. We should note that we mark the Micro instance to have 0.35 ECU. This is an average of its ECU allocation as we have shown in our <a title="EC2 Micro instance analysis" href="http://huanliu.wordpress.com/2010/09/10/amazon-ec2-micro-instances-deeper-dive/">Micro instance analysis</a>.</p>
<table border="1">
<tbody>
<tr>
<td>instance</td>
<td>CPU(in ECU)</td>
<td>RAM(in GB)</td>
<td>Storage(in GB)</td>
<td>Instance cost per hour (in cents)</td>
<td>Fitted instance cost per hour (in cents)</td>
</tr>
<tr>
<td>m1.small</td>
<td>1</td>
<td>1.7</td>
<td>160</td>
<td>8.5</td>
<td>7.33</td>
</tr>
<tr>
<td>m1.large</td>
<td>4</td>
<td>7.5</td>
<td>850</td>
<td>34</td>
<td>34.07</td>
</tr>
<tr>
<td>m1.xlarge</td>
<td>8</td>
<td>15</td>
<td>1,690</td>
<td>68</td>
<td>67.97</td>
</tr>
<tr>
<td>t1.micro</td>
<td>0.35</td>
<td>0.613</td>
<td>0</td>
<td>2</td>
<td>1.71</td>
</tr>
<tr>
<td>m2.xlarge</td>
<td>6.5</td>
<td>17.1</td>
<td>420</td>
<td>50</td>
<td>49.96</td>
</tr>
<tr>
<td>m2.2xlarge</td>
<td>13</td>
<td>34.2</td>
<td>850</td>
<td>100</td>
<td>100.1</td>
</tr>
<tr>
<td>m2.4xlarge</td>
<td>26</td>
<td>68.4</td>
<td>1,690</td>
<td>200</td>
<td>200</td>
</tr>
<tr>
<td>c1.medium</td>
<td>5</td>
<td>1.7</td>
<td>350</td>
<td>17</td>
<td>15.83</td>
</tr>
<tr>
<td>c1.xlarge</td>
<td>20</td>
<td>7</td>
<td>1,690</td>
<td>68</td>
<td>68.32</td>
</tr>
</tbody>
</table>
<p>It should come as no surprise that the memory is actually a significant component of the instance cost. Next time when you compare two cloud offerings, make sure to compare the RAM available.</p>
<p>In the estimation, we did not include EC2 cluster instances and cluster GPU instances, because they are different from other instances (both have a 10GB network interface and one has a GPU). But, now that we have a unit cost for CPU, memory and storage, we can estimate what those extra features cost.</p>
<p>For a cluster instance, combining the cost of CPU (33.5 ECU), memory (23GB), and storage (1690 GB) using our estimated parameters, the cost comes out to be $1.19/hour. Since Amazon charges $1.60/hour, the extra charge must be for the 10GB interface, which is the only feature that is different from other instances. Subtracting the two, the 10GB interface costs $0.41/hour.</p>
<p>For a cluster GPU instance, combining the cost of CPU (33.5 ECU), memory (22GB), and storage (1690 GB), the cost comes out to be $1.17/hour. Since Amazon charges $2.10/hour, the extra charge much be for the 10GB interface and the GPU. Subtracting the two costs and taking out the 10GB interface cost, we know the GPU costs $0.52/hour.</p>
<p>We can perform the same analysis for the other 3 Amazon data centers: N. California, Ireland and Singapore. Luckily, their cost structures are the same, so I only need to present one result. The unit costs are as follows:</p>
<p>s = 0.0169 cent/GB/hour<br />
m = 2.316 cent/GB/hour<br />
c = 1.575 cent/ECU/hour</p>
<p>The actual instance cost and the projected instance cost are as shown in the following table. Again, they agree very well. There are no cluster and cluster GPU instances in other data centers, so no cost for the 10GB interface and the GPU is shown.</p>
<table border="1">
<tbody>
<tr>
<td>instance</td>
<td>CPU(in ECU)</td>
<td>RAM(in GB)</td>
<td>Storage(in GB)</td>
<td>Instance cost per hour (in cents)</td>
<td>Fitted instance cost per hour (in cents)</td>
</tr>
<tr>
<td>m1.small</td>
<td>1</td>
<td>1.7</td>
<td>160</td>
<td>9.5</td>
<td>8.22</td>
</tr>
<tr>
<td>m1.large</td>
<td>4</td>
<td>7.5</td>
<td>850</td>
<td>38</td>
<td>38.07</td>
</tr>
<tr>
<td>m1.xlarge</td>
<td>8</td>
<td>15</td>
<td>1,690</td>
<td>76</td>
<td>75.97</td>
</tr>
<tr>
<td>t1.micro</td>
<td>0.35</td>
<td>0.613</td>
<td>0</td>
<td>2.5</td>
<td>1.97</td>
</tr>
<tr>
<td>m2.xlarge</td>
<td>6.5</td>
<td>17.1</td>
<td>420</td>
<td>57</td>
<td>56.96</td>
</tr>
<tr>
<td>m2.2xlarge</td>
<td>13</td>
<td>34.2</td>
<td>850</td>
<td>114</td>
<td>114.1</td>
</tr>
<tr>
<td>m2.4xlarge</td>
<td>26</td>
<td>68.4</td>
<td>1,690</td>
<td>228</td>
<td>228</td>
</tr>
<tr>
<td>c1.medium</td>
<td>5</td>
<td>1.7</td>
<td>350</td>
<td>19</td>
<td>17.74</td>
</tr>
<tr>
<td>c1.xlarge</td>
<td>20</td>
<td>7</td>
<td>1,690</td>
<td>76</td>
<td>76.34</td>
</tr>
</tbody>
</table>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/huanliu.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/huanliu.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/huanliu.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/huanliu.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/huanliu.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/huanliu.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/huanliu.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/huanliu.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/huanliu.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/huanliu.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/huanliu.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/huanliu.wordpress.com/105/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/huanliu.wordpress.com/105/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/huanliu.wordpress.com/105/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=105&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://huanliu.wordpress.com/2011/01/24/the-true-cost-of-an-ecu/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/aabcaed537a9f2df3a2010d2158d4546?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">huanliu</media:title>
		</media:content>
	</item>
		<item>
		<title>Dimensions to use to compare NoSQL data stores</title>
		<link>http://huanliu.wordpress.com/2011/01/21/dimensions-to-use-to-compare-nosql-data-stores/</link>
		<comments>http://huanliu.wordpress.com/2011/01/21/dimensions-to-use-to-compare-nosql-data-stores/#comments</comments>
		<pubDate>Fri, 21 Jan 2011 17:56:50 +0000</pubDate>
		<dc:creator>huanliu</dc:creator>
				<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Cassandra]]></category>
		<category><![CDATA[MongoDB]]></category>

		<guid isPermaLink="false">http://huanliu.wordpress.com/?p=102</guid>
		<description><![CDATA[You have decided to use a NoSQL data store in favor of a DBMS store, possibly due to scaling reasons. But, there are so many NoSQL stores out there, which one should you choose? Part of the NoSQL movement is the acknowledgment that there are tradeoffs, and the various NoSQL projects have pursued different tradeoff [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=102&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>You have decided to use a NoSQL data store in favor of a DBMS store, possibly due to scaling reasons. But, there are so many NoSQL stores out there, which one should you choose? Part of the NoSQL movement is the acknowledgment that there are tradeoffs, and the various NoSQL projects have pursued different tradeoff points in the design space. Understanding the tradeoffs they have made, and figuring out which one fits your application better is a major undertaking.</p>
<p>Obviously, choosing the right data store is a much bigger topic, which is not something that can be covered in a single blog. There are also many resources comparing the various NoSQL data stores, e.g., <a href="http://wiki.toadforcloud.com/index.php/Survey_distributed_databases">here</a>, so that there is no point repeating them. Instead, in this post, I will highlight the dimensions you should use when you compare the various data stores.</p>
<p><strong>Data model</strong></p>
<p>Obviously, you must choose a data model that matches your application. In SQL, there is only one, i.e., the relational model, so you have to fit your application into the relational model. Luckily, in the NoSQL world, you have a number of choices. They can be grouped into roughly four categories: key-value blob, column-oriented data store (e.g., BigTable-alike), document-based data store, and graph data store. The graph data store will fit, well&#8230;, graph problems (obviously) very well. We find that the column-oriented and document-based data store have roughly the same expressive power, and a variety of applications can fit well. In comparison, the key-value blob storage has a much simpler data model, which limits the number of applications that may fit.</p>
<p><strong>Consistency</strong></p>
<p>Amazon popularized the concept of &#8220;eventual consistency&#8221;, basically giving up consistency in favor of higher scalability. The application has to get around the limitation posed by the eventual consistency model, since it is the only one who understands the semantics of the data. One example is Amazon&#8217;s shopping cart application. Using their Dynamo backend, an item in the shopping cart may reappear after you have deleted it. That happens because the application choose to keep the item when the data is inconsistent and when it needs to reconcile the view.</p>
<p>In the weak consistency model, it is also important to compare the data store on how they reconcile inconsistencies. Some data stores, such as MongoDB and Cassandra, uses timestamp to reconcile, i.e., the last writer wins. The downside of this approach is that the timestamp needs to be accurately synchronized, which is very difficult if you want a finer resolution. Making it worse, Cassandra uses client&#8217;s timestamp, so you have to make sure your clients&#8217; (not the storage nodes&#8217;) clock are properly synchronized. Other data stores, such as Riak, uses vector clock to reconcile. The downside of this approach is that the reconciliation has to happen in the application because you need to understand the data semantic in order to reconcile.</p>
<p>If you cannot tolerate a weaker consistency model, or if it is too cumbersome for you to handle the reconciliation, you may want to consider a data store that supports a stronger consistency model, such as HBase and MongoDB. Cassandra supports a tunable consistency level, so you can use Cassandra and tune up the consistency level. Alternatively, you can use a BigTable clone, such as HBase and Hypertable, which supports a strong consistency model. This is cited as one of the reasons <a href="http://www.facebook.com/note.php?note_id=454991608919">Facebook used HBase</a> rather than Cassandra recently.</p>
<p><strong>Atomic test-and-set</strong></p>
<p>In CPUs, atomic test-and-set is a required instruction, and it is the building-block primitive to eliminate race condition in multi-processor environment. Suppose you want to increase a counter by 1. You have to read the counter&#8217;s current value first, increment it by 1, then write back the result. If someone else reads the counter after you read it, but write back the result before you write it, then your write is lost, and it is over-written by the other guy. Atomic test-and-set guarantees that no one can come in between your read and write.</p>
<p>Unfortunately, in NoSQL data stores, this is not a mandatory feature. There are several ways to get around it. First, with the flexible schema support, it is a common practice to aggressively create new columns on the fly, and avoid writing over old data. This works well if new writes are less often, but if you constantly write new data (e.g., increment the counter every second), you will end up with lots of garbage data that needs to be cleaned up later. Second, you can avoid the problem by making sure that only one agent updates the data. This gets harder to manage when you have many agents.</p>
<p>If you cannot use either work around in your application, you need to look for a data store that supports atomic test-and-set. Amazon&#8217;s SimpleDB, Yahoo PNUTS, Google BigTable, MongoDB all support some flavors of test-and-set. Unfortunately, other popular data stores, such as Cassandra, does not support atomic read-and-set.</p>
<p><strong>Secondary index</strong></p>
<p>There is no join capability from any of the NoSQL data stores. In order to support a richer data relationship and a faster lookup and retrieval for certain data items, you may need secondary index support. MongoDB supports secondary index, and both HBase and Cassandra have some early stage support for secondary index. Although not a secondary index, Riak supports links, which can link an item to another, so that you can build a richer relationship.</p>
<p><strong>Manageability</strong></p>
<p>Each data store has its own tools to help you automate the management, but its architecture dictates how much automation could be achieved. A symmetric architecture is a lot easier to manage and to reason about. Data stores, such as Cassandra and Riak, has only one type of nodes, and all nodes perform the same function. Other data stores have a master/slave architecture. The management is a little harder because you have to manage two types of nodes. If there are more than two types of nodes, it is even harder to manage. For example, MongoDB has two types of nodes: routing nodes and data serving nodes. But a data serving node could be either a primary or a secondary. Primary is the only one who can take writes in a replication set, while a secondary may be able to serve read requests if a weaker consistency model is acceptable. You have to keep track of which one is primary or secondary in order to reason about the system behavior.</p>
<p><strong>Latency vs. durability</strong></p>
<p>There is a tradeoff between latency and durability. A data write can return super fast if it is only committed to memory, but a memory corruption can easily lose your data. Alternatively, you can wait for the data to be written to a local disk before returning. The latency is a little longer, but it is more durable. Or, you can wait for the data to be written into several disks across several nodes. The latency is definitely longer, but it is a lot more durable. Even if a single hard disk or node fails, you still have your data stored somewhere else.</p>
<p>MongoDB favors low latency. When writing, it returns to the caller without even waiting for the data to be synced to the disk. Although this behavior can be overwritten by the application developer by sending a &#8220;sync&#8221; command right after the write, this work around can really kill the performance. HBase also makes a tradeoff to favor low latency. It does not sync log updates to disk, so it can return to clients quickly. Cassandra is tunable, where a client can specify on a per-call basis whether the write should be persisted. PNUTS is on the other extreme, where it always sync log data to disk.</p>
<p><strong>Read vs. write performance</strong></p>
<p>There is also a tradeoff between read and write performance. When you write, you can write sequentially to the disk, which optimize the write latency, because a spinning hard disk is very good at sequential writes. The price you have to pay is in the read performance. Since data is written sequentially based on the order it was written in, rather than its index order, reading the data may require scanning through several data files to find the latest copy. On the other hand, you can pay for the price when writing the data, to make sure the data is written in the correct place or the data is indexed. You pay for a slower write, but the read performance will be higher because it is a simple lookup. HBase and Cassandra both optimize for write, whereas PNUTS is optimized for read. Amazon SimpleDB also optimizes for read. This is evident in its low write throughput (roughly 30 writes/second in our measurement) and high read throughput.</p>
<p>There is a side effect of optimizing for read. Because some data has to be written in place (either the index or the data), there is a possibility of corruption, which may make the later half of the file unreadable. You have to carefully look into the design to make sure there are no corner failure cases that can cause this to happen, or come up with a good backup and recovery plan.</p>
<p><strong>Dynamic scaling</strong></p>
<p>This is a key requirement in NoSQL data stores. You want to be able to grow and shrink your cluster size and its capacity on the fly by simply adding or removing nodes. Fortunately, most NoSQL stores we looked at support this capability, so the decision is easy.</p>
<p><strong>Auto failover</strong></p>
<p>If dynamic scaling is implemented robustly, auto failover comes for free because a node failure should be indistinguishable from decommissioning a node. Unfortunately, some data stores require you to explicitly decommission a node. A node failure, i.e., an unplanned decommissioning, could take some time to recover.</p>
<p><strong>Auto load balancing</strong></p>
<p>The load a machine experiences, both in terms of the amount of storage and the amount of read/write requests, may differ widely among the machines forming the storage cluster. The load may also fluctuate wildly over time. A single overloaded node may cause great disruption to the cluster, even if other nodes are lightly loaded. HBase, MongoDB, and PNUTS all support auto load balancing, while Riak only rebalances when nodes join and leave. If the data store does not support auto load balancing, you have to make sure to load the data evenly yourself. It may involve profiling your data, and/or tuning the configuration. For example, in Cassandra, you can choose RandomPartitioner, which tends to even out the load.</p>
<p>Another aspect of load balancing is around failure scenarios. If a node fails, how many other nodes are going to take over the workload for the failing node? You want to spread the load as even as possible, so that you do not overload another node and trigger a domino effect.  This is cited as one of the reasons <a href="http://www.theregister.co.uk/2010/12/17/facebook_messages_tech/">Facebook choose HBase</a>, because HBase spreads out the shards across machines.</p>
<p><strong>Compression support</strong></p>
<p>Storing data in a compressed format saves disk space. Because IO is often the limiting factor is today&#8217;s computer systems, it is always a good idea to tradeoff CPU for a reduction in the storage space. HBase supports compression, but unfortunately, many others, including Cassandra and MongoDB, do not (yet) support compression.</p>
<p><strong>Range scan</strong></p>
<p>Many applications require the ability to read out a chunk of sequential data based on a predefined order (typically the index order). It is convenient to specify a range and get all keys within that range, because you do not even need to know what keys are there to lookup. In addition, you can perform a range scan at a much higher performance than looking up each individual keys (even assuming you know all keys in the range).</p>
<p>BigTable stores data in lexicographical order; hence, it can easily support range scan. As a BigTable clone, HBase supports range scan. Even though only modeling after the BigTable data model, Cassandra also supports range scan with their OrderPreservingPartitioner. On the other hands, key-value stores, such as Riak, do not support range scan.</p>
<p><strong>Failure scenarios</strong></p>
<p>What failure scenario are you willing to tolerate? Many are implemented with a master/slave architecture. If the master goes down, the failure could be quiet dramatic. For example, Hypertable currently only has a single master (although there is plan to change it in the future), which is a single point of failure. Not only there is only a single master, but there also is only a single chubby node, so the master&#8217;s failure could be catastrophic. Other master/slave implementations have better plans to protect the master. There are often ways to recover the master gracefully. However, it means that the cluster could be gone for an extended period of time when recovering the master node. Fully distributed implementations, such as Riak and Cassandra, can tolerate failure much more gracefully. Because they are symmetric, a node failure typically means a degraded service, rather than a total failure.</p>
<p>Another aspect of failure handling that you have to look into is failure recovery time. In addition to the master node, when a data node goes down, it could take some time to recover. For example, BigTable has a single tablet server per range. If a tablet server is down, it has to be reconstructed from the DFS, when could take some time.</p>
<p>I have highlighted some dimensions that you need to think about when comparing the various NoSQL data stores. It is by no means exhaustive, but hopefully it is a good list to get your started.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/huanliu.wordpress.com/102/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/huanliu.wordpress.com/102/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/huanliu.wordpress.com/102/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/huanliu.wordpress.com/102/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/huanliu.wordpress.com/102/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/huanliu.wordpress.com/102/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/huanliu.wordpress.com/102/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/huanliu.wordpress.com/102/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/huanliu.wordpress.com/102/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/huanliu.wordpress.com/102/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/huanliu.wordpress.com/102/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/huanliu.wordpress.com/102/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/huanliu.wordpress.com/102/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/huanliu.wordpress.com/102/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=huanliu.wordpress.com&amp;blog=6318270&amp;post=102&amp;subd=huanliu&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://huanliu.wordpress.com/2011/01/21/dimensions-to-use-to-compare-nosql-data-stores/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/aabcaed537a9f2df3a2010d2158d4546?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">huanliu</media:title>
		</media:content>
	</item>
	</channel>
</rss>
