Amazon EC2 grows 62% in 2 years

I estimated Amazon data center size about two years ago using a unique probing technique that I came up with. Since then, I have been tracking their growth (US East data center monthly, but less frequently for all data centers). Now is the time to give you all an update.

Physical server

I will not cover the technique again here, since you can refer to the original post. But I want to stress that this is measuring the number of physical server racks in their data centers, hence deducing the number of physical servers. There are other approaches, such as Netcraft that measures the web facing virtual servers. However, Netcraft only measures the number of virtual servers (and only a subset of it, those that are web facing), where a virtual server could be a tiny Micro instance, a very small slice of a physical server. If you want to know how big EC2 is physically, this is the definitive research.

The following figure shows the growth of the US East data center.

useastgrowthtrend

Number of server racks in EC2 US East data center

The growth in US East data center slowed down in late 2012 and 2013, but the growth has picked up quite a bit recently. It only added 1,362 racks between Mar. 12, 2012 and Dec. 29th, 2013, whereas, it has been adding on average 1,000 racks per year between 2007 and 2013. Then, all of a sudden, it adds 431 racks in the last month and half. However, other EC2 data centers have enjoyed tremendous growth in the two years period. The following table shows how many racks I can observe today, and at the end of last year vs. two years ago by each data center.

data center # of server racks on 3/12/2012 # of server racks on 12/29/2013 % growth 3/12/2012 to 12/29/2013 # of server racks on 2/18/2014 % growth 3/12/2012 to 2/18/2014
US East (Virginia) 5,030 6,382 26.9% 6,813 35.4%
US West (Oregon) 41 619 1410% 904 2205%
US West (N. California) 630 847 34.4% 950 50.8%
EU West (Ireland) 814 1,340 64.6% 1,556 191.2%
AP Northeast (Japan) 314 589 87.6% 719 229%
AP Southeast (Singapore) 246 371 50.8% 432 75.6%
SA East (Sao Paulo) 25 83 232% 122 488%
Total 7,100 10,231 44.1% 11,496 61.9%

There are a few observations:

1. The overall growth rate shows no sign of slowing down. From Jan. 2007 to Mar. 2012, EC2 grows from almost 0 server to 7,100 racks of servers, roughly 1,420 racks per year. From Mar. 2012 to Feb. 2014, EC2 grows from 7,100 racks to 11,496 racks, which is 2,198 racks per year.

2. Most of the growth is not from the US East data center. The Oregon data center grows the most at 2205%, followed by Sao Paulo at 488%.

3. There is a huge spike within the last 1.5 months. The number of racks increased from 10,231 to 11,496, adding 1,265 racks of servers.

The overall growth in the last two years is 62%, which is quite impressive. However, others have estimated that AWS revenue have been growing at a faster rate of more than 50% per year. The discrepancy could be due to the fact that AWS revenue includes many other AWS services including some new ones they have introduced in recent years, and EC2 is just a smaller component of it.

Virtual server growth

Another way to look at EC2’s growth is to look at how many virtual servers are running. Since a customer is paying for a virtual server, looking at the virtual server trend is also a good predictor of EC2 revenue.

As part of our probing technique, we enumerate all virtual servers, regardless whether it hosts a web server or not. If a virtual server is running, the EC2 DNS server will have an entry translating its external IP address to its internal IP address. By counting the number of DNS entries, we arrive at an upper bound of the number of virtual servers running (it is an upper bound because when a virtual server is terminated, the DNS entry is not deleted right away).

The following figure shows the number of running virtual servers (active DNS entries) in the US East Data center in orange. AWS also publishes the number of IP addresses that are available periodically, and we have been tracking that over time. The blue points shows how many IP addresses that are available to assign to virtual servers. AWS has been constantly adding more IP address allocation ahead of the expected growth.

AWS number of running virtual servers

EC2 number of running virtual servers

The green dots show the total available IP addresses across all data center. It is an upper bound on the maximum number of virtual servers EC2 can run. On Dec. 29th, 2013, our data shows there are up to 2.97 Million virtual machines that are active. You can put in an assumption of the average price AWS charges for an instance to roughly estimate EC2 revenue.

Density

From our data, we can also derive the density — the average number of virtual servers running on a physical server. On Mar. 12, 2012, there are 120 virtual servers running on each server rack. However, on Dec. 29th, 2013, this density has increased to 245 virtual servers per rack. Either the Micro instance is gaining popularity, or AWS has been doing a better job of consolidating their load to increase the profit margin.

Parting comment

I have not been blogging much in the last two years. You may be wondering what I have been doing. Well, I have been working on a startup, today we finally come out of stealth mode, and we are officially launching at the Launch Festival. It is an iPhone app, called Jamo, that brings dance games from Wii and Xbox to the iPhone. If this research has been helpful to you, please help me by downloading the App, and give us a 5* rating. You can read more about the App in a previous post.

Advertisements

Amazon DynamoDB use cases

In-memory computing is clearly hot. It is reported that SAP HANA has been “one of SAP’s more successful new products — and perhaps the fastest growing new product the company ever launched”. Similarly, I have heard Amazon DynamoDB is also a rapidly growing product for AWS. Part of the reason is that the price for in-memory technology has dropped significantly, both for SSD flash memory and traditional RAM, as shown in the following graph (excerp from Hasso Plattner and Alexander Zeier’s book, page 15).

In-memory technology offers both higher throughput and lower latency, thus it could potentially be used to satisfy a range of latency-hungry or bandwidth-hungry applications. To understand DynamoDB’s sweet spots, we looked into many areas where DynamoDB could be used, and we concluded that DynamoDB does not make sense for applications that desire a higher throughput, but it does make sense for a portion of the applications that desire a lower latency. This post is about our reasoning when investigating DynamoDB, hope it helps those of you who are considering adopting the technology.

Let us start examining a couple of broader classes of applications, and see which one might be a good fit for DynamoDB.

Batch applications

Batch applications are those with a large volume of data that needs to be analyzed. Typically, there is a less stringent latency requirement. Many batch applications can run overnight or for even longer before the report is needed. However, there is a strong requirement for high throughput due to the volume of data. Hadoop, a framework for batch applications, is a good example. It cannot guarantee low latency, but it can sustain a high throughput through horizontal scaling.

For data intensive applications, such as those targeted by the Hadoop platform, it is easy to scale the bandwidth. Because there is an embarassing amount of parallelism, you can simply add more servers to the cluster to scale out the throughput. Given that it is feasible to get high bandwidth both through in-memory technology and through disk-based technology using horizontal scaling, it comes down to price comparison.

The RAMCloud project has made an argument that in-memory technology is actually cheaper in certain cases. As noted by the RAMCloud paper, even though hard drive’s price has also fallen over the years, the IO bandwidth of a hard disk has not improved much. If you desire to access each data item more frequently, you simply cannot fill up the disk; otherwise, you will choke the disk IO interface. For example, the RAMCloud paper calculates that you can access any data only 6 times a year on average if you fill up a modern disk (assuming random access for 1k blocks). Since you can only use a small portion of a hard disk if you need high IO throughput, your effective cost per bit goes up. At some point, it is more expensive than an in-memory solution. The following figure from the RAMCloud paper shows in which area a particular technology becomes the cheapest solution. As the graph shows, when the data set is relatively smaller, and when the IO requirement is high, in-memory technology is the winner.

The key to RAMCloud’s argument is that you cannot fill up a disk, thus the effective cost is higher. However, this argument does not apply in the cloud. You pay AWS for the actual storage space you use, and you do not care a large portion of the disk is empty. In effect, you count on getting a higher access rate to your data at the expense of other people’s data getting a lower access rate (This is certainly true for some of my data in S3 which I have not accessed even once since I started using AWS in 2006). In our own tests, we get a very high throughput rate from both S3 and SimpleDB (by spreading the data over many domains). Although there is no guarantee on access rate, S3 comes at a cost of 1/8 and SimpleDB comes at a cost of 1/4 of that of DynamoDB, making both an attractive alternative for batch applications.

In summary, if you are deploying in house where you are paying for the infrastructure cost, it may make sense economically to use in-memory technology for your batch applications. However, in a hosted cloud environment where you only pay for the actual storage you use, in-memory technology, such as DynamoDB, is less likely a candidate for batch applications.

Web applications

We have argued that bandwidth-hungry applications are not a good fit for DynamoDB because there is a cheaper way using a disk based solution by leveraging shared bandwidth in the cloud. But let us look at another type of applicaton — web appplications — which may value the lower latency offered by DynamoDB.

Interactive web applications

First, let us consider an interactive web application, where users may create data on your website, then they may query the data in many different forms. Our work around Gamification typically involves this kind of application. For example, in Steptacular (our previous Gamification work on health care/wellness), users need to upload their walking history, then they may need to query their history in many different format and look at their friends’ actions.

For our current Gamification project, we seriously considered using DynamoDB, but in the end, we concluded that it is not a good fit for two reasons.

1. Immaturity of ORM tools

Many web applications are developed using an ORM (Object Relational Mapping) tool. This is because an ORM tool shields you away from the complexity of the underlying data store, allowing the developers to be more productive. Ruby’s ActiveRecords is the best I have seen, where you just define your data model in one place. Unlike earlier ORM tools, such as Hibernate for Java, you do not even have to explicitly define a mapping using an XML file, all the mapping is done automatically.

Even though Amazon SDK comes with an ORM layer, its feature set is far from other mature ORM tools. People are developing a more complete ORM tool, but the lack of features from DynamoDB (e.g., no auto-increment ID field support) and the wide grounds to cover for each progamming language means that it could be a while before this field matures.

2. Lack of secondary index

The lack of secondary index support makes it a no go for a majority of interactive web applications. These interactive web applications need to present data in many different dimensions, each dimension needs to have an index for an efficient query.

AWS recommends that you duplicate data in different tables, so that you can use the primary index to query efficiently. Unfortunately, this is not really practical. This requires multiple writes on data input, which is not only a performance killer, but it also creates a coherence management nightware. The coherence management problem is difficult to get around. Consider a failure scenario, where you successfully wrote the first copy, but then you failed when you are updating the data in the second table with a different index structure. What do you do in that case? You cannot simply roll back the last update because, like many other NoSQL data stores, DynamoDB does not support transaction. So you will end up with an inconsistent state.

Hybrid web/batch applications

Next, let us consider a different type of web application, which I refer to as the google-search-type web application. This type of application has little or no data input from the web front end, or if it takes data from the web front end, the data is not going to be queried over more than one dimension. In other words, this type of application is mostly read-only. The data it queries may come from a different source, such as from web crawling, and there is a batch process which load the data possibly into many tables with different indexes. The consistency problem is not an issue here because the batch process can simply retry without worrying about data getting out of sync since there are no other concurrent writes. The beauty of this type of application is that it can easily get around the feature limitations of DynamoDB and yet benefit from the much reduced latency to improve interactivity.

Many applications fall into this category, including BI (Business Intelligence) applications and many visualization applications. Part of the reason that SAP HANA is taking off is because the demands from BI applications for faster, interactive queries. I think the same demand is probably driving the demand for DynamoDB.

What type of applications are you deploying in DynamoDB? If you are deploying an interactive web application or a batch application, I would really like to hear from you to understand the rationale.

Amazon data center size

(Edit 3/16/2012: I am surprised that this post is picked up by a lot of media outlets. Given the strong interest, I want to emphasize what is measured and what is derived. The # of server racks in EC2 is what I am directly observing. By assuming 64 physical servers in a rack, I can derive the rough server count. But remember this is an *assumption*. Check the comments below that some think that AWS uses 1U server, others think that AWS is less dense. Obviously, using a different assumption, the estimated server number would be different. For example, if a credible source tells you that AWS uses 36 1U servers in each rack, the number of servers would be 255,600. An additional note: please visit my disclaimer page. This is a personal blog, only represents my personal opinion, not my employer’s.)

Similar to the EC2 CPU utilization rate, another piece of secret information Amazon will never share with you is the size of their data center. But it is really informative if we can get a glimpse, because Amazon is clearly a leader in this space, and their growth rate would be a great indicator of how well the cloud industry is doing.

Although Amazon would never tell you, I have figured out a way to probe for its size. There have been early guesstimates on how big Amazon cloud is, and there are even tricks to figure out how many virtual machines are started in EC2, but this is the first time anyone can estimate the real size of Amazon EC2.

The methodology is fully documented below for those inquisitive minds. If you are one of them, read it through and feel free to point out if there are any flaws in the methodology. But for those of you who just want to know the numbers: Amazon has a pretty impressive infrastructure. The following table shows the number of server racks and physical servers each of Amazon’s data centers has, as of Mar. 12, 2012. The column on server racks is what I directly probed (see the methodology below), and the column on number of servers is derived by assuming there are 64 blade servers in each rack.

data center\size # of server racks # of blade servers
US East (Virginia) 5,030 321,920
US West (Oregon) 41 2,624
US West (N. California) 630 40,320
EU West (Ireland) 814 52,096
AP Northeast (Japan) 314 20,096
AP Southeast (Singapore) 246 15,744
SA East (Sao Paulo) 25 1,600
Total 7,100 454,400

The first key observation is that Amazon now has close to half a million servers, which is quite impressive. The other observation is that the US east data center, being the first data center, is much bigger. What it means is that it is hard to compete with Amazon on scale in the US, but in other regions, the entry barrier is lower. For example, Sao Paulo has only 25 racks of servers.

I also show the growth rate of Amazon’s infrastructure for the past 6 months below. I only collected data for the US east data center because it is the largest, and the most popular data center. The Y axis shows the number of server racks in the US east data center.

EC2 US east data center growth in the number of server racks

Besides their size, the growth rate is also pretty impressive. The US east data center has been adding roughly 110 racks of servers each month. The growth rate looks roughly linear, although recently it is showing signs of slowing down.

Probing methodology

Figuring out EC2′ size is not trivial. Part of the reason is that EC2 provides you with virtual machines and it is difficult to know how many virtual machines are active on a physical host. Thus, even if we can determine how many virtual machines are there, we still cannot figure out the number of physical servers. Instead of focusing on how many servers are there, our methodology probes for the number of server racks out there.

It may sound harder to probe for the number of server racks. Luckily, EC2 uses a regular pattern of IP address assignment, which can be exploited to correlate with server racks. I noticed the pattern by looking at a lot of instances I launched over time and running traceroutes between my instances.  The pattern is as follows:

  • Each EC2 instance is assigned an internal IP address in the form of 10.x.x.x.
  • Each server rack is assigned a 10.x.x.x/22 IP address range, i.e., all virtual machines running on that server rack will have the same 22 bits IP prefix.
  • A 10.x.x.x/22 IP address range has 1024 IP addresses, but the first 256 are reserved for DOM0 virtual machines (system management virtual machine in XEN), and only the last 768 are used for customers’ instances.
  • Within the first 256 addresses, two at address 10.x.x.2 and 10.x.x.3 are reserved for routers on the rack. These two routers are arranged in a load balanced and fault-tolerant configuration to route traffic in and out of the rack. I verified that the uplink capacity from 10.x.x.2 and 10.x.x.3 are roughly 2 Gbps total, further suggesting that they are routers each with a 1Gbps uplink.

Understanding the pattern allows us to deduce how many racks are there. In particular, if we know a virtual machine at a certain internal IP address (e.g., 10.2.13.243), then we know there is a rack using the /22 address range (e.g., a rack at 10.2.12.x/22). If we take this to the extreme where we know the IP address of at least one virtual machine on each rack, then we can see all racks in EC2.

So how can we know the IP addresses of a large number of virtual machines? You can certainly launch a large number of virtual machines and record the internal IP addresses that you get, but that is going to be costly. If you are RightScale, where a large number of instances are launched through your service, you may not be able to take this approach. Another approach is to scan the whole IP address space and watch when an instance responds back to a ping. There are several problems with this approach. First, it may be considered port scanning, which is a violation of AWS’s policy. Second, not all live instances respond to ping, especially with AWS’ security group blocking all ports by default. Lastly, the whole IP address space in 10.x.x.x is huge, which would take a considerable amount of time to scan.

While you may be discouraged at this point, it turns out there is another way. In addition to the internal IP address we talked about, each AWS instance also has an external IP address. Although we cannot scan the external IP addresses either (so as not to violate the port scanning policy), we can leverage DNS translation to figure out the internal IP addresses. If you query DNS for an EC2 instance’s public DNS name from inside EC2, the DNS server will return its internal IP address (if you query it from outside of EC2, you will get the external IP instead). So, all we are left to do is to get a large number of EC2 instances’ public DNS names. Luckily, we can easily derive the list of public DNS names, because EC2 instances’ public DNS names are directly tied to their external IP addresses. An instance at external IP address x.y.z.w (e.g., 50.17.204.150) will have a public DNS name ec2-x-y-z-w…..amazonaws.com (e.g., ec2-50-17-204-150.compute-1.amazonaws.com if in US east data center). To enumerate all public DNS names, we just have to find out all public IP addresses. Again, this is easy to do because EC2 publishs all public IP addresses they use here.

Once we determined the number of server racks, we just multiply it by the number of physical servers on the rack. Unfortunately, we do not know how many physical servers are on each rack, so we have to make assumptions. I assume Amazon has dense racks, each rack has 4 10U chassis, and each chassis holds 16 blades for a total of 64 blades/rack.

Let us recap how we can find all server racks.

  • Enumerate all public IP addresses EC2 uses
  • Translate a public IP address to its public DNS name (e.g., ec2-50-17-204-150.compute-1.amazonaws.com)
  • Run a DNS query inside EC2 to get its internal IP address (e.g., 10.2.13.243).
  • Derive the rack’s IP range from the internal IP address (e.g., 10.2.12.x/22).
  • Count how many unique racks we have seen, then multiple it by the number of physical servers in a rack (I assume it is 64 servers/rack).

Caveat

Even though my methodology could provide insights that are never possible before, it has its shortcomings, which could lead to inaccurate results. The limitations are:

  • The methodology requires an active instance on a rack for the rack to be observed. If the rack has no instances running on it, we cannot count it.
  • We cannot know how many physical servers are in a rack. I assume Amazon has dense racks, each rack has 4 10U chassis, and each chassis holds 16 blades.
  • My methodology cannot tell whether the racks I observe are for EC2 only. It could be possible that other AWS services (such as S3, SQS, SimpleDB) run on virtual servers on the same set of racks. It it also possible that they run on dedicated racks, in which case, AWS is bigger than what I can observe. So, what I am observing is only a lower bound on the size of AWS.

Launch a new site in 3.5 weeks with Amazon

Getting started quick is one of the reasons that people adopted cloud, and that is why Amazon Web Services (AWS) is so popular. But people often overlook the fact that the retail part of Amazon is also amazing. If your project involves supply chain, you can also leverage Amazon retail to get up and running quickly.

We recently launched a wellness pilot project at Accenture where we leveraged both Amazon retail and Amazon web services. The Steptacular pilot is designed to encourage Accenture US employees to lead a healthy lifestyle. We all had our new year resolutions, but we always procrastinate, and we never exercise as much as we should. Why? Because there is a lack of motivation and engagement. The Steptacular pilot uses a pedometer to track a participant’s physical activity, then it leverages concepts in Gamification, uses social incentive (peer pressure) and monetary incentive to constantly engage participants. I will talk about the pilot and its results in details in a future post, but in this post, let me share how we are able to launch within 3.5 weeks, the key capabilities we leveraged from Amazon and some lessons we learned from this experience.

Supply chain side

The Steptacular pilot requires participants to carry a pedometer to track their physical activity. This is the first step of increasing engagement — using technology to alleviate the hassle of manual (and inaccurate) entry. We quickly locked into the Omron HJ-720 model because it is low cost and it has a USB connector so that we can automate the step upload process.

We got in touch with Omron. The guys at Omron are super nice. Once they learned what we are trying to do, they immediately approved us as a reseller. That means we can buy pedometer at the wholesale price. Unfortunately, we still have to figure out how we can get the devices into our participants’ hands. Accenture is a distributed organization with 42 offices in the US alone. To make the matter worse, many consultants work from client sites, so it is not feasible to distribute in person. We seriously considered three options:

  1. Ask our participants to order directly from Amazon. This is the solution we chose in the end, after connecting with the Amazon buyer in charge of the Omron pedometer and being assured that they will have no problem handling the volume. It turns out that this not only saves us a significant amount of shipping hassle, but it is also very cost effective for our participants.
  2. Be a vendor ourselves and uses Amazon for supply chain. Although I did not know about it before, I am pleasantly surprised to learn about the Fulfillment by Amazon capability. This is Amazon’s cloud for supply chain. Like a cloud, this is provided as a service — you store your merchandise in Amazon’s warehouse, and they handle the inventory and shipping. Also, like a cloud, it is pay per use with no long term commitment. Although equally good at reducing hassle for us, we did not find that we can save cost. Amazon retail is so efficient and has such a small margin that we realize we cannot compete even though we are happy with a 0% margin and even though we (supposedly) pay for the same wholesale price.
  3. Ship and manage by ourselves. The only way we could be cheaper is if we manage the supply chain and shipping logistics ourselves, and of course, this is assuming that we work for free. However, the amount of work is huge, and none of us wants to lick envelope for a few weeks, definitely not for free.

The pilot officially launched on Mar. 29th. Besides Amazon itself, another Amazon affiliate, J&R music, also sells the same pedometer on Amazon’s website. Within a few minutes, our participants were able to totally drain J&R’s stock. However, Amazon remained in stock for the whole duration. Within a week, they sold roughly 3,000 pedometers pedometers. I am sure J&R is still mystified by the sudden surge in demand. If you are from J&R, my apologies for not giving adequate warning ahead and kudos to you for not overcommitting your stock like many TouchPad vendors did recently (I am one of those burned by OnSale).

In addition to managing device distribution, we also have to worry about how to subsidize our participants. Our sponsors agreed to subsidize each pedometer by $10 to ease the adoption, but we could not just write each participant a $10 check — that is too much work. Again, Amazon came to the rescue. There are two options. One is that Amazon could generate a bunch of one-time-use $10 discount code which is specifically tied to the pedometer product, then, based on how many are redeemed, Amazon could bill us for the total cost. The other option is that we could buy a bunch of $10 gift cards in bulk and distribute to our participants electronically. We ultimately chose the gift card option for its flexibility and also for the fact that it is not considered a discount so that the device would still cost more than $25 for our participants to qualify for super saver shipping. Looking back, I do regret choosing the gift card option, because managing squatters turns out to be a big hassle, but that is not Amazon’s fault, it is just human nature.

Technology platform side

It is a no-brainer to use Amazon to leverage its scaling capabilities, especially for a short-term quick project like ours. One key thing we learned from this experience is that you should only use what you need. Amazon web services offer a wide range of services, all designed for scale, so it is likely that you will find a service that serves your need.

Take for example the email service Amazon provides. Initially, we used Gmail for sending out signup confirmations and email notifications. During the initial scaling trial, we soon hit Gmail’s limit on how fast we can send emails. Once realizing the problem, we quickly switched to Amazon SES (Simple Email Service). There is an initial cap on how many we can send, but it only took a couple of emails for us to lift the limit. With a couple of hours of coding and testing, we all of a sudden can send thousands of emails at once.

In addition to SES, we also leveraged AWS’ CloudWatch service to enable us to closely monitor and be alerted of system failures. Best of all, it all comes for free without any development effort from our side.

Even though Amazon web services offer a large array of services, you should only choose what you absolutely need. In other words, do not over engineer. Let us taking auto scaling as an example. If you host a website in Amazon, it is natural to think about putting in an auto-scaling solution, just in case to handle the unexpected. Amazon has its auto scaling solution, and we, at the Accenture labs, have even developed an auto-scaling solution called WebScalar in the past. If you are Netflix, it makes absolute sense to do so because your traffic is huge and it fluctuates widely. But if you are smaller, you may not need to scale beyond a single instance. If you do not need it, it is extra complexity that you do not want to deal with especially when you want to launch quick. We estimated that we will have around 4,000 participants, and when we did a quick profiling, we figured that a standard extra-large instance in Amazon would be adequate to handle the load. Sure enough, even though the website experienced a slow down for a short period of time during launch, it remains adequate to handle the traffic for the whole duration of the pilot.

We also learned a lesson on fault tolerance — really think through your backup solution. Steptacular survived two large-scale failures in the US East data center. We enjoyed peace of mind partly because we are lucky, partly because we have a plan. Steptacular uses an instance-store instance (instead of an EBS instance). We made the choice mainly for performance reasons — we want to free up the network bandwidth and leverage the local hard disk bandwidth. This turns out to have saved us from the first failure in Apr. which is caused by EBS blocks failure. Even though we cannot count on EBS for persistency, we build in our own solution. Most static content on the instance is bundled into a Amazon Machine Image (AMI). There are two pieces of less static content (the content that changes often) stored on the instance: the website logic and the steps database. The website logic is stored in a Subversion repository and the database is synced to another database running outside of the US East data center. This architecture allows us to be back up and running quickly, by first launching our AMI, then check out website code from repository and lastly dump and reload the database from the mirror. Even though we did not have to initiate this backup procedure, it is good to have the peace of mind knowing your data is safe.

Thanks to Amazon, both Amazon retail and Amazon web services, we are able to pull off the pilot in 3.5 weeks. More importantly, the pilot itself has collected some interesting results on how we can motivate people to exercise more. But I will leave that to a future post after we have a chance to dig deep into the data.

Acknowledgments

Launching Steptacular in 3.5 weeks would not have been possible without the help of many people. We would like to especially thank the following folks:

  • Jim Li from Omron for providing both hardware, software and logistics support
  • Jeff Barr from Amazon for connecting us with the right folks at Amazon retail
  • James Hamilton from Amazon for increasing our email limit on the spot
  • Charles Allen from Amazon for getting us the gift codes quickly
  • Tiffany Morley and Helen Shen from Amazon for managing the inventory so that the pedometer miraculously stayed in stock despite the huge demand

Last but not least, big kudos to the Steptacular team, which includes several Stanford students, who worked really hard even through the finals week to get the pilot up and running. They are one of the best team I proudly have ever worked with.

Amazon’s physical hardware and EC2 compute unit

Ever wonder what hardware is running behind Amazon’s EC2? Why would you even care? Well, there are at least a couple of reasons.

  1. Side-by-side comparisons. Amazon express their machine power in terms of EC2 compute units (ECU) and other cloud providers just express it in terms of number of cores. Either case, it is vague and you cannot perform economical comparison between different cloud offerings, and with owning-your-own-hardware approach. Knowing how much a EC2 computing unit is in terms of hardware raw power allows you to perform apple-to-apple comparison.
  2. Physical isolation. In many enterprise clients’ mind, security is the number one concern. Even though hypervisor isolation is robust, they feel more comfortable if there is a physical separation, i.e., they do not want their VM to sit on the same physical hardware right next to a hacker’s VM. Knowing the hardware computing power and the largest VM’s computing power, one can determine whether there is enough room left to host a hacker’s VM.

The observation below is only based on what we see in the N. Virginia data center, and the underlying hardware may very well be very different in other data centers (i.e., Ireland, N. California and Singapore). If you are curious, feel free to use the methodology that we will describe to see what is going on in other data centers.

Our observation is based on a combination of “hints” from several different tools and methods, including the following:

CPUID

The “cpuid” instruction is supported by all x86 CPU manufacturers, and it is designed to report the capabilities of the CPU. This instruction is non-trapping, meaning that you can execute it in user mode without triggering protection trap. In the Xen paravirtualized hypervisor (what Amazon uses), it means that the hypervisor would not be able to intercept the instruction, and change the result that it returns. Therefore, the output from “cpuid” is the real output from the physical CPU.

We look at several fields in the “cpuid” output. First and foremost, we look at the branding string, which identifies the CPU’s model number. We also look at “local APIC physical ID” in (1/ebx). The “APIC ID” is unique for a physical core. By enumerating all “APIC ID”s, we know how many physical cores are there. Lastly, we look at “Logical CPU cores” in (0x80000008/ecx). It is supposed to show how many hyper-thread cores are on a physical core.

Intel processor specifications

With the model numbers reported by “cpuid”, we could look up their data sheet to determine the exact specification of a processor, including how many cores per socket, how many sockets per system, and its cache size etc.

/sys/devices/system/cpu/cpu?/cache/index?/shared_cpu_map

This is a file in the Linux file system. It lists the cache hierarchy including which cores share a particular cache. This is used as a validation to match against the cache hierarchy specified in the CPU data sheet. However, this is not used to reach any conclusion, as we have seen it reporting wrong information in some cases.

Performance benchmark

We use PassMark-CPU Mark — a performance benchmark — to compare the CPU performance with other systems with the same CPU configuration. A matching performance number would confirm our observation.

System statistics

A variety of tools, such as “mpstat” and “top”, can report on the system’s performance statistics, including the CPU and memory usage. In particular, on a Xen-hypervisor, a VM can get the steal cycle statistics — time that is stolen from the VM to run other things, including other VMs. The documentation states that steal cycle counts the amount of time that your VM is ready to run but could not due to others competing for the CPU. Thus, if you keep your VM busy, you will see all the CPU cycle stolen from you. For example, on an m1.small VM, you will see the steal cycle to be roughly 60% and you can keep your CPU busy at most up to 40%. This is a hard cap Amazon puts on to limit you to one EC2 compute unit.

Now that the methodology is clear, we can dive into the observations. Amazon infrastructure runs on three set of distinct hardware.

High-memory instances

The high-memory instances (m2.xlarge, m2.2xlarge, m2.4xlarge) run on systems with dual-socket Intel Xeon X5550 (Nahelem) 2.66GHz processors. Intel Xeon X5550 processor has 4 cores, and each core is capable of hyper-threading, i.e., there could be 8 cores from the software’s point of view. However, Amazon disable hyper-threading, because “cpuid” 0x80000008/ecx reports that there is only one logical core. Further, the APIC IDs are 1, 3, 5, 7, 17, 19, 21, 23. The missing IDs (9, 11, 13, 15) are probably reserved for the hyper-threading cores and they are not used. The m2.4xlarge instance occupies the whole physical hardware. An m2.4xlarge instance’s Passmark-CPU mark is 10,052.6, on par with other dual-socket X5550 systems (average is 10,853). Furthermore, we never observe steal cycle beyond 1 or 2%.

High-CPU instances

The high-CPU instances (c1.medium, c1.xlarge) run on systems with dual-socket Intel Xeon E5410 2.33GHz processors. It is dual-socket because we see APIC IDs 0 to 7, and E5410 only has 4 cores. A c1.xlarge instance almost takes up the whole physical machine. However, we frequently observe steal cycle on a c1.xlarge instance ranging from 0% to 25% with an average of about 10%. The amount of steal cycle is not enough to host another smaller VM, i.e., a c1.medium. Maybe those steal cycles are used to run Amazon’s software firewall (security group). On Passmark-CPU mark, a c1.xlarge machine achieves 7,962.6, actually higher than an average dual-sock E5410 system is able to achieve (average is 6,903).

Standard instances

The standard instances (m1.small, m1.large, m1.xlarge) run on systems with a single socket Intel Xeon E5430 4 core 2.66GHz processor. A m1.small instance may occasionally run on a system consisting of an AMD Dual-Core Opteron 2218 HE Processor, but that system is rare to find (<10%), so we would not focus on it here. The Xeon E5430 platform is single socket because we only see APIC IDs 0,1,2,3.

By simple deduction, we can reason that an m1.xlarge instance does not take up the whole physical machine. Since a c1.xlarge instance is 20 ECUs (8 cores at 2.5 ECU each), we can reason that an E5410 processor is at least 10 ECU. Thus an E5430 would have roughly 11.4 ECU since its clock frequency is a little higher than that of an E5410 (2.66GHz vs. 2.33GHz). Since an m1.xlarge instance has only 8 ECU (4 cores at 2 ECU each), there is room for at least 3 more m1.small instances. This is an example where knowing the physical hardware configuration helps us reason about the CPU power allocated. In addition to reasoning by hardware configuration, we also observe large steal cycles on an m1.xlarge instance, which ranges from 0% to 75% with an average of about 30%.

A m1.xlarge instance achieves PassMark-CPU Mark score of 3,627.8. We cannot find other single-socket E5430 systems in PassMark’s database, but the score is less than half of what a c1.xlarge instance is able to achieve. This again confirms a large steal cycle.

In conclusion, we believe that a c1.xlarge and an m2.4xlarge instances occupy their own physical hardware. For people that are security conscious, they should choose those instances to avoid co-location hacking. In addition, an Intel Xeon X5550 has 13 ECU, an Intel Xeon E5430 has about 11 ECU, and an Intel Xeon E5410 has 10 ECU, where an ECU is roughly equivalent to a PassMark-CPU Mark score of 400. Using this information, you can perform economical comparison between cloud and your favorite alternative approach.

How to choose a load balancer for the cloud

If you are hosting a scalable application (e.g., a web application) in the cloud, you will have to choose a load balancing solution so that you can spread your workload across many cloud machines. Even though there are dedicated solutions out there already, how to choose one is still far from obvious. You will have to evaluate a potential solution from both the cost and performance perspectives. We illustrate these considerations with two examples.

First, let us take Amazon’s Elastic Load Balancing (ELB) offering, and evaluate its cost implications. Let us assume you have an application that sends/receives 25Mbps of traffic on average. It will cost you $0.008/GB * 25Mbps * 3600 sec/hour = $0.09/hour, already more than the cost of a small EC2 Linux instance in N. Virginia. The cost makes it unsuitable for most applications. If your application does not have a lot of traffic, ELB makes sense economically. But for that small amount of traffic (< 25Mbps), you most likely do not need a load balancer. We have run performance studies based on the SpecWeb benchmark — a suite of benchmarks designed to simulate realistic web applications. Even for the most computation intensive benchmark in the suite (the banking benchmark), a small EC2 instance can handle 60Mbps of traffic. A slightly larger c1.xlarge instance is able to process 310Mbps. This means that even if you application is 10 times more CPU intensive per unit of traffic, you can still comfortably host it on a c1.xlarge instance. If you application has a larger amount of traffic (> 25Mbps), it is more economical to roll you own load balancer. In our test, a small EC2 instance is able to forward 400Mbps traffic even for a chatty application with a lot of small user sessions. Based on the current pricing scheme, ELB only makes sense if your application is very CPU intensive, or if the expected traffic fluctuates widely. You can refer to our benchmarking results (CloudCom paper section 2) and calculate the tradeoff based on your own application’s profile.

Second, we have to look at the performance a load balancing solution can deliver. You cannot simply assume a solution would deliver the performance requirement until you test it out. For example, Google App Engine (GAE) promises unlimited scalability, where you can simply drop your web application and Google handles the automatic scaling. Alternatively, you can run a load balancer in Google App Engine and load balance an unlimited amount of traffic. Even though it sounds promising on paper, our test shows that it cannot support more than 100 simultaneous SpecWeb sessions (< 5Mbps) due to its burst quota. To put this into perspective, we are able to run tests that support 1,000 simultaneous sessions even on a small Amazon EC2 instances. We worked with the GAE team for a while trying to resolve the limitation, but we were never able to get it working. Others have noticed its performance limitation as well . Note that this happened between Feb. and Apr. of 2009, so its limit may have improved since then.

The two examples illustrate that you have to do your homework to understand both the cost and performance implications. You have to understand your application’s profile and conduct performance studies for each potential solution. Although setting up performance testing is time consuming, fortunately, we have done some leg work already for the common solutions. You can leverage our performance report (section 2 of our CloudCom paper). We have set up a fully automated performance testing harness, so if you have a scenario not covered, we will be happy to help you test it out.

The two examples also illustrate that you cannot rely on a cloud provider’s solution. In many cases, you still need to roll your own load balancing solution, for example, by running a software load balancer inside a cloud VM. The existing software load balancers differ in design, and hence their performance characteristics. In the following, we discuss some tradeoffs in choosing the right software load balancer.

A load balancer can either forward traffic at layer 4 or at layer 7. In layer 4 (TCP layer), the load balancer only sees packets. It inspects the packet header of each packet and then decides where to forward it. The load balancer does not need to terminate TCP sessions with the users and originate TCP sessions with the backend web servers; therefore, it can be implemented efficiently. Note that not all layer 4 load balancers would work in the Amazon cloud. Amazon disallows source IP spoofing, so if a load balancer just forwards the incoming packet as it is (i.e., keeping the source IP address intact), the packet would be dropped by Amazon because the source IP does not match the load balancer’s IP address. In layer 7 (application layer), a load balancer has to terminate a TCP connection, receive the HTTP content, and then relays the content to the web servers. For each incoming TCP session from a user, the load balancer not only has to open a socket to terminate the incoming TCP session, it also has to generate a new TCP session to one of the web servers to relay the content. Because of the extra states, layer 7 load balancer is more inefficient. This is especially bad if SSL is enabled because the load balancer has to terminate the incoming SSL connection, and possibly generate a new SSL connection to the web servers, which is a very CPU-intensive operation.

Now the general theories are behind us, let us look at some free load balancers out them and tell you a little about their performance tradeoffs.

HaProxy

HaProxy could operate at both layer 4 or layer 7 mode. However, if you want session persistency (same user always load balanced to the same backend server), you have to operate it at layer 7. This is because HaProxy uses cookies to remember the session persistency, and to manipulate cookies, you have to operate at layer 7. Using cookies alleviates the need to keep local session states. Probably due to this reason (at least partly), HaProxy performs really well in our test. It has almost the same efficiency as other layer 4 load balancers for non-SSL traffic.

One drawback of HaProxy is that it does not support SSL termination. Therefore, you have to run a front end (e.g., an Apache web server) to terminate the SSL first. If the front end is hosted on the same server, it would impact how much traffic could be load balanced. In fact, SSL termination and origination (to the backend web servers) could significantly drain the CPU capacity. If it is hosted on a different server, the traffic between the SSL terminator and the load balancer is in the clear, making it easy for evedropping.

Nginx

Nginx operates at layer 7. It could run either as a web server or as a load balancer. In our performance test, we see Nginx consumes roughly twice the CPU cycle as other layer 4 load balancers. The overhead is much greater when SSL termination is enabled.

Unlike HaProxy, Nginx natively supports SSL termination. Unfortunately, the backend traffic from the load balancer to the web servers is in the clear. Depending on how much evedropping you believe that could happen in a cloud’s internal network, it may or may not be acceptable to you.

Rock Load Balancer

Rock load balancer operates at layer 4. Among the three load balancers we have evaluated, it has the highest performance. In particular, it seems that it can forward SSL traffic without terminating and re-originating connections. This saves a lot of CPU cycles for SSL traffic. Unfortunately, Rock Load Balancer still has an open bug where it could not effectively utilize all cores in a multi-core machine. Thus, it is not suitable for very high-bandwidth (>400Mbps) web applications which require multi-core CPUs in the load balancer.

I have quickly summarized the key pros and cons of the software load balancers we have evaluated. I hope it is useful to you in helping you decide which load balancer to choose. If you have a good estimate of what is your application profile, please feel free to ping me and we would be happy to help.