Amazon EC2 grows 62% in 2 years

I estimated Amazon data center size about two years ago using a unique probing technique that I came up with. Since then, I have been tracking their growth (US East data center monthly, but less frequently for all data centers). Now is the time to give you all an update.

Physical server

I will not cover the technique again here, since you can refer to the original post. But I want to stress that this is measuring the number of physical server racks in their data centers, hence deducing the number of physical servers. There are other approaches, such as Netcraft that measures the web facing virtual servers. However, Netcraft only measures the number of virtual servers (and only a subset of it, those that are web facing), where a virtual server could be a tiny Micro instance, a very small slice of a physical server. If you want to know how big EC2 is physically, this is the definitive research.

The following figure shows the growth of the US East data center.

useastgrowthtrend

Number of server racks in EC2 US East data center

The growth in US East data center slowed down in late 2012 and 2013, but the growth has picked up quite a bit recently. It only added 1,362 racks between Mar. 12, 2012 and Dec. 29th, 2013, whereas, it has been adding on average 1,000 racks per year between 2007 and 2013. Then, all of a sudden, it adds 431 racks in the last month and half. However, other EC2 data centers have enjoyed tremendous growth in the two years period. The following table shows how many racks I can observe today, and at the end of last year vs. two years ago by each data center.

data center # of server racks on 3/12/2012 # of server racks on 12/29/2013 % growth 3/12/2012 to 12/29/2013 # of server racks on 2/18/2014 % growth 3/12/2012 to 2/18/2014
US East (Virginia) 5,030 6,382 26.9% 6,813 35.4%
US West (Oregon) 41 619 1410% 904 2205%
US West (N. California) 630 847 34.4% 950 50.8%
EU West (Ireland) 814 1,340 64.6% 1,556 191.2%
AP Northeast (Japan) 314 589 87.6% 719 229%
AP Southeast (Singapore) 246 371 50.8% 432 75.6%
SA East (Sao Paulo) 25 83 232% 122 488%
Total 7,100 10,231 44.1% 11,496 61.9%

There are a few observations:

1. The overall growth rate shows no sign of slowing down. From Jan. 2007 to Mar. 2012, EC2 grows from almost 0 server to 7,100 racks of servers, roughly 1,420 racks per year. From Mar. 2012 to Feb. 2014, EC2 grows from 7,100 racks to 11,496 racks, which is 2,198 racks per year.

2. Most of the growth is not from the US East data center. The Oregon data center grows the most at 2205%, followed by Sao Paulo at 488%.

3. There is a huge spike within the last 1.5 months. The number of racks increased from 10,231 to 11,496, adding 1,265 racks of servers.

The overall growth in the last two years is 62%, which is quite impressive. However, others have estimated that AWS revenue have been growing at a faster rate of more than 50% per year. The discrepancy could be due to the fact that AWS revenue includes many other AWS services including some new ones they have introduced in recent years, and EC2 is just a smaller component of it.

Virtual server growth

Another way to look at EC2’s growth is to look at how many virtual servers are running. Since a customer is paying for a virtual server, looking at the virtual server trend is also a good predictor of EC2 revenue.

As part of our probing technique, we enumerate all virtual servers, regardless whether it hosts a web server or not. If a virtual server is running, the EC2 DNS server will have an entry translating its external IP address to its internal IP address. By counting the number of DNS entries, we arrive at an upper bound of the number of virtual servers running (it is an upper bound because when a virtual server is terminated, the DNS entry is not deleted right away).

The following figure shows the number of running virtual servers (active DNS entries) in the US East Data center in orange. AWS also publishes the number of IP addresses that are available periodically, and we have been tracking that over time. The blue points shows how many IP addresses that are available to assign to virtual servers. AWS has been constantly adding more IP address allocation ahead of the expected growth.

AWS number of running virtual servers

EC2 number of running virtual servers

The green dots show the total available IP addresses across all data center. It is an upper bound on the maximum number of virtual servers EC2 can run. On Dec. 29th, 2013, our data shows there are up to 2.97 Million virtual machines that are active. You can put in an assumption of the average price AWS charges for an instance to roughly estimate EC2 revenue.

Density

From our data, we can also derive the density — the average number of virtual servers running on a physical server. On Mar. 12, 2012, there are 120 virtual servers running on each server rack. However, on Dec. 29th, 2013, this density has increased to 245 virtual servers per rack. Either the Micro instance is gaining popularity, or AWS has been doing a better job of consolidating their load to increase the profit margin.

Parting comment

I have not been blogging much in the last two years. You may be wondering what I have been doing. Well, I have been working on a startup, today we finally come out of stealth mode, and we are officially launching at the Launch Festival. It is an iPhone app, called Jamo, that brings dance games from Wii and Xbox to the iPhone. If this research has been helpful to you, please help me by downloading the App, and give us a 5* rating. You can read more about the App in a previous post.

EC2 spot pricing coming to Cluster Compute and Cluster GPU instances

AWS EC2 introduced the Cluster Compute and Cluster GPU instances few months back. Those instances are very good for high-throughput HPC applications, unfortunately, they have not been available in the spot market. I think that is about to change, the Cluster Compute and Cluster GPU instances should be available through the spot market shortly.

Over the weekend (Feb. 6, 2011), I have noticed that both cc1.4xlarge and cg1.4xlarge instances are showing up on AWS’s console when you query the price history. But that went away quickly the next day. I am guessing they were testing the features. Starting today (Feb. 8, 2011), you can query AWS API or use the AWS command line tool to see the price history for cc1.4xlarge instances. Furthermore, you can start bidding for cc1.4xlarge already (it was not possible over the weekend because the “current price” is not set for those instances).

Although there is no official announcement yet, I think it is just a matter of time. Have fun crunching through your HPC applications for cheap.

 

 

Why cloud has not taken off in the enterprises?

Given all the hypes around cloud computing, it is surprising that there is only scant evidence that cloud has taken off in the enterprise. Sure, there is salesforce.com, but I am more referring to the infrastructure cloud, such as Amazon, or even Microsoft Azure and Google App Engine. Why have not it taken off? I have heard various theories, and I also have some of my own. I list them in the following. Please feel free to chime in if I miss anything.

  • I am using it but I do not want you to know. Cloud is different from previous technologies that it is not a top-down adoption (e.g., mandated by the CIO), but rather a bottom-up adoption where engineers are fed up with the CIO and want a way around. I have heard various companies using Amazon extensively, particularly in life sciences and financial sector, where there is a greater demand for more computation power. In one case, I have heard a hedge fund company who rents 3,000 EC2 servers every night to crunch through the numbers. Even though they use cloud, they do not want to generate unnecessary attention to invite questions from upper management and CIOs, because cloud is still not an approved technology and putting any data outside of the firewall is still questionable.
  • Security. Even if the CIO wants to use the cloud, security is always a major hurdle to get around. I have been involved in such a situation. Before using cloud, it has to be officially approved by the cyber security team; however, the cyber security team has every incentive to disapprove it because moving data outside the firewall exposes them to additional risks that they have to be responsible for. In the case I was involved in, the cyber security team came up with a long list of requirements that, in the end, we found that some of their internal applications do not even meet. Needlessly to say that the project I was involved in was not a go, even with a strong push from the project team.
  • Redemptification. CIOs have to cover themselves too. Most cloud vendors’, including Amazon’s, license term disclaims all liabilities should lose or failure happens. This is very different from the traditional hosting model, where the hosting provider claims responsibility in writing. CIOs have to be able to balance the risk. A redemptification clause in the contract is like an insurance policy, and few CIOs want to take the responsibility for something they do not control.
  • Small portion of cost. Infrastructure cost is only a small portion of the IT cost, especially for capital rich companies. I heard from one company that their infrastructure cost is < 20% of the budget, and the bulk of the budget is spent on application development and maintenance. For them, finding a way to reduce application cost is the key. For startups, cloud makes a lot of sense. However, for enterprises, cloud may not be the top priority. To make matters worse, porting applications to the cloud tends to require the application to be re-architected and re-written, causing the application development cost to go even higher.
  • Management. CIOs need to be in control. Having every employee pulling out their credit card to provision for compute resources is not ok. Who is going to control the budget? Who will ensure data is secured properly? Who can reclaim the data when the employee leaves the company? Existing cloud management tools simply do not meet enterprises’ governance requirements.

Knowing the reasons is only half of the battle. At Accenture Labs, we are working on solutions, often in partnership with cloud vendors, to address cloud shortcomings. I am confident that, in a few years, the barrier to adoption in enterprises would be much lower.

Amazon cloud has an “infinite” capacity?

One of the value propositions of a cloud is that it has an “infinite” capacity, but how big is “infinite”? It was recently estimated that Amazon may have 40,000 servers. Since each physical server can run 8 m1.small instances, Amazon could potentially support 320,000 m1.small instances at the same time. Although that is a lot of capacity, the real question is: how much capacity is there when you need it? Recently, as part of the scalability test we did for Cloud MapReduce, we had some first-hand experience on how big Amazon EC2 is.

We performed many tests with 100 or 200 m1.small instances, both during the day and night. There are no difference that we can observe. All servers launched successfully. One interesting observation is that, there are no prorated usage for EC2. You are always charged for the hour at the hourly granularity. In the past, I have heard that, starting from the second hour, you are charged on a prorated basis, but it appears that I am charged $10 more when I turn off 100 instances just minutes past the hour mark.

We run a couple of tests with 500 m1.small instances. In both cases, we launched all 500 in the same web services call, i.e., specifying both the upper and lower limits as 500. The first time was run on a Saturday from 9-10pm. Of the 500 requested, only 370 were successfully launched. The other 130 terminated right after launch showing “Internal Error” as the reason for termination. The second time was run on a Sunday from 9-10am. Of the 500 requested, 461 were successfully launched, the other showed “Internal Error” again. We do not know why there is such a big failure rate, but as we learned later, we are strongly advised against launching more than 100 servers at a time. One interesting note is that, even though we specified 500 servers to launch, we are only charged for the servers that successfully launched (i.e., $37 and $46.1 /hour respectively).

We also run a couple of tests with 1,000 m1.small instances. Before running these tests, we have to ask Amazon to raise our instance limit. One thing we were advises is that we should launch in 100 instances increment, because it is not desirable to take up a lot of head room available in a data center in one shot. Spreading out the request allows them to balance the load more evenly. The first test was run on a Wed. from 10-11am, the second test was run on a Thurs. night from 10-11pm. Even though we were launching in 100 increments, all servers ended up in the same reliability zone (us-east-1d). So it appears that there is at least a 1,000 servers head room in a reliability zone.

Unfortunately, we cannot afford to run a larger scale test. For the month, we incurred $1140 AWS charges, a record for us.

In summary, for those of you requiring few than 1,000 servers, Amazon does have an “infinite” capacity. For those of you requiring more, there is a high chance that they can accommodate if you spread your load across reliability zones (e.g., 1,000 instances from each zone). Test it and report back!