Amazon EC2 grows 62% in 2 years

I estimated Amazon data center size about two years ago using a unique probing technique that I came up with. Since then, I have been tracking their growth (US East data center monthly, but less frequently for all data centers). Now is the time to give you all an update.

Physical server

I will not cover the technique again here, since you can refer to the original post. But I want to stress that this is measuring the number of physical server racks in their data centers, hence deducing the number of physical servers. There are other approaches, such as Netcraft that measures the web facing virtual servers. However, Netcraft only measures the number of virtual servers (and only a subset of it, those that are web facing), where a virtual server could be a tiny Micro instance, a very small slice of a physical server. If you want to know how big EC2 is physically, this is the definitive research.

The following figure shows the growth of the US East data center.

useastgrowthtrend

Number of server racks in EC2 US East data center

The growth in US East data center slowed down in late 2012 and 2013, but the growth has picked up quite a bit recently. It only added 1,362 racks between Mar. 12, 2012 and Dec. 29th, 2013, whereas, it has been adding on average 1,000 racks per year between 2007 and 2013. Then, all of a sudden, it adds 431 racks in the last month and half. However, other EC2 data centers have enjoyed tremendous growth in the two years period. The following table shows how many racks I can observe today, and at the end of last year vs. two years ago by each data center.

data center # of server racks on 3/12/2012 # of server racks on 12/29/2013 % growth 3/12/2012 to 12/29/2013 # of server racks on 2/18/2014 % growth 3/12/2012 to 2/18/2014
US East (Virginia) 5,030 6,382 26.9% 6,813 35.4%
US West (Oregon) 41 619 1410% 904 2205%
US West (N. California) 630 847 34.4% 950 50.8%
EU West (Ireland) 814 1,340 64.6% 1,556 191.2%
AP Northeast (Japan) 314 589 87.6% 719 229%
AP Southeast (Singapore) 246 371 50.8% 432 75.6%
SA East (Sao Paulo) 25 83 232% 122 488%
Total 7,100 10,231 44.1% 11,496 61.9%

There are a few observations:

1. The overall growth rate shows no sign of slowing down. From Jan. 2007 to Mar. 2012, EC2 grows from almost 0 server to 7,100 racks of servers, roughly 1,420 racks per year. From Mar. 2012 to Feb. 2014, EC2 grows from 7,100 racks to 11,496 racks, which is 2,198 racks per year.

2. Most of the growth is not from the US East data center. The Oregon data center grows the most at 2205%, followed by Sao Paulo at 488%.

3. There is a huge spike within the last 1.5 months. The number of racks increased from 10,231 to 11,496, adding 1,265 racks of servers.

The overall growth in the last two years is 62%, which is quite impressive. However, others have estimated that AWS revenue have been growing at a faster rate of more than 50% per year. The discrepancy could be due to the fact that AWS revenue includes many other AWS services including some new ones they have introduced in recent years, and EC2 is just a smaller component of it.

Virtual server growth

Another way to look at EC2’s growth is to look at how many virtual servers are running. Since a customer is paying for a virtual server, looking at the virtual server trend is also a good predictor of EC2 revenue.

As part of our probing technique, we enumerate all virtual servers, regardless whether it hosts a web server or not. If a virtual server is running, the EC2 DNS server will have an entry translating its external IP address to its internal IP address. By counting the number of DNS entries, we arrive at an upper bound of the number of virtual servers running (it is an upper bound because when a virtual server is terminated, the DNS entry is not deleted right away).

The following figure shows the number of running virtual servers (active DNS entries) in the US East Data center in orange. AWS also publishes the number of IP addresses that are available periodically, and we have been tracking that over time. The blue points shows how many IP addresses that are available to assign to virtual servers. AWS has been constantly adding more IP address allocation ahead of the expected growth.

AWS number of running virtual servers

EC2 number of running virtual servers

The green dots show the total available IP addresses across all data center. It is an upper bound on the maximum number of virtual servers EC2 can run. On Dec. 29th, 2013, our data shows there are up to 2.97 Million virtual machines that are active. You can put in an assumption of the average price AWS charges for an instance to roughly estimate EC2 revenue.

Density

From our data, we can also derive the density — the average number of virtual servers running on a physical server. On Mar. 12, 2012, there are 120 virtual servers running on each server rack. However, on Dec. 29th, 2013, this density has increased to 245 virtual servers per rack. Either the Micro instance is gaining popularity, or AWS has been doing a better job of consolidating their load to increase the profit margin.

Parting comment

I have not been blogging much in the last two years. You may be wondering what I have been doing. Well, I have been working on a startup, today we finally come out of stealth mode, and we are officially launching at the Launch Festival. It is an iPhone app, called Jamo, that brings dance games from Wii and Xbox to the iPhone. If this research has been helpful to you, please help me by downloading the App, and give us a 5* rating. You can read more about the App in a previous post.

Advertisements

Amazon data center size

(Edit 3/16/2012: I am surprised that this post is picked up by a lot of media outlets. Given the strong interest, I want to emphasize what is measured and what is derived. The # of server racks in EC2 is what I am directly observing. By assuming 64 physical servers in a rack, I can derive the rough server count. But remember this is an *assumption*. Check the comments below that some think that AWS uses 1U server, others think that AWS is less dense. Obviously, using a different assumption, the estimated server number would be different. For example, if a credible source tells you that AWS uses 36 1U servers in each rack, the number of servers would be 255,600. An additional note: please visit my disclaimer page. This is a personal blog, only represents my personal opinion, not my employer’s.)

Similar to the EC2 CPU utilization rate, another piece of secret information Amazon will never share with you is the size of their data center. But it is really informative if we can get a glimpse, because Amazon is clearly a leader in this space, and their growth rate would be a great indicator of how well the cloud industry is doing.

Although Amazon would never tell you, I have figured out a way to probe for its size. There have been early guesstimates on how big Amazon cloud is, and there are even tricks to figure out how many virtual machines are started in EC2, but this is the first time anyone can estimate the real size of Amazon EC2.

The methodology is fully documented below for those inquisitive minds. If you are one of them, read it through and feel free to point out if there are any flaws in the methodology. But for those of you who just want to know the numbers: Amazon has a pretty impressive infrastructure. The following table shows the number of server racks and physical servers each of Amazon’s data centers has, as of Mar. 12, 2012. The column on server racks is what I directly probed (see the methodology below), and the column on number of servers is derived by assuming there are 64 blade servers in each rack.

data center\size # of server racks # of blade servers
US East (Virginia) 5,030 321,920
US West (Oregon) 41 2,624
US West (N. California) 630 40,320
EU West (Ireland) 814 52,096
AP Northeast (Japan) 314 20,096
AP Southeast (Singapore) 246 15,744
SA East (Sao Paulo) 25 1,600
Total 7,100 454,400

The first key observation is that Amazon now has close to half a million servers, which is quite impressive. The other observation is that the US east data center, being the first data center, is much bigger. What it means is that it is hard to compete with Amazon on scale in the US, but in other regions, the entry barrier is lower. For example, Sao Paulo has only 25 racks of servers.

I also show the growth rate of Amazon’s infrastructure for the past 6 months below. I only collected data for the US east data center because it is the largest, and the most popular data center. The Y axis shows the number of server racks in the US east data center.

EC2 US east data center growth in the number of server racks

Besides their size, the growth rate is also pretty impressive. The US east data center has been adding roughly 110 racks of servers each month. The growth rate looks roughly linear, although recently it is showing signs of slowing down.

Probing methodology

Figuring out EC2′ size is not trivial. Part of the reason is that EC2 provides you with virtual machines and it is difficult to know how many virtual machines are active on a physical host. Thus, even if we can determine how many virtual machines are there, we still cannot figure out the number of physical servers. Instead of focusing on how many servers are there, our methodology probes for the number of server racks out there.

It may sound harder to probe for the number of server racks. Luckily, EC2 uses a regular pattern of IP address assignment, which can be exploited to correlate with server racks. I noticed the pattern by looking at a lot of instances I launched over time and running traceroutes between my instances.  The pattern is as follows:

  • Each EC2 instance is assigned an internal IP address in the form of 10.x.x.x.
  • Each server rack is assigned a 10.x.x.x/22 IP address range, i.e., all virtual machines running on that server rack will have the same 22 bits IP prefix.
  • A 10.x.x.x/22 IP address range has 1024 IP addresses, but the first 256 are reserved for DOM0 virtual machines (system management virtual machine in XEN), and only the last 768 are used for customers’ instances.
  • Within the first 256 addresses, two at address 10.x.x.2 and 10.x.x.3 are reserved for routers on the rack. These two routers are arranged in a load balanced and fault-tolerant configuration to route traffic in and out of the rack. I verified that the uplink capacity from 10.x.x.2 and 10.x.x.3 are roughly 2 Gbps total, further suggesting that they are routers each with a 1Gbps uplink.

Understanding the pattern allows us to deduce how many racks are there. In particular, if we know a virtual machine at a certain internal IP address (e.g., 10.2.13.243), then we know there is a rack using the /22 address range (e.g., a rack at 10.2.12.x/22). If we take this to the extreme where we know the IP address of at least one virtual machine on each rack, then we can see all racks in EC2.

So how can we know the IP addresses of a large number of virtual machines? You can certainly launch a large number of virtual machines and record the internal IP addresses that you get, but that is going to be costly. If you are RightScale, where a large number of instances are launched through your service, you may not be able to take this approach. Another approach is to scan the whole IP address space and watch when an instance responds back to a ping. There are several problems with this approach. First, it may be considered port scanning, which is a violation of AWS’s policy. Second, not all live instances respond to ping, especially with AWS’ security group blocking all ports by default. Lastly, the whole IP address space in 10.x.x.x is huge, which would take a considerable amount of time to scan.

While you may be discouraged at this point, it turns out there is another way. In addition to the internal IP address we talked about, each AWS instance also has an external IP address. Although we cannot scan the external IP addresses either (so as not to violate the port scanning policy), we can leverage DNS translation to figure out the internal IP addresses. If you query DNS for an EC2 instance’s public DNS name from inside EC2, the DNS server will return its internal IP address (if you query it from outside of EC2, you will get the external IP instead). So, all we are left to do is to get a large number of EC2 instances’ public DNS names. Luckily, we can easily derive the list of public DNS names, because EC2 instances’ public DNS names are directly tied to their external IP addresses. An instance at external IP address x.y.z.w (e.g., 50.17.204.150) will have a public DNS name ec2-x-y-z-w…..amazonaws.com (e.g., ec2-50-17-204-150.compute-1.amazonaws.com if in US east data center). To enumerate all public DNS names, we just have to find out all public IP addresses. Again, this is easy to do because EC2 publishs all public IP addresses they use here.

Once we determined the number of server racks, we just multiply it by the number of physical servers on the rack. Unfortunately, we do not know how many physical servers are on each rack, so we have to make assumptions. I assume Amazon has dense racks, each rack has 4 10U chassis, and each chassis holds 16 blades for a total of 64 blades/rack.

Let us recap how we can find all server racks.

  • Enumerate all public IP addresses EC2 uses
  • Translate a public IP address to its public DNS name (e.g., ec2-50-17-204-150.compute-1.amazonaws.com)
  • Run a DNS query inside EC2 to get its internal IP address (e.g., 10.2.13.243).
  • Derive the rack’s IP range from the internal IP address (e.g., 10.2.12.x/22).
  • Count how many unique racks we have seen, then multiple it by the number of physical servers in a rack (I assume it is 64 servers/rack).

Caveat

Even though my methodology could provide insights that are never possible before, it has its shortcomings, which could lead to inaccurate results. The limitations are:

  • The methodology requires an active instance on a rack for the rack to be observed. If the rack has no instances running on it, we cannot count it.
  • We cannot know how many physical servers are in a rack. I assume Amazon has dense racks, each rack has 4 10U chassis, and each chassis holds 16 blades.
  • My methodology cannot tell whether the racks I observe are for EC2 only. It could be possible that other AWS services (such as S3, SQS, SimpleDB) run on virtual servers on the same set of racks. It it also possible that they run on dedicated racks, in which case, AWS is bigger than what I can observe. So, what I am observing is only a lower bound on the size of AWS.

Comparing cloud providers on VM cost

How do you compare two IaaS clouds? Is Amazon EC2’s small standard instance (1 ECU, 1.7GB RAM, 160GB storage) cheaper or is Rackspace cloud’s 256MB server (4 cores, 256MB RAM, 10GB storage) cheaper? It is obviously simpler to compare them if you focus only on one metric. For example, let us assume your application is CPU bound and it does not require much memory at all. Then you should focus solely on the CPU power a cloud VM gives you. We have translated GoGrid, Rackspace, and Terremark‘s VM configurations into their equivalent ECU, so you can simply take a ratio between the cost and the ECU rating and pick the lowest ratio. Unfortunately, real-life applications are never that simple. They demand CPU cycle, memory, as well as hard disk storage capacity. So, how do you compare apple-to-apple?

The methodology

Since no methodology exists yet, we will propose one. Since the comparison results depend highly on the methodology chosen, we first will spell out the methodology we use so that if you have a different one and you come up with a different result, you can trace the source of the difference. If you see areas where we can improve the methodology, please do leave a comment. The methodology works as follows:

  1. We first break down the cost components in Amazon EC2. We assume Amazon has priced their instances using a linear model, i.e., the cost is equal to c * CPU + m * Mem + s * Storage, where c is the unit cost of CPU per ECU per hour, m is the unit cost of memory per GB per hour, and s is the unit cost of storage per GB per hour. Amazon provides several types of instances, each with a different combination of CPU, memory and storage, which is enough of a hint for us to use regression analysis to estimate c, m and s. The details are in our ECU cost breakdown analysis.
  2. Once we have the unit cost in EC2, we can compare it with another cloud provider. We take one VM configuration from a cloud provider at a time, we then compute what Amazon EC2 would charge for an instance with the exact same specification if EC2 were to offer it. This can be easily done by multiplying the EC2 unit costs (c, m, and s) with the amount of CPU, RAM, and storage in the VM, and add them up. Of course, this is hypothetical, because EC2 does not offer an instance with an exact same spec. So even if the EC2 price is lower, you cannot just buy a substitute from Amazon. However, this gives us a good sense of the relative cost.

We have done the analysis with GoGrid, Rackspace, and Terremark.

We can compute a ratio between a cloud VM’s cost with its hypothetical equivalent in EC2. The following lists the top few VMs that have the lowest ratio. If you are curious about the ratio for other VM configurations, feel free to dig into the individual posts on each provider. The ratio listed is assuming that you will get the maximum CPU allowed under bursting, which is frequently the case in those cloud providers. Further, the ratio listed is comparing with EC2 N. Virginia data center. Other EC2 data centers have a higher cost.

Provider RAM (GB) CPU (cores) storage (GB) cost ratio with an equivalent in EC2
Rackspace 0.25 4 10 0.168
Terremark 0.5 8 charged separately at $0.25/month/GB 0.19
Rackspace 0.5 4 20 0.314
Terremark 0.5 4 charged separately at $0.25/month/GB 0.338
Terremark 1 8 charged separately at $0.25/month/GB 0.375
Terremark 1.5 8 charged separately at $0.25/month/GB 0.491

 

How to use this data?

Due to the limitations of this methodology (comparing with a hypothetical equivalent in EC2), it only makes sense if one of the cloud provider you are comparing is Amazon EC2. In other words, do not compare Rackspace with Terremark based on the ratio.

It also makes no sense to use our results if you know the exact specification for your server. In that case, you should find a minimum VM configuration that is just barely bigger than your requirement and compare price.

Our results are useful if your application is flexible. For example, instead of using one m1.small instance in EC2, you could use several Rackspace 256MB VMs to achieve a dramatic cost savings. Examples of a flexible application include a batch application, such as a MapReduce job, which could be chopped down to a finer granularity. Another example could be web servers in a web server farm, where the load balancer can divide up the work to take advantage of whatever computation capacity provisioned on the web server.

Our results are also useful if you want to get a high level overview. Consider an enterprise purchaser who wants to choose a cloud platform. There are many dimensions he has to consider, e.g., features, cost, SLA, contract terms….. Doing a deep analysis at the beginning is just going to be overwhelming. Since Amazon is a big player in cloud, it most likely will be part of the evaluation. Having a ratio would give a ten-thousand-feet view such that the decision maker would know whether an alternative cloud would save him money. Then, as the evaluation progresses, he can dig deeper into a finer comparison.

Caveats:

There are many caveats in using our results that we should spell out.

  • This is only comparing a VM cost, including its CPU, memory and storage. But, it does not include other costs, such as bandwidth transfers. The bandwidth cost varies wildly, for example, GoGrid offers free inbound traffic, which can translate into a significant cost saving.
  • When we compare CPUs, we are only comparing their processing power, not their IO capabilities (both disk and network IO). In Amazon, we sometimes observe degraded IO performance, possibly due to competing VMs on the same host. It is a sad side effect of using popular cloud offerings.
  • As we mentioned, this only applies to fungible applications that can take full advantage of provisioned CPU, memory and storage resources. For example, if you cannot take advantage of the provisioned RAM, it does not matter if it is a good deal. You are wasting the memory, and you may be better off with a VM configuration from a different cloud provider with a smaller provisioned RAM.
  • This is not a substitute for feature comparisons. For example, GoGrid offers free F5 hardware load balancer. If you need a hardware load balancer, you should consider that separately.

Rackspace cost comparison with Amazon EC2

(Earlier posts in this series are: EC2 cost break down, GoGrid & EC2 cost comparison)

We looked at Amazon EC2 and GoGrid cost earlier. Let us examine another IaaS provider — Rackspace cloud. The first step again is to unify on the same unit of measurement on the CPU power. Using the same methodology as we used for EC2’s hardware analysis, we determine that Rackspace runs on a platform with two sockets of Quad-Core AMD Opteron 2374 HE processor. According to PassMark-CPU Mark results, this platform has a CPU mark score of 4642, which is roughly 12 ECU. Rackspace cloud’s FAQ states that “For Linux distributions, each Cloud Server is assigned four virtual cores and the amount of CPU cycles allocated to these cores is weighted based on the size of the Cloud Server.” From talking to Rackspace support, we know that each physical host has 32GB of RAM, and it can host at most 2 16GB (15.5GB to be precise) VMs. Therefore, a 16GB VM would own the complete 4 cores it is allocated, i.e., the 16GB VM has a guaranteed capacity of half of the platform, which is 6 ECU. Since Rackspace states that the CPU is proportionally shared based on the RAM, we can derive the minimum guaranteed CPU based on how many other VMs could fit on the same physical host. The following table lists the minimum CPU and the maximum CPU (assuming full bursting when all other VMs are idle). Again, we are only concerned about Linux VMs, as they do not include license costs, so they more accurately represent the true hardware cost.

RAM (GB) Storage (GB) Min CPU (ECU) Max CPU (ECU) Cost (cents/hour)
0.256 10 0.09375 6 1.5
0.512 20 0.1875 6 3
1 40 0.375 6 6
2 80 0.75 6 12
4 160 1.5 6 24
8 320 3 6 48
16 620 6 6 96

Similar to GoGrid, Rackspace only charges based on the RAM, so it is not possible to determine how it values each component (i.e., CPU, RAM and storage) separately, as we have done for EC2. However, it is possible to project what a similar configuration would cost in EC2 using the unit cost we have derived from the EC2 cost breakdown. The results are shown in the following table where we assume a VM only gets its minimum guaranteed CPU. Each row corresponds to one VM configuration, which is denoted by its RAM size in the first column. We also show the ratio between the Rackspace cost and the projected equivalent EC2 cost.

RAM (GB) Rackspace cost (cents/hour) Equivalent EC2 cost (cents/hour) Rackspace cost/EC2 cost
0.256 1.5 0.8 1.87
0.512 3 1.6 1.87
1 6 3.16 1.9
2 12 6.32 1.9
4 24 12.6 1.9
8 48 25.3 1.9
16 96 50.2 1.91

Since a Rackspace VM can burst if other VMs on the same host are idle, it could potentially grab a much larger share of the CPU. The following table shows the cost comparison assuming that the VM bursts to its fullest extent.

RAM (GB) Rackspace cost (cents/hour) Equivalent EC2 cost (cents/hour) Rackspace cost/EC2 cost
0.256 1.5 8.89 0.17
0.512 3 9.56 0.31
1 6 10.86 0.55
2 12 13.5 0.89
4 24 18.8 1.28
8 48 29.4 1.63
16 96 50.2 1.91

If your VM is only getting the minimum guaranteed CPU, Rackspace is about 1.9 times more expensive than an equivalent in EC2. However, in our experience, we can frequently grab a much larger share of the CPU. Assuming you can grab the full 4 cores, the 256MB, 512MB, 1GB, and 2GB VMs are a great bargain, which are 17%, 31%, 55%, and 89% of the equivalent EC2 cost respectively.

The true cost of an ECU

How do you compare the cost of two cloud or IaaS offerings? Is Amazon EC2’s small instance (1 ECU, 1.7GB RAM, 160GB storage) cheaper or is Rackspace cloud’s 256MB server (4 cores, 256MB RAM, 10GB storage) cheaper? Unfortunately, answering this question is very difficult. One reason is that cloud vendors have been offering virtual machines with different configurations, i.e., different combinations of CPU power, memory and storage, making is difficult to perform an apple-to-apple comparison.

Towards the goal of a better apple-to-apple comparison, I will break down the cost for CPU, memory and storage individually for Amazon EC2 in this post. For those not interested in understanding the methodology, the high level conclusions are as follows. In Amazon’s N. Virginia data center, the unit costs are:

  • 1 ECU costs $0.01369/hour
  • 1 GB of RAM costs $0.0201/hour
  • 1 GB of local storage costs $0.000159/hour
  • A 10GB network interface costs $0.41/hour
  • A GPU costs $0.52/hour

Before we can break down the cost, we have to know what an instance’s (Amazon’s term for a virtual machine) cost consists of. We assume the cost includes solely the cost of its CPU, its memory, and its local storage space. This means that there is no fixed cost component, for example, to account for the hardware chassis, or to account for the static IP address. We make this assumption purely for simplicity. In practice, it makes little difference to the end result even if we assume there is a fixed cost component. We also note that the instance cost does not include the cost for the network bandwidth consumed, which is always charged separately, at least in the cloud providers we looked at.

Let us assume the instance cost is a linear function of the three components, i.e., Cost = c * CPU + m * Mem + s * Storage, where c, m and s are the unit cost of CPU, memory and local storage respectively. It is fortunate that Amazon EC2 offers several types of instances, each type of instance has a different combination of CPU, memory and storage, which offers us a clue of what each component costs. Combining the many types of instances, we can estimate the parameters c, m and s by using a least-square regression analysis. Let us first look at Amazon’s N. Virginia data center. We only use Linux instances’ hourly cost as the instance cost to avoid accounting for an OS’s licensing cost. The results from least-square regression are:

s = 0.0159 cent/GB/hour
m = 2.01 cent/GB/hour
c = 1.369  cent/ECU/hour

The linear model and the estimation actually match the real data really well. The following table shows the instances we used for regression. The last column shows the instance cost as predicted by our estimated parameters, and the second-to-last column shows the real EC2 cost. As you can see, the two costs actually match fairly well, suggesting that a linear model is a good approximation. We should note that we mark the Micro instance to have 0.35 ECU. This is an average of its ECU allocation as we have shown in our Micro instance analysis.

instance CPU(in ECU) RAM(in GB) Storage(in GB) Instance cost per hour (in cents) Fitted instance cost per hour (in cents)
m1.small 1 1.7 160 8.5 7.33
m1.large 4 7.5 850 34 34.07
m1.xlarge 8 15 1,690 68 67.97
t1.micro 0.35 0.613 0 2 1.71
m2.xlarge 6.5 17.1 420 50 49.96
m2.2xlarge 13 34.2 850 100 100.1
m2.4xlarge 26 68.4 1,690 200 200
c1.medium 5 1.7 350 17 15.83
c1.xlarge 20 7 1,690 68 68.32

It should come as no surprise that the memory is actually a significant component of the instance cost. Next time when you compare two cloud offerings, make sure to compare the RAM available.

In the estimation, we did not include EC2 cluster instances and cluster GPU instances, because they are different from other instances (both have a 10GB network interface and one has a GPU). But, now that we have a unit cost for CPU, memory and storage, we can estimate what those extra features cost.

For a cluster instance, combining the cost of CPU (33.5 ECU), memory (23GB), and storage (1690 GB) using our estimated parameters, the cost comes out to be $1.19/hour. Since Amazon charges $1.60/hour, the extra charge must be for the 10GB interface, which is the only feature that is different from other instances. Subtracting the two, the 10GB interface costs $0.41/hour.

For a cluster GPU instance, combining the cost of CPU (33.5 ECU), memory (22GB), and storage (1690 GB), the cost comes out to be $1.17/hour. Since Amazon charges $2.10/hour, the extra charge much be for the 10GB interface and the GPU. Subtracting the two costs and taking out the 10GB interface cost, we know the GPU costs $0.52/hour.

We can perform the same analysis for the other 3 Amazon data centers: N. California, Ireland and Singapore. Luckily, their cost structures are the same, so I only need to present one result. The unit costs are as follows:

s = 0.0169 cent/GB/hour
m = 2.316 cent/GB/hour
c = 1.575 cent/ECU/hour

The actual instance cost and the projected instance cost are as shown in the following table. Again, they agree very well. There are no cluster and cluster GPU instances in other data centers, so no cost for the 10GB interface and the GPU is shown.

instance CPU(in ECU) RAM(in GB) Storage(in GB) Instance cost per hour (in cents) Fitted instance cost per hour (in cents)
m1.small 1 1.7 160 9.5 8.22
m1.large 4 7.5 850 38 38.07
m1.xlarge 8 15 1,690 76 75.97
t1.micro 0.35 0.613 0 2.5 1.97
m2.xlarge 6.5 17.1 420 57 56.96
m2.2xlarge 13 34.2 850 114 114.1
m2.4xlarge 26 68.4 1,690 228 228
c1.medium 5 1.7 350 19 17.74
c1.xlarge 20 7 1,690 76 76.34

Amazon cloud has an “infinite” capacity?

One of the value propositions of a cloud is that it has an “infinite” capacity, but how big is “infinite”? It was recently estimated that Amazon may have 40,000 servers. Since each physical server can run 8 m1.small instances, Amazon could potentially support 320,000 m1.small instances at the same time. Although that is a lot of capacity, the real question is: how much capacity is there when you need it? Recently, as part of the scalability test we did for Cloud MapReduce, we had some first-hand experience on how big Amazon EC2 is.

We performed many tests with 100 or 200 m1.small instances, both during the day and night. There are no difference that we can observe. All servers launched successfully. One interesting observation is that, there are no prorated usage for EC2. You are always charged for the hour at the hourly granularity. In the past, I have heard that, starting from the second hour, you are charged on a prorated basis, but it appears that I am charged $10 more when I turn off 100 instances just minutes past the hour mark.

We run a couple of tests with 500 m1.small instances. In both cases, we launched all 500 in the same web services call, i.e., specifying both the upper and lower limits as 500. The first time was run on a Saturday from 9-10pm. Of the 500 requested, only 370 were successfully launched. The other 130 terminated right after launch showing “Internal Error” as the reason for termination. The second time was run on a Sunday from 9-10am. Of the 500 requested, 461 were successfully launched, the other showed “Internal Error” again. We do not know why there is such a big failure rate, but as we learned later, we are strongly advised against launching more than 100 servers at a time. One interesting note is that, even though we specified 500 servers to launch, we are only charged for the servers that successfully launched (i.e., $37 and $46.1 /hour respectively).

We also run a couple of tests with 1,000 m1.small instances. Before running these tests, we have to ask Amazon to raise our instance limit. One thing we were advises is that we should launch in 100 instances increment, because it is not desirable to take up a lot of head room available in a data center in one shot. Spreading out the request allows them to balance the load more evenly. The first test was run on a Wed. from 10-11am, the second test was run on a Thurs. night from 10-11pm. Even though we were launching in 100 increments, all servers ended up in the same reliability zone (us-east-1d). So it appears that there is at least a 1,000 servers head room in a reliability zone.

Unfortunately, we cannot afford to run a larger scale test. For the month, we incurred $1140 AWS charges, a record for us.

In summary, for those of you requiring few than 1,000 servers, Amazon does have an “infinite” capacity. For those of you requiring more, there is a high chance that they can accommodate if you spread your load across reliability zones (e.g., 1,000 instances from each zone). Test it and report back!