March 13, 2012 97 Comments
(Edit 3/16/2012: I am surprised that this post is picked up by a lot of media outlets. Given the strong interest, I want to emphasize what is measured and what is derived. The # of server racks in EC2 is what I am directly observing. By assuming 64 physical servers in a rack, I can derive the rough server count. But remember this is an *assumption*. Check the comments below that some think that AWS uses 1U server, others think that AWS is less dense. Obviously, using a different assumption, the estimated server number would be different. For example, if a credible source tells you that AWS uses 36 1U servers in each rack, the number of servers would be 255,600. An additional note: please visit my disclaimer page. This is a personal blog, only represents my personal opinion, not my employer’s.)
Similar to the EC2 CPU utilization rate, another piece of secret information Amazon will never share with you is the size of their data center. But it is really informative if we can get a glimpse, because Amazon is clearly a leader in this space, and their growth rate would be a great indicator of how well the cloud industry is doing.
Although Amazon would never tell you, I have figured out a way to probe for its size. There have been early guesstimates on how big Amazon cloud is, and there are even tricks to figure out how many virtual machines are started in EC2, but this is the first time anyone can estimate the real size of Amazon EC2.
The methodology is fully documented below for those inquisitive minds. If you are one of them, read it through and feel free to point out if there are any flaws in the methodology. But for those of you who just want to know the numbers: Amazon has a pretty impressive infrastructure. The following table shows the number of server racks and physical servers each of Amazon’s data centers has, as of Mar. 12, 2012. The column on server racks is what I directly probed (see the methodology below), and the column on number of servers is derived by assuming there are 64 blade servers in each rack.
|data center\size||# of server racks||# of blade servers|
|US East (Virginia)||5,030||321,920|
|US West (Oregon)||41||2,624|
|US West (N. California)||630||40,320|
|EU West (Ireland)||814||52,096|
|AP Northeast (Japan)||314||20,096|
|AP Southeast (Singapore)||246||15,744|
|SA East (Sao Paulo)||25||1,600|
The first key observation is that Amazon now has close to half a million servers, which is quite impressive. The other observation is that the US east data center, being the first data center, is much bigger. What it means is that it is hard to compete with Amazon on scale in the US, but in other regions, the entry barrier is lower. For example, Sao Paulo has only 25 racks of servers.
I also show the growth rate of Amazon’s infrastructure for the past 6 months below. I only collected data for the US east data center because it is the largest, and the most popular data center. The Y axis shows the number of server racks in the US east data center.
Besides their size, the growth rate is also pretty impressive. The US east data center has been adding roughly 110 racks of servers each month. The growth rate looks roughly linear, although recently it is showing signs of slowing down.
Figuring out EC2′ size is not trivial. Part of the reason is that EC2 provides you with virtual machines and it is difficult to know how many virtual machines are active on a physical host. Thus, even if we can determine how many virtual machines are there, we still cannot figure out the number of physical servers. Instead of focusing on how many servers are there, our methodology probes for the number of server racks out there.
It may sound harder to probe for the number of server racks. Luckily, EC2 uses a regular pattern of IP address assignment, which can be exploited to correlate with server racks. I noticed the pattern by looking at a lot of instances I launched over time and running traceroutes between my instances. The pattern is as follows:
- Each EC2 instance is assigned an internal IP address in the form of 10.x.x.x.
- Each server rack is assigned a 10.x.x.x/22 IP address range, i.e., all virtual machines running on that server rack will have the same 22 bits IP prefix.
- A 10.x.x.x/22 IP address range has 1024 IP addresses, but the first 256 are reserved for DOM0 virtual machines (system management virtual machine in XEN), and only the last 768 are used for customers’ instances.
- Within the first 256 addresses, two at address 10.x.x.2 and 10.x.x.3 are reserved for routers on the rack. These two routers are arranged in a load balanced and fault-tolerant configuration to route traffic in and out of the rack. I verified that the uplink capacity from 10.x.x.2 and 10.x.x.3 are roughly 2 Gbps total, further suggesting that they are routers each with a 1Gbps uplink.
Understanding the pattern allows us to deduce how many racks are there. In particular, if we know a virtual machine at a certain internal IP address (e.g., 10.2.13.243), then we know there is a rack using the /22 address range (e.g., a rack at 10.2.12.x/22). If we take this to the extreme where we know the IP address of at least one virtual machine on each rack, then we can see all racks in EC2.
So how can we know the IP addresses of a large number of virtual machines? You can certainly launch a large number of virtual machines and record the internal IP addresses that you get, but that is going to be costly. If you are RightScale, where a large number of instances are launched through your service, you may not be able to take this approach. Another approach is to scan the whole IP address space and watch when an instance responds back to a ping. There are several problems with this approach. First, it may be considered port scanning, which is a violation of AWS’s policy. Second, not all live instances respond to ping, especially with AWS’ security group blocking all ports by default. Lastly, the whole IP address space in 10.x.x.x is huge, which would take a considerable amount of time to scan.
While you may be discouraged at this point, it turns out there is another way. In addition to the internal IP address we talked about, each AWS instance also has an external IP address. Although we cannot scan the external IP addresses either (so as not to violate the port scanning policy), we can leverage DNS translation to figure out the internal IP addresses. If you query DNS for an EC2 instance’s public DNS name from inside EC2, the DNS server will return its internal IP address (if you query it from outside of EC2, you will get the external IP instead). So, all we are left to do is to get a large number of EC2 instances’ public DNS names. Luckily, we can easily derive the list of public DNS names, because EC2 instances’ public DNS names are directly tied to their external IP addresses. An instance at external IP address x.y.z.w (e.g., 126.96.36.199) will have a public DNS name ec2-x-y-z-w…..amazonaws.com (e.g., ec2-50-17-204-150.compute-1.amazonaws.com if in US east data center). To enumerate all public DNS names, we just have to find out all public IP addresses. Again, this is easy to do because EC2 publishs all public IP addresses they use here.
Once we determined the number of server racks, we just multiply it by the number of physical servers on the rack. Unfortunately, we do not know how many physical servers are on each rack, so we have to make assumptions. I assume Amazon has dense racks, each rack has 4 10U chassis, and each chassis holds 16 blades for a total of 64 blades/rack.
Let us recap how we can find all server racks.
- Enumerate all public IP addresses EC2 uses
- Translate a public IP address to its public DNS name (e.g., ec2-50-17-204-150.compute-1.amazonaws.com)
- Run a DNS query inside EC2 to get its internal IP address (e.g., 10.2.13.243).
- Derive the rack’s IP range from the internal IP address (e.g., 10.2.12.x/22).
- Count how many unique racks we have seen, then multiple it by the number of physical servers in a rack (I assume it is 64 servers/rack).
Even though my methodology could provide insights that are never possible before, it has its shortcomings, which could lead to inaccurate results. The limitations are:
- The methodology requires an active instance on a rack for the rack to be observed. If the rack has no instances running on it, we cannot count it.
- We cannot know how many physical servers are in a rack. I assume Amazon has dense racks, each rack has 4 10U chassis, and each chassis holds 16 blades.
- My methodology cannot tell whether the racks I observe are for EC2 only. It could be possible that other AWS services (such as S3, SQS, SimpleDB) run on virtual servers on the same set of racks. It it also possible that they run on dedicated racks, in which case, AWS is bigger than what I can observe. So, what I am observing is only a lower bound on the size of AWS.