Amazon’s physical hardware and EC2 compute unit

June 14, 2010 14 Comments

Ever wonder what hardware is running behind Amazon’s EC2? Why would you even care? Well, there are at least a couple of reasons.

Side-by-side comparisons. Amazon express their machine power in terms of EC2 compute units (ECU) and other cloud providers just express it in terms of number of cores. Either case, it is vague and you cannot perform economical comparison between different cloud offerings, and with owning-your-own-hardware approach. Knowing how much a EC2 computing unit is in terms of hardware raw power allows you to perform apple-to-apple comparison.
Physical isolation. In many enterprise clients’ mind, security is the number one concern. Even though hypervisor isolation is robust, they feel more comfortable if there is a physical separation, i.e., they do not want their VM to sit on the same physical hardware right next to a hacker’s VM. Knowing the hardware computing power and the largest VM’s computing power, one can determine whether there is enough room left to host a hacker’s VM.

The observation below is only based on what we see in the N. Virginia data center, and the underlying hardware may very well be very different in other data centers (i.e., Ireland, N. California and Singapore). If you are curious, feel free to use the methodology that we will describe to see what is going on in other data centers.

Our observation is based on a combination of “hints” from several different tools and methods, including the following:

CPUID

The “cpuid” instruction is supported by all x86 CPU manufacturers, and it is designed to report the capabilities of the CPU. This instruction is non-trapping, meaning that you can execute it in user mode without triggering protection trap. In the Xen paravirtualized hypervisor (what Amazon uses), it means that the hypervisor would not be able to intercept the instruction, and change the result that it returns. Therefore, the output from “cpuid” is the real output from the physical CPU.

We look at several fields in the “cpuid” output. First and foremost, we look at the branding string, which identifies the CPU’s model number. We also look at “local APIC physical ID” in (1/ebx). The “APIC ID” is unique for a physical core. By enumerating all “APIC ID”s, we know how many physical cores are there. Lastly, we look at “Logical CPU cores” in (0x80000008/ecx). It is supposed to show how many hyper-thread cores are on a physical core.

Intel processor specifications

With the model numbers reported by “cpuid”, we could look up their data sheet to determine the exact specification of a processor, including how many cores per socket, how many sockets per system, and its cache size etc.

/sys/devices/system/cpu/cpu?/cache/index?/shared_cpu_map

This is a file in the Linux file system. It lists the cache hierarchy including which cores share a particular cache. This is used as a validation to match against the cache hierarchy specified in the CPU data sheet. However, this is not used to reach any conclusion, as we have seen it reporting wrong information in some cases.

Performance benchmark

We use PassMark-CPU Mark — a performance benchmark — to compare the CPU performance with other systems with the same CPU configuration. A matching performance number would confirm our observation.

System statistics

A variety of tools, such as “mpstat” and “top”, can report on the system’s performance statistics, including the CPU and memory usage. In particular, on a Xen-hypervisor, a VM can get the steal cycle statistics — time that is stolen from the VM to run other things, including other VMs. The documentation states that steal cycle counts the amount of time that your VM is ready to run but could not due to others competing for the CPU. Thus, if you keep your VM busy, you will see all the CPU cycle stolen from you. For example, on an m1.small VM, you will see the steal cycle to be roughly 60% and you can keep your CPU busy at most up to 40%. This is a hard cap Amazon puts on to limit you to one EC2 compute unit.

Now that the methodology is clear, we can dive into the observations. Amazon infrastructure runs on three set of distinct hardware.

High-memory instances

The high-memory instances (m2.xlarge, m2.2xlarge, m2.4xlarge) run on systems with dual-socket Intel Xeon X5550 (Nahelem) 2.66GHz processors. Intel Xeon X5550 processor has 4 cores, and each core is capable of hyper-threading, i.e., there could be 8 cores from the software’s point of view. However, Amazon disable hyper-threading, because “cpuid” 0x80000008/ecx reports that there is only one logical core. Further, the APIC IDs are 1, 3, 5, 7, 17, 19, 21, 23. The missing IDs (9, 11, 13, 15) are probably reserved for the hyper-threading cores and they are not used. The m2.4xlarge instance occupies the whole physical hardware. An m2.4xlarge instance’s Passmark-CPU mark is 10,052.6, on par with other dual-socket X5550 systems (average is 10,853). Furthermore, we never observe steal cycle beyond 1 or 2%.

High-CPU instances

The high-CPU instances (c1.medium, c1.xlarge) run on systems with dual-socket Intel Xeon E5410 2.33GHz processors. It is dual-socket because we see APIC IDs 0 to 7, and E5410 only has 4 cores. A c1.xlarge instance almost takes up the whole physical machine. However, we frequently observe steal cycle on a c1.xlarge instance ranging from 0% to 25% with an average of about 10%. The amount of steal cycle is not enough to host another smaller VM, i.e., a c1.medium. Maybe those steal cycles are used to run Amazon’s software firewall (security group). On Passmark-CPU mark, a c1.xlarge machine achieves 7,962.6, actually higher than an average dual-sock E5410 system is able to achieve (average is 6,903).

Standard instances

The standard instances (m1.small, m1.large, m1.xlarge) run on systems with a single socket Intel Xeon E5430 4 core 2.66GHz processor. A m1.small instance may occasionally run on a system consisting of an AMD Dual-Core Opteron 2218 HE Processor, but that system is rare to find (<10%), so we would not focus on it here. The Xeon E5430 platform is single socket because we only see APIC IDs 0,1,2,3.

By simple deduction, we can reason that an m1.xlarge instance does not take up the whole physical machine. Since a c1.xlarge instance is 20 ECUs (8 cores at 2.5 ECU each), we can reason that an E5410 processor is at least 10 ECU. Thus an E5430 would have roughly 11.4 ECU since its clock frequency is a little higher than that of an E5410 (2.66GHz vs. 2.33GHz). Since an m1.xlarge instance has only 8 ECU (4 cores at 2 ECU each), there is room for at least 3 more m1.small instances. This is an example where knowing the physical hardware configuration helps us reason about the CPU power allocated. In addition to reasoning by hardware configuration, we also observe large steal cycles on an m1.xlarge instance, which ranges from 0% to 75% with an average of about 30%.

A m1.xlarge instance achieves PassMark-CPU Mark score of 3,627.8. We cannot find other single-socket E5430 systems in PassMark’s database, but the score is less than half of what a c1.xlarge instance is able to achieve. This again confirms a large steal cycle.

In conclusion, we believe that a c1.xlarge and an m2.4xlarge instances occupy their own physical hardware. For people that are security conscious, they should choose those instances to avoid co-location hacking. In addition, an Intel Xeon X5550 has 13 ECU, an Intel Xeon E5430 has about 11 ECU, and an Intel Xeon E5410 has 10 ECU, where an ECU is roughly equivalent to a PassMark-CPU Mark score of 400. Using this information, you can perform economical comparison between cloud and your favorite alternative approach.

Filed under Cloud, technology Tagged with Amazon EC2, cloud hardware, ECU

14 Responses to Amazon’s physical hardware and EC2 compute unit

Shlomo Swidler says:

June 14, 2010 at 8:18 pm

A comparison of the ECU vs. other cloud providers’ CPU performance is here:
http://blog.cloudharmony.com/2010/05/what-is-ecu-cpu-benchmarking-in-cloud.html

Reply
ophirk says:

June 16, 2010 at 11:47 am

Very interesting read and technological analysys.
Have you considered running vMARK from VMWARE ?
Would be interesting to see the PassMark memory results as well.

Reply
- huanliu says:
  
  June 16, 2010 at 2:57 pm
  
  The reason I used PassMark is that it provides an online database of other comparable systems’ PassMark results. vMARK is interesting, but I have no access to comparable systems myself, so it is hard to compare. I could not find an online database of PassMark memory results, but I do find one for hard disk. It will be on my todo list for a future I/O performance benchmark post.
  
  Reply
Gautam says:

August 27, 2010 at 2:02 am

We’re trying to run a proprietary vendor program that does its liscencing by “CPU sockets”. So, if this software support 2 CPU sockets and each socket had say 4 cores, then the program could run on 2*4=8 cores. Unfortunatley, on a c1.medium instance, it’s reading (I think) the “2 cores” as 2 separate CPU sockets, instead of a single CPU with 2 cores. (I haven’t experimented yet with c1.xlarge). So, is there any way of getting the Amazon linux OS (specifically centos) to “show” to any program running on it that’s its truly just using 1 physical socket?

Reply
- huanliu says:
  
  August 27, 2010 at 4:52 am
  
  I am guessing the software is running “cpuid” to determine the number of sockets. Unfortunately, “cpuid” does not trap in the Intel architecture, so the software is reading the real physical hardware information. Since there are really two sockets in the physical hardware, your software determines that you have two sockets, even though you only have access to 2 cores. Since the high-memory instances also run on a dual-socket hardware, your only bet is to use the standard instances. An extra-large standard instance has more horsepower than a c1.medium, so hopefully it is good enough for you.
  
  Alternatively, you can try a VCloud Express provider. VMWare hypervisor uses a different technique (code re-write) to virtualize, which can trap these non-privileged instructions so that it can overwrite the raw result returned by the hardware. You may get lucky that your software would see one socket even though the hardware is two sockets. But there is no guarantee, as it really depends on the algorithm your software uses.
  
  Reply
Pingback: Numerati
Pingback: Amazon Web Services offers new Micro instances
Pingback: Amazon EC2 Micro instances deeper dive | Huan Liu's Blog
Pingback: Amazon’s cloud offers cheaper, smaller instances - technospeak
Mark says:

October 7, 2012 at 6:51 pm

Amazon states that 1 ECU is on average equal to 1.1 GHz. The “High-CPU” server has 8 cores running at 2.33 GHz, but the “High-CPU Extra Large Instance” has 8 cores with 2.5 ECU. The “High-Memory Quadruple Extra Large Instance” has 8 cores with 3.25 ECU and is occupying the whole system which has 8 cores running at 2.66 GHz. So then 1 ECU is less then 1 GHz?

I must be missing something? An explanation on this would help me a lot as I’m trying to put together a model for different cloud providers.

Reply
Pingback: Quora
aerobics, aerobic and aerobic physical exercise says:

May 21, 2015 at 12:09 pm

ml;m,

Reply
Beniamin Dziurdza says:

February 26, 2021 at 11:41 am

Are there more recent reports?

Reply
Pingback: what server hardware does the amazon cloud use? - Boot Panic