How to choose a load balancer for the cloud

If you are hosting a scalable application (e.g., a web application) in the cloud, you will have to choose a load balancing solution so that you can spread your workload across many cloud machines. Even though there are dedicated solutions out there already, how to choose one is still far from obvious. You will have to evaluate a potential solution from both the cost and performance perspectives. We illustrate these considerations with two examples.

First, let us take Amazon’s Elastic Load Balancing (ELB) offering, and evaluate its cost implications. Let us assume you have an application that sends/receives 25Mbps of traffic on average. It will cost you $0.008/GB * 25Mbps * 3600 sec/hour = $0.09/hour, already more than the cost of a small EC2 Linux instance in N. Virginia. The cost makes it unsuitable for most applications. If your application does not have a lot of traffic, ELB makes sense economically. But for that small amount of traffic (< 25Mbps), you most likely do not need a load balancer. We have run performance studies based on the SpecWeb benchmark — a suite of benchmarks designed to simulate realistic web applications. Even for the most computation intensive benchmark in the suite (the banking benchmark), a small EC2 instance can handle 60Mbps of traffic. A slightly larger c1.xlarge instance is able to process 310Mbps. This means that even if you application is 10 times more CPU intensive per unit of traffic, you can still comfortably host it on a c1.xlarge instance. If you application has a larger amount of traffic (> 25Mbps), it is more economical to roll you own load balancer. In our test, a small EC2 instance is able to forward 400Mbps traffic even for a chatty application with a lot of small user sessions. Based on the current pricing scheme, ELB only makes sense if your application is very CPU intensive, or if the expected traffic fluctuates widely. You can refer to our benchmarking results (CloudCom paper section 2) and calculate the tradeoff based on your own application’s profile.

Second, we have to look at the performance a load balancing solution can deliver. You cannot simply assume a solution would deliver the performance requirement until you test it out. For example, Google App Engine (GAE) promises unlimited scalability, where you can simply drop your web application and Google handles the automatic scaling. Alternatively, you can run a load balancer in Google App Engine and load balance an unlimited amount of traffic. Even though it sounds promising on paper, our test shows that it cannot support more than 100 simultaneous SpecWeb sessions (< 5Mbps) due to its burst quota. To put this into perspective, we are able to run tests that support 1,000 simultaneous sessions even on a small Amazon EC2 instances. We worked with the GAE team for a while trying to resolve the limitation, but we were never able to get it working. Others have noticed its performance limitation as well . Note that this happened between Feb. and Apr. of 2009, so its limit may have improved since then.

The two examples illustrate that you have to do your homework to understand both the cost and performance implications. You have to understand your application’s profile and conduct performance studies for each potential solution. Although setting up performance testing is time consuming, fortunately, we have done some leg work already for the common solutions. You can leverage our performance report (section 2 of our CloudCom paper). We have set up a fully automated performance testing harness, so if you have a scenario not covered, we will be happy to help you test it out.

The two examples also illustrate that you cannot rely on a cloud provider’s solution. In many cases, you still need to roll your own load balancing solution, for example, by running a software load balancer inside a cloud VM. The existing software load balancers differ in design, and hence their performance characteristics. In the following, we discuss some tradeoffs in choosing the right software load balancer.

A load balancer can either forward traffic at layer 4 or at layer 7. In layer 4 (TCP layer), the load balancer only sees packets. It inspects the packet header of each packet and then decides where to forward it. The load balancer does not need to terminate TCP sessions with the users and originate TCP sessions with the backend web servers; therefore, it can be implemented efficiently. Note that not all layer 4 load balancers would work in the Amazon cloud. Amazon disallows source IP spoofing, so if a load balancer just forwards the incoming packet as it is (i.e., keeping the source IP address intact), the packet would be dropped by Amazon because the source IP does not match the load balancer’s IP address. In layer 7 (application layer), a load balancer has to terminate a TCP connection, receive the HTTP content, and then relays the content to the web servers. For each incoming TCP session from a user, the load balancer not only has to open a socket to terminate the incoming TCP session, it also has to generate a new TCP session to one of the web servers to relay the content. Because of the extra states, layer 7 load balancer is more inefficient. This is especially bad if SSL is enabled because the load balancer has to terminate the incoming SSL connection, and possibly generate a new SSL connection to the web servers, which is a very CPU-intensive operation.

Now the general theories are behind us, let us look at some free load balancers out them and tell you a little about their performance tradeoffs.

HaProxy

HaProxy could operate at both layer 4 or layer 7 mode. However, if you want session persistency (same user always load balanced to the same backend server), you have to operate it at layer 7. This is because HaProxy uses cookies to remember the session persistency, and to manipulate cookies, you have to operate at layer 7. Using cookies alleviates the need to keep local session states. Probably due to this reason (at least partly), HaProxy performs really well in our test. It has almost the same efficiency as other layer 4 load balancers for non-SSL traffic.

One drawback of HaProxy is that it does not support SSL termination. Therefore, you have to run a front end (e.g., an Apache web server) to terminate the SSL first. If the front end is hosted on the same server, it would impact how much traffic could be load balanced. In fact, SSL termination and origination (to the backend web servers) could significantly drain the CPU capacity. If it is hosted on a different server, the traffic between the SSL terminator and the load balancer is in the clear, making it easy for evedropping.

Nginx

Nginx operates at layer 7. It could run either as a web server or as a load balancer. In our performance test, we see Nginx consumes roughly twice the CPU cycle as other layer 4 load balancers. The overhead is much greater when SSL termination is enabled.

Unlike HaProxy, Nginx natively supports SSL termination. Unfortunately, the backend traffic from the load balancer to the web servers is in the clear. Depending on how much evedropping you believe that could happen in a cloud’s internal network, it may or may not be acceptable to you.

Rock Load Balancer

Rock load balancer operates at layer 4. Among the three load balancers we have evaluated, it has the highest performance. In particular, it seems that it can forward SSL traffic without terminating and re-originating connections. This saves a lot of CPU cycles for SSL traffic. Unfortunately, Rock Load Balancer still has an open bug where it could not effectively utilize all cores in a multi-core machine. Thus, it is not suitable for very high-bandwidth (>400Mbps) web applications which require multi-core CPUs in the load balancer.

I have quickly summarized the key pros and cons of the software load balancers we have evaluated. I hope it is useful to you in helping you decide which load balancer to choose. If you have a good estimate of what is your application profile, please feel free to ping me and we would be happy to help.

Advertisements