Eventual consistency, its manifestation in Amazon Cloud and how to design your applications around it

A cloud is such a large-scale system that its implementation requires tradeoffs in its design goals. The Amazon cloud has adopted a weaker consistency model called eventual consistency. It has implications on how we can architect and design applications to overcome its limitations. This post talks about the problems this weaker consistency model exposes and how we designed our applications around it.

Eric Brewer, professor at UC Berkeley and founder of Inktomi, conjectured that a large system such as a cloud can only achieve two out of three properties: data consistency, system availability and tolerance to network partition.  Being a public infrastructure, a cloud cannot sacrifice on system availability. Due to its scale and performance requirements, a cloud has to rely on a distributed implementation. Thus, it cannot sacrifice on its tolerance on network partitioning either. Therefore, a cloud has to trade off against data consistency and that is exactly what Amazon Cloud has done.

Eventual consistency states that when you write to and then read from a cloud immediately, the return result may not be exactly what you are expecting. However, the system guarantees that, “eventually” it will return what you expect it to return, but the time it takes is undeterministic.

In the Amazon implementation, the effects of eventual consistency is only noticeable on two fronts, both are in the SQS (Simple Queue Service) implementation.

  • When two clients are asking SQS to create the same queue at the same time, both could be notified that the queue has been created successfully. Normally if you create the same queue twice at different times, the second attempt will get an error message stating that the queue exists already. This could be a problem if your application relies on the fact that a queue can only be created once. In one of the applications we developed, we have to completely re-architect in order to get around the problem. Since the solution is application specific, I would not elaborate on how we solved the problem.
  • When a client reads from a SQS queue, especially when the queue is getting close to being empty, the queue may respond to say that it is empty even though there are still messages in it. This could be a problem for your application if your application thinks the queue is empty when it is not. Amazon documentation says, in most cases, you only need to wait for “a few seconds”, but it is unclear how many are “a few” since it depends on many factors, such as SQS load, data distribution and degree of replication etc.

We have devised a solution for the second problem. The solution is more widely applicable; hence, we describe it here in the hope that it is helpful to someone.

The idea is simple. A message producer could count how many messages it has generated for a particular queue so that the consumer for that queue would know how many messages to expect. In the solution we used for an application, we used Amazon SimpleDB to host this count data. Since we have many producers for each queue, we require each producer to write a separate row in SimpleDB, and all counts for a queue fall under the same column. A sample SimpleDB table is shown below:

queue1    queue2    queue3   ….
producer1    c_11      c_12          c_13
producer2    c_21      c_22         c_23

When the consumers start to process (may need a separate status to know all producers have stopped already), they can run a query against SimpleDB, such as

select queue1 from domain_name

The query will return a list of items. The consumers just need to iterate through the list and sum up all numbers under the same column (e.g., queue1). The sum is what the consumers should check. If the number of messages dequeued is less than the sum, the consumers should continue polling SQS. This solution avoids setting an arbitrary wait time, which either results in idle processing, or results in missed messages.

Hope this solution is helpful. Let me know if there are other symptoms of eventual consistency that I have missed.

Cloud is time sharing?

I was talking to Jeanne Harris — the author of the book “Competing on Analytics” — last year. I was trying to explain what is Cloud Computing and what are the various benefits. It took a while for her to peel through the marketing hype, until finally she said: “I got it, it is time sharing!”.

For those not familiar, time sharing is a concept introduced in 1957 and it is the prominent model of computing in the 1970s. Computers, such as main frames, were expensive back then. To share the expensive machine more efficiently, programmers use remote terminals whose accesses are multiplexed to a single computer.

Since the advent of PCs in the 80’s, computers get cheaper over time and the prominent model of computing has shifted to happen more at the client side. The main driver is bandwidth and latency. Can you imagine running an interactive and UI-rich application over the remote terminal on a mainframe? It is a good tradeoff in exchange for a lowered machine utilization because the machine is cheap anyway.

Things are changing again over the last decade. With the advance in search, social networking, business intelligence etc., we are all of a sudden inundated with a lot of data to analyze. The problem we try to solve and hence the computation capacity needs get dramatically bigger. Economics is again in favor of a time-sharing model where many people share the expensive Cloud. Because of its scale, only a handful of companies, such as Amazon, Google, Yahoo, and Microsoft can afford to build such a Cloud.

History does come around, does not it?