Eventual consistency, its manifestation in Amazon Cloud and how to design your applications around it
June 23, 2009 2 Comments
A cloud is such a large-scale system that its implementation requires tradeoffs in its design goals. The Amazon cloud has adopted a weaker consistency model called eventual consistency. It has implications on how we can architect and design applications to overcome its limitations. This post talks about the problems this weaker consistency model exposes and how we designed our applications around it.
Eric Brewer, professor at UC Berkeley and founder of Inktomi, conjectured that a large system such as a cloud can only achieve two out of three properties: data consistency, system availability and tolerance to network partition. Being a public infrastructure, a cloud cannot sacrifice on system availability. Due to its scale and performance requirements, a cloud has to rely on a distributed implementation. Thus, it cannot sacrifice on its tolerance on network partitioning either. Therefore, a cloud has to trade off against data consistency and that is exactly what Amazon Cloud has done.
Eventual consistency states that when you write to and then read from a cloud immediately, the return result may not be exactly what you are expecting. However, the system guarantees that, “eventually” it will return what you expect it to return, but the time it takes is undeterministic.
In the Amazon implementation, the effects of eventual consistency is only noticeable on two fronts, both are in the SQS (Simple Queue Service) implementation.
- When two clients are asking SQS to create the same queue at the same time, both could be notified that the queue has been created successfully. Normally if you create the same queue twice at different times, the second attempt will get an error message stating that the queue exists already. This could be a problem if your application relies on the fact that a queue can only be created once. In one of the applications we developed, we have to completely re-architect in order to get around the problem. Since the solution is application specific, I would not elaborate on how we solved the problem.
- When a client reads from a SQS queue, especially when the queue is getting close to being empty, the queue may respond to say that it is empty even though there are still messages in it. This could be a problem for your application if your application thinks the queue is empty when it is not. Amazon documentation says, in most cases, you only need to wait for “a few seconds”, but it is unclear how many are “a few” since it depends on many factors, such as SQS load, data distribution and degree of replication etc.
We have devised a solution for the second problem. The solution is more widely applicable; hence, we describe it here in the hope that it is helpful to someone.
The idea is simple. A message producer could count how many messages it has generated for a particular queue so that the consumer for that queue would know how many messages to expect. In the solution we used for an application, we used Amazon SimpleDB to host this count data. Since we have many producers for each queue, we require each producer to write a separate row in SimpleDB, and all counts for a queue fall under the same column. A sample SimpleDB table is shown below:
queue1 queue2 queue3 ….
producer1 c_11 c_12 c_13
producer2 c_21 c_22 c_23
When the consumers start to process (may need a separate status to know all producers have stopped already), they can run a query against SimpleDB, such as
select queue1 from domain_name
The query will return a list of items. The consumers just need to iterate through the list and sum up all numbers under the same column (e.g., queue1). The sum is what the consumers should check. If the number of messages dequeued is less than the sum, the consumers should continue polling SQS. This solution avoids setting an arbitrary wait time, which either results in idle processing, or results in missed messages.
Hope this solution is helpful. Let me know if there are other symptoms of eventual consistency that I have missed.