The cost of eventual consistency

Amazon recently announced a set of new features (consistent read, conditional put and delete). You can read Werner’s and Jeff Barr’s post to learn more about the details of the features. On the surface, it seems like just another set of incremental features, but it is actually quite significant from a system design perspective. The CAP theorem states that a system can only attain two out of three properties: Consistency, Availability and Partition tolerance. In the past, Amazon has been preaching the eventual consistency model, where they give up consistency (C in CAP) in favor of availability (A in CAP ) and partition tolerance  (P in CAP). The new features are actually on a different design point than all their offerings in the past, i.e., giving up availability in favor of consistency and partition tolerance.

Since the features are at a totally different design point, they may have a very different performance characteristic. Werner, in his post, states that Consistent Read has “Higher read latency” and “Lower read throughput”. To see how different, we run a set of experiments in EC2. The results: there are no differences, both in terms of latency and throughput, that we could perceive. For details on our experiments, read on.


Single thread

We run several experiments to understand the latency differences. First, we run GetAttribute (both with and without consistency read) and PutAttribute (both with and without conditional put) on a single item and a single attribute sequentially in a single thread for 1,000 times, and then measure the average latency. For conditional PutAttribute, we first read the previous value and use it as a condition, then randomly generate a new value for the put request. The time reported for PutAttribute is only for the PutAttribute API call, which does not include the time to read the previous value. The average latency is:

GetAttribute                                                22ms
GetAttribute with ConsistentRead     23ms
PutAttribute                                                110ms
PutAttribute Conditional                       118ms

The time difference between the two versions of GetAttribute and PutAttribute are small enough that they are essentially the same. Over many runs of the experiment, we have seen many cases where a consistent read GetAttribute or a conditional PutAttribute takes few milli-seconds longer.

Multiple threads

Accessing a single attribute sequentially, as in the last experiment, is far from stressing SimpleDB. We also run a set of experiments to see what happens when there are multiple parallel GetAttribute and PutAttribute. We use multiple threads on a m1.small EC2 instance and we vary the number of threads to increase the level of parallelism.

For PutAttribute, we run 10,000 requests and measure the overall time it takes for all threads to finish all requests. The results are shown in the following figure. There are 4 curves. The “Single item, PutAttribute” curve is writing a single attribute to a single item repeatedly. The “Single item, PutAttribute Conditional” curve is similar except that it invokes the conditional PutAttribute, where the condition is checking a fictitious attribute to be non-existent. We could not use a condition based on the attribute’s previous value, because with multiple threads, the PutAttribute call would almost always fail because other threads would have written over the previous value between the GetAttribute and PutAttribute call. The two “1000 items” curves are similar to their “Single item” cousin, except that we randomly choose 1 out of 1000 items to write to for each request. The goal is to spread out the workload.

SimpleDB PutAttribute consistent performance test

There are two observations from the figure. First, the conditional version is almost identical to the normal eventual consistent version. The small differences between the two versions can be all attributed to the normal variation in accessing SimpleDB. Second, the single item PutAttribute takes longer to process, presumably because the load on the single item is much higher when all threads are accessing the same item at the same time. However, as we increase the parallelism, the longer latency could be effectively hidden by the multiple threads. At 100 threads, the “Single item” and “1000 items” results are almost identical.

If we increase the number of threads beyond 100, the overhead of thread scheduling starts to dominate. Even though not shown, we actually see the overall processing time increases as we increase the number of threads further.

For GetAttribute, we also run 10,000 requests and measure the overall time. However, to avoid any caching effect, we have 5 threads running in the background constantly writing new values to the items. The following figure shows the results for GetAttributes.

SimpleDB GetAttribute consistency performance test

From the figure, we can arrive at the two similar observations as the PutAttribute case. First, the ConsistentRead version is almost identical to the normal eventual consistent read. Again, the differences can all be attributed to the normal SimpleDB fluctuation. Second, GetAttribute on a single item takes longer because of the higher stress. Again, the latency could be effectively hidden with more threads.


There is not an easier way to measure the sustainable throughput than jamming the system with a large number of requests, and that is exactly what we did. We used Cloud MapReduce to write the test cases, so that we can easily scale up to a large number of nodes. SimpleDB has a much smaller write throughput than a read throughput, so to jam SimpleDB, we run 100 threads of GetAttribute and 100 threads of PutAttribute simultaneously on each map task, where each map task runs on a separate m1.small EC2 instance. With 4 m1.small EC2 instances running in parallel, we can consistently overload SimpleDB with the following error message.

Response Status Code: 503
<Response><Errors><Error><Code>ServiceUnavailable</Code><Message>Service AmazonSimpleDB is currently unavailable. Please try again later</Message></Error></Errors><RequestID>613df115-aee4-f86f-7a22-9a94f9b1633c</RequestID></Response>

With 2 or 3 EC2 instances, we occasionally trigger the error message. It seems to depend on the load other people are putting on SimpleDB as well. Regardless, we are not able to perceive a difference between the conditional PutAttribute, ConsistentRead GetAttribute and their normal eventual consistent counterpart.


During the 3 days we performed our test, we are not able to observe a total system failure. There may be single node failures in the Amazon cluster, but we are not able to observe externally. From what we see, the strong consistent Get and Put behave exactly the same as an eventual consistent version, both in terms of latency and throughput. So, according to CAP theorem, did we just get consistency for free without sacrificing availability?

5 Responses to The cost of eventual consistency

  1. Mocky says:

    Your testing shows the same results as my testing. When SimpleDB is healthy, conditional writes and consistent reads have no extra latency or throughput. However availability doesn’t measure well with a short sample size.

    I believe that if exactly one of my domain replicas becomes slow to respond, that normal operations will be unaffected, but that consistent operations will reflect this slowdown. My conclusion is that the latency and throughput penalties will only be observable during the times when SimpleDB experiences internal slowdowns.

    • huanliu says:

      I totally agree with you in theory. During the 3 days, we tried to continuously run our test, hoping to spot any failures behind the scene, but we were not able to observe anything. I would not be surprised that SimpleDB is able to repair failures quickly, so that it is not visible to the clients at all.

  2. Ashley Tate says:

    Nice write-up. According to the folks at Amazon, consistent reads are reliable 99.99% of the time, and only unreliable during certain, rare network failures in their data center. In practice I suppose this means an application depending on consistent reads would fail horribly for a brief time under those conditions, when it would otherwise have continued plugging away with little or no interruption using eventually consistent reads. So, no, CAP is not violated; you still must decide how important that last “9” of uptime is to your business model before choosing to depend all or in part on consistent reads.

    • huanliu says:

      Do you have a reference for that 99.99% reliability? As far as I know, there is no SLA on SimpleDB. Even S3 (the eventual consistent version) only has a 99.9% SLA.

      • Ashley Tate says:

        The 99.99% figure is unofficial–from an email conversation with folks at Amazon in response to my questions about the architecture of consistent reads.

        (Sorry, for the delayed response–I was never notified of your comment.)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: