#88: Eventual Consistency and the CAP Theorem

Have you ever come across a website or mobile app that occasionally takes time to reflect changes?

This maybe due to Eventual Consistency - a side-effect of decisions being made by technologists over the CAP Theorem.

Import decisions that the business must be, but often aren't, aware of.

Or listen at:

Published: Wed, 09 Jun 2021 15:44:53 GMT

Transcript

Hello and welcome back to the Better ROI from Software Development podcast.

In this week's episode, I'm going to look at quite a technical subject, probably more technical than I've looked at in previous episodes. As such, I'd really appreciate any feedback you can give whenever I've hit the nail on the head and explain it well or whether I'm missing something in how I'm covering it.

Again, this is probably a slightly more technical than previous episodes, but I actually think it's important because, while it is a technical concept, how it is applied and the decisions around it have real impact to the business and thus are important for business owners and business leaders to understand when it's being used.

Let's start by talking about how this might feel as customer.

If you're using a website or maybe a mobile application, you make changes, but sometimes it feels if it takes a while to take effect.

For example, maybe using Amazon and you add product only to find doesn't get added to your basket straight away.

Or you've just watched the next episode of your favourite TV series on Netflix. Yet when you return to the home screen it is prompting you to watch it again.

This could be an example of something called eventual consistency.

To illustrate eventual consistency, I'm going to use an example;

Congratulations, you are the proud owner of the world's largest and most fashionable sock empire.

Your unique pattern took the fashion world by storm.

It's desired by celebrities, social media influences, royalty even - everybody wants your unique one of a kind pattern.

And to provide for demand, you run three factories, one in New York, one in Paris and one in Hong Kong.

And you've been riding the wave of success for some time now - so much so that you want to change it up. You want to invent a new pattern to take the world by storm and carry on with the success.

But a bit like the Colonel and his secret special recipe of herbs and spices, you're the only one that really knows how the pattern works. As such, you need to go out and train and set up the factories one by one, taking a week at a time to swap over to the new pattern.

So let's say you've done New York. New York is ready and you're about to start on Paris.

So you take Paris off line for the week that it takes you to update the machinery and train the staff. During which time you receive a massive amount of orders.

To facilitate those orders, you need to be producing socks both from your New York and your Hong Kong factories. This means depending on where the order goes, the customer could receive the new style, because it went to the New York factory, or the old pattern if it went to the Hong Kong factory.

To meet the demand, we need to use both factories. But we can't be confident which pattern it is that the customer would be getting.

And this is a demonstration of eventual consistency.

Before you made the change, all three factories would have been producing the same original pattern.

After you've updated all the factories, they'd all be producing the new pattern.

While you're updating the factories a customer could get either pattern.

Eventual consistency is a side effect of what is known as the CAP Theorem.

Wikipedia describes the CAP Theorem as:

"In theoretical computer science, the CAP theorem states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees:

    * Consistency: Every read receives the most recent write or an error
    * Availability: Every request receives a (non-error) response, without the guarantee that it contains the most recent write
    * Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

When a network partition failure happens should we decide to

    * Cancel the operation and thus decrease the availability but ensure consistency
    * Proceed with the operation and thus provide availability but risk inconsistency"

Don't worry, I'm not expecting you to have understood all of that. We go back to our sock factory example in a moment. The important thing you need to understand about the CAP Theorem is that it's a trade off between options.

Now, you may be familiar with the iron triangle from project management - where you have a trade off between the scope of work, the cost and the time.

You're often told "good, fast, cheap, choose two" - it's about making a trade off between competing characteristics.

And this is exactly the same with the CAP Theorem.

So let's look at the CAP Theorem through the example of our sock factories.

Our multiple factories, which we need to be able to handle the orders is, the "P" part of the cap for the partition tolerance. Having multiple factories allows us to handle if one factory is unavailable.

We need to have that to be able to maintain an ability to deliver orders. Our partition tolerance, or multiple factories in this example, helps us if one factory is unavailable due to staff shortages, electric outage, local dispute, any reason why one individual factory wouldn't work. We want to make sure that we have multiple factories, not just to handle the volume of the work, but also to handle with one of them is unavailable.

The example I gave before of our factories being able to still provide socks from the Hong Kong and the New York office is an example of where we are looking at the "A" of CAP - the availability.

We're making sure that we can still provide orders during the change of the pattern - so we're midway through changing the pattern, but we focussed on making sure that we can still send orders out.

The downside of that is we lose on the consistency - the "C" of the CAP - due to not all of our factories having the same pattern.

If we decided that consistency was more important than availability, that all of our factories produced the same pattern, we would have to hold all orders until all of our factories are ready.

We'd have had to stop. Stop production and sending socks to our customers for the three weeks it takes for us to update - First, the New York, second week into Paris and the third week into Hong Kong.

We wouldn't be producing socks during that period. But once we do start producing socks at the end of it, we know that we will have consistency.

That's where we favour the consistency, the pattern, over the availability of being able to produced socks during that three week period.

And ultimately, that's what we're talking about here.

We're talking about consistency versus availability, and this is why I say it's very much a business decision.

We often put this sort of decision into the hands of technologists without the business understanding or being aware of the consequences.

While my sock factory example is a little bit of a contrived example, it should help you to see that you have an active choice to make between availability and consistency.

And the same is true of most software systems, especially once they reach a certain scale.

And you have to make a decision what's going to be the right thing for your customers and your business when we're dealing at large scale, when we're dealing with more than one server, when we're having to do maybe multiple locations.

What's more important about the data that you, your customer or your partners are seeing?

Is it more important than they have the data available or is it more important that data is consistent?

And it's critical the business understand those trade offs and really understand what is correct for the business. It can be very difficult from a technical perspective to change a system that has been built one way - to be say, for example, highly available to suddenly needing to be strongly consistent.

There's some real technical changes under the hood that would need to be taken into effect.

And more importantly, as a business owner, you need to be thinking about what it means to you, your customers and your partners.

Let's take an example of strong consistency.

This is where you are prioritising the consistency over the availability, so that example of where you want to make sure that all of the sock patterns are the same, rather than being confident that you can produce socks during that three week period.

The most obvious example of where you want strong consistency is a banking application - somewhere where you need to make sure the balance is correct.

What you definitely don't want as a bank is to find that somebody can go to an ATM with a balance of £100, withdraw one hundred pounds and go to an ATM 100 yards down the road and make a second withdrawal and find that it's working on eventual consistency.

Because if there is a delay in updating the balance with the bank, they could potentially be drawing that £100 out from multiple ATMs before it all balances - at which point they've overdrawn on their account.

Thus something like a banking application, it's critical that you're using strong consistency rather than the availability.

And on the face of it, this may seem to be the most desired option every time you think about it.

But remember the hold on the sock orders during that time it took to update the pattern, you weren't able to actually get new socks out the door. You had to wait for all the patterns to be updated across all the factories. You couldn't start sending them out. And that's where you're looking at getting that strong consistency.

Now, with bank balances, you're talking seconds maybe rather than weeks. So that makes it much more practical. But even then, you still sometimes need to prioritise the availability over the consistency.

Take, for example, any website that has a very high read content. So where people are reading more than writing. So take, for example, news articles, blogs, Facebook, Twitter - anything like that you want to provide availability over consistency.

The reader of your articles doesn't expect to be able to go to your news site and have to wait to be able to read the latest article because somebody at the back end is adding it.

They expect to see articles there. They expect to be able to see what is available at the time.

Now, this is where you'd favour availability, because you will show them the latest that is available for them now.

Depending on how your servers are set up - you may have multiple servers - and because it just hasn't reached that server, you show them the latest available, even though there may be something newer that is slowly being rolled out in the background.

And this is why, when I said right at the beginning of this episode, you may see examples of this in Amazon when you update your basket, something takes a while to hit your basket, or Netflix not updating your homepage quickly - because that's not the important bit.

The important bit is giving you something to be able to work with. Something you'd be able to see if you went back to that home page on Netflix. And it was taking its time, for whatever reason, to update your latest food episode, you'd rather see available episodes, TV shows, films that you can look at, even if it's not 100 percent correct, rather than wait maybe the 10, 15, 20 seconds, if there's a problem to actually update that latest record.

Now, when it comes to strong consistency over high availability, there isn't a perfect answer.

There's unlikely to be one except maybe on the banking application.

Both of them are going to be trade-offs. And to be honest, there's probably a sliding scale between strong consistency at one end and high availability at the other - and there's levels of grey in between - and it needs to be thought through carefully.

This should really be a conversation talked through between both the business, who own that relationship with their users, their customers and their partners, and the technical staff.

You can't do it in one silo, it needs to be a joined up conversation.

And I'd certainly advise being cautious that when you go into this conversations, it's quite common to go in with a bias. Depending on what sort of person you are and what sort of attitude you have towards risk, you probably have a strong feeling.

As I've gone through this podcast as whether, "well, strong consistency makes sense. I'd always go for a strong consistency" or "oh, no, I definitely want high availability. I definitely think that's the best option."

Be careful of going in straight away to any conversation, assuming one is going to be the right fit. Like anything, we as humans have our biases and we probably have an immediate appeal to one or the other, but you must be careful looking at each application, each product, that you're building, and be careful to weigh up the options available, which is correct for that situation.

In this episode, I've talked about eventual consistency and why you might see it in the use of websites and mobile applications.

I've tied it back to the CAP Theorem. The theorem that talks about trade-offs between consistency, availability and partition tolerance.

I've talked about how eventual consistency is an example of a side effect that you may see off the back of a system that is operating at high availability.

I've also talked about the other end of the spectrum where you have strongly consistent, such as bank accounts.

And I've talked about that trade off between availability and consistency and how that is an important business decision, often an important business decision that's been left to technologist's to make - because it isn't easy to understand.

And thus, I urge you to sit down and talk with your team and understand whenever you're starting a new project or a new system to make sure you're getting that balance correct.

Thank you for taking the time to listen to this podcast, as I said in the intro, this is a bit more of a technical subject, so I'd really appreciate any feedback you can give me in terms of how well I did and whether there's anything I can do to improve it.

I look forward to speaking to you again next week.