ACID semantics require a lot of effort and communication. This is worth it when the answer needs to be correct. But, in many cases, this just isn't the case. In many real-world situations we can tolerate some staleness and inconsistency.
In class, I gave the example of the "quantity available" an online store displays for an item via its online catalog and shopping cart. If you are buying one or two items, does it matter if they tell you 450 or 452 are available? In this case, a slightly slate value doesn't matter, right? When does it matter -- when you check out. If they take your money, you'd better have your goods. Have you ever went to check out and gotten an error, "Ooops. Your items sold!". Did it really just sell, that very moment? Or, had it been sold the whole time? You don't know. What you do know is that it doesn't matter -- you can't have it. This is a case where most lookups can be a little stale -- and, only at the very end, do we need the "correct" answer.
The idea that we can trade off correctness for time, effort, and availability is a good one, as is the observation that favoring complete correctness and consistency at any cost, may be an unnecssary extreme -- depending upon the application.
BASE (Like the Opposite of ACID, Get It?)
You'll occasionally hear or read of the acronym BASE. This acronym captures one way of thinking about "good enough":
- Basically available means that small failures don't generate large disabilities. It is the same idea as what we call "soft failure" vs "hard failure", but with the added emphasis that a few failures in a large scale system shouldn't really be noticable.
- Soft state is usually intended to convey state that can be generated or refreshed upon demand, rather than necessarily being stored as "hard state". But, in this case, it is being used to convey that values, even after written, will continue to change without any explicit user request. Specifically, they'll propagate out slowly.
- Eventual consistency conveys the idea that, although the system might be inconsistent for some time after an update, it will eventually converge to consistency. Without this property, or an approximation thereof, what good would the system be?
BASE is often contrasted with ACID. The idea being that traditional ACID semantics are very pessimistic and do a lot of work, assuming that any inconsistency would be noticed and result in disaster. BASE, by contrast, is vry optimistic and assumes that the inconsistencies are unlikely to result in disaster before they are eventually fixed.
The idea that we can trade off correctness for time, effort, and availability is a good one, as is the observation that favoring complete correctness and consistency at any cost, may be an unnecssary extreme -- depending upon the application....even if the acronym is, well, a little bit of a stretch.
Agreement In Light of Failure Isn't Easy: Two Classic Problems
Today we discuss two classic problems, the Two Armies Problem and the Byzantine Generals Problem. These problems illustrate communications failures and processor failures, respectively. They help us to understand a little more about the limits of our abilities, in these cases to manage processor or communication failure.
The Two Army Problem
Consider two waring armies, the Red Army, and the Blue Army. The Blue Blue Army is camping in a mountain valley. The Red Army, while larger and more powerful, is divided into two groups hiding in the surrounding mountains.
If the two Red Army platoons attack the Blue Army together and at exactly the right time, they will prevail. The longer they wait, the more surpised the Blue Army will be by the attack. But if they wait too long, they will run out of supplies, grow weak, and starve. The timing is critical. But if they are not coordinated, they will surely lose. They must attack at exactly the same time.
When the time is right R1 will send a messenger to R2 that says, "Attack at dawn!" But R1 may become concerned that R2 did not get the message and consequently not attack. If that happens, R1 will be defeated. Alternately, R2 may become concerned that R1 will beocme concerned, so they may not attack, leaving R1 to be defeated in solitude.
So what if they agree that the recipient of the message, R2, will return an ACK to the sender, R1? This is just one level of indirection. The R2 may become concerned that the ACK was lost and not attack. Or R1 may become concerned that R2 became concerned and did not attack. This problem can't be solved by ACK-ACK, or even ACK-ACK-ACK-ACK -- more ACKs just add more levels of indirection, but the same problem remains.
Another issue might be fake messages. What if the Blue Army sent an imposter to deliver a message to R2 telling them to attack too early. They would be defeated if they followed it. But if they did not obey messages for fear that they were fraudulant, R1 would be defeated when they did attack, even after advising R2. This fear might also prompt an army from acting upon a perfectly valid message.
The moral of this story is that there is no solution to this probem if the communications medium is unreliable. Please note that I said medium, not protocol. This is an important distinction. A reliable protocol above an unreliable medium can guarantee that a message will eventually be sent, provided of course that the recipient eventually becomes ready and accessible. But no protocol can guarantee that a message will be delivered within a finite amount of time -- error conditions may persist for long and indeterminate amounts of time.
The Byzantine Generals Problem
One day, many moons ago, the Turkish Sultan led an army to invade the Byzantine Empire, Byzantium. The Emperor had several smaller armies to defend the Empire. The leaders of these individual armies needed to be carefully coordinated in order to defend the city against the Turks. They needed to receive frequent updates about the strength of the other armies in order to act properly.
They were aware of the communications problems that led to the defeat of the Red Army, so they used careful encryption and error correction to exchange messages. But they had a new problem -- traitors. The Byzantines suspected that one of their generals was a traitor who would lie about the strength of his army in order to undermine their defense.
In order to combat this problem and detect the traitor, they established a protocol for exchanging messages:
- Each general should transmit the strength of his army to every other general using a reliable messenger.
- Once each general has collected this information, he should send it to every other general. Once this happens, each general knows what every other general thinks about the strength of each army.
- Each general should then determine the strength of each army by considering each general's report of that army's strength as a vote in favor of an army size. If a majority of the generals agree about the size of a particular army, that size should be believed. Otherwise, the size of the army should be considered unknown and the general of the unknown sized army should be suspected of treachery.
Note: It is important that only the reports from the other generals as part of step 2 be counted. It duplicates information if the information from step 1 is counted.
How Many Traitors can the Byzantines Tolerate?
It is a theoretical finding that if there are T traitors, there must be (2T + 1) loyal generals for the loyal generals to determine the sizes of the loyal armies and to identify the traitors.
We won't provie this property in this class. The intuition behind it is this. Normally, we'd expect to find that we need (T+1) loyal generals to outvote the T traitors. But, upon a more careful look, this isn't actually correct. Take a careful look at the vectors. Notice that, for each disloyal general, we've got two different types of corruption -- the entire vector provided by that general as well as that general's entry in each and every other vector.
As a result, we need to outvote both these T broken vectors -- and the T broken entries within the other vectors: (2T + 1).
Does This Always Work?
Well, of course not. This wouldn't be Distributed Systems if we actually gave you a comprehensive solution to a problem -- these are hard problems, afterall.
If the disloyal general tells the same lie to all of the generals, they will each agree to the wrong value. For this reason, based on this pedagogical story, undetectible errors (faulty hardware, not faulty communications) are known as Byzantine Errors by computer scientists.
So, what would it take?
If each general is forced to sign his message, and all messages are repeated to all generals -- including the signatures, this problem can be solved -- only one loyal general is required. (If they are all disloyal, who is checking, anyway?) In that case, it would be known which general had signed the inconsistent messages -- solving for the consistent lie would still require a scouting mission. (Additionally, it is required that the authenticity of the signatures can be readily verified, as can forgery).
What is a digital signature? Think back to prior courses and discussions about PGP, and other public-private key algorithms. A signature is the private key that si used to encrypt the message. The sender is authenticated if the right private key decrypts the message.
What's the Moral of the Story?
Failures can be expensive to detect -- sometimes impossible. This means that distributed agreement may not be expensive -- or impossible. Sometimes we have to pay the price, sometimes we don't. The successful design and implementation of distributed systems depending on knowing what we can do, what we can do, and what we should do.
The CAP/Brewer Conjecture
It is commonly desirable for distributed systems to exhbit Consistency, Availability, and Partition tolerance.
- By consistency we mean that all participating systems share the same view of the data. For example, if one system observes the value five, all systems would, if they looked, observe the value five at that time. None, for example, would be more stale or more fresh than others.
- By Availability we mean that the system is able to respond quickly enough for the user's needs. For example, if a Web page times out, or users abort before seeing the results, it is not available.
- By Partition tolerance we mean that, in the event of the failure or isolation of some participants, the other participants can continue to do whatever they can. For example, the loss of certain nodes might necessitate disconnecting certain clients or the inability to return certain results -- but should not unduely interfere with the ability of the functioning nodes to service clients and/or return results.
The CAP Conjecture, attributable to Eric Brewer of UC-B, is that we can build systems that guarantee up to two of these properties -- but not necessarily all three. Let's consider why.
Imagine a system with too much of a queue. We can fix that by adding more systems and dividing the work as seen below:
The picture above is nice, because we have availability. And, in the event of a partitioning, the reachable nodes can still respond, so we have partition tolerance. The problem, though, is that we've lost consistency. Each host is operating independently, so the values can diverge if updates differ. This is the "AP"/"PA" case.
To add back consistency, we'll need to have communications among the hosts such that they can sync values:
But, notice what has happened. We gained consistency through communication. If we break that communication, we're back where we started. So, we now have consistency and availability, but not also partition tolerance. This is the "CA"/"AC" case.
What about the "PC"/"CP" case? How can we have consistency and partition tolerance without availability, at least in any meaningful way? One ansewer, which is I think a good example, is that we enable reads, but not updates. Now we have sacraficed the avaialability of writes in order to ensure maintaining consistency in light of a partitioning.
Note: You'll often see what I've called the "CAP Conjecture" described as the "CAP Theorem". I do not believe this to be correct. The conjecture, is, I think, important because it allows us to frame the trade-offs we're likely to see in real world systems. Attempts to "prove" the conjecture into a theorem have introduced constraints and formalisms that, I think, separate it from the generally applicable cases. As it turns out, there is a natural tension at work here: One can only prove something that is defined, but fuzzy edges makes things more applicable and gives them a broader reach.
Today's discussion, and the Two Army Problem, in particular, follow that of Tanenbaum very closely. When I first prepared this lecture, the citation was as follows: