Consistency Models
A Quick Overview on Consistency Models used in Traditional Databases.
“ Consensus is one of the most important and fundamental problems in distributed computing. On the surface, it seems simple: informally, the goal is simply to get several nodes to agree on something.” ― Martin Kleppmann
✅ Definitions
An unit of Data is said to be consistent if all the stakeholders of that unit of data agrees upon it’s state. In a Distributed System, the programmers and users, when adding, updating or retrieving, are to abide by certain rules to achieve a predictable outcome. Simply put, we want to know ahead of the time what would be the effect on all other users subscribed to same information if we perform an action on it. These rules collectively known as a Consistency Model and dictates the guideline for programmers using them. Use cases for such models spans across File Systems, Databases, Web Caching, Messaging etc.
📝 Broad Classification of Consistency Models
At the core, all Consistency Models can be classified into two groups - Strong and Weak Consistency Models.
🛡️ Strong Consistency Model
Strong Consistency models are bound to provide programmers and users with utmost latest state of the system before a WRITE can take place preventing a Stale Write. Now there is variability in how this constraint gets ensured and we will discuss few of them below:
-
Strict Consistency - The Strict Consistency dictates every WRITE operation is propagated to every processing node almost instantly making Stale Writes impossible. It must be noted that this is a theoretical model and can never be implemented as no physical device can transmit messages instantly.
-
Sequential Consistency - This model does not constrain the transmission of information into an instant action, rather it focuses on Isolation and ensures an order of execution of WRITEs among all the processing nodes. This model is implemented with two inherent requirements - Program Order and Write Atomicity that makes it possible to process request in such a manner that it appears as “instantly”. Despite being a Strong Consistent model, only weaker than Strict Consistency, it can produce non-deterministic results as the order of the processing could be arbitrary.
-
Casual Consistency - This is an improvement over Sequential Consistency in order to deter dependent processing and increase overall throughput. Casual Consistency model classifies all operations into two buckets - Casually Related and Others, and applies Sequential Consistency constraints only to those in Casually Related bucket.
🧵 Weak Consistency Model
Weak Consistency Models offload the responsibility of creating state to the programmers who are using that systems. This is done to increase the throughput to an insane level, allowing high-traffic applications to be built on top of it. Some of these consistency models are listed as below:
-
Weak Ordering Consistency - This model is based on the hypothesis that not all operation in a system is critical in nature and only those that are critical in nature requires synchronization. Thus it classifies all operations into two categories - Data Operations and Synchronization Operations. Program order and atomicity is maintained only on synchronization operations and not on all reads and writes.
-
Release Consistency - This model bifurcates WRITE operation into two parts. During entry to the critical section, a lock is “Acquired” and all writes are done on local memory of the node; during exit, the lock is “Released” and writes are propagated to all other nodes.
-
Entry Consistency - Entry Consistency narrows the scope of lock compared to Release Consistency by defining Shared Variables and assigning local variable to them accordingly. This way, only when the acquire is to variable
x
, all operations related tox
need to be completed with respect to that processor. This allows concurrent operations of different critical sections of different shared variables to occur.
⏳ Eventual Consistency Model
Eventual Consistency is a Weak Consistency model in a system with the lack of simultaneous updates. It defines that if no update takes a very long time, all replicas eventually become consistent. In such model goes through a Divergence and a Converged state each time an operation takes place. The state of a node in between two Converged state often termed as Soft State. With multiple soft states in their corresponding nodes, each nodes must share their version of the state (often known as Anti-entropy) and then compute and execute a merge to arrive to the final state (often called Reconciliation).
This model is very effective when the volume of transactions is high but the turn-around time for each operation is very small making the system Highly Available. However, the value READ in between Converged State of a node is non-deterministic and often can be unsafe for usage.
🥽 Strong Eventual Consistency
Whereas eventual consistency is only a Liveness Guarantee (updates will be observed eventually), strong eventual consistency (SEC) adds the Safety Guarantee that any two nodes that have received the same (unordered) set of updates will be in the same state. If, furthermore, the system is monotonic, the application will never suffer rollbacks.
🧪 How it applies to Databases - ACID vs BASE
The term ACID which is an acronym for Atomicity, Consistency, Isolation and Durability, was coined as a concept to understand Database Transactions in Traditional Databases, namely SQL Databases. BASE on the other hand used to describe Transactions in NoSQL Databases as a sequence of Basically Available, Soft State and Eventually Consistent.
Atomicity guarantees failure of a Transaction under Strong Consistency model in case of at least one failure within the block of the transaction. This ensures that the Database never goes into an inconsistent state for each and every situation, including power failures, errors, and crashes. This gets followed up by Isolation guarantee that executes multiple transactions in parallel to produce end state such that it appears as if they have been executed sequentially. And Durability simply means that the resultant state gets backed-up into no-volatile memory to be preserved.
Databases that follows ACID are widely adopted across the industry due the constraints it follows and hence produces only clean state of the application. However, due to the same constraints, the database at scale obtains limits to it’s Availability and Concurrency. To overcome this burden, some of this constraints are loosen up. It introduces Soft States between WRITE updates that can converge at a given time making the system Eventually Consistent. This means at any point in time, READ requests might not be able to provide the state post latest WRITE requests, and WRITE request might well be overwritten or reformed against latest state of the Database.
Most of the modern decentralized databases are a mix of both qualities as they provide both Availability and Liveness Guarantees as well as Safety Guarantee. Hence they are called SALT Databases, i.e., Sequential, Agreed, Ledgered and Tamper-resistant.
Now that we are well versed with Consistency Models and how they are adopted inside Databases, let us take short tour on Web based Editors before we understand how Operational Transformation fits into all these.