Ddia-Transactions
Transactions
Important refresher on single node transactions before moving into distributed zone. In increasing order of guarantees:
Read committed
This is the default in Postgres and many others. It simply guarantees that there are no dirty writes and no dirty reads. To handle concurrent reads in the face of a concurrent write, the implementation can be as simple as keeping a copy of an old record throughout a transaction that is writing to that record; any concurrent reads will just read the old value until the writing transaction has committed.
But, one read-only transaction might query some data X and later some correlated data Y; if a separate transaction commits a write in between those two reads, the data Y might be inconsistent with X, and yet it would not be a "dirty" read since both X and Y were clean reads, off committed transactions.
Snapshot isolation (Repeatable read)
And so we motivate the next stricter guarantee, which essentially assumes that all reads see data in "snapshots", i.e. once a read transaction starts, it will only see data that was committed at the time of its start. The philosophy here is that
- readers don't block writers and
- writers don't block readers
To implement this, instead of only maintaining two copies of data (old, new), the database maintains many versions, indexed by transaction IDs: this is known as MVCC: multi-version concurrency control.
Preventing lost updates
What about concurrent writes? There are a few approcahes:
- Atomic write operations: the best option when feasible is to just use atomic db writes, e.g.
SET value = value + 1
. Effort should be made to do this where possible, for example diving into specific json functions offered by Postgres. Of course, sometimes it is just infeasible and we need to read some value out of the db first, and then write it. - Explicit locking e.g.
SELECT * FROM table WHERE id = '1' FOR UPDATE
and then, later,UPDATE table
andCOMMIT
. - Automatic detection of lost updates: the two approaches above force sequential read-modify-write cycles. We could also just let them happen in parallel, and rollback if a lost update is detected. In fact PostgreSQL's repeatable read does automatically detect and abort.
- Compare-and-set: is often found in databases without transactions. E.g.
UPDATE table SET col = new_value WHERE id = '1' and col = old_value
to ensure the update only happens if the data hasn't been written to since read. However, this check means very little if snapshot isolation is in place! Test before using.
Replication futher complicates things. A common approach in this setting is to simply let conflicts happen, but have application code or bespoke data structures to resolve them as necessary. Atomic operations can work well in this context, especially if they are commutative. Unfortunately LWW (Last Write Wins) is the default in many replicated databases, and very prone to lost updates.
Write Skew
Consider a database representing on call doctors; for any given shift there must be at least one doctor on call. Suppose there are three,
|name|on_call|
|----|-------|
|alice| true|
|bob|true|
|carol|true|
but both Alice and Bob feel ill, and go to update their shifts simultaneously. Each queries the current COUNT(*) WHERE on_call
, sees that there is 3, and so proceeds to UPDATE
their value to false. They both see a count of 3 because of snapshot isolation. And, this is not a dirty write because the two transactions are updating different records. Also, clearly it's not a lost update since both updates go through.
This anomaly is reffered to as write skew and is a generalization of a dirty write. Namely, it's a different outcome than if the two transactions ran serially, and was caused by two transactions reading the same objects and then updating some of them. (When they update the same object, it's a dirty write or lost update).
The methods above are not strict enough:
- Atomic operations don't help since multiple objects are involved.
- Automatic detection requires serializable isolation.
- Possibly: check constraints in Postgres
- Best bet: explicit lock
FOR UPDATE
for the transaction. - Safest bet: use serializable isolation level.
Serializable Isolation
The strongest guarantee: the result after concurrent transactions must be the same as if they happened serially, one after the other. Most databases use the following three techniques to achieve this.
Actual Serial Execution
Use a single thread and handle transactions one after the other. This only happened late in the game, once RAM became cheap and OLTP was recognized to have mostly short, small read/write transactions. However, this approach often requires transactions in the form of stored procedures, written in esoteric per-database languages that are awkward in today's development environment.
Two-Phase Locking (2PL)
This was the most widely used algorithm for serializability for three decades.
Instead of snapshot isolation's philosphy that readers don't block writers and writers don't block readers, we go stricter: readers do block writers and vice versa. Further, to avoid phantoms and problems like the doctors on call above, more than just per-object locks are needed: predicate locks are issued for rows matching an arbitrary condition (and potentially rows that don't even exist at the time of locking). Or, index-range locks are used, which offer a more granular approach, and may lock more rows than necessary, but the locking has much less overhead, usually leading to increased performance.
This isn't really widely used anymore because of its performance impact:
- The restrictions reduce concurrency opportunities and there is a lot of locking overhead.
- Deadlocks can happen more often with 2PL than with Read Committed.
- In particular, high percentile latencies can be unpredictably long when there are many contenders for transactions.
Serializable Snapshot Isolation (SSI)
First described in 2008, SSI offers much better performance, and as of Postgres 9.1 is the default algorithm for serializable isolation. The main difference is that SSI is an optimistic concurrency technique; it operates like snapshot isolation, but then runs an algorithm before commits to determine which transactions to abort to maintain serializability.