Distributed Transactions
Concurrency control
Paul Krzyzanowski
March 24, 2021
Goal: Allow multiple transactions to run concurrently but ensure that resource access is controlled to give results that are identical if they ran in some serial sequence. This preserves the “Isolated” guarantee of ACID transaction semantics.
A schedule is a sequence of transactions. A serial schedule is one where transactions are executed sequentially: a new transaction starts when a previous one commits (or aborts). This is inefficient, since we are forcing transactions to run one at a time. However, it ensures isolation (the I in ACID).
The goal of concurrency control is to allow multiple transactions to run concurrently (which is great for performance) while ensuring that data access is controlled such that the net effect is the same as if the transactions all ran in some serial order. That is, we cannot have the net result be one where transactions read an interim state of data from another transaction.
There are two categories of concurrency control: pessimistic and optimistic. Pessimistic concurrency control assumes that there is a high likelihood that two or more transactions will try to access the same resources. A transaction will lock acces to the resources it needs to keep others from modifying them. Optimistic concurrency control assumes that it is not likely that multiple transactions will access the same resource. Optimistic techniques will not lock access to resources. Instead, they will check for conflicts at commit time. If it turns out that multiple transactions did indeed access resources concurrently, one or more of them will have to abort.
Two-phase locking
Exclusive locks, via a lock manager, help us accomplish this. A transaction can grab locks for the resources it needs. That ensures mutual exclusion. If a transaction T1 grabs a lock only for the duration that it needs to access the resource, another transaction, T2 may get a lock on a modified resource whose lock was released and also access another resource that T1 will later lock and modify. That violates isolation and does not yield serially-equivalent execution.
To ensure serializability, it is important that a transaction does not acquire any new locks after it has released any lock on a resource. This technique is known as two-phase locking (2PL). The first phase is the growing phase, during which locks are acquired. The second phase is the shrinking phase, in which locks are released.
Strong strict two-phase locking
The problem with two-phase locking is that, if a transaction that released some locks aborts, there is a chance that other transactions have already used data that was modified by the transaction. In that case, those transactions (and all transactions that depend on them) have to abort as well. This condition is called cascading aborts.
Strong strict two-phase locking (SS2PL) avoids this problem by requiring all locks to be held until the end of the transaction. The shrinking phase, in effect, is an atomic operation that occurs at the very end of the transaction. You lose concurrency this way but avoid having to process cascading aborts.
Exclusive and shared locks
Exclusive locking for every resource access is a bit aggressive. Consider a transaction that just reads data. It needs to lock that data to ensure that no other transaction modifies it but nothing would go wrong if another transaction also wanted to read that same data. We can achieve greater concurrency by distinguishing read locks from write locks. Read locks (called shared locks) need not be exclusive: any number of them can be granted. However, if there are any read locks on an object, a write lock cannot be granted and the transaction must wait. Similarly, if there is a write lock (called an exclusive lock) on an object then any read lock requests or write lock requests must wait.
Optimistic concurrency control
Concurrency control techniques that rely on locking force locks to be held while the transaction is using the resource, or, in some cases, until the end of the transaction. This restricts the maximum amount of concurrency that may be achieved in the system. Optimistic concurrency control techniques assume that transactions are more likely to complete than not. As such, it is better to put more effort on rectifying errors from having used data from aborted transactions than to lock access to the data. A fully optimistic system uses no locks and checks for conflicts at commit time. Optimistic concurrency control has three phases of operation:
Working phase. The transaction reads and writes data. A private workspace is often, but not always, used to keep non-committed results isolated.
Validation phase. When the transaction is ready to commit, a check is made to see if there is any conflict with other transactions that took place during this time. For example, if some transaction A modified data and committed but transaction B read data before A modified that data, then transaction B cannot be allowed to commit because that would violate serializability.
Update phase. Tentative changes are made permanent and the transaction commits.
Two-version-based concurrency control
A way of increasing concurrency even beyond read/write locks is through two-version-based concurrency control. In this case, one transaction writes tentative versions (private versions) while other transactions read existing, previously committed versions of the data. This allows transactions to request read locks even if another transaction has a write lock on the object. When the transaction that has a write lock is ready to commit, it converts its write locks to a commit locks, waits until any transactions that have a read lock for that object complete, and then makes its modified version permanent. During a commit lock, no other transaction can grab any lock on that object. This allows increased concurrency but transactions that write data risk waiting when they attempt to commit and make their data permanent.
Timestamp ordering
Timestamp ordering allows lock-free concurrency control by keeping track of two timestamps per object: the timestamp of the last committed that read the object and the timestamp of the last committed transaction that wrote the object. If a transaction wants to write an object, it compares its own timestamp with the object’s write timestamp. If the object’s timestamp is older then we have good ordering and the transaction can proceed. Otherwise the transaction aborts and is restarted.
Multiversion concurrency control
We can extend two-version-based concurrency control to support multiple versions of an object and apply the concepts of timestamp ordering to create multiversion concurrency control (MVCC).
In multiversion concurrency control, no locks are used. The system can maintain multiple versions of an object (e.g., a field in a database table). Each transaction has a timestamp associated with it that marks the start of the transaction. Each version of an object has two timestamps: the read timestamp, which records the time the object was read, and the write timestamp, which records the time the object was modified.
With MVCC, a transaction will never have to wait to read an object; it simply reads an the latest version of the object that has a write timestamp earlier than that of the transaction. A write cannot take place if the are any other uncommitted transactions that read the object and have a timestamp earlier than the read timestamp of our transaction. This is because an earlier transaction depends on a previous value of the object and we need to consider the possibility that the earlier transaction may modify that object. In this case, the transaction will abort and restart in the hope that this conflict will not arise again.