Transactions

[This topic is presented in pages 271-288 in Tannenbaum - VaSteen.]

What is a Transaction

Transactions are mainly studied in the context of [distributed] data bases, but the concept is a model on how to implement reliable [distributed] applications. Often people refer to the desired properties of transactions using the ACID acronym:

As we will see below, isolation is achieved by using serializable schedules (usually, two-phase locking); durability is achieved by saving results in persistent store; consistency is required of the people that develop the transactions, at the application, not at the tool level [a money transfer transaction should worry about preserving accounting rules, not the system - however, if the invariant is formally specified, the system will fail transactions that violate the invariant]; and atomicity is achieved in two steps, first in single systems (usually with logs), then in distributed systems (usually with the two-phase commit protocol).

Transactions are easy to use. A Transaction Server will support operations to Start a Transaction and to End a Transaction. The termination of a transaction can take two forms: success, called commit, and failure, called abort. Abort gives up on the current transaction. Many are the reasons for aborting a transaction. They range from the internal logic of the transaction (like making reservations for a multi leg trip and then giving up in the middle when a particular leg cannot be scheduled) to crashes occurring in one of the systems taking part in a distributed transaction. When a transaction commits or aborts, many house-keeping action will be required to complete ("close the books" on) the transaction. The Transaction Server will worry about how to guaranty transactionality for the actions performed between the Start and End delimiters. It is a "service" provided to us without need for us to program anything: it is both valuable and free.

We have from the beginning of the course worried about the concurrent execution of activities, their possible interleavings (or schedules), about mutual exclusion, and critical sections. Here we examine a number of undesirable behaviors caused by interleavings that violate the ACID properties. These interleavings lead to transaction aborts.

If transactions are executed one after the other [this is called a serial schedule], we avoid the aberrations that are caused by their concurrent execution on shared data. But by doing so the response time in executing the transactions becomes excessive, more than users will tolerate. Fortunately not all is lost: to avoid aberrations it is not necessary to have serial schedules, serializable schedules will suffice: a serializable schedule is a schedule that is equivalent to a serial schedule on the basis of the Bernstein's Conditions. This requires some thought and explanation.

Serializability of Transactions

Let's use the notation W(a) to indicate a write operation on object a and R(b) to indicate a read operation on object b. Let S be the start of a transaction, A is the Abort, and C is the Commit. In describing transaction Ti, let's append an i to its S, A, C, R and W operations, say, Si, Ai, Ci, Wi(a), and Ri(b). For the time being let's postpone consideration of S, C, and A operations, focusing only on R and W. Two operations that operate on the same object are said to be conflicting operations if they are not both read operations. This reflects what we know from the Bernstein's conditions.

Now let's consider some transactions:

   T1:  R1(a) R1(b) W1(a)
   T2:  R2(b) R2(a) W2(b)
and the schedule ("schedule" is another name for "interleaving")H1:
   H1:  R1(a) R1(b) R2(b) W1(a) R2(a) W2(b)
Since R2(b) and W1(a) operate on different objects, their order could be changed without affecting the result when executing H1. Thus we would get the schedule H2:
   H2:  R1(a) R1(b) W1(a) R2(b) R2(a) W2(b)
which is just the serial execution of T1 followed by T2. In this case we say that the schedule H1 is serializable. Of course not all schedules are serializable. For example the schedule H3:
   H3: R1(a) R1(b) R2(b) R2(a) W1(a) W2(b)
is not serializable since R2(a)W1(a) cannot be reordered as W1(a)R2(a) since the values for a used in T2 in the two cases are different. [This is basically what was said with the Bernstein's conditions.]

We have a general way to determine if a schedule is serializable. Suppose that we have a schedule of transactions T1, T2, .., Tn. We create a serialization graph with nodes T1, T2, .., Tn. We then put an arc from node Ti to node Tj (and we say Ti precedes Tj) if we find in the schedule an Ri(x) before a Wj(x) or a Wi(x) before an Rj(x) or a Wj(x). Notice that these operations will have to be protected by read or write locks and Ti will have to release the lock on x before Tj can acquire the lock on x. [Our discussion of conflicting operations and serializability, is a simplification of reality. The order of read and write operations on the same object by different transactions has to be preserved - for example R1(x)..W2(x) must be kept in this order as does W1(x)..R2(x). But when considering write operations to one object without intervening reads, only the last write matters. For example,W1(x)W2(x)W1(x) is equivalent to W2(x)W1(x)W1(x) since the effect of these operations depends only on the last W1(x).]

Serializability Theorem: A schedule is serializable if and only if its serialization graph is acyclic.

But this theorem is not easy to apply: it is not reasonable to expect that transactions will be executed while checking at all times that no loop occurs in their serialization graph.

Two-Phase Locking

We use locks (read locks and write locks as necessary) to prevent undesired schedules. We denote a locking operation with L and unlocking with U. We assume that the context makes it clear what are read locks and what are write locks. We assume that no read or write operation takes place on an object without the appropriate locks being in place on that object.
[Things can be made more complicated: we may have a system where we may first read lock an item, and later on raise that lock to a write lock. Alternatively, we may start with a write lock on an item, and then demote that to a read lock for that item. Raises will have to take place during the locking phase, and demotions during the unlocking phase. ]

We will say that a transaction follows a Two-phase Locking [2PL] Policy if it never applies a lock after having released another lock. The first phase is when we apply locks, the second phase is when we release them.

Theorem: If transactions follow the two-phase locking policy then they will be serializable [that is, all their feasible schedules, i.e. schedules that respect the locks, are serializable].

In fact, by contradiction, if we had transactions that follow the two-phase locking policy and give origin to a schedule that is not serializable, the serialization graph for that schedule would be cyclic. Thus we would have a loop from transactions T1, to T2, .. to Tm, and back to T1. And each of these transactions would have to release a lock before the next can acquire it. Thus T1 would first release a lock and then acquire a lock, contradicting the fact that T1 is two-phase.

Two problems remain when using two-phase locking:

Lock management is usually not a programmer's concern. Locks can be inserted automatically by the system to enforce a desired policy. For example, if we have transaction T:

   S R(x) R(y) W(x) W(y) R(z) W(u) C
then locks can be inserted as follows [using 2PL]:
   S L(x) R(x) L(y) R(y) W(x) L(z) L(u) U(x) W(y) U(y) R(z) U(z) W(u) U(u) C
or, better, [using S2PL]:
   S L(x) R(x) L(y) R(y) W(x) W(y) L(z) R(z) L(u) W(u) C U(z) U(x)
U(y) U(u)

There is an alternative to the use of locks, with consequent possibility of deadlock: by using timestamps. Suppose that each object is timestamped with the start time of the last transaction to access that object. Then a transaction T will abort if it accesses an object that has a timestamp later than the start time of T. This is bad if these aborts occur frequently, but if rare, this technique is fine. This way of executing transactions, where transactions access shared objects in the same order as the transactions were started, is called the Strict Execution of Transactions policy.
This method has a problem: how to guarantee that the timestamps really reflect the true ordering of transaction start times. This will be discussed when we study clocks.

Atomic Transactions

We are left with a fundamental problem of transactions: atomicity. We need to examine how to deal with failure, aborts, commits.

Atomic transactions on a single system

We consider first the case where a single computer is involved in a transaction. Usually the data is maintaned in a database (we could also have flat files) under the control of a database manager (DM) (in this context it is often called just resource manager). Since reading from disk is slow and since synchronous writes to disk are very slow, all sorts of buffering schemes and caches are used. Thus write operations to persistent storage will have to be done in the correct order and, unless one is careful, data will be lost in case of system crashes.

A number of implementation strategies are possible. Among them:

  1. WriteAhead Log - Redo-Only, also called Deferred-Update
    We assume we have in stable storage a writeahead log (also called a Redo Log, or a Database Log, or an Intentions List), it is just a sequential file. Initially all the operations are done acquiring locks without modifying the data base, just writing to the log for each transaction a start record and for each update command the command and the data that has to be written to complete the operation. [Note the record will include the indentifier of the transaction and it will be an indempotent write operation: if we had 100 in an account and we withdraw 10, the write operations says to write 90 to the account.]
    When all this information has been logged (this phase is called the precommit phase) a commit record is written to the log. Anything of the transaction that was done before this commit, in the case of failure, is discarded and we are as if back to before the beginning of the transaction. After the commit, the transaction will be completed on the real data, not the log, because we have in the log all the information needed to carry it out. This method of implementing the transaction is called redo-only because when we commit we need to redo each (update) operation of the transaction (if we give up we need not do anything to the database). The order in which log records are written is important since they must be done over in the same order (at least, conflicting operations must be done in the same order as indicated in the log) in the redo phase. The locks that were acquired in the precommit phase, are held throughout the redo phase. Here is what is done in correspondence to the various operations of a transaction:

  2. WriteAhead Log - Undo-Only, also called Update-in-Place
    An alternative is to have a precommit phase where we apply locks and for each update command we save to the WriteAhead Log (now called Undo Log) the data that will be modified (so that we can undo the command) and we execute on the real database the command. We then write a commit record to the log. If there is a crash before the transaction is completed, we rollback the old values by reading them from the log from earliest to latest (but only one per variable). This method of implementing the transaction is called undo-only since in case of failure before the termination we need to undo all the updates done to the database. Here is what is done in correspondence to the various operations of a transaction: In case of crash, on recovery one looks at the log, and in the log at all the active transactions [those transactions with a start record but no abort or commit record]. All these transactions are aborted, in the way indicated above.

  3. Private Workspace
    This is an interesting idea already encountered in Copy-On-Write (COW) fork operations and in Write-Once file systems. The idea is to give to each user its own copy of the data being updated, but this copy is just a directory of the blocks affected (the blocks may come from many files). Then when the update is made by a transaction to a block, it is written to a new block (called a shadow block) and the directory of that transaction is updated to point to this new block. When all updates are carried out and the new directory is copied to a log in stable storage together with a commit record, the transaction is ready to be completed by updating atomically the real directories of the database. Notice that in case of failure before commit we do not need to do anything to the database, just update lists of free blocks since blocks allocated to the aborted transactions can be recovered.
All logs are kept in stable storage. For each transaction we will have a start transaction record and an end transaction record. Then space for the log can be recovered by compaction after freeing the space associated to completed transactions, or by treating the log as a circular tape where we restart from the beginning after having reached the end. Notice that logs, since they are sequential files, they allow fast processing. Information is logged at much greater speed than if it were actually recorded in a large store.

Restart is the operation required after a system failure. The restart operation will take care of the transactions that were not completely done at the time of the crash. Aborted transactions are undone, committed transactions are completed, and uncommitted transactions are undone and restarted. Checkpoints (i.e. the saving of intermediate values of objects during a transaction) can be used to reduce the amount of work to be done during restart.

Atomic Transactions in a Distributed System

The problem is the following: A number of systems are involved in executing a transaction. One is the transaction manager or coordinator, say it is system 1, the system where the transaction was started. All other systems have a resource manager that operates as indicated above for transactions on single systems. A resource manager is responsible for carrying out the local operations keeping a local log and, in addition, for communicating with the (transaction) manager. Each system has available stable storage to record reliably their progress and decisions across temporary local failures and recoveries. We assume that communication is reliable and we accept the fact that any system may fail and recover. We want to reach the situation where all systems complete in agreement that the transaction is committed (and they all are in the new state) or that it is aborted (and they all are in the old state). Of course the agreement should be faithful to the truth: there is commitment iff all systems will complete their tasks. A solution is provided by the Two-Phase Commit Protocol.
   Part 1:
      The transaction(a program in the transaction) starts by informing 
      the transaction manager (or coordinator) that will write to 
      stable storage a StartTransaction record and inform all participant 
      resource managers (also called participants).
      Now the transaction is Active.
      Carry out the operations of the transaction at each system
      as we did in the pre-commit phase in the single system transaction.

    Agreement Protocol:
      At this point all the operations of the transaction have been
      carried out in a "non-committed" fashion. A program in the transaction
      issues the "commit" request (only one program can do that). 
      The manager writes a pre-commit record to the log (stable storage)
      and sends to all participating resource managers the request that 
      they report their decision to commit or abort (REQUEST-TO-PREPARE):
      Round 1:
        All participants write a pre-commit or abort record to
        the log (stable storage) and report to the coordinator
        their local decision to commit or abort. A participant
        will send a commit reply (PREPARED) only if it has saved to stable
        storage all its state and the pre-commit record. Otherwise
        it replies NO.
        The coordinator collects the answers [if any is late, it is
        assumed to be an abort] then decides to commit iff
        all replies were to commit. The decision is recorded
        in stable storage at the coordinator with a commit record
        At this point, if commit was chosen, the transaction is 
        "committed": it must be brought to a successful conclusion.
      Round 2:
        The coordinator informs all participants of the decision with
        a COMMIT or ABORT message. Now the participants must
        carry out that decision. [The coordinator 
        will keep sending the decision until it 
        is recorded in stable storage at all participants.]
        A participant will acknowledge with a DONE message a received commit
        or abort message so that the coordinator can stop overseeing this
        participant and release resources it may be holding.

    Part 2:
      The actions are finalized at all systems. This can be accomplished
      reliably since what to do is recorder in stable storage
      at each site. The participant record the completion of the
      transaction in their log. Then the transaction becomes Inactive.
Note that people think of the two phases as referring to Part 1 and Part 2, while in reality the two phases refer to the two rounds of the agreement protocol. The agreement protocol requires 4(n-1) messages, where n is the number of systems. A transaction can be in one of the following states:
  1. Executing: From the start of the transaction to the start of the two-phase protocol.
  2. Uncertain: During the two-phase protocol.
  3. Committed or Aborted: After the two-phase commit protocol before the completion of the transaction.
  4. Inactive: All participants have finalized the transaction.

    The transaction is active when in any state but the inactive state.

The following two pictures display respectively the interactions between the coordinator and a participant during a transaction, and what they do in the case of failures.

The two-phase commit protocol is imperfect since it assumes that no system will fail permanently.

  1. Consider the case that a participant j fails permanently after round 1 when all had agreed to commit. Then the coordinator decides to commit and informs the others in Part 2 but the transaction cannot complete since system j is not there to accept the message from the coordinator and acknowledge it, and the coordinator cannot abort because by now some systems may have finalized their action. The transaction remains in the uncertain state and becomes blocked. Some extraordinary action must be taken, usually in the form of a compensating transaction that undoes or somehow fixes the previous transaction.
  2. It is also a problem if the coordinator fails permanently at the end of round one, just before recording its decision. The participants that have failed will know what to do [abort, thus re-establishing the previous state], but those that had decided to commit are in trouble [they are left hanging on how to finalize the transaction, i.e. they are blocked] and the transaction remains in the uncertain state. These blocked systems will attempt to reach resolution by sending queries to the coordinator requesting disposition information for the incomplete transaction.
So some additional work, not part of the two-phase commit protocol, needs to be done to handle these problems. Basically, we need to have some way of knowing when we can close the books on a transaction and say that it is done. A three-phase commit protocol has been suggested to deal with the weaknesses of the two-phase protocol.

Impossibility of Guarantieing Consensus in Asynchronous Systems

The problems encountered in the Two-Phase Commit Protocol (the blocking of the transaction) is an example of a general problem: our inability to guarantee arriving at a consensus in a system where Because of this impossibility, one aims not to guaranty consensus, but to minimize the probability of failing to achieve consensus. Also, one aims to establish emergency procedures for handling failures.

Transaction Servers

One takes advantage of the services provided by the Transaction Server and need not worry about concurrency control and recovery. The system (transaction server) will take care of that, locking and unlocking as required. Thus transaction processing represents a "higher level" form of programming. Of course, somebody has to implement such a transaction server and support its functionality. It should not be surprising to find out that transaction servers are complex and that transactions have a high overhead. People evaluate the performance of many kinds of distributed systems on the basis of the number of transactions they can support in a unit of time.

Nested Transactions

In some systems it is possible to define nested transactions, that is transactions defined within other transactions. An abort in a nested transaction terminates the nested transaction only, not the containing transaction. For example a transaction could be:
	A0: Start
	A1: Something
	A2: Start
	A3: Something
	A4: Abort
	A5: Start
	A6: Something
	A7: End
	A8: Something
	A9: End
is equivalent to just A0, A1, A5, A6, A7, A8, A9.