Transactions:
The primary tool for handling concurrency in enterprise applications is the transaction. The word "transaction" often brings to mind an exchange of money or goods. Walking up to an ATM machine, entering your PIN, and withdrawing cash is a transaction. Paying the $3 toll at the
Looking at typical financial dealings such as these provides a good definition for the term. First, a transaction is a bounded sequence of work, with both start and endpoints well defined. An ATM transaction begins when the card is inserted and ends when cash is delivered or an inadequate balance is discovered. Second, all participating resources are in a consistent state both when the transaction begins and when the transaction ends.
In addition, each transaction must complete on and all-or-nothing basis. The bank can't subtract from an account holder's balance unless the ATM machine actually delivers the cash.
A transaction is a complete unit of work. It may comprise many computational tasks, which may include user interface, data retrieval, and communications. Completion of transaction means either commitment or rollback. Either commitment or rollback results in a consistent state.
Transaction Properties:
ACID
Software transactions are often described in terms of the ACID properties:
· Atomicity: Each step in the sequence of actions performed within the boundaries of a transaction must complete successfully or all work must roll back. Partial completion is not a transactional concept.
· Consistency: A system's resources must be in a consistent, non-corrupt state at both the start and the completion of a transaction.
· Isolation: The result of an individual transaction must not be visible to any other open transactions until that transaction commits successfully.
· Durability: Any result of a committed transaction must be made permanent. This translates to "Must survive a crash of any sort."
Transaction Concurrency Problems:
If locking is not available and several users access a database concurrently, problems may occur if their transactions use the same data at the same time. Concurrency problems include:
[Lost Updates and Inconsistent Reads (dirty read, non-repeatable read, Phantom reads)].
- Lost or buried updates.
- Uncommitted dependency (dirty read).
- Inconsistent analysis (non-repeatable read).
- Phantom reads.
v Lost Updates:
Lost updates occur when two or more transactions select the same row and then update the row based on the value originally selected. Each transaction is unaware of other transactions. The last update overwrites updates made by the other transactions, which results in lost data.
For example, two editors make an electronic copy of the same document. Each editor changes the copy independently and then saves the changed copy, thereby overwriting the original document. The editor who saves the changed copy last overwrites changes made by the first editor. This problem could be avoided if the second editor could not make changes until the first editor had finished.
Tx1: -----t1: update ------t3: commit.
Tx2: -------t2: update -------- t4: commit/rollback.
Data updated by Tx1 will be lost.
v Uncommitted Dependency (Dirty Read):
Uncommitted dependency occurs when a second transaction selects a row that is being updated by another transaction. The second transaction is reading data that has not been committed yet and may be changed by the transaction updating the row.
For example, an editor is making changes to an electronic document. During the changes, a second editor takes a copy of the document that includes all the changes made so far, and distributes the document to the intended audience. The first editor then decides the changes made so far are wrong and removes the edits and saves the document. The distributed document contains edits that no longer exist, and should be treated as if they never existed. This problem could be avoided if no one could read the changed document until the first editor determined that the changes were final.
Tx1: -----t1: updating---------- t3: (rollback).
Tx2: ----------t2: select---------------- t3: (commit).
Tx2 is working on data that no longer exist.
v Inconsistent Analysis (Non-repeatable Read):
Inconsistent analysis occurs when a second transaction accesses the same row several times and reads different data each time. Inconsistent analysis is similar to uncommitted dependency in that another transaction is changing the data that a second transaction is reading. However, in inconsistent analysis, the data read by the second transaction was committed by the transaction that made the change. Also, inconsistent analysis involves multiple reads (two or more) of the same row and each time the information is changed by another transaction; thus, the term non-repeatable read.
For example, an editor reads the same document twice, but between each reading, the writer rewrites the document. When the editor reads the document for the second time, it has changed. The original read was not repeatable. This problem could be avoided if the editor could read the document only after the writer has finished writing it.
Tx1: ---------t2: updating--- t3: (commit).
Tx2: -----t1: select------------------ t4: select---
Tx2 will read different data on second select.
v Phantom Reads:
Phantom reads occur when an insert or delete action is performed against a row that belongs to a range of rows being read by a transaction. The transaction's first read of the range of rows shows a row that no longer exists in the second or succeeding read, as a result of a deletion by a different transaction. Similarly, as the result of an insert by a different transaction, the transaction’s second or succeeding read shows a row that did not exist in the original read.
For example, an editor makes changes to a document submitted by a writer, but when the changes are incorporated into the master copy of the document by the production department, they find that new unedited material has been added to the document by the author. This problem could be avoided if no one could add new material to the document until the editor and production department finish working with the original document.
Tx1: ----------t2: insert/delete --- t3: (commit) ------
Tx2: ----t1: select---------- t4: select----------------
Transaction Concurrency Problems Solution:
Isolation and Immutability:
The problems of concurrency have been around for a while, and software people have come up with various solutions. For enterprise applications two solutions are particularly important: isolation and immutability.
Transaction Concurrency Control:
Optimistic Concurrency
Optimistic concurrency control works on the assumption that resource conflicts between multiple users are unlikely (but not impossible), and allows transactions to execute without locking any resources. Only when attempting to change data are resources checked to determine if any conflicts have occurred. If a conflict occurs, the application must read the data and attempt the change again.
Pessimistic Concurrency
Pessimistic concurrency control locks resources as they are required, for the duration of a transaction. Unless deadlocks occur, a transaction is assured of successful completion.
A good way of thinking about this is that an optimistic lock is about conflict detection while a pessimistic lock is about conflict prevention.
Both approaches have their pros and cons. The problem with the pessimistic lock is that it reduces concurrency. Optimistic locks allow people to make much better progress, because the lock is only held during the commit. The problem with them is what happens when you get a conflict.
The essence of the choice between optimistic and pessimistic locks is the frequency and severity of conflicts. If conflicts are sufficiently rare, or if the consequences are no big deal, you should usually pick optimistic locks because they give you better concurrency and are usually easier to implement. However, if the results of a conflict are painful for users, you'll need to use a pessimistic technique instead.
ANSI transaction isolation levels: [2]
1. Read uncommitted: A system that permits dirty reads is said to operate in read uncommitted isolation. One transaction may not write to a row if another uncommitted transaction has already written to it. Any transaction may read any row, however. This isolation level may be implemented in the database-management system with exclusive write locks.
2. Read Committed: A system that permits unrepeatable reads but not dirty reads is said to implement read committed transaction isolation. This may be achieved by using shared read locks and exclusive write locks.
3. Repeatable Read: A system operating in repeatable read isolation mode permits neither unrepeatable reads nor dirty reads. Phantom reads may occur.
4. Serializable: Serializable provides the strictest transaction isolation. This isolation level emulates serial transaction execution, as if transactions were executed one after another, serially, rather than concurrently. Serializability may not be implemented using only row-level locks. There must instead be some other mechanism that prevents a newly inserted row from becoming visible to a transaction that has already executed a query that would return the row.
Isolation level | Dirty read | Non-repeatable read | Phantom |
Read uncommitted | Yes | Yes | Yes |
Read committed | No | Yes | Yes |
Repeatable read | No | No | Yes |
Serializable | No | No | No |
How exactly the locking system is implemented in a DBMS varies significantly; each vendor has a different strategy. You should study the documentation of your DBMS to find out more about the locking system, how locks are escalated (from row-level, to pages, to whole tables, for example), and what impact each isolation level has on the performance and scalability of your system.
Reference:
- Addison Wesley: Patterns of Enterprise Application Architecture By Martin Fowler, David Rice, Matthew Foemmel, Edward Hieatt, Robert Mee, Randy Stafford
- Java Persistence with Hibernate By Christan Bauer and Gavin King