Database Transaction
- A transaction is a logical unit of work that must be either:
- Entirely completed
- or aborted
- no intermediate form is acceptable
- e.g. An account transfer transaction consists of two parts:
- $500 withdrawal from savings account
- $500 deposited into a checking account
- If either part-1 is unsuccessful or part-2 is unsuccessful,
we cannot let the transaction go to completion
- A transaction may have multiple components,
each components may be classified as a transaction in the database sense, e.g.
- e.g. in a retail point-of-sale system, a customer may wish to buy several items.
- Each item can be thought of a logical component in a sales transaction
- There are all kinds of reasons to cause this whole transaction
not to go to completion, (such as insufficient stock in one of the items)
- But if business policy allows, a sales transaction can still be completed
without one of the sales items
- Each component can be completed or aborted
- A component can not be partially completed, e.g. update the sales but not the inventory
Transaction Properties
- Atomicity -- all parts of a transaction must be completed or not at all
- Durability -- the consistent state of the database
when a transaction is completed must not be lost even in the presence of system failure
- Serializability -- the result of the concurrent execution of
several transactions is treated as if they were executed in serial order in a multi-processing environment
- Isolation -- data used during the execution of a transaction cannot be used by a
second transaction until the first one is completed
Transaction Management
What has to be done to ensure correct transaction
Single-user databases
- It has no problems of serializability and isolation
since it is single-tasking (i.e. one transaction at a time)
- It has to guarantee durability through back-up and recovery facilities
- database is backup totally and incrementally periodically
- Total database recovery is performed if the whole database is ruined
(e.g. head crash and a new disk is loaded with back-up data from the total backup)
- Recovery from the incremental backup may be needed to bring the database back
to the point before the last transaction
- Checkpoints are markers on the log written after periodic
incremental backup is performed
- Savepoints are similar but performed at the transaction level
- To guarantee atomicity, COMMIT and ROLLBACK operators are used
Multi-user databases
- Multiple and concurrent transactions are executed
- Serializability must be ensured by means of concurrency control techniques
- Isolation may be ensured also by means of concurrency control
Transaction Commitment
To ensure atomicity of transaction
Commitment is the act of making the effect of the transaction permanent,
prior to which may be work applied to temporary database only
Rollback reverts the database state to the a previous state, the point
before the first SQL statement or the explicit marker of BEGIN TRAN
However, it cannot cancel a transaction once it has been committed
When RDBMS encounters these SQL statements: COMMIT and ROLLBACK, it takes the appropriate actions
If the end of a program is successfully reach, then all changes are permanently records (commit)
If the program is abnormally terminated,
the database is rolled back to its previous state and all changes are cancelled
SQL Server has savepoints which are markers in the midst of transactions
so that the database can be rolled back to those points
instead of the beginning of the transaction: e.g.
- BEGIN TRAN name_change
- UPDATE member
- SET lastname = 'Smith' where member_no = 1
- SAVE TRAN savepoint_name
- DELETE adult where member_no = 1
- ROLLBACK TRAN savepoint_name
- " other SQL statements "
- COMMIT TRAN
Transaction Log
It is the mechanism to keep track of all transactions
that update the database and therefore support recovery
When the DBMS executes transactions that modify the database,
it updates the log automatically
The before and after data about the database and any of the
tables, rows and attribute values, the beginning and ending of the transaction are recorded
The log can be used to roll-back as well as roll-forward any committed transactions
that are not written to the physical database
Log writing increases the transaction overhead
but worthwhile as database can be recovered
CONCURRENCY CONTROL
Coordination of simultaneous execution of multiple transactions
- the 3 concurrency problems:
- The Lost Update Problem -- e.g. the following with time t4>t3>t2>t1
Transaction A Time Transaction B
Fetch R t1 --
-- t2 Fetch R
Update R t3 --
-- t4 Update R
- After Transaction A updates R at time t3,
- Transaction B updates R based on R at earlier time t2
- The effect of A's update is wiped out by B
- The Uncommitted Dependency Problem
Transaction A Time Transaction B
-- t1 Update R
Fetch R t2 --
-- t3 RollBack
- A is allowed to retrieve (or worse, update) a row that has
been updated by another transaction B which has not been committed
- When B rolls back, and if the transaction B never commits,
then A has seen some data that now no longer exists ( and in a sense "never existed)
- The Inconsistent Analysis Problem
Transaction A Time Transaction B
Fetch Acct1 (40) t1 --
sum = 40 t2 --
Fetch Acct2 (50) t3 --
sum = 90 t4 --
-- t5 Fetch Acct3 (30)
-- t6 Update Acct3 (30 to 20)
-- t7 Fetch Acct1 (40)
-- t8 Update Acct1 (40 to 50)
-- t9 COMMIT
Fetch Acct3 (20) t10 --
sum = 110 [NOT 120!]
- The result produced by A is incorrect, and if written back to the database,
will leave the database in an inconsistent state
- A is dependent on an uncommitted change since B commits all its updates before A sees Acct3
Concurrency Control Based on Locking
- A transaction ensures an object it is interested in (e.g. a database row, and/or an index)
will not change in some unpredicatable manner, it acquires a lock on that object
- The effect of the lock is to guarantee exclusive use and therefore
block other transactions out of the object and thereby prevent them from changing it
Lock Granularity
- One can guarantee exclusive use by locking the whole database
and results in a single user data so that no one else can use it during that time
- Cannot share the data among multiple users even if they use different tables
- Locking can be applied at different levels:
- Database level --
- good for batch processing, total backup
- Table level
- No one else can access the locked table
- Others can access other tables.
- Application is still single-user
as another user of the same application cannot access other rows of the same table
- Page level
- A disk-page which is equivalent to a disk-block
- A page has a fixed size such as 2K, 4K,... (SQL Server uses 2K)
- A table can span several pages
- A page can span several rows of one or more tables
- A page can form part of a large row
- It is the most commonly used level
- Field level
- Object size down to an attribute
- Others can access the same row except different attributes
- High overhead, less efficient
Lock Types
- Binary Locks
- It has only two states, an object is either locked or unlocked
- When a transaction needs an object, the object must not be locked
- It locks the object before using it
- It unlocks the object after use
- It eliminates the lost update problem
- Performance is not so good as too restrictive,
two read operations cannot access the same object
when obvious none of the operatins will cause any problems
- Shared/Exclusive Locks
- To enable read only operations to share a resource, shared lock is introduced
- when a transaction wants to read an object: issue a shared-lock if no exclusive-lock is on
- when a transaction wants to update an object: issue an exclusive-lock when no lock is applied
- therefore a lock has 3 states: unlocked, shared, and exclusive
- two transactions conflict only when at least one of them is an update (write) transaction
- more overhead for the scheduler as:
- type of lock must be known
- 3 lock operations
- enhanced to have lock upgrade and lock downgrade
In SQL-SERVER, locks are automatically applied, you can specify under (optimizer_hints):
- NOLOCK -- Allows "dirty reads," which means that no shared locks are issued
and no exclusive locks are honored. This can result in higher concurrency,
but at the cost of lower consistency. If this option is specified,
it is possible to read an uncommitted transaction
or to read a set of pages that are rolled back in the middle of the read,
so errors might result
- HOLDLOCK -- Makes a shared lock more restrictive by holding it
until the completion of a transaction (instead of releasing the shared lock
as soon as the required table or data page is no longer needed,
whether or not the transaction has been completed).
The HOLDLOCK option applies only to the table or view for which it is specified and only for the duration of
the transaction defined by the statement in which it is used
- UPDLOCK -- Takes update locks instead of shared locks while reading the table and holds them until the end-of-command or end-of-transaction.
- TABLOCK -- Takes a shared lock on the table that is held until the end-of-command. If HOLDLOCK is also specified, the shared lock is held until the end of the transaction.
- PAGLOCK -- Takes shared page locks where a single shared table lock would normally be taken.
- TABLOCKX -- Takes an exclusive lock on the table that is held until the end-of-command or end-of-transaction.
Two-Phase Locking to Ensure Serializability
- Locks prevent serious data inconsistencies, it still has two major problems,
one of which is serializability
- Two-phase locking protocol defines how transactions acquire and relinquish locks
- A Growing phase when a transaction aquires all the required locks without unlocking any data. At this locked point, the transaction can execute
- A Shrinking phase when a transaction releases all locks and cannot obtain any new locks
- Two transactions cannot have conflicting locks
- No unlock operation can precede a lock operation in the same transaction
- No data are affected until all locks are obtained
Deadlock Problem
- Deadlock exists when two transactions T1 and T2 are in the following situation:
- T1 has a lock on resource A and wishes to obtain B exclusively
- T2 has a lock on resource B and wishes to obtain A exclusively
- Each transaction will wait for the other to unlock and end up in a deadly embrace situation
- More than two transactions can be involved in a dead lock (the dependency graph has/is a circle)
3 Techniques to Control Deadlocks
- Deadlock Prevention -- A transaction requesting a new lock is aborted if there is the possibility that a deadlock can occur
(more rollbacks and redoes)
- Deadlock Detection -- DBMS periodically checks for deadlock, if found, aborts one the transactions (the victim)
- Deadlock Avoidance -- Transaction must obtain all the locks it needs before it can be executed
(Increases response time)
Concurrency Control with Time Stamping
- A global unique time stamp is assigned to each transaction
which is the order transactions are submitted to the DBMS
- Unique time stamps means no dupliate time stamp can exist
- Monotonicity of time stamps means the time stamps values are always increasing
- All READ and WRITE for the same transaction use the same time stamp
- The DBMS executes the conflicting transactions in time stamp order to ensure serializability of transactions
- If two transactions are conflicting, (later) one is aborted and rescheduled with a new time stamp
- No locking is needed, but each value stored in the database must have 2 time stamp fields, one for last-read, another for last-update
- Processing overhead increases with additional time stamp management and aborting and rescheduling of transactions
Concurrency Control with Optimistic Methods
Assumes majority of database operations do not conflict
- No locking or time stamping uses
- Transaction executed without restriction until committment time
- Each transaction goes through two or three phases:
- Read Phase -- database read and updates recorded to a private copy not accessible by others
- Validation Phase -- transaction is validated to assure of integrity and consistency. If positve,
transaction goes to next phase,
else transaction is aborted and restarted
- Write Phase -- apply the recorded changes on the private copy to the permanent database
- It is acceptable for mostly read or query databse systems that has few update transactions.