|
Techforum
Understanding Transaction Isolation
This is a topic related to SQL Server. However, the concept
is applicable to any RDBMS. While auditing many applications, I have found that
incomplete or no understanding of ‘Isolation Level’ leads to a lot
of real life problems. These include performance degradation, blocking, locking
as well as major deadlocks.
This
article provides an easy-to-understand view of what ‘Isolation Levels’
really mean and when to use which level.
‘Isolation Level’ is a setting that decides how
data which is a part of an ongoing transaction is made visible to other transactions.
Transactions are well understood. So what is the problem?
As all of us know, it is a unit of work. The work may contain
many steps but either all steps happen or all steps don’t happen. This
is fairly well known. Nothing new here. But consider how transactions occur
in real life.
- I am updating one row in a Begin Tran and Commit
section. I have issued Begin Tran as well as the Update statement. Commit
is not yet issued – because I want to perform some more actions in the
same transactions on some other tables. If some other user wants to read this
row that I have updated but not yet committed, what should happen?
- Consider another scenario. I start a transaction.
I calculate a total of a field based upon all rows in a table. Now, I need
to add a new record in another table which contains this total. Now, can the
original table be changed by some other user after I calculate the total?
In which case, there could be a mismatch. Do you want to take such chances?
- Another scenario. I am working on some transaction
table between a range of keys within a transaction—say 10 and 20. There
were only 5 records when I read the range – 10, 12, 14, 16, 20. Now
I am working on other things in the transaction. Before I could commit the
transaction, someone added another row with a key value of 11. Now, my base
assumption about what records I read between 10 and 20 and further work upon
them itself is wrong. Problem!!! Is it not?
- I start reading a long table. It is not a transaction
at all. But other users want to refer to that table for updating some fields
in specific rows. The query takes 20 minutes to read all the rows. What happens
to other users who are trying to update the rows? Do they wait for 20 minutes?
Or they are allowed to update the rows even when the rows are being read in
a large query? What if the query was used to generate a summary report containing
grand totals? The total would be wrong because after the summation started,
some rows have changed. Some of these rows could have changed after the summation
occurred. What’s to be done now?
As you can see all these situations are confusing and prone
to inaccuracies. To avoid such problems we have a feature called “Isolation
Levels”.
Isolation Levels are applicable to transactions. These decide
the visibility of information which is a part of an ongoing transaction.
In very simple terms Isolation Levels decide “What
happens when some data being referred to (for reading or writing) within an
incomplete transaction is also being referred to (for reading or writing) from
another connection (or user – actually it is called another ‘transaction
lock space’)?”
To make things sound technical, all these problems have been
given nice and complex sounding names.
Data visibility problems that can occur during a transaction
Let us understand some jargon.
Each transaction performs certain operations (select / insert
/ update / delete) on one or more rows of one or more tables. Transaction starts
with Begin Tran command and ends with Commit. If unsuccessful, it ends with
Rollback command. Now, after Begin Tran is issues and before Commit is issued,
multiple tables may participate in the transaction related commands. Specific
parts of these tables need to be locked during this phase to ensure that other
users do not interfere with this transaction. This is called Transaction Lock
Space.
- Uncommitted (dirty) data:
Consider this code snippet (Blue text is code; green indicates
comments):
1. Begin Transaction
—we want to change the customer status from active
= “yes” to active = “no”
2. Update customer Set active = “No”
where CustomerID = 2324
—some more commandss
3. Select * from customer where CustomerID = 2324
4. Commit Transaction
Now, if the transaction commits, the value will be “No”.
If it does not commit (for whatever reason), the value will remain “Yes”.
Consider that the value has already been changed to “No”.
But there are more commands to be executed before the entire transaction commits.
These commands are time consuming and take, say, 3 minutes. During these three
minutes, if some user outside the transaction lock space reads the value of
Active field in the Customer table for ID 2324, what should they see? “Yes”
or “No”? The answer is simple. The value is changed but not committed.
Therefore, external queries should still show “Yes”.
Now what would happen if a Select active from Customer where
CustomerID – 2324 returns “No” to another user? What happens
if the transaction rolls back? This is called Uncommitted Data. Ideally this
should not be visible outside the transaction.
However, consider the same command executed within the transaction
(line 4). It should – and it will return “No”.
Now let us consider various problems that can occur. The
problems can be of three types:
1. Dirty Read
2. Non-repeatable read
3. Phantom rows
- Dirty reads
This is when connections outside the transaction space can
read the uncommitted data. You don’t want this to happen in most cases.
The only reason when you may want to allow this is when you are creating a report
which assumes a certain amount of inaccuracy. You think this situation is rare?
Not really. Suppose the report compares the current month sale with last month
sale and returns a percentage difference. Consider that the report takes 5 minutes
to generate. During these 5 minutes, more transactions may be getting added
to the sales data. Typically 5 transactions would get added to the sales table.
The average transaction value is Rs. 1000. The total sale for the month is typically
30-40 lacs. In such cases, you really don’t need to worry about absolute
accuracy. You can intentionally allow dirty reads because the impact is really
not significant from a business perspective.
- Non-repeatable read
Consider the above code. We change the value of Active to
“No” in line 2. Now in line 4 you expect the value to be “No”
because we are querying the row ‘within’ the transaction.
Now what would happen if some other transaction was allowed
to change the same value to, say, “Maybe”? In line 4, we would expect
value of “No” but actually get value of “Maybe”. This
problem is called non-repeatable read. The idea is that within a transaction,
the same data read any number of times should yield consistent results. If it
is not same then the reads are ‘non-repeatable’.
- Phantom Rows
This was explained in the introduction (range 10 to 20 example).
The concept is simple. If you have already read a range of data based upon a
key value, another transaction inserts a new row which happens to have a value
between this range. The original range which was referred to would now become
invalid. This is because a “Phantom row” was added. This is also
a major problem.
Important learning
Although the above three issues are listed as problems, they
may not always be considered as problems! Paradoxical? Not really. As explained
for Dirty Read, whether it is a problem or not depends entirely upon the business
context. Technology exists to allow these problems to occurs as well as prevent
them. The choice is yours. I know this complicates matters. But if these matters
were not complicated, what are IT professionals paid for!
Now let us see how to prevent these problems? The answer
is “using Isolation Levels”.
Types of Isolation Levels
| To solve the problem of … |
The Isolation Level should be… |
| Dirty Read |
Read Committed (Default of SQL Server)
|
| Dirty Read and |
Repeatable Read Non-Repeatable Read
|
| Dirty Read and Non-Repeatable
Read and Phantom Rows
|
Serializable |
| To retain all three problems |
Read Uncommitted |
Before we discuss each isolation level, let us understand
how to set or change it in T-SQL.
Syntax
SET TRANSACTION ISOLATION LEVEL
{ READ COMMITTED
| READ UNCOMMITTED
| REPEATABLE READ
| SERIALIZABLE
}
Remember:
- This setting applies to all transactions which are happening
within a single connection.
- Multiple connections can have different Isolation Levels.
- Within a single connection, you can change the Isolation
Level for specific transactions.
- You must know the kind of transaction you are performing
and its business requirements before you can decide the right isolation level.
- Isolation level does not just affect the current transactions
in a given connection. It also affects how other transactions issued within
other connections will behave.
- Therefore, you should have a good idea of what kind of
concurrent transactions occur in your application when the system is live and
many users are active.
This is the single biggest problem which leads to low performance,
locking and blocking, as well as deadlocks
- Most developers think of individual transactions as though
they are occurring in a single user system.
- It is difficult to envisage all permutations of possible
transactions which can occur together. But it is very much possible to plan
for it in a structured and informed manner. This requires additional effort,
monitoring live systems and tweaking of specific transactions and so on.
- Unfortunately, this additional effort is rarely a part
of system deployment! Such problems surface only after the system load increases
in such a way that small, unnoticed issues become amplified due to large volume
of data and / or large number of concurrent users.
- Only one of the options can be set at a time.
- It remains in effect till you explicitly change the option
by issuing another command. This is an important aspect to consider.
Best Practice : When you change the Isolation Level for a
transaction, remember to set it back to the original level after the transaction
is completed.
Isolation levels
This is as good (or bad) as not having any isolation. All
data which is uncommitted is readable from any connection. This should not be
used unless you have a very good reason to do so.
This prevents dirty reads. This does not prevent phantoms
or non-repeatable reads. This is the default. Although it is the default, it
does not mean that this isolation level is ideal for all transactions. One of
the regular tendencies amongst techies is to simply use default values without
giving it a second thought! I cannot imagine the number of phantom and non-repeatable
reads that must be occurring in the world because someone simply used the ‘default’
value. It is a scary thought to say the least.
This level is obviously more restrictive than the ‘Read
Uncommitted’ level.
This prevents dirty reads as well as non-repeatable reads.
It does not prevent phantom rows. This is more restrictive than Read Committed
level. When I say restrictive what does it mean? It means that the chances of
other transactions having to wait for this one to finish are INCREASED. Another
way of saying this is – Repeatable Read reduces concurrency compared to
Read Committed level.
This is the most restrictive of the options. This should
never be used as the default level. If you use this one as the default, it will
most probably create a single user system!
This prevents all three problems.
How do Isolation Levels work?
You will notice that we have not yet addressed this issue
at all in this article. The answer to this question is another Pandora’s
box. The answer is simple. It uses the appropriate types of locks to achieve
the desired effects. We will see what locks are and how they are implemented
using Isolation Levels in the next article.
In the meantime, please go through your code and see if you
are simply using the default level or are there some transactions which merit
a different isolation level. If you feel like changing a level, never do it
in production system. It can create havoc. Try it out in a test environment
first, satisfy yourself that it has no side effects and then implement it in
production system.
Don’t stop there. Monitor the production system. Some
problems are never detected in test systems. They may occur in production. Handle
them as required.
If you have a packaged product which handles large amount
of data and many concurrent users, you must analyse the appropriate usage of
Isolation Levels.
 |
About
the Author:Dr Nitin Paranjape is the Chairman and MD of Maestros (Mediline).
He is a consultant with many organisations, covering appropriate technology
utilisation, business application of relevant technology, application architecture
and audit as well as knowledge transfer. He has authored more than 650 articles
on various technology-related subjects. He can be contacted at nitin@mediline.co.in |
|