Issue dated - 10th February 2003

-


CURRENT ISSUE
INDIA NEWS
INDIA TRENDS
STOCK FILE
OPINION
FOCUS
E-BUSINESS
COMPANY WATCH
TECHSPACE
TECHNOLOGY
PRODUCTS
EVENTS
COLUMNS
TECH FORUM

THE C# COLUMN

BETWEEN THE BYTES
TECHNOLOGY
SPECIALS <NEW>
HMA BANKBIZ
EC SERVICES
ARCHIVES/SEARCH
IT APPOINTMENTS
WRITE TO US
SUBSCRIBE/RENEW
CUSTOMER SERVICE
ADVERTISE
ABOUT US

 Network Sites
  IT People
  Network Magazine
  Business Traveller
  Exp. Hotelier & Caterer
  Exp. Travel & Tourism
  Exp. Backwaters
  Exp. Pharma Pulse
  Exp. Healthcare Mgmt.
  Express Textile
 Group Sites
  ExpressIndia
  Indian Express
  Financial Express

 
Front Page > TechSpace > Story Print this Page|  Email this page

Learning from (others) mistakes : Query Blocking (cont...)

Tech Forum - Dr. Nitin Paranjape

Recap
The purpose of “Learning from (others) mistakes” series is to make sure that we prevent similar mistakes from occurring rather than repeating them. TechForum will present the causes, and suggest corrective and preventive steps for commonly observed mistakes.

In the last article, I described a scenario of business discontinuity due to a very harmless and unintentional error on the part of a developer.

The entire system stopped responding to 1000+ users simply because one of the important transactional table was BLOCKED by a simple SELECT type of query.

In this article, we will explore the causes and remedies for blocking of tables / rows.

What is blocking?
Blocking simply means a state where one query is waiting for another query to release a resource which it has locked. When the running query finishes execution, the locked resource is released. The waiting query can now lock the resource and execute.

One of the most important things to understand here is that BLOCKING IS NORMAL. SQL commands could originate from various sources like individual user connections, SQL engine processes, scheduled jobs, components, ASP pages and so on. SQL Server works on requests for each connection in the order in which they arrive. Therefore, in a site with lots of users working in parallel, it is very difficult to predict which resources will be locked in which order by which connection.

Locking of a resource while updating data, ensures that data can be updated only by one connection, and no other connection can interfere with this process.

Granularity of locking
While updating data, the entire table need not be locked. Only the relevant part of the table needs to be blocked. The idea is to lock the smallest possible area of a table and to lock it for the shortest possible time. Various regions of tables which can be locked are:

Locking granularity is decided at execution time by query optimiser. The decision is taken depending upon very sophisticated analysis of the query. The optimiser calculates the number and distribution of rows which is required to be locked. Based on these parameters, the cheapest (with least overhead) type of lock is chosen. Even after initially deciding and using a particular type of lock, the query optimizer keeps analysing the actual number of locks required to lock the affected entity (a page, for example).

There is an internal setting called Escalation Threshold. This is the total number of locks which are permitted for the currently selected granularity. If the actual number of locks exceeds the threshold, the query optimiser automatically chooses the next higher (broader) granularity available. This way, the overheads are reduced, query execution time is improved and the most appropriate locking level is achieved.

Lock timeout
When a resource is locked, other transactions (connections) can not update it. Others have to wait, in a queue, for the lock to be released. If the original transaction takes too long, other waiting connections (and their end users) will perceive a degradation in performance and may even complain that the machine / application appears to be hanging.

The only way to know whether a resource is currently locked is to try to lock it.

There is no islocked() function! Now, if you want to know if something is locked, and you try to lock it, you are taking the risk of getting blocked, if the original lock is not released quickly. This waiting is ENDLESS. Because the default value of Lock Timeout is -1, that means no timeout period—wait endlessly.

If you put some value for lock timeout property, the blocked connection will be released after that period elapses. The blocked connection is now free to do something else.

How will the application which is handling the connection know that the transaction is actually aborted due to lock timeout?

To inform the client application about a forcible cancellation of blocking, a special error number is returned.

1222 “Lock request time-out period exceeded”

This error needs to be trapped and acted upon. The current transaction will NOT be aborted when locking timeout occurs. The application developer needs to decide whether to wait, retry or rollback. Therefore, blindly putting a timeout value will not necessarily solve all blocking problems. It may even introduce more problems due to unhandled errors and unpredictable transaction management.

Reasons for blocking
Something which leads to a critical, often used resource being locked for a long time will lead to blocking of other connections.

The underlying cause is immaterial. What is important is the blockade of a shared, critical resource. Here are some common causes:
1. Long running queries
2. Canceling uncommitted queries
3. Partial processing of query results by application
4. Distributed deadlock

When do you call a transaction LONG RUNNING?
Unfortunately, there is no fixed time limit to call a transaction long running. The first thing to understand is that whether a transaction is long running depends upon whether it was expected or unexpected. It has nothing to do with the actual time taken to execute the query.

Let me explain. Consider a monthly batch process which churns lots of data and generates complex reports. This process is scheduled to run at midnight of the last day of every month. This process contains a very complex query which takes 30 minutes to execute. The fact that it takes 30 minutes is known and is acceptable from a business perspective. This query is expected to take that much time. Therefore, it is NOT a long running query.

Now let us take another scenario. Consider a Web-based application supporting live transactions from 1,000 connected customers. Customer entered data is updated in the database using a stored procedure. This usually takes 100 milliseconds. This performance is adequate enough to support the peak load. Now, if this query takes 10 seconds, IT IS a long running query. Because, it was not expected to take much time.

Very often the query text is generated at runtime by concatenating multiple string pieces. This type of requirement is often present when the query content changes based on various parameters which are known or controllable only at run-time, not at design-time. This type of query generation is potentially dangerous. Generally the query should generate some SQL statement like:

Select <some fields> from a , b where <join condition> <filter condition>

Now, under some rare conditions, which were not thought of during testing (do we think during testing ?!!) it could generate a statement like:

Select <some fields> from a , b

This still does not sound dangerous enough. If earlier query resulted in 10 records, how many records do you think will the second query return? It depends on the total number of records in both tables. So if both a and b tables had substantial number of rows (say 1 lac and 10000) the returned number of rows would be 1000000000.
That would be a realllly long running query!

Prevention and cure

Set Query Timeout

Put a query timeout to avoid long running queries. This should be done at least for queries which are likely to be long running.

Set lock timeout

This is also a good setting which prevents endless waiting for locks to be released. Remember to handle the error and rollback the transaction or retry.

Keep transactions as short as possible

This is easier said than done. Testing with data which is of real-life-like peak size and connection load is very important. Most often testing is done in development scenarios with few rows and no load. This almost always leads to query optimization using TABLE level locks. When the size grows, different type of locks may be required. There is no other way of predicting behavior and performance. You must do stress test with peak data and user load.

‘Use Query Governer....’ setting

Set ‘Query governer timelimit’ in SQL Server

This prevents ANY query which exceeds the execution cost mentioned from running at all. The query timeout setting kills the query AFTER it has been detected as long running. Query governor setting simply DOES NOT EXECUTE the query. It detects the problem BEFORE it occurs.

This is a very useful setting. However, please ensure that the value entered is tested and proven to high enough based upon the typical cost of longest possible queries. Otherwise, genuine queries may also be prevented from running due to this setting. This setting can be modified using the Server Properties dialog in Enterprise Manager.

Read more about the usage before you use the setting on production server.

Processing all results

If the application does not process all results returning from a query, the locks are retained. Ensure that all query results are processed completely.

Rollback explicitly after query cancellation.

This is a commonly unknown fact. Query cancellation is done through ODBC. However it DOES NOT implicitly mean rollback. You need to issue a ROLLBACK in addition to canceling the query.

Find out the right setting of Isolation level

The higher the isolation level, the more stringent is the locking. Therefore, understand what isolation levels are available and decide the most appropriate one for your application. In addition, understand the impact of the isolation level on the locking behavior for various types of SQL commands by ACTUALLY TRYING OUT EACH TYPE OF COMMAND AND CHECKING FOR LOCKS PRODUCED.

Never have user activity between BEGIN TRAN and COMMIT

Begin Tran and Commit must be issued after all information required to process transaction is available. This may sound obvious and too simple a thing to mention here. However, I have myself seen BEGIN TRAN being issued on a dynamic recordset in the FORM - LOAD Event and COMMIT in the FORM - UNLOAD event!

So don’t take anything for granted. Check it out.

Tip of the week for the IT Industry

Please press F1

Sounds weird? Not really. All the content I have written in this and the current article did not come from some costly book or a secret web site or some great genius. It was , sorry, IT IS available in Books Online of the SQL Server product!

It is really pitiable that most of the so called great IT Professionals want more salaries, better perks, higher positions, overseas opportunities but forget to press F1. Expecting the best in the world is fine. But for that you need to be (or at least attempt to be) the best. Fortunately, being the best in IT is simpler than you thought. Just press F1 often enough and utilize that knowledge effectively in your work. That’s it. Sorry That’s IT.

Feedback
Your feedback, suggestions, requests for covering specific topics or issues are welcome. Please send feedback to techforum@mediline.co.in

About the Author Dr Nitin Paranjape is the Chairman and MD of Maestros (Mediline). He is a consultant with many organisations, covering appropriate technology utilisation, business application of relevant technology, application architecture and audit as well as knowledge transfer. He has authored more than 650 articles on various technology-related subjects. He can be contacted at nitin@mediline.co.in
<Back to top>


© Copyright 2000: Indian Express Group (Mumbai, India). All rights reserved throughout the world. This entire site is compiled in
Mumbai by The Business Publications Division of the Indian Express Group of Newspapers.
Please contact our Webmaster for any queries on this site.