Issue dated - 9th December 2002

-


CURRENT ISSUE
INDIA NEWS
INDIA TRENDS
STOCK FILE
OPINION
NEWS ANALYSIS
E-BUSINESS
COMPANY WATCH
INDIA COMPUTES
TECHNOLOGY
TECHSPACE
REVIEWS
PRODUCTS
EVENTS
COLUMNS <NEW>
TECH FORUM

THE C# COLUMN

BETWEEN THE BYTES
TECHNOLOGY
EC SERVICES
ARCHIVES/SEARCH
IT APPOINTMENTS
WRITE TO US
SUBSCRIBE/RENEW
CUSTOMER SERVICE
ADVERTISE
ABOUT US

 Network Sites
  IT People
  Network Magazine
  Business Traveller
  Exp. Hotelier & Caterer
  Exp. Travel & Tourism
  Exp. Backwaters
  Exp. Pharma Pulse
  Exp. Healthcare Mgmt.
  Express Textile
 Group Sites
  ExpressIndia
  Indian Express
  Financial Express

 
Front Page > TechSpace > Story Print this Page|  Email this page

Utility for detecting duplicate indexes in SQL Server

Tech Forum - Dr Nitin Paranjape

I have been involved in the database optimisation and tuning of a large number of applications. One of the causes of slow performance, especially for updates, is the presence of a large number of complex indexes.

If there are too many indexes on a particular table, the SELECT query performance may improve. However, during update of data (insert, update, and delete) additional work needs to be performed for updating the index entries. Thus, the amount of time required updating data increases substantially based upon the number and complexity (number of columns) in an index.

One peculiar problem that I have often observed is the presence of duplicate indexes. Having duplicate (or overlapping) indexes is pure wastage of the processor time and the additional work that needs to be put in by the SQL Server database engine.

What’s a duplicate index?
On the face of it, you may think it would be impossible to have duplicate indexes on a given table. However, in reality, it is very much possible. Let us first understand what a duplicate index is.

Consider a table containing 8 columns A, B, C, D, E, F, G, H. The following indexes have been created on the table:

  • B
  • AB
  • BCG
  • ABC

If you notice, the index on B is used when you are searching/filtering using the column B. The index BCG starts with column B. Therefore, queries which need to use index for locating B can still use index BCG. The same is the case with AB and ABC.

Thus, indexes B and AB are unnecessary and redundant.

Having indexes that are overlapping is poor data design. These indexes have the following adverse effects:

  • The storage space required increases.
  • The update performance is slowed down.
  • Backup time increases.
  • Query tuning becomes more cumbersome.
  • Blindly creating the same indexes as original tables while migrating databases from xBase to SQL Server.

How do we end up creating such duplicate indexes? Nobody wants to purposely design a poorly performing database. The reason why this happens is due to ad hoc database level changes. While designing the original database structure, appropriate indexes would have been created. Later, as a part of performance tuning activity, someone else may add more indexes that are more complex (contain more columns). This may make some existing indexes redundant.

Another common reason is using an ad hoc combination of manual creation of indexes and the index tuning wizard. If you are using some CASE tool that has been configured for creating certain types of indexes on its own, the situation becomes more complicated.

Detection of duplication
How do we detect the presence of duplicate indexes? The answer is very simple. Just open each table from enterprise manager by choosing the Manage indexes option and look for duplication/overlap manually. This is a good idea if you have few tables to work on.

However, most production systems contain literally hundreds of tables and indexes. In most organisations, this job would be assigned to the Database administrator (DBA). And most DBAs would do this manually, however cumbersome and time consuming the task may be. Many administrators still have some kind of mental block, which induces them to distinguish themselves from developers. They will typically shy away from writing any code, even if it could reduce their work.

I have always advocated that this mental divide between administrators and developers should be abolished. One nice example of how a typical administrative task can greatly benefit by the use of some smart coding is the solution to the above problem. I have written a small tool that programmatically analyses all tables and their indexes and generates a list of duplicate indexes. The tool is written using apparently unrelated tools. But I will tell you why these tools were used and why the usage is appropriate.

The utility
This utility (let us call it Index Redundancy Detector - IRD) is written in two steps.

  1. In this step a delimited text file containing all the tables, indexes and columns in each index is created.
  2. In this step, this information is analysed to detect duplication.

Now comes the real fun. The code for Step 1 is written in MS Word and the code for Step 2 is written in Visual FoxPro .NET! Why on earth would anyone want to use MS Word (VBA)? Well, because it is the simplest and fastest way of writing standard VB code that does not really need to compile into an EXE or DLL. Of course it could have been written in any other language. But I have this code ready. You can adapt it to your taste.

Why Visual FoxPro? You will see exactly why a little later. At this stage I will only say that the situation demanded complex string handling and sorting functionality. This was possible with minimal amount of coding in VFP. Of course, both the above steps can be coded in any other suitable tool. I am giving below the code I used. You are free to understand the concept and implement it in language of your choice.

Step 1: Making a list of index columns
What we need is a simple data structure that contains the following information:

Field Information
1 Tablename
2 Index Name
3 Index columns (comma delimited list)

I preferred to download this list using CSV format for ease of upload into Step 2 (VFP)

I could have created this data using a complicated SQL Stored Procedure. However, because it was a well-defined, iterative task, I preferred the DMO object library, which is available with SQL Server 2000. DMO means Distributed Management Objects. In brief, it is a very rich object model that represents most of the entities and actions which you normally handle using standard SQL Syntax.

For example, in SQL Syntax you would create a new table by using the command:

Create table <field definitions>

In case of DMO you would do so by using the object hierarchy:

Database.tables.add (<object to add>)

Here ‘tables’ is a collection within the database object. Database is an item in the databases collection which in turn is a child of SQLServer object itself. In order to highlight this hierarchical object model of DMO, here is how a table is represented.

Here is the code which dumps the index details to a separated text file where individual fields are separated with the * character. Why not comma delimited? Because the commas between the individual column names for an index would clash with commas which delimit the fields. To make the SQL DMO object model available while coding, you need to set a reference to the “Microsoft SQL DMO Object Library” from Tools - References in MS Word VBA editor.

Sub DumpIndexInfo
Dim fnum
fnum = FreeFile
Open “c:\sample.txt” For Output As #fnum
Dim str As String
Dim sq As New SQLDMO.SQLServer
Dim db As New SQLDMO.Database
Dim tb As SQLDMO.Table
Dim ind As SQLDMO.Index
Dim ol As SQLDMO.SQLObjectList
Dim co As SQLDMO.Column
sq.Connect <databasename>, <userid>, <password>
Set db = sq.Databases(“<databasename>”)
For Each tb In db.Tables

If tb.Indexes.Count > 1 Then
For Each ind In tb.Indexes
Set ol = ind.ListIndexedColumns
str = “”
For Each co In ol
str = str + co.Name + “,”
Next ‘column
str = Left(str, Len(str) - 1) ‘remove last comma
Print #fnum, tb.Name & “*” & ind.Name & “*” & str
‘Debug.Print tb.Name & “*” & ind.Name & “*” & str
Next ‘index
End If
Next ‘table
Close #fnum
End Sub

This code produces a file called sample.txt containing information about all tables, indexes and the list of columns for each index.

Please note, I have not excluded system tables here. However, you can easily put that filter.

Here is a typical record for a sample table say, “invoice”

invoice*myindex1*sampleid,date,transactiontype
invoice*myindex3*sampleid
invoice*myindex2*locationid,sampleid

(In this sample, the index myindex3 on sampleid is redundant.)

Step 2: Detecting redundant indexes

Now that we have the * delimited text file, we need to make some sense out of the data in it and detect for
duplication programmatically. At this stage, believe me, VFP offers great benefits. Let us see how.

Fieldname Datatype Comment
TBL CHAR(50) Name of the table
IND CHAR(50) Name of the index
CL CHAR(254) Max column width permissible in VFP Comma delimited list of the Cindex columns
CLCNT INTEGER Number of columns in the index
SCANNED LOGICAL Flag to indicate that this record has been scanned for duplicate check
DUPLICATE LOGICAL Flag to indicate that this record (the index named in this record) is duplicate

To start with we need to put the text file into a proper VFP table for better data handling. To do so, we first
need to create an empty VFP DBF file containing the following structure. Let us call this table WORK table.

Now we need to upload information from the text file to this table. The command is simple.

CLOSE TABLES
USE WORK
APPEND FROM c:\sample.txt TYPE DELIMITED WITH CHARACTER “*”

We need to count the number of columns in each index and put the value in the CLCNT field. This field will be used to sort the indexes in ascending order in order to check for duplicates. The command is simple:

REPLACE ALL clcnt WITH OCCURS( ‘,’, cl)

Now, we need to create some index on this table.

INDEX ON tbl + STR(clcnt) TAG cntbased
SET ORDER TO cntbased in WORK

This would sort the table first on table name and within the table name on number of columns in the index.
In order to loop through one table at a time, we need to create a simple table containing one record for each table name. This is a simple DISTINCT query.

SELECT dist tbl FROM WORK INTO TABLE master

Now, we loop through the master table, filter the WORK table on each table, look at the index columns and detect duplicates.

Now you have the MASTER and WORK tables open. We are now ready to do the actual duplicate check. Here is the requisite code.

* CHECKDUPL.PRG
SELECT master
SCAN
nam = ALLTRIM(tbl)
SELECT work
SET FILTER TO ALLTRIM(tbl)== nam
GO TOP
SCAN FOR (NOT scanned) AND (not duplicate)
REPLACE scanned WITH .t.
str = ALLTRIM(cl)
strlen = LEN(str)
recno = RECNO()
COUNT FOR LEFT(cl, strlen) == str TO cnt rest
if cnt > 1
GOTO recno
REPLACE duplicate WITH .t.
Endif
GO top
ENDSCAN
ENDSCAN

That is all there is to it. Now all records marked with DUPLICATE = .T. are the redundant indexes.

If you want further sophistication, we can even create a simple script file that contains all the required DROP INDEX commands!

Here is the VFP code required to do this. This code could have been inserted in the duplicate checking code itself. However, for clarity, it is given separately below.

SELECT WORK
SET TEXTMERGE TO sample.sql
SET TEXTMERGE ON
SCAN FOR DUPLICATE = .T.
\DROP INDEX <<ALLTRIM (TBL) + ‘.’ + ALLTRIM ( IND )>>
* create the SQL command “DROP INDEX <tablename.indexname>”
ENDSCAN
SET TEXTMERGE TO

Now the file sample.sql contains the correct script for deleting the duplicate indexes.

Lessons to learn

  1. Be careful with index creation. Such duplication could lead to a lot of performance degradation and yet go unnoticed.
  2. Administrative tasks can often be simplified using smart programming.
  3. Remove the mental barrier that separates developers and administrators.
  4. Use the appropriate tool for the appropriate purpose. Here is a summary:
Tool / Technology Why is this appropriate?
DMO Very easy to navigate object structure and obtain desired information about indexes. Would have been very cumbersome using traditional SQL Syntax
VFP As you can now appreciate, VFP offers very powerful string handling features with very little coding. I didn’t use it just because I like it. You can try to write this logic in other languages of your choice. However, the code will definitely be lengthier as well as more complex in other cases.
Script Final output was given in SAMPLE.SQL. It was very easy to delete the indexes found using DMO itself. However, an SQL Script file makes more sense here because this can be run on remote sites / multiple locations. DMO would not be feasible in such scenarios. Moreover, a script file is run explicitly. DMO would run it within the code, which may be undesirable in production systems.

Feedback - Your feedback, suggestions, requests for covering specific topics or issues are welcome. Please send feedback to techforum@mediline.co.in

About the Author Dr Nitin Paranjape is the Chairman and MD of Maestros (Mediline). He is a consultant with many organisations, covering appropriate technology utilisation, business application of relevant technology, application architecture and audit as well as knowledge transfer. He has authored more than 650 articles on various technology-related subjects. He can be contacted at nitin@mediline.co.in
<Back to top>


© Copyright 2000: Indian Express Group (Mumbai, India). All rights reserved throughout the world. This entire site is compiled in
Mumbai by The Business Publications Division of the Indian Express Group of Newspapers.
Please contact our Webmaster for any queries on this site.