|
I
have been involved in the database optimisation and tuning
of a large number of applications. One of the causes of slow
performance, especially for updates, is the presence of a
large number of complex indexes.
If there are too many indexes on a particular table, the SELECT
query performance may improve. However, during update of data
(insert, update, and delete) additional work needs to be performed
for updating the index entries. Thus, the amount of time required
updating data increases substantially based upon the number
and complexity (number of columns) in an index.
One peculiar problem that I have often observed is the presence
of duplicate indexes. Having duplicate (or overlapping) indexes
is pure wastage of the processor time and the additional work
that needs to be put in by the SQL Server database engine.
Whats a duplicate index?
On the face of it, you may think it would be impossible to
have duplicate indexes on a given table. However, in reality,
it is very much possible. Let us first understand what a duplicate
index is.
Consider a table containing 8 columns A, B, C, D, E, F, G,
H. The following indexes have been created on the table:
If you notice, the index on B is used when you are searching/filtering
using the column B. The index BCG starts with column B. Therefore,
queries which need to use index for locating B can still use
index BCG. The same is the case with AB and ABC.
Thus, indexes B and AB are unnecessary and redundant.
Having indexes that are overlapping is poor data design. These
indexes have the following adverse effects:
-
The storage space required increases.
-
The update performance is slowed down.
-
Backup time increases.
-
Query tuning becomes more cumbersome.
-
Blindly creating the same indexes as original tables while
migrating databases from xBase to SQL Server.
How do we end up creating such duplicate indexes? Nobody wants
to purposely design a poorly performing database. The reason
why this happens is due to ad hoc database level changes.
While designing the original database structure, appropriate
indexes would have been created. Later, as a part of performance
tuning activity, someone else may add more indexes that are
more complex (contain more columns). This may make some existing
indexes redundant.
Another common reason is using an ad hoc combination of manual
creation of indexes and the index tuning wizard. If you are
using some CASE tool that has been configured for creating
certain types of indexes on its own, the situation becomes
more complicated.
Detection of duplication
How do we detect the presence of duplicate indexes? The answer
is very simple. Just open each table from enterprise manager
by choosing the Manage indexes option and look for duplication/overlap
manually. This is a good idea if you have few tables to work
on.
However, most production systems contain literally hundreds
of tables and indexes. In most organisations, this job would
be assigned to the Database administrator (DBA). And most
DBAs would do this manually, however cumbersome and time consuming
the task may be. Many administrators still have some kind
of mental block, which induces them to distinguish themselves
from developers. They will typically shy away from writing
any code, even if it could reduce their work.
I have always advocated that this mental divide between administrators
and developers should be abolished. One nice example of how
a typical administrative task can greatly benefit by the use
of some smart coding is the solution to the above problem.
I have written a small tool that programmatically analyses
all tables and their indexes and generates a list of duplicate
indexes. The tool is written using apparently unrelated tools.
But I will tell you why these tools were used and why the
usage is appropriate.
The utility
This utility (let us call it Index Redundancy Detector - IRD)
is written in two steps.
-
In this step a delimited text file containing all the tables,
indexes and columns in each index is created.
- In
this step, this information is analysed to detect duplication.
Now comes the real fun. The code for Step 1 is written in
MS Word and the code for Step 2 is written in Visual FoxPro
.NET! Why on earth would anyone want to use MS Word (VBA)?
Well, because it is the simplest and fastest way of writing
standard VB code that does not really need to compile into
an EXE or DLL. Of course it could have been written in any
other language. But I have this code ready. You can adapt
it to your taste.
Why Visual FoxPro? You will see exactly why a little later.
At this stage I will only say that the situation demanded
complex string handling and sorting functionality. This was
possible with minimal amount of coding in VFP. Of course,
both the above steps can be coded in any other suitable tool.
I am giving below the code I used. You are free to understand
the concept and implement it in language of your choice.
Step 1: Making a list of index columns
What we need is a simple data structure that contains the
following information:
| Field |
Information |
| 1 |
Tablename |
| 2 |
Index
Name |
| 3 |
Index
columns (comma delimited list) |
I preferred to download this list using CSV format for ease
of upload into Step 2 (VFP)
I could have created this data using a complicated SQL Stored
Procedure. However, because it was a well-defined, iterative
task, I preferred the DMO object library, which is available
with SQL Server 2000. DMO means Distributed Management Objects.
In brief, it is a very rich object model that represents most
of the entities and actions which you normally handle using
standard SQL Syntax.
For example, in SQL Syntax you would create a new table by
using the command:
Create table <field definitions>
In case of DMO you would do so by using the object hierarchy:
Database.tables.add (<object to add>)
Here
tables is a collection within the database object.
Database is an item in the databases collection which in turn
is a child of SQLServer object itself. In order to highlight
this hierarchical object model of DMO, here is how a table
is represented.
Here is the code which dumps the index details to a separated
text file where individual fields are separated with the *
character. Why not comma delimited? Because the commas between
the individual column names for an index would clash with
commas which delimit the fields. To make the SQL DMO object
model available while coding, you need to set a reference
to the Microsoft SQL DMO Object Library from Tools
- References in MS Word VBA editor.
Sub
DumpIndexInfo
Dim fnum
fnum = FreeFile
Open c:\sample.txt For Output As #fnum
Dim str As String
Dim sq As New SQLDMO.SQLServer
Dim db As New SQLDMO.Database
Dim tb As SQLDMO.Table
Dim ind As SQLDMO.Index
Dim ol As SQLDMO.SQLObjectList
Dim co As SQLDMO.Column
sq.Connect <databasename>, <userid>, <password>
Set db = sq.Databases(<databasename>)
For Each tb In db.Tables
If tb.Indexes.Count > 1 Then
For Each ind In tb.Indexes
Set ol = ind.ListIndexedColumns
str =
For Each co In ol
str = str + co.Name + ,
Next column
str = Left(str, Len(str) - 1) remove last comma
Print #fnum, tb.Name & * & ind.Name &
* & str
Debug.Print tb.Name & * & ind.Name
& * & str
Next index
End If
Next table
Close #fnum
End Sub
This code produces a file called sample.txt containing information
about all tables, indexes and the list of columns for each
index.
Please note, I have not excluded system tables here. However,
you can easily put that filter.
Here is a typical record for a sample table say, invoice
invoice*myindex1*sampleid,date,transactiontype
invoice*myindex3*sampleid
invoice*myindex2*locationid,sampleid
(In this sample, the index myindex3 on sampleid is redundant.)
Step
2: Detecting redundant indexes
Now that we have the * delimited text file, we need to make
some sense out of the data in it and detect for
duplication programmatically. At this stage, believe me, VFP
offers great benefits. Let us see how.
|
Fieldname |
Datatype |
Comment |
|
TBL |
CHAR(50) |
Name of the table |
|
IND |
CHAR(50) |
Name of the index |
|
CL |
CHAR(254) Max column width permissible in VFP |
Comma delimited list of the Cindex columns |
|
CLCNT |
INTEGER |
Number of columns in the index |
|
SCANNED |
LOGICAL |
Flag to indicate that this record has been scanned for
duplicate check |
|
DUPLICATE |
LOGICAL |
Flag to indicate that this record (the index named in
this record) is duplicate |
To
start with we need to put the text file into a proper VFP
table for better data handling. To do so, we first
need to create an empty VFP DBF file containing the following
structure. Let us call this table WORK table.
Now
we need to upload information from the text file to this table.
The command is simple.
CLOSE TABLES
USE WORK
APPEND FROM c:\sample.txt TYPE DELIMITED WITH CHARACTER *
We need to count the number of columns in each index and put
the value in the CLCNT field. This field will be used to sort
the indexes in ascending order in order to check for duplicates.
The command is simple:
REPLACE ALL clcnt WITH OCCURS( ,,
cl)
Now,
we need to create some index on this table.
INDEX ON tbl + STR(clcnt) TAG cntbased
SET ORDER TO cntbased in WORK
This would sort the table first on table name and within the
table name on number of columns in the index.
In order to loop through one table at a time, we need to create
a simple table containing one record for each table name.
This is a simple DISTINCT query.
SELECT dist tbl FROM WORK INTO TABLE master
Now, we loop through the master table, filter the WORK table
on each table, look at the index columns and detect duplicates.
Now you have the MASTER and WORK tables open. We are now ready
to do the actual duplicate check. Here is the requisite code.
* CHECKDUPL.PRG
SELECT master
SCAN
nam = ALLTRIM(tbl)
SELECT work
SET FILTER TO ALLTRIM(tbl)== nam
GO TOP
SCAN FOR (NOT scanned) AND (not duplicate)
REPLACE scanned WITH .t.
str = ALLTRIM(cl)
strlen = LEN(str)
recno = RECNO()
COUNT FOR LEFT(cl, strlen) == str TO cnt rest
if cnt > 1
GOTO recno
REPLACE duplicate WITH .t.
Endif
GO top
ENDSCAN
ENDSCAN
That is all there is to it. Now all records marked with DUPLICATE
= .T. are the redundant indexes.
If you want further sophistication, we can even create a simple
script file that contains all the required DROP INDEX commands!
Here is the VFP code required to do this. This code could
have been inserted in the duplicate checking code itself.
However, for clarity, it is given separately below.
SELECT WORK
SET TEXTMERGE TO sample.sql
SET TEXTMERGE ON
SCAN FOR DUPLICATE = .T.
\DROP INDEX <<ALLTRIM (TBL) + . + ALLTRIM
( IND )>>
* create the SQL command DROP INDEX <tablename.indexname>
ENDSCAN
SET TEXTMERGE TO
Now the file sample.sql contains the correct script for deleting
the duplicate indexes.
Lessons to learn
-
Be careful with index creation. Such duplication could lead
to a lot of performance degradation and yet go unnoticed.
-
Administrative tasks can often be simplified using smart
programming.
-
Remove the mental barrier that separates developers and
administrators.
-
Use the appropriate tool for the appropriate purpose. Here
is a summary:
| Tool
/ Technology |
Why
is this appropriate? |
| DMO |
Very
easy to navigate object structure and obtain desired information
about indexes. Would have been very cumbersome using traditional
SQL Syntax |
| VFP |
As
you can now appreciate, VFP offers very powerful string
handling features with very little coding. I didnt
use it just because I like it. You can try to write this
logic in other languages of your choice. However, the
code will definitely be lengthier as well as more complex
in other cases. |
| Script |
Final
output was given in SAMPLE.SQL. It was very easy to delete
the indexes found using DMO itself. However, an SQL Script
file makes more sense here because this can be run on
remote sites / multiple locations. DMO would not be feasible
in such scenarios. Moreover, a script file is run explicitly.
DMO would run it within the code, which may be undesirable
in production systems. |
Feedback - Your feedback, suggestions, requests for covering
specific topics or issues are welcome. Please send feedback
to techforum@mediline.co.in
 |
About
the Author Dr Nitin Paranjape is the Chairman and MD of
Maestros (Mediline). He is a consultant with many organisations,
covering appropriate technology utilisation, business
application of relevant technology, application architecture
and audit as well as knowledge transfer. He has authored
more than 650 articles on various technology-related subjects.
He can be contacted at nitin@mediline.co.in |
|