Issue dated - 05th January 2004

-


Previous Issues

CURRENT ISSUE
NEWS ANALYSIS
INDIA NEWS
COLUMNS
TECH FORUM

THE C# COLUMN

BETWEEN THE BYTES
TECHNOLOGY
SPECIALS <NEW>
Symantec Report
Security Headquarters
JobsDB
MINDPRINTS
HMA BANKBIZ
EC SERVICES
ARCHIVES/SEARCH
IT APPOINTMENTS
Openings At Jobstreet.com
WRITE TO US
SUBSCRIBE/RENEW
CUSTOMER SERVICE
ADVERTISE
ABOUT US

 Network Sites
  IT People
  Network Magazine
  Business Traveller
  Exp. Hotelier & Caterer
  Exp. Travel & Tourism
  Exp. Pharma Pulse
  Exp. Healthcare Mgmt.
  Express Textile
 Group Sites
  ExpressIndia
  Indian Express
  Financial Express

 
Front Page > TechSpace > Story Print this Page|  Email this page

Tech form

Data querying without query language

I wrote the article “How to eliminate duplicate records in Excel” some time ago (issue dated 20 Nov 2003. Article available at http://www.expresscomputeronline.com/20031020/techspace01.shtml)

I received very good response about this article from readers. However, I also received a mail from a reader, Mike Musterd, who showed me a much simpler way of achieving same result–with just four clicks in Excel– no ODBC, no SQL DISTINCT clause. Just pure Excel.

As I have always been harping on – you never stop learning. I really am thankful to Mike for highlighting this simple but powerful feature.

I started exploring the feature further. Of course it eliminated duplicates with few clicks. But I continued studying it further—and then at the end of an hour, I had explored a feature that is available in just a small dialog with three text boxes, two option buttons and one checkbox. But the power and sophisticated functionality it packs is really mind blowing.

So here is the great feature for all of you to notice, explore, use and exploit.

Recap of the previous article

For those of you who have not read the previous article, here is a quick recap.

1. Consider you have some data entered / dumped in Excel

2. You want to find out and eliminate duplicate rows, if any.

3. ODBC provides a driver for Excel data.

4. The Excel file is considered as the Database and named ranges are considered as tables.

5. So name a range, create an ODBC DSN for the Excel sheet first.

6. Now, use the New database query option of Excel to get all data from the named range table which could contain duplicates.

7. Now edit the query and use the DISTINCT clause with the SELECT command.

8. That’s it.

That is a fair number of steps.

Problems with this approach

This approach is perfectly usable, technically correct and innovative. The only problem with it is that a regular user would not be in a position to take all these steps easily.

Editing queries and ODBC DSN creation is not something that appeals to simple users. So now we are ready to see the 4-click solution to achieve the same result.

The simpler, yet more sophisticated feature

Let us consider the same base data we had used last time.

Now follow these steps.

1. Click anywhere within the data. No need to highlight the block of data

2. Open Data menu – Filter – Advanced Filter. The dialog appears with the List Range of data block is automatically selected.

3. Now select the check box ‘Unique Records Only’. Observe that the option “Filter the list, in-place’ is already selected by default.

4. Now click OK.

That’s it. Just 4 steps instead of many more!

This is almost magical. Imagine the amount of user focus and clever coding which goes into putting such features in products. That is why I admire the way Microsoft has inculcated the ability to add finesse to day-to-day needs – year on year – version by version.

Obviously, this functionality is within the reach of even novice users.

Liked it? Use it next time you want to filter data and eliminate duplicates. More importantly, show this to your colleagures / friends whom you think can benefit from it.

Even more importantly, try to explore features yourself while you are working. It is a really rewarding experience.

The magic has just begun

After showing the simpler method, the article snippet should have ended. But it did not. Look at the dialog again carefully.

We just used the ‘List range’ in our example. There are other textboxes called ‘Criteria Range’ and ‘Copy to’.

Copy To option

‘Copy to’ is the simpler one to understand. When you select the Copy to another location option, the Copy To textbox is enabled. It simply allows you to put the filter results into another empty area of Excel worksheet rather than doing the filtering in-place. This is to be used when you want the original data to be retained as it is.

One important thought: When you think you have completely understood the feature, stop and explore further. In majority of cases you will find something more! And soon you will stop convincing yourself that you have completely understood a feature! When you think you have understood something fully, you stop thinking further. And effective technology utilisation is all about thinking further. So don’t fall into the trap called – ‘Oh, I know this now’

Why so much gyan here suddenly? Because, I found another feature related to the apparently simple Copy To feature.

Consider that you have eliminated the duplicated rows and want to make a copy of the unique rows. That’s fine. But you don’t want all the columns from the source data. You want only few of them. Let’s say only the Qty column in our case.

How do you manage that? Copying all and then deleting is valid but cumbersome.

Now again someone has thought of this need and created a nice feature for this. Follow these steps:

1. Before you use the advanced filter, identify the area where you want to copy the filtered data.

2. Now, just put the labels of the columns that you want in the final copy in that region.

3. Click somewhere in the original data and choose Advanced Filter.

4. Choose the range for Copy To in such a way that all the column name header(s) you wrote in the target area are selected.

5. Now choose OK in the Advanced Filter dialog.

6. That’s it. The data will be filtered and only the specified columns will be copied to the target range.

Is that not clever?

Soon you will start realising that all this is a very simplistic but highly effective implementation of the SELECT syntax. But more about this later.

What is wrong with In-place filter and AutoFilter?

If you use the Filter Data – In Place option, the data is simply filtered by hiding the unwanted rows – same way as AutoFilter does it.

This has some disadvantages. Firstly, when you hide rows of this block of data, other unrelated data on the same row elsewhere on the worksheet is also hidden. This may not be desirable.

In fact, the biggest problem with a spreadsheet is that a regular spreadsheet row does not mean anything. It is the blocks of data that are important. Since you can hide a row or column at will, it is important to ensure that you enter data blocks in such a way that they do not have any common rows or columns.

This is a simple conclusion based upon common sense. But often, while designing or entering data in spreadsheets, we often forget to take this precaution.

Ideally, multiple data blocks should be entered as follows in Excel. This is a representative diagram. But it will convey the idea.

Each color represents different blocks of data. Of course there will be formulas, totals and all that—but each with non-overlapping row and column blocks.

By arranging information in this manner, you never have to use the ‘Shift cells up/down’ option. You can easily delete/hide rows or columns for a given data block without worrying whether it would delete some other data far away on the same row/column.

Come to think of this – it is so simple. But do you know what the real problem is? We are so busy and savvy that we never end up thinking about such simple things. And believe me, this thought may be simple but the impact of this way of data layout management is far from simple. It is profound and highly productive.

Anyway let us come back to the Advanced Filter. Now the option left is this simple looking textbox called Criteria Range.

However, the functionality that this tiny text box encompasses is so much that I will cover it in the next article.

About the Author:Dr Nitin Paranjape is the Chairman and MD of Maestros (Mediline). He is a consultant with many organisations, covering appropriate technology utilisation, business application of relevant technology, application architecture and audit as well as knowledge transfer. He has authored more than 650 articles on various technology-related subjects. He can be contacted at nitin@mediline.co.in

 

<Back to top>


© Copyright 2003: Indian Express Group (Mumbai, India). All rights reserved throughout the world. This entire site is compiled in
Mumbai by The Business Publications Division of the Indian Express Group of Newspapers.
Please contact our Webmaster for any queries on this site.