How important is clustered index for a table

Advertisements

I am not sure, how to name this topic. May be, start by following questions.

How important and effective is clustered index for a table?

How to solve the heap fragmentation?

Why my query is running slow most of the time for this particular table?

Are the primary key and clustered index are same?

There are developers still they understand that, we must need a primary key to create a clustered index. No, that’s not. Both are different used for different purpose.

The primary key is to enforce the business logic. I.e. enforce the uniqueness. By default it will create a clustered index, if we are not changing manually to non-clustered.

You can create a clustered index without a primary key. In two ways, create [unique] clustered index. The unique is an optional keyword.

A heap table is “without a clustered index is called heap”

Coming into the picture.

Recently a developer came to me for an application slowness. The query is in more environment (QA, DEV & Prod) each run differently. DBAs know where to start analysis the query.

I cannot show all the codes. But it’s a simple search character code like ‘’ with more conditions.

The table has massive fragmentation. Since, it’s a heap, but small. A 50K data table, with a simple query runs more than 5 minutes to return the data.

After a de-fragmentation only of all non-clustered indexes, it still runs 1 minute to return the data.
After a clustered index creation, it runs boom, less than a second. This is a best practice and the table has clustered index will be good and it enforce the logical ordering and every record in a non-clustered index has to link to the clustering index key.

Think about the heap it an unordered data, so it needs travel all the pages to get data. Probably, a full scan. But clustered index will be ordered logically with matching of index columns and it’s easy to get the data by clustering key.

From 2008 onwards, the same result we can get by rebuilding the table ALTER TABLE REBUILD. But internally it changes the heap records and ask all non-clustered indexes to rebuild it to match with rebuilt heap, for a big table it is worse.

SELECT OBJECT_NAME(F.OBJECT_ID) obj,i.name ind,
f.avg_fragmentation_in_percent,
f.page_count,table_schema,
avg_page_space_used_in_percent, forwarded_record_count
FROM SYS.DM_DB_INDEX_PHYSICAL_STATS (DB_ID(‘Regression’),NULL,NULL,NULL,null) F
JOIN SYS.INDEXES I
ON(F.OBJECT_ID=I.OBJECT_ID)AND i.index_id=f.index_id
JOIN INFORMATION_SCHEMA.TABLES S
ON (s.table_name=OBJECT_NAME(F.OBJECT_ID))
WHERE f.avg_fragmentation_in_percent>15 and f.page_count>=1000
AND f.database_id=DB_ID(‘Regression’)
AND OBJECTPROPERTY(I.OBJECT_ID,’ISSYSTEMTABLE’)=0
order by f.avg_fragmentation_in_percent desc

Before clustered index creation: IX_mtProduct_ProductID & heap has 2372493 pages

Table	Index	Fragmentation	Page count
mtProduct	IX_mtProduct_Description	0.342466	292
mtProduct	IX_mtProduct_Mat_Seg_Desc	97.4359	819
mtProduct	IX_mtProduct_MaterialType	96.81529	471
mtProduct	IX_mtProduct_MCG	98.17987	934
mtProduct	IX_mtProduct_ProdType	92.49423	866
mtProduct	IX_mtProduct_ProductFamily	96.52568	662
mtProduct	IX_mtProduct_ProductID	99.07121	646
mtProduct	IX_mtProduct_ProductLine	98.75	880
mtProduct	NULL	11.40322	2372493

After clustered index creation: CX_mtProduct_ProductID Clustered index has 1954 pages

Table	Index	Fragmentation	Page count
mtProduct	IX_mtProduct_Description	0.869565217	345
mtProduct	IX_mtProduct_Mat_Seg_Desc	1.604278075	187
mtProduct	IX_mtProduct_MaterialType	0.591715976	169
mtProduct	IX_mtProduct_MCG	1.754385965	171
mtProduct	IX_mtProduct_ProdType	2.580645161	310
mtProduct	IX_mtProduct_ProductFamily	1.435406699	209
mtProduct	CX_mtProduct_ProductID	0.204708291	1954
mtProduct	IX_mtProduct_ProductLine	1.369863014	219

Now, see the number pages for heap- 2372493 and clustered index – 1954 and it can change over a time, when there is a modification but, not like a heap.

It’s always good to have a clustered index to search the record effectively.

Hope now you have understood the effectiveness of the clustered index and we should create a clustered index, when there is a candidate column available or sometimes create a dumpy column for the very big table. Generally a Surrogate key – Ex: IDENTITY.

There are already many posts available to choose the cluster index.

http://www.sqlskills.com/blogs/kimberly/ever-increasing-clustering-key-the-clustered-index-debate-again/

https://www.simple-talk.com/sql/learn-sql-server/effective-clustered-indexes/

How important is clustered index for a table

Muthukkumaran

One Comment

Pingback:

Leave a Reply Cancel reply