Indexes

How important is clustered index for a table

Advertisements

 

I am not sure, how to name this topic. May be, start by following questions.

 

How important and effective is clustered index for a table?

How to solve the heap fragmentation?

Why my query is running slow most of the time for this particular table?

Are the primary key and clustered index are same?

 

There are developers still they understand that, we must need a primary key to create a clustered index. No, that’s not. Both are different used for different purpose.

The primary key is to enforce the business logic. I.e. enforce the uniqueness. By default it will create a clustered index, if we are not changing manually to non-clustered.

You can create a clustered index without a primary key. In two ways, create [unique] clustered index. The unique is an optional keyword.

A heap table is “without a clustered index is called heap”

 

Coming into the picture.

Recently a developer came to me for an application slowness. The query is in more environment (QA, DEV & Prod) each run differently. DBAs know where to start analysis the query.

 

I cannot show all the codes. But it’s a simple search character code like ‘’ with more conditions.

The table has massive fragmentation. Since, it’s a heap, but small. A 50K data table, with a simple query runs more than 5 minutes to return the data.

  1. After a de-fragmentation only of all non-clustered indexes, it still runs 1 minute to return the data.

  2. After a clustered index creation, it runs boom, less than a second. This is a best practice and the table has clustered index will be good and it enforce the logical ordering and every record in a non-clustered index has to link to the clustering index key.

Think about the heap it an unordered data, so it needs travel all the pages to get data. Probably, a full scan. But clustered index will be ordered logically with matching of index columns and it’s easy to get the data by clustering key.

From 2008 onwards, the same result we can get by rebuilding the table ALTER TABLE REBUILD. But internally it changes the heap records and ask all non-clustered indexes to rebuild it to match with rebuilt heap, for a big table it is worse.

SELECT OBJECT_NAME(F.OBJECT_ID) obj,i.name ind,
f.avg_fragmentation_in_percent,
f.page_count,table_schema,
avg_page_space_used_in_percent, forwarded_record_count
FROM SYS.DM_DB_INDEX_PHYSICAL_STATS (DB_ID(‘Regression’),NULL,NULL,NULL,null) F
JOIN SYS.INDEXES I
ON(F.OBJECT_ID=I.OBJECT_ID)AND i.index_id=f.index_id
JOIN INFORMATION_SCHEMA.TABLES S
ON (s.table_name=OBJECT_NAME(F.OBJECT_ID))
WHERE f.avg_fragmentation_in_percent>15 and f.page_count>=1000
AND f.database_id=DB_ID(‘Regression’)
AND OBJECTPROPERTY(I.OBJECT_ID,’ISSYSTEMTABLE’)=0
order by f.avg_fragmentation_in_percent desc

 

Before clustered index creation: IX_mtProduct_ProductID & heap has 2372493 pages

Table Index Fragmentation Page count
mtProduct IX_mtProduct_Description

0.342466

292

mtProduct IX_mtProduct_Mat_Seg_Desc

97.4359

819

mtProduct IX_mtProduct_MaterialType

96.81529

471

mtProduct IX_mtProduct_MCG

98.17987

934

mtProduct IX_mtProduct_ProdType

92.49423

866

mtProduct IX_mtProduct_ProductFamily

96.52568

662

mtProduct IX_mtProduct_ProductID

99.07121

646

mtProduct IX_mtProduct_ProductLine

98.75

880

mtProduct NULL

11.40322

2372493

 

After clustered index creation: CX_mtProduct_ProductID Clustered index has 1954 pages

Table Index Fragmentation Page count
mtProduct IX_mtProduct_Description

0.869565217

345

mtProduct IX_mtProduct_Mat_Seg_Desc

1.604278075

187

mtProduct IX_mtProduct_MaterialType

0.591715976

169

mtProduct IX_mtProduct_MCG

1.754385965

171

mtProduct IX_mtProduct_ProdType

2.580645161

310

mtProduct IX_mtProduct_ProductFamily

1.435406699

209

mtProduct CX_mtProduct_ProductID

0.204708291

1954

mtProduct IX_mtProduct_ProductLine

1.369863014

219

 

Now, see the number pages for heap- 2372493 and clustered index – 1954 and it can change over a time, when there is a modification but, not like a heap.

It’s always good to have a clustered index to search the record effectively.

Hope now you have understood the effectiveness of the clustered index and we should create a clustered index, when there is a candidate column available or sometimes create a dumpy column for the very big table. Generally a Surrogate key – Ex: IDENTITY.

 

There are already many posts available to choose the cluster index.

http://www.sqlskills.com/blogs/kimberly/ever-increasing-clustering-key-the-clustered-index-debate-again/

https://www.simple-talk.com/sql/learn-sql-server/effective-clustered-indexes/

 

 

 

 

 

 

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

+ 70 = 79