I am not sure, how to name this topic. May be, start by following questions.
How important and effective is clustered index for a table?
How to solve the heap fragmentation?
Why my query is running slow most of the time for this particular table?
Are the primary key and clustered index are same?
There are developers still they understand that, we must need a primary key to create a clustered index. No, that’s not. Both are different used for different purpose.
The primary key is to enforce the business logic. I.e. enforce the uniqueness. By default it will create a clustered index, if we are not changing manually to non-clustered.
You can create a clustered index without a primary key. In two ways, create [unique] clustered index. The unique is an optional keyword.
A heap table is “without a clustered index is called heap”
Coming into the picture.
Recently a developer came to me for an application slowness. The query is in more environment (QA, DEV & Prod) each run differently. DBAs know where to start analysis the query.
I cannot show all the codes. But it’s a simple search character code like ‘’ with more conditions.
The table has massive fragmentation. Since, it’s a heap, but small. A 50K data table, with a simple query runs more than 5 minutes to return the data.
- After a de-fragmentation only of all non-clustered indexes, it still runs 1 minute to return the data.
-
After a clustered index creation, it runs boom, less than a second. This is a best practice and the table has clustered index will be good and it enforce the logical ordering and every record in a non-clustered index has to link to the clustering index key.
Think about the heap it an unordered data, so it needs travel all the pages to get data. Probably, a full scan. But clustered index will be ordered logically with matching of index columns and it’s easy to get the data by clustering key.
From 2008 onwards, the same result we can get by rebuilding the table ALTER TABLE REBUILD. But internally it changes the heap records and ask all non-clustered indexes to rebuild it to match with rebuilt heap, for a big table it is worse.
SELECT OBJECT_NAME(F.OBJECT_ID) obj,i.name ind,
f.avg_fragmentation_in_percent,
f.page_count,table_schema,
avg_page_space_used_in_percent, forwarded_record_count
FROM SYS.DM_DB_INDEX_PHYSICAL_STATS (DB_ID(‘Regression’),NULL,NULL,NULL,null) F
JOIN SYS.INDEXES I
ON(F.OBJECT_ID=I.OBJECT_ID)AND i.index_id=f.index_id
JOIN INFORMATION_SCHEMA.TABLES S
ON (s.table_name=OBJECT_NAME(F.OBJECT_ID))
WHERE f.avg_fragmentation_in_percent>15 and f.page_count>=1000
AND f.database_id=DB_ID(‘Regression’)
AND OBJECTPROPERTY(I.OBJECT_ID,’ISSYSTEMTABLE’)=0
order by f.avg_fragmentation_in_percent desc
Before clustered index creation: IX_mtProduct_ProductID & heap has 2372493 pages
Table | Index | Fragmentation | Page count |
mtProduct | IX_mtProduct_Description |
0.342466 |
292 |
mtProduct | IX_mtProduct_Mat_Seg_Desc |
97.4359 |
819 |
mtProduct | IX_mtProduct_MaterialType |
96.81529 |
471 |
mtProduct | IX_mtProduct_MCG |
98.17987 |
934 |
mtProduct | IX_mtProduct_ProdType |
92.49423 |
866 |
mtProduct | IX_mtProduct_ProductFamily |
96.52568 |
662 |
mtProduct | IX_mtProduct_ProductID |
99.07121 |
646 |
mtProduct | IX_mtProduct_ProductLine |
98.75 |
880 |
mtProduct | NULL |
11.40322 |
2372493 |
After clustered index creation: CX_mtProduct_ProductID Clustered index has 1954 pages
Table | Index | Fragmentation | Page count |
mtProduct | IX_mtProduct_Description |
0.869565217 |
345 |
mtProduct | IX_mtProduct_Mat_Seg_Desc |
1.604278075 |
187 |
mtProduct | IX_mtProduct_MaterialType |
0.591715976 |
169 |
mtProduct | IX_mtProduct_MCG |
1.754385965 |
171 |
mtProduct | IX_mtProduct_ProdType |
2.580645161 |
310 |
mtProduct | IX_mtProduct_ProductFamily |
1.435406699 |
209 |
mtProduct | CX_mtProduct_ProductID |
0.204708291 |
1954 |
mtProduct | IX_mtProduct_ProductLine |
1.369863014 |
219 |
Now, see the number pages for heap- 2372493 and clustered index – 1954 and it can change over a time, when there is a modification but, not like a heap.
It’s always good to have a clustered index to search the record effectively.
Hope now you have understood the effectiveness of the clustered index and we should create a clustered index, when there is a candidate column available or sometimes create a dumpy column for the very big table. Generally a Surrogate key – Ex: IDENTITY.
There are already many posts available to choose the cluster index.
https://www.simple-talk.com/sql/learn-sql-server/effective-clustered-indexes/
One Comment
Pingback: