DBA,  Performance

Use of SQL server statistics and how to make a query to use them

Advertisements

Statistics are very important in the query optimization. This post will give you some basic idea of stats and how they are created and used. This post is to link with my performance tuning parts.

What is statistics:

Statistics store information about the distribution of data in the column/columns for a table. It has Header, Densities and Histogram. The Histogram stores only 200 steps, and for very large tables VLT, it is hard to hold much data only can hold up-to 200 steps, you have to create manually for the VLT. It is a lightweight and use very low storage.

The only way to know the underlying tables from the Query Optimizer is by using statistics, to estimate how many rows will be returned to the query plan.

How they are creating:

-- step 1
use master
alter database stats_check set single_user with rollback immediate
drop database stats_check

create database stats_check
-- 

USE master
GO
ALTER DATABASE [stats_check] SET AUTO_CREATE_STATISTICS ON WITH NO_WAIT
GO
-- step 2
USE stats_check
create table tbl_stats_check (id int, name varchar(20), dept varchar(20), age int,join_date datetime)

insert into tbl_stats_check values (1,'a','IT',30,'2018-10-25')
insert into tbl_stats_check values (2,'b','Civil',35,'2017-10-25')
insert into tbl_stats_check values (12,'x','Eng',45,'2013-10-25')

-- step 3
-- run this to know the -- stats info

SELECT s.name, s.stats_id,
    s.auto_created,
    s.user_created,
    c.name as colname,
    s.no_recompute,'dbcc show_statistics (''dbo.tbl_stats_check'','''+s.name+''')' AS Tablename_show_statistics
FROM sys.stats AS s
JOIN sys.stats_columns as sc 
    on s.stats_id=sc.stats_id
    and s.object_id=sc.object_id
JOIN sys.columns as c on 
    sc.object_id=c.object_id
    and sc.column_id=c.column_id
WHERE s.object_id=OBJECT_ID('dbo.tbl_stats_check')
    --and s.auto_created = 1
ORDER BY sc.stats_column_id;
GO

-- step 4
select * from tbl_stats_check where id =1
select * from tbl_stats_check where dept ='it' and age =30

-- run the stats info select again (step 3)
-- drop the database and run stpe 1 skip the step 4

-- run step 5 and check the stats info select again (step 3)
create clustered index cx_tbl_stats_check_id on tbl_stats_check(id)
create index cx_tbl_stats_check_dept on tbl_stats_check(dept,age)

dbcc show_statistics ('dbo.tbl_stats_check','cx_tbl_stats_check_dept')
dbcc show_statistics ('dbo.tbl_stats_check','_WA_Sys_00000004_0EA330E9')
dbcc show_statistics ('dbo.tbl_stats_check','_WA_Sys_00000003_0EA330E9')
-- The stats created for auto is creating single stats for each column, from the index it is a combination of columns distribution

UPDATE statistics Stats_Check.dbo.tbl_stats_check (cx_tbl_stats_check_dept) with fullscan--,NORECOMPUTE

 

Creating an index – statistics are created automatically created for column/columns. For more than one column of composite index, statistics are only created for first column in the histogram.

By Default AUTO_CREATE_STATISTICS is ON for database – When query is running by where clause or join clause it gets created.

We can manually create it. https://docs.microsoft.com/en-us/sql/t-sql/statements/create-statistics-transact-sql?view=sql-server-2017

Query example: You can run and see how it is created by where clause (By sampling) and index (By Full scan).

How the query using them:

Dbcc_showstatistics

You can see three different categories in the results.

  1. Head – will have the information of the statistics. When it is last updated, rows and rows sampled – if this differs the stats are not with full scan. Steps – number of steps etc.
  2. Density – Shows the density values for the column/columns.

SELECT 1.0 / COUNT(DISTINCT Column) FROM Table;

  1. Histogram – The table data that divide further as histogram

The histogram will only store 200 steps, this is another problem for very large tables.

This is where Query Optimizer will calculate how many rows will be returned for a query, based on that it will create a plan. To create a most accurate plan, we need to have a updated stats.

Take a look on this: https://jbswiki.com/2017/01/31/what-dbcc-show_statistics-tells-me/

Important points to note:

Update statistics regularly

SQL automatically update stats, When 20% + 500 are changed in a table.

The rebuild index will update stats on index column with full scan not for non-index columns.

We can use UPDATE STATISTICS command or can schedule GUI, I prefer to use OLA script.

sp_updatestats – Generally, we will not use this, since it is replaces a full scan stats with random sampling.

Some larger tables, we can use manual statistics update by using NORECOMPUTE option.

Create filtered statistics when we have option to use.

Update statistics schedule will be based on the data changes and application mostly weekly once, some application needs a daily stats update.

I’m currently working as a SQL server DBA in one of the top MNC. I’m passionate about SQL Server And I’m specialized in Administration and Performance tuning. I’m an active member of SQL server Central and MSDN forum. I also write articles in SQL server Central. For more Click here

Leave a Reply

Your email address will not be published. Required fields are marked *

+ 49 = 53