dimanche 28 juin 2015

Case of using filtered statistics

I was going through filtered stats in below link.

http://ift.tt/1vBiF0Y

Data is Skewed heavily,one region is having 0 rows,rest all are from diferent regions. Below is the entire code to reproduce the issue

create table Region(id int, name nvarchar(100)) 
go 
create table Sales(id int, detail int) 
go 
create clustered index d1 on Region(id) 
go 
create index ix_Region_name on Region(name) 
go 
create statistics ix_Region_id_name on Region(id, name) 
go 
create clustered index ix_Sales_id_detail on Sales(id, detail) 
go

-- only two values in this table as lookup or dim table 
insert Region values(0, 'Dallas') 
insert Region values(1, 'New York') 
go

set nocount on 
-- Sales is skewed 
insert Sales values(0, 0) 
declare @i int 
set @i = 1 
while @i <= 1000 begin 
insert Sales  values (1, @i) 
set @i = @i + 1 
end 
go

update statistics Region with fullscan 
update statistics Sales with fullscan 
go

set statistics profile on 
go 
--note that this query will over estimate 
-- it estimate there will be 500.5 rows 
select detail from Region join Sales on Region.id = Sales.id where name='Dallas' option (recompile) 
--this query will under estimate 
-- this query will also estimate 500.5 rows in fact 1000 rows returned 
select detail from Region join Sales on Region.id = Sales.id where name='New York' option (recompile) 
go

set statistics profile off 
go

create statistics Region_stats_id on Region (id) 
where name = 'Dallas' 
go 
create statistics  Region_stats_id2 on Region (id) 
where name = 'New York' 
go

set statistics profile on 
go 
--now the estimate becomes accurate (1 row) because 
select detail from Region join Sales on Region.id = Sales.id where name='Dallas' option (recompile)

--the estimate becomes accurate (1000 rows) because stats Region_stats_id2 is used to evaluate 
select detail from Region join Sales on Region.id = Sales.id where name='New York' option (recompile) 
go

set statistics profile off

My question is we have below stats available on both tables

sp_helpstats 'region','all'
sp_helpstats 'sales','all'

Table region:

statistics_name   statistics_keys
d1                    id
ix_Region_id_name     id, name
ix_Region_name        name

Table sales:

statistics_name    statistics_keys
ix_Sales_id_detail     id, detail

1.Why the estimation went wrong for thse below queries

select detail from Region join Sales on Region.id = Sales.id where name='Dallas' option (recompile)

--the estimate becomes accurate (1000 rows) because stats Region_stats_id2 is used to evaluate 
select detail from Region join Sales on Region.id = Sales.id where name='New York' option (recompile) 

2.When i created filtered stat as per author,i could see estimates correctly,but why we need to create filtered stats,how can i say i need filtered stats for my queries since even when i created simple stats,i got same result .

Best i came across so far 1.Kimberely tripp skewed stats video
2.Technet stats whitepaper

But still not able to understand why filtered stats made a difference here

thanks in advance.

Aucun commentaire:

Enregistrer un commentaire