When the size of a database is large or number of rows in a table become more, the built-in auto stats update usually can’t keep the statistics up-to-date. Out dated statistics can’t serve the query optimizer to generate better query planes. Consequentially, performance gets impacted negatively. Maintaining statistcs in a database is a one of the biggests challenge for DBAs. Because they need to know when the statistics updated last time and when they should perform stats update again. However, that’s not sufficient. They also need to know how many accumulated changes have not been reflected in the stats. It’s a simple task back to SQL Server 2000. They can just query column rowmodctr in sys.sysindexes. Unfortunately, this column is broken after SQL Server version 2000. It’s no longer accurate. Even worse, SQL Server 2005 and 2008 do not expose such information at all even though they have it internally.
The answer of the question I asked in my last post is Statistics. Query Optimizer is a cost(and rule) based optimizer. It calculates the costs for each operator based on its formulars behind and get the total estimated cost of query. If there are more alternatives to implement the same logic, SQL Server will know the cost of each alternative, then it can pickup a most efficient one to run. However, databases nowadays are usually complicated. Very frequently, implementing a logic for data accessing can have millions of alternatives. Getting cost for each and find the cheapest best one is just so time consuming. It doesn’t make sense to take a day to find a best plan to execute where the returning of the query can be done in 10 minutes by using the wrost plan. There are definitely some rules behind. We will come back to the rule in the future. Now let’s see how SQL Server gets estimated number of rows on the data it maniputes on.
At the end of last post, I gave you a puzzle and also mentioned to give the answer of it.
use AdventureWorks2008R2 --set statistics io on -- this is the hint select CustomerID, SalesOrderID from Sales.SalesOrderHeader with (index=[IX_SalesOrderHeader_CustomerID]) where CustomerID = 29974 and SalesOrderID = 45785 select CustomerID, SalesOrderID from Sales.SalesOrderHeader where CustomerID = 29974 and SalesOrderID = 45785
Look at the query plan, the cost of both are the same. Does it mean the performance of those 2 are the same? If not, Which statement generates better plan?