Query plans are merely set of steps that will be running in specific order to achieve user defined operation(s). Such ideas are widely presented in all known and unknown areas. In order to light up a fire, someone might find a piece of stick, a rock and hays, scratch the stick on the rock, make the temporature of the stick high enough to light itself or the hays. But you may just go to grocery store to buy matches or lighters and use them to burn the hays. Mapping those 2 approaches in SQL Server, those are 2 query plans.
After buying a lighter from a grocery store, you will first make a flame, put hays on top of the flame, then move the burning hays to a bigger pile. You will have to go in that order. Using different ways will in crease the time or make things difficult.
SQL Server integrates many operators for data accessing and processing. One or more operators combined together in specific order creates a plan. Getting the same result, there are definitely more than one adoptable approaches(combination of operators). Some of them working batter in some scenarios but not others, some might work for both group of scenarios with sub-optimal performance, and definitely there are chances that you can get one plan for all although it’s unlikely. 🙂
How does SQL Server know which query plan is the best? Through the cost. Cost means time in milliseconds. Each operator has one or more cost formulas. Combining the costs from all involved operators can get the total cost of the execution. For instance, drinking one bottle of beer need to take 6 minutes. So drinking 100 bottle will need 600 minutes, that’s 10 hours of work. Whether or not I have the capacity to drink that much is the second question(dump them to tempdb 🙂 ). 10 hour is just an estimate.
Back to SQL Server, scanning records from a table, SQL Server estimate that it will take 0.0011 milliseconds CPU time per row. If a table has a million records, SQL Server will estimate to use 1.1 seconds CPU time to finish the operation. How does SQL Server know how many rows in the table where a condition is true? It uses statistics. I will come to that in my later posts. Based on estimate and stats, SQL Server can quantify the cost of each query plan. Comparing the tatal cost of each plan can get the best plan. There are still challenge. If you need to join 3 table together, you can join them in different ways, ((a,b),c), ((a,c),b), ((b,a),c)…This is still countable, 3! = 6. What if you have 10 tables joining together? To find the best plan, cost of 10!= 3,628,800 potential plans are need to be compared! SQL Server will only generate an optimal plan for a query rather than the best since it’s just not feasible. But SQL Server does allow you to affect the process of query plan generation, such as plan guides, hints, timeout( generating and finding a optimal plans is time consuming, you can setup extend the timeout by setting trace flag 2301). Then you will need to know how to infuence the default behavior of SQL Server.
The next, how would you know what the best plan? You will have to know the meaning of each operator, cost of each operator, proper use of each operator, optimal order of operators. Let’s get into those topics in future posts.