Merge is a physical operator joining 2 sets together into one. Similar to Nested Loop, it can implement all logical join operations, such as outer join and inner join. Different from Nested Loop, Merge needs 2 input sets which are sorted on the joining keys. For instance, there are 2 piles of papers. The first pile includes customers’ basic information. Each paper has and only has one customers information. The first pile is sorted by customers’ ID. The second pile includes customers’ purchase information. Every customer might have 0 to many purchase orders. The second pile is sorted by customer’s ID. While merging taking place, operator takes one page from the first pile, Customer1 for instance, to compare the page from the second pile. If matched, return the combined information, then take the next page from the second pile and compare again. Until no more pages on the second pile can mach the current page from the first pile, which also means no more pages for Customer1 in the second pile, then the operator takes the second page from the first pile, and repeat this operation again and again until all the pages in the first pile get processed. This is a very effecient operation.
John H
Query Plan (08) – Nested Loop Cont.
As we know that Nested Loop is a physical operator which can perform different kind of logical joins between 2 sets, SetA and SetB for instance. Does SetA joining SetB equal to SetB joining SetA from perfromance perspective(assume 2 sets have enough indexes)? It depends?
Produce CLR_MONITOR Wait Type
CLR_MONITOR is one of the wait types in SQL Server 2008. It occurs when a task is currently performing CLR execution and is waiting to obtain a lock on the monitor according to BOL. When this shows up in sys.dm_exec_requests, it means the session is running a CLR which is waiting for lock being granted to an object by using Monitor class.
Statistics
The answer of the question I asked in my last post is Statistics. Query Optimizer is a cost(and rule) based optimizer. It calculates the costs for each operator based on its formulars behind and get the total estimated cost of query. If there are more alternatives to implement the same logic, SQL Server will know the cost of each alternative, then it can pickup a most efficient one to run. However, databases nowadays are usually complicated. Very frequently, implementing a logic for data accessing can have millions of alternatives. Getting cost for each and find the cheapest best one is just so time consuming. It doesn’t make sense to take a day to find a best plan to execute where the returning of the query can be done in 10 minutes by using the wrost plan. There are definitely some rules behind. We will come back to the rule in the future. Now let’s see how SQL Server gets estimated number of rows on the data it maniputes on.
Vancouver Tech Fest 2012 is Coming
Vancouver Tech Fest 2012 is Coming – April 28, 2012 It will be a great place to learn different programming technologies. I will be presenting Locking and Concurrency Considerations in DB Design. Concurrency is one of the most important things to be seriously considered while designing a complex database system. Fully understanding different level of … Read more
Query Plan (07) – Nested Loop
Relationships amoung multiple sets can always be interpreted into multiple relationships of 2 sets – joins. Conceptually, you can have 2 types of joins, Inner Join and Outer Join. Inner joining returns the conjunction of two sets, such as Inner Join, Cross Apply, and Intersect in SQL Server. Outer join will return full set from one or both of the inputs with or without relations between each other such as Left Outer join, Right Outer Join, Full Outer Join, Cross Join, and Outer Apply. In SQL Server, if rows are only returned from one set not the other, it’s called Semi Join, such as Exists, IN (subset). If returning rows do not exist in another set, it’s called Anti Semi Join, such as NOT Exists, Except, Not In (subset). They are all conceptual. Nested loop is a PHYSICAL operator, it supports any of types of joins described above.
Size of The Index
At the end of last post, I gave you a puzzle and also mentioned to give the answer of it.
use AdventureWorks2008R2 --set statistics io on -- this is the hint select CustomerID, SalesOrderID from Sales.SalesOrderHeader with (index=[IX_SalesOrderHeader_CustomerID]) where CustomerID = 29974 and SalesOrderID = 45785 select CustomerID, SalesOrderID from Sales.SalesOrderHeader where CustomerID = 29974 and SalesOrderID = 45785
Look at the query plan, the cost of both are the same. Does it mean the performance of those 2 are the same? If not, Which statement generates better plan?
Query Plan (06) – Seek
Seek operator presents both physical and logical operator. It can only apply to an index, clustered index or none clustered index. It’s the most efficient operation to reach a record in an index by key. As we know, indexes are organized in a B-Tree, each record in the non-leaf level of B-Tree includes a the first key(s) in the page of next level and the pointer (File:Page:Slot) of the page in the next level.
Query Plan(05) – Scan
Few factors can cause SQL Server to scan a table.
- Hint has been used, such as undocumented table hint ForceScan, wrong index is specified for a table.
- No index can be utilized for seeking
- The query selects everthing, no where clause.
- Predicates are not selective.
Query Plan(04) – Basic Properties
Every operator has its own properties. Reading those properties can help understand the logics behind them. Let’s have a glance on a query plan of a simple query
create table #t(Value int) insert into #t (Value) values(1) select * from #t