Query Plan (11) – Hash (1)

In my last post, I described Hash Joins which is showing as Hash Match physical operator in the graphical execution plan. Another place hash match is used is for aggregation. When columns in group by clause do have have indexes or SQL Server cannot determine whether the rows are sorted or not, SQL Server will perform a Hash match to get aggregates. If distinct keywork appears on the select list, but there is no indexes on the selected columns, a hash match will be used as well. This operation is called hash aggregate.

Continue reading “Query Plan (11) – Hash (1)”

Query Plan (10) – Hash

Hash is used in 2 most commonly seen physical operators, Hash Join and Hash Aggregate. Those 2 happen when there are no any other alternatives (merge, nested loop, or stream aggregate) which can be used for more efficient operation. For instance, when SQL Server joins 2 tables together but none of them has an index. SQL Server has no idea whether the joining keys are sorted or not.  In most of the case for such scenarios, hash join will take place.  As its name, hash join uses hash algorithm to encode the joining keys from both side, compares the hashed values, and produce the result. This sounds very complex — yes, it is a very heavy operator.
Continue reading “Query Plan (10) – Hash”

Proud of Being A MVP

This morning, as a routine of my regular day, I checked my emails and found a very special one which is from MVP Program. It says

Dear John Huang
Congratulations! We are pleased to present you with the 2012 Microsoft® MVP Award! This award is given to exceptional technical community leaders who actively share their high quality, real world expertise with others. We appreciate your outstanding contributions in SQL Server technical communities during the past year.

I am proud of being a MVP and I appreciate this recognition of all my work done in our SQL Server Community. I would like take this opportunity to thank all professionals and MVPs here who brought me into the community, provided opportunities to let me grow, nominated me, and also ones who attended my events, read my blog, and asked me questions from various locations. Thanks for your continuous support.

Modification Count of Statistics

When the size of a database is large or number of rows in a table become more, the built-in auto stats update usually can’t keep the statistics up-to-date. Out dated statistics can’t serve the query optimizer to generate better query planes. Consequentially, performance gets impacted negatively. Maintaining statistcs in a database is a one of the biggests challenge for DBAs. Because they need to know when the statistics updated last time and when they should perform stats update again. However, that’s not sufficient. They also need to know how many accumulated changes have not been reflected in the stats. It’s a simple task back to SQL Server 2000. They can just query column rowmodctr in sys.sysindexes. Unfortunately, this column is broken after SQL Server version 2000. It’s no longer accurate. Even worse, SQL Server 2005 and 2008 do not expose such information at all even though they have it internally.

Continue reading “Modification Count of Statistics”

Query Plan (09) – Merge

Merge is a physical operator joining 2 sets together into one. Similar to Nested Loop, it can implement all logical join operations, such as outer join and inner join. Different from Nested Loop, Merge needs 2 input sets which are sorted on the joining keys. For instance, there are 2 piles of papers. The first pile includes customers’ basic information. Each paper has and only has one customers information. The first pile is sorted by customers’ ID. The second pile includes customers’ purchase information. Every customer might have 0 to many purchase orders. The second pile is sorted by customer’s ID. While merging taking place, operator takes one page from the first pile, Customer1 for instance, to compare the page from the second pile. If matched, return the combined information, then take the next page from the second pile and compare again. Until no more pages on the second pile can mach the current page from the first pile, which also means no more pages for Customer1 in the second pile, then the operator takes the second page from the first pile, and repeat this operation again and again until all the pages in the first pile get processed. This is a very effecient operation.

Continue reading “Query Plan (09) – Merge”

Query Plan (08) – Nested Loop Cont.

As we know that Nested Loop is a physical operator which can perform different kind of logical joins between 2 sets, SetA and SetB for instance. Does SetA joining SetB equal to SetB joining SetA from perfromance perspective(assume 2 sets have enough indexes)? It depends?

Continue reading “Query Plan (08) – Nested Loop Cont.”

Produce CLR_MONITOR Wait Type

CLR_MONITOR is one of the wait types in SQL Server 2008. It occurs when a task is currently performing CLR execution and is waiting to obtain a lock on the monitor according to BOL. When this shows up in sys.dm_exec_requests, it means the session is running a CLR which is waiting for lock being granted to an object by using Monitor class.

Continue reading “Produce CLR_MONITOR Wait Type”

Statistics

The answer of the question I asked in my last post is Statistics. Query Optimizer is a cost(and rule) based optimizer. It calculates the costs for each operator based on its formulars behind and get the total estimated cost of query. If there are more alternatives to implement the same logic, SQL Server will know the cost of each alternative, then it can pickup a most efficient one to run. However, databases nowadays are usually complicated. Very frequently, implementing a logic for data accessing can have millions of alternatives. Getting cost for each and find the cheapest best one is just so time consuming. It doesn’t make sense to take a day to find a best plan to execute where the returning of the query can be done in 10 minutes by using the wrost plan. There are definitely some rules behind. We will come back to the rule in the future. Now let’s see how SQL Server gets estimated number of rows on the data it maniputes on.

Continue reading “Statistics”

Vancouver Tech Fest 2012 is Coming

Vancouver Tech Fest 2012 is Coming – April 28, 2012

It will be a great place to learn different programming technologies. I will be presenting Locking and Concurrency Considerations in DB Design.

Concurrency is one of the most important things to be seriously considered while designing a complex database system. Fully understanding different level of concurrency control is the key to make your design success. Most of SQL Server internal concurrency controls are implemented through locks. We can utilized the same mechanism to customize the concurrency control for your processes. I am going to cover all of these during the session. At the end, we will talk about concurrency control with combination of different components of SQL Server.

Query Plan (07) – Nested Loop

Relationships amoung multiple sets can always be interpreted into multiple relationships of 2 sets – joins. Conceptually, you can have 2 types of joins, Inner Join and Outer Join. Inner joining returns the conjunction of two sets, such as Inner Join, Cross Apply, and Intersect in SQL Server. Outer join will return full set from one or both of the inputs with or without relations between each other such as Left Outer join, Right Outer Join, Full Outer Join, Cross Join, and Outer Apply. In SQL Server, if rows are only returned from one set not the other, it’s called Semi Join, such as Exists, IN (subset). If returning rows do not exist in another set, it’s called Anti Semi Join, such as NOT Exists, Except, Not In (subset). They are all conceptual. Nested loop is a PHYSICAL operator, it supports any of types of joins described above.

Continue reading “Query Plan (07) – Nested Loop”