Apr 092012
 

Stream Aggregate is the most efficient physical operator for value aggregation and generating distict rows. Stream here refers to record stream. Give you a typical example here. When you use SqlDataReader to retrieve reuslt from a query, the rows are read one by one by calling SqlDataReader.Read() method. You can say that you are streaming records from SQL Server to your client. If you perform aggregates in the stream, for instance, you are asked to count number of rows in the stream. In this example, you only need to increase the value with 1 to a variable in your application whenever Read() is involked with true returned. This is called Stream Aggregate.

In most of the cases, stream aggregate needs the input stream sorted. For instance, there is a pile of customer orders with customer name. When you need to get the grand total for each customer, you can take the first order from the pile and write down the customer’s name and total; then take the second order, compare the customer name with the name written previously, if they are matched, then add the written total with the total on your hand, then you discard this order and get another one. If the one you take shows different customer name, you know that all orders for prevous customer has been aggregated. Then you hand out the grand total for previous customer and then repeat such process until all the orders are processed.

use tempdb
set statistics xml off
create table test(a int, b int)
go
insert into test values(RAND()*100000, rand()* 100000)
go 2000
set statistics xml on
select top 1 a from test order by a desc
select max(a) from test 

select distinct a from test
set statistics xml off
create index a on test(a)
set statistics xml on
go
select top 1 a from test order by a desc
select max(a) from test 

select distinct a from test
set statistics xml off

The first 3 queries running on the table without any indexes. Getting max value from the table, the first query reads all rows, sort it, the select top 1, which is costly. The cost of this is same as the cost of getting distinct value from column A. While using stream aggregate, the query only need to read all rows then produce the maximum value.

The second 3 queries are identical from the first 3 but with index created. The TOP operation in this case becomes to the most costless operation because it utilizes index. The second one showing above is the same as the first one, but I believe the fact is it slightly costly than the first one, it’s a stream aggregate over top operation. In this case the stream aggregate only process one row. The cost of it can be ignored. The last one has no choice, it has to read all rows and get distinct values. With indexes, it still 5 time faster than the last query in the first 3.

 Leave a Reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="">

(required)

(required)

C# | HTML | Plain Text | SQL | XHTML | XML | XSLT |

This site uses Akismet to reduce spam. Learn how your comment data is processed.