How is one or more aggregate function implemented in most SQL engines?

In the book Database Fundamentals, Silberschatz. It is explained that aggregate functions can be calculated on the march.

This make sense. What it means is that for calculating the maximun, average or count the items in a set, you don’t need to pass a copy of the set to the aggregate procedures, you only process each record meanwhile you transverse the set.

One naive implementation could be to keep a variable for each aggregate desired. For example, a SELECT sum(a_field), count(a_field), max(a_field) FROM a_set could be implemented as:

sum_ = 0
count_ = 0
max_ = -INF

for record in a_set:
    sum_ = sum_ + record.a_field
    count_ = count_ + 1
    max_ = max(max_, record.a_field)

return (sum_, count_, max_)

Of course, this is unthinkable as the loop over the set should not be so tied to the aggregate computation. I suppose the loop delegates the aggregation to a kind of coroutine.

Supposing a coroutine is a kind of object with two methods:

  • feed: where you can pass a value to the coroutine
  • get: which gives you the result of a computation

The loop would be something like:

# Given a set C of aggregation coroutines
for record in a_set:
    for c in C:

return (c.get() for c in C)

In this case, I imagine a coroutine like max as:

max_ = -INF
while item = consume():
    max_ = max(max_, item)
yield max_

Here, I’m supposing that when the coroutine invokes consume it waits until somebody calls it’s feed method. And when it calls yield, that value is collected later by the one who invokes it’s get method.

Just for fun, let’s implement the sum:

sum_ = 0
while item = consume():
    sum_ = sum_ + item
yield sum_

So, this is broadly what I imagine is happening behind the scenes, but I can’t be sure, so:

  1. How is this process actually implemented in the most of SQL engines?.
  2. What would happen with an aggregation which requires two or more transverses on the dataset, as the standard deviation?.

Note: The pseudo is a kind of pseudo Python.


I believe most of the “modern” RDBMS implementations are based on the Cascades optimization framework.

I shall talk about how Microsoft SQL Server handles this, as that is the DBMS with which I am most familiar. SQL Server is an implementation of the Cascades optimization framework, so it’s workings and the ones for other “modern” RDBMS should be similar.

The SQL received by the server is converted into a series of physical operators by the optimiser. The physical operators initialize, collect data, and close. Specifically, the physical operator can answer the following three method calls:

Init(): The Init() method causes a physical operator to initialize itself and set up any required data structures. The physical operator may receive many Init() calls, though typically a physical operator receives only one.

GetNext(): The GetNext() method causes a physical operator to get the first, or subsequent row of data. The physical operator may receive zero or many GetNext() calls.

Close(): The Close() method causes a physical operator to perform some clean-up operations and shut itself down. A physical operator only receives one Close() call.

The execution plan for this query

from dbo.t1

may look like this:

Example aggregating query plan

I have omitted some operators for simplicity.

Execution proceeds from the top-level operator (left-most in the diagram). Once everything is initialized (by a chain of Init() calls) SELECT will call GetNext() on Stream Aggregate and wait for a response. Stream Aggregate will issue a GetNext() to the Clustered Index Seek operator. Its implementation says to return one row from persistent storage in response to GetNext(). Stream Aggregate will then add that row’s values into its internal registers for each of the aggregate values it is tracking (sum, count, average or whatever). Its internal implementation has the ability to hold within itself each of the required values.

Stream Aggregate does not respond to SELECT’s GetNext() immediately. Rather its implementation says to continuously call its child’s GetNext(). Note that Stream Aggregate does not care what operator its child is. It happens to be a Clustered Index Seek in this example but could be a table scan, a join, a constant, or anything else. It doesn’t matter since all operators implement the same three-method interface and respond externally the same way to these three calls. In this way the “aggregation” function is isolated from the “reading” function and the “looping” function is part of Stream Aggregate’s implementation. The optimiser is free to substitute different implementations for each function as it sees fit e.g. using an index lookup or a table scan.

Eventually Clustered Index Seek will respond to GetNext() with “no more data”. At this point Stream Aggregate can do the maths to return the required values. For something simple like COUNT() the corresponding internal register is passed back as is. For complex values like STDEV() Stream Aggregate actually holds internally the equivalent “simple” values which can be computed with a single pass.

Stream Aggregate's internal registers

It only need do the final calculation of the standard deviation once it is sure there are no further data.


Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *