Columnstore Indexes – part 111 (“Row Group Elimination – Pain Points”)

Continuation from the previous 110 parts, the whole series can be found at http://www.nikoport.com/columnstore/.

The topic of the Segment Elimination (aka Row Group Elimination) is something that I keep on blogging on through all the years, but given that Joe Obbish has written an incredibly valuable blog post on this matter lately, I decided to touch only on the points that I feel that need some expansion – and right now, I can think of 2 of them – the OR condition and the Inter-column Search, that are true pain in the neck in the world of the Columnstore Indexes Row Group Elimination.

The Setup

Once again the test database shall be the free test database from Microsoft – ContosoRetailDW, which I shall restore from the C:\Install location:

Using my favourite table FactOnlineSales, this time I shall create a limited copy of it, calling it FactOnlineSales_CCI, creating a Clustered Columnstore Index on it and optimising the Row Group Elimination for the column number 4 – the [ProductKey]:

To verify all the details of all Segments for this column, let’s issue the following query against the sys.column_store_segments DMV:

On the picture on the right side, you can see the distribution of the values between different Segments between 1 and 2517 for the [ProductId], between my 9 different Segments. You can also determine that there are no overlaps with the some certain values being located on the interconnections between the Segments.
We are ready to advance now and start looking at the most serious Segment Elimination Problems for the Columnstore Indexes.

The Painful OR condition

At the moment of the writing of this article, I do not remember of any more painful limitation for the Row Group elimination, other than the usage of the OR condition.

First of all let’s take for the test the overall usage of the OR criteria with a simple query, but even before that let’s execute the simplest search query with one equality predicate:

As expected for a simple predicate that we have specified, the Query Optimiser processed just 1 Segment, because other Segments do not contain comparable results. Everything is fine here, right? 🙂

Now, let’s use 2 equality searches and see what happens – the expectation here is that only 2 Row Groups will be processed, since we have aligned all segments and that Segment with ID = 3 is the one that will correspond to this equality search.

This query produces the following IO results: we read 3 Row Groups, while skipping 6 – the Row Group Elimination works in a … unexpected way!

Do dive deeper, lets install and start the Extended Events session from the CISL library – cstore_XE_RowGroupElimination:
Watching the elimination results will give you the confirmation that as a matter of a fact, 3 Segments with IDs 3,4,5 were processed, but this raises a big question – what did this happen ?

Let me share with you my guess: “The Segment Elimination in this case works as a range scan between the minimum (1000) and the maximum values (2000)“, because the Segments 3, 4 & 5 are the ones which are overlapping with this scale of values.
Following this theory I will execute a simple variation of the query, but this time setting the second parameter search equals to the 0 – thus expecting that it will scan the Segments between 0 and 5:

and (drumroll) – voilá!:


As expected, unfortunately, it seems that the OR will scan all possible values in the range, instead of doing an intelligent comparison (Insert a very big frown here – :().

To solve this, you can re-write your OR condition into something similar, where every single search functions separately (yes, there will be few cases where it will be doable):

The somewhat risky results are presented below. They are risky, because if other conditions will require a full scan, this way will instantly become a very unproductive:

Still this gives us processing of just 1 Segment with 253 lob logical reads vs 6 Segments and 2669 lob logical reads in the original query.

But wait, that’s not all! It can get significantly worse!
Let’s use the inequality search with the OR condition, to see what happens – should be safe, right ?

Yes, you have seen it right, instead of reading just 4 Segments, according to the value distribution, this query will scan all of the segments without a need.
This is a really bad situation, as far as I am concerned, and I am filing this bug to the Connect during this week.

The one positive thing to add is of course if we have a better AND predicate, it will be helping to eliminate Row Groups, for example the following query will deliver no results while reading no Row Groups, because there are no such rows with SalesAmount > 10000.:

Inter-column Search

The comparison/search between the columns of the same table is something where Columnstore Indexes have a serious problem, because it seems to me being not implemented.

Let us run a simple query with 3 aggregates against our FactOnlineSales_CCI table, comparing the [StoreKey] (integer) with [ProductKey] columns.


The execution plan above is very self-explainatory, the Filter iterator does all the job of doing the search, and the Columnstore Index Scan gets no Predicate Pushdown, nor can it do Row Group elimination.

For the RowStore tables, this part functions pretty simple with a predicate being pushed down to the Storage Engine:


While naturally we have a different execution plan here with a Stream Aggregate, the story does not need further study, the number of rows coming out of our 12.6 Million Rows table is very small, because it is being filtered within the scan operation.

We can execute here not only the equality search (=), but also the non-equality one (< or >, for example), but the result will stay the same – the Columnstore Index Scan will not push the predicate into the engine and hence the Row Group elimination will not function.

Thoughts

There are a lot of limitations in every technology or solution, but those 2 are the biggest ones that I know off and I hope that I was able to share them with the reader.

to be continued with Columnstore Indexes – part 112 (“Linked Servers”)

Leave a Reply

Your email address will not be published. Required fields are marked *