Clustered Columnstore Indexes – part 34 (“Deleted Segments Elimination”)

Continuation from the previous 33 parts, starting from http://www.nikoport.com/2013/07/05/clustered-columnstore-indexes-part-1-intro/

Segment Elimination for Deleted Row Groups

After some very interesting and inspiring conversations during SQLBits, I came to an idea of testing a Segment Elimination while working with a fully deleted Segment:

Imagine that you are regularly updating your Columnstore Table and it reaches a certain moment when all data inside a compressed Segment is deleted.
What happens to the Segment elimination process executed by Query Optimiser ?
Will it check on the meta-information level if each of the present rows inside a particular Segment are eliminated and thus reading this segment does not make any sense at all ?

Read this blog post to find it out:

Let us kick of with the a simple setup script, that will create a simple table with just 1 column, which shall be defined as an identity by default.

Now we can check the situation with our Row Groups, and for this purpose I shall use the following script:

Row Groups after initial Load As you can see, we have a perfect column clustering on our C1 column, with numbers sequentially occupying their place between 2 compressed Segments.

Now we can test our Table by invoking a test Query for determining if Segment Elimination is working as expected in a situation without any deleted rows.

Now we can test our Table by invoking a test Query for determining if Segment Elimination is working as expected in a situation without any deleted rows:
ErrorLog with Segment Elimination
Everything looks fine our second Row Group (id =3) was eliminated from the final result, this is exactly what we have expected.

Lets Delete half of the available rows, all those which do belong to the first of our 2 Segments:

We need to determine if anything has changed in our meta-information, and so let us run a query to check on the compressed Segments:

We need to determine if anything has changed in our meta-information, and so let us run a query to check on the compressed Segments:
Row Groups after deleting first Segment data Quite visible that in our DeletedBitmap for the first of our Segments we have the very same amount of rows that are located in our first Segment – 1048567. This is means that our first Segment has become practically useless, because it does not contain any data at all (all rows are now deleted).
The first question naturally will be what happens with this Segment – and the answer is Nothing. It will stay this way until we shall rebuild our table or partition containing the respective Segment.
I will get back to this topic in the end of this blog post.

For now let us invoke our test query again with the same predicate id (10) again, to see if we can get some Segment Elimination out of Query Optimiser:

Error Log with no success at Deleted Segment Elimination Oh well, we are not very lucky, Query Optimiser has ignored only the second segment while reading the first one, where the value 10 should be contained, but only in this case as we have seen, each value of the Segment has already been marked as deleted in Deleted Bitmap.

Final Thoughts

I imagine that it might be considered quite a waste of time to check deleted bitmaps information and if a Segment has become useless, even though I could argue that such checks by default for DataWarehousing queries are not that expensive, given the nature and specificity of the long running queries, especially because in such environments there are not a big number of them.

As an alternative, we need some kind of mechanism which would allow us to delete those Segments without invoking REBUILD functionality, because at the moment we would automatically loose our well prepared Segment Clustering and because it might not make any sense at all.

I would be more than happy to have some kind of hint for Tuple Mover command, such as:
ALTER INDEX REORGANIZE with (REMOVE_ALL_DELETED_SEGMENTS = ON);
which would analyse such Segments and remove them completely.
I hope that this blog post shall receive an update soon … 🙂

Update on 28th of July 2014: A Connect item has been filed suggesting some improvements into the maintenance procedures for Columnstore Indexes.

to be continued with Clustered Columnstore Indexes – part 35 (“Trace Flags & Query Optimiser Rules”)

Leave a Reply

Your email address will not be published. Required fields are marked *