Clustered Columnstore Indexes – part 22 (“Invisible Row Groups”)

Continuation from the previous 21 parts, starting from http://www.nikoport.com/2013/07/05/clustered-columnstore-indexes-part-1-intro/

Last week on an amazing 2013 edition of PASSCamp in Germany one of the attendees has called my attention to the fact that during hands-on labs he has managed to get an invisible Row Group. I was in the middle of demos and did not get the message during the first call – I thought that it was an unexpected Row Group that has managed to see, but as a matter of a fact – it was something different.

Indeed it was something, that is called Invisible Row Group. I can only say that I was extremely excited about it since I have never managed to see one in my tests and demos. I am referring to the very special moment of life of any Delta Store, when it is right to become a compressed Segment. If you are running tests during the period when Tuple Mover kicks in, you have great chances to find it at CTP2 for sure.

When I had a first couple of minutes free I have managed to imitate the conditions, which I will describe below:
Lets kick off by creating an empty table with a Clustered Columnstore Index:

I will insert 2 new rows and compress this Row Group:

Let us insert 10.000 rows and update them immediately:

Before we advance to the next step, please use the following script to monitorize the state of the Row Groups in our table:

Invisible

Its time to insert an extra Row Group – 1.045.678 rows:

You should immediately kick in Tuple Mover manually, while checking the status of the Row Groups using the script above:

This is a screenshot of the result that I have managed to capture:
Screen Shot 2013-12-10 at 00.07.35

There are 4 Row Groups visible, with 1 of them having status equal to 0 (Zero), and status_description of “Invisible”. This is clearly a moment when we are compressing our closed Delta-Store which is identified with Row_Group_id = 1 (state = “closed”).

How can I be sure ? – Very easily: should we keep executing our script for checking on Row Groups, we should be able to find our Row Group with id = 2 gone:
Screen Shot 2013-12-10 at 00.43.11

A very clear show of how things are functioning in Clustered Columnstore 🙂

This raises another important issue for all those future scripts consulting sizes of Clustered Columnstore Indexes – a predicate with state > 0 should be included in order to avoid double counting the number of rows inside the Clustered Columnstore Index.

to be continued with Clustered Columnstore Indexes – part 23 (“Data Loading”)

4 thoughts on “Clustered Columnstore Indexes – part 22 (“Invisible Row Groups”)

  1. Michael Zilberstein

    Thanks for this great series! I’ve been exploring the topic myself and also stumbled upon Invisible row-groups. I’ve submitted a Connect item about it:
    https://connect.microsoft.com/SQLServer/feedback/details/810518/invisible-state-isnt-documented-in-bol-for-sys-column-store-row-groups-dmv

    Microsoft answered that this type wasn’t supposed to be seen by users, but since it is evidently can be seen, they’ve documented it now. So now this “invisible” status is “visible” 🙂
    http://msdn.microsoft.com/en-us/library/dn223749(v=sql.120).aspx

  2. Suman

    Any idea how to find the number of rows going into delta store. I am running a load from SSIS package using “Fast load” to OLEDB Destination, and the row store state changes from OPEN,CLOSED,COMPRESSED. But when load is done using “INSERT INTO SELECT * FROM”, row group state moves from INVISIBLE to COMPRESSED, and there is a lot of performance difference between the 2 methods, in which, the later is much significantly faster…

    1. Niko Neugebauer Post author

      Hi Suman,

      it seems to be a bug, I have submitted a Connect Item – https://connect.microsoft.com/SQLServer/feedback/details/1234426
      In order to get to use BULK LOAD API in SSIS DataFlow, please use “SQL Server Destination” for your table. It will support the Bulk Load API for Dataflows inserting more than 102.400 rows, as explained here – http://www.nikoport.com/2014/06/20/clustered-columnstore-indexes-part-30-bulk-load-api-magic-number/.
      I actually consider that it might be by design, since SQLNCLI version was not changed from SQL Server 2012 to 2014.
      I have seen enough references from the Columnstore Developers that OLE DB Destination should work in Bulk LOAD API, but I hope to find out a definitive answer soon.

      Best regards,
      Niko Neugebauer

Leave a Reply

Your email address will not be published. Required fields are marked *