Clustered Columnstore Indexes – part 2 (“Internals”)

This is the second in the series dedicated to the Clustered Columnstore Indexes.
The first part can be found here – https://www.nikoport.com/2013/07/05/clustered-columnstore-indexes-part-1-intro/

Lets try to explore about some more complicated things, which will be daily stuff for those who will be working with Clustered Columnstore Indexes on the daily basis.

Tuple Mover:
While we are inserting new rows into a Table (hence a Delta Store), at some certain point (1.000.000 rows for a normal insert, 100.000 rows for a Bulk insert) we should reach a point when a Delta Stores will be full and shall be closed for any additional data insertion.
SQL Server has an automatic background process which will be checking for the closed Delta Stores (status=2), after finding which it will create a Segment out of it by encoding and compressing. This background process is called Tuple Mover and by default it is executed every 5 minutes. Tuple Mover is designed not to block any read scans of the data but concurrent deletes & updates shall have to wait until compression process completes. I think that this should not be a major blocker since we are not talking about OLTP systems.
Note: Actually, already compressed Segment (Row Group) can be updated with additional rows if it has not reached max number of rows (around 1 million) while rebuilding the table. More on this in the upcoming posts.
Note: One can always invoke Tuple Mover by executing alter index ... reorganize
Invocations of the next statements do rebuild whole table and hence do not invoke Tuple Mover:
alter table ... rebuild or
alter index ... rebuild.

Bulk Insert:
I am very happy to see Microsoft have implemented a good support for the Bulk Insert operations, which are absolutely essential for any DatawareHousing environment.
Bulk insert operations are treated differently basing on the number of rows being inserted. The 100.000 rows is the separation limit, so lets say if we are inserting 85.000 rows, then they shall be added to an existing or to a new Delta Store, but if we are inserting 120.000 rows, then they shall be automatically converted into a Segment (Row Group). An ideal number of rows inserted would be over 1 Million, since it would automatically convert into a non-updatable and full Segment.
The operation itself is very efficient, since it is storing all inserted rows in the memory and does apply encoding and compression in-memory, and only after the process is terminated – then it shall store that data on the disk. Natural requirement for this operation to execute successfully is to have enough memory :)
Note: insert into ... select from ...; statement is optimized for SQL Server 2014 to use bulk insertion with parallel batch mode execution.

Isolation Levels:
According to Microsoft currently all levels of isolation except Snapshot are implemented and supported for usage with Clustered Columnstore Indexes. This means that the following isolation levels are supported: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ & SERIALIZABLE.
Additional note goes for READ_COMMITED_SNAPSHOT which is supported as well.
It is very interesting to think about the reasons behind implementing almost all of them and leaving SNAPSHOT behind, while supporting READ_COMMITED_SNAPSHOT, which is definitely not the most popular isolation level for the DataWarehouses. Does it actually mean that SNAPSHOT will not be implemented / supported in the future ? I don’t know…

Locking:
I have heard that in the terms of locking for Clustered Columnstore Indexes it is being done on the page level, but I kinda disagree with that because with my tests I have seen that even a simple operation acquires an IX (Intent Exclusive) or X (Exclusive) lock on the respective Row Group (applies as well to the Delta Store for the inserts) which as the consequence prevent any reading of the data from it.
Maybe it is a CTP1 artifact – I do not know, but I am definitely will be putting all my experiments in the upcoming posts.
I have found some very interesting error messages in the undocumented functionalities which are directly pointing that some of the locking support is still to be implemented. ;)

to be continued with Clustered Columnstore Indexes – part 3 (“More Internals”)

10 thoughts on “Clustered Columnstore Indexes – part 2 (“Internals”)

  1. Andreas Wolter

    Hi Niko,

    nice detailed article.
    One question though: Are you certain, that Read_Committed_Snapshot is supported?
    That would surprise me, but I’d love to know for sure.

    Thanx,

    Andreas

  2. Niko Neugebauer Post author

    Hi Andreas,

    yeap – I have got this confirmation on the TechEd 2013 Europe, and I have already ran basic tests successfully.
    I can not confirm how good this support is, since I am not ready to publish that article yet, but soon enough (in the next couple of days) it will come online.

    In the case of the SNAPSHOT isolation the situation is very clear, here is the error message that I receive when I try to run the transaction:
    SNAPSHOT isolation level is not supported on a table which has a clustered columnstore index.

  3. Andreas Wolter

    Hi Niko.
    Thanks for testing this.
    I think that’s quite strange, since the underlying architecture, version chain and so on, isn’t so much different.
    This is certainly really interesting. I wonder why and how this is implemented and snapshot itself not.
    Andreas

  4. Pingback: Clustered Columnstore Indexes – part 6 (“Observing the behavior”) | Nikoport

  5. Pingback: Clustered Columnstore Indexes – part 1 (“Intro”) | Nikoport

  6. Rakesh

    Hi – Is there a limitation on number of columns that a clustered column store index can have in sql 2014 enterprise?

  7. Martin

    Hi, currently working on SQL Server 2014. I have a table with over 500 million records. When looking at the sys.column_store_row_groups I can see that I have a number of segments with only a few hundred or a few thousand rows as opposed to the maximum of 1048576, why would this be? Also, you mentioned that there can be more than 1 Delta Store but why does this happen? (Apologies if this is covered in one of your blogs already).

    1. Niko Neugebauer Post author

      Hi Martin,

      there are Memory http://www.nikoport.com/2014/06/21/clustered-columnstore-indexes-part-31-memory-pressure-and-row-group-sizes/ and Dictionary http://www.nikoport.com/2015/02/18/clustered-columnstore-indexes-part-48-improving-dictionary-pressure/ pressures to consider. They cause the Row Groups to get trimmed.
      Different Delta-Stores can be created through the rollbacks and multiple parallel inserts. Or they can simply locate in each different partition.

      Best regards,
      Niko

Leave a Reply to Rakesh Cancel reply

Your email address will not be published. Required fields are marked *