Columnstore Indexes – part 116 (“Partitioning Specifics”)

Continuation from the previous 115 parts, the whole series can be found at

After delaying for a couple of years (yes, strangely there is other priority in my life, besides this blog post series) – I am finally putting a couple of the notes on the partitioning for the Columnstore Indexes. Even though this blog series is not a stranger to the Partitioning subject with 4 blog posts referring Columnstore Indexes – part 14 (“Partitioning”), Columnstore Indexes – part 15 (“Partitioning Advanced”), Columnstore Indexes – part 94 (“Use Partitioning Wisely”) & Columnstore Indexes – part 103 (“Partitioning 2016 vs Partitioning 2014”), but the practical (and not the performance) aspects & specifics of working with the Columnstore Indexes were always on my todo list until this very blog post.
In this blogpost I will want to focus on 2 of the most common operations over the Partitioned tables, the MERGE & the SPLIT, because there are some little specifics that will affect your DWH, once you migrate it to Columnstore Indexes over from the Rowstore ones.

The Setup

To test the partitioning scripts, please download the free ContosoRetailDW database, at Microsoft’s page.
My Data & Log files are configured to be restored to the following path on this test instance: c:\Data\SQL16\, but feel free to change them according to your test server:

Let us add 5 new file groups (1 for the pre-2007 data, and then for each of the years 1 data file (2007,2008,2009,2010).

The partitioning function & scheme, shall be defined as specified below:

Now we are ready to drop the Primary Clustered Key on the FactOnlineSales table and after creating the obligatory partitioned Clustered Rowstore Index, we can finally create partitioned Clustered Columnstore Index:

To check if everything is correct, I suggest using the following queries, displaying the number of rows per partition and the respective minimal year as the maximum year, stored within each of the partitions:

As per picture below, you can see that we have 4, ~3.6 & 5 million rows in each of the partitions for the respective years 2007, 2008 & 2009, while the archive (pre-2007) and the year 2010 partition remains empty:

Merging Partitions

Let’s start with a simple test of merging the 2007 partition with the year 2008, by issuing the following command:

It might ready you a reasonably huge surprise, but this command will fail, if you are using the Columnstore Indexes.

The very same command will function without any problem, if we would simply avoid creating Clustered Columnstore Index …
The reason behind this limitation has to do with the fact that Columnstore Indexes do not sort or control the boundaries of the data, and this is biting the total implementation in such operations.

But let us not give up and try to merge the archive years (empty) and the year 2007, by issuing MERGE command to the date 2007-01-01:

Well, it won’t take a second to find out that this operation will give an error as well, but hey – this will be a different error:

It happens that we are using different file groups for each of the partitions, and if we would use the same file group for the 2 involved partitions, everything would function without any problem – to test this, let us create a new partitioning functioning [pfOnlineSalesDatePrimary] and a scheme [ColumstorePartitioningPrimary] that will be storing all partitions in the Primary Filegroup:

Now if we try to execute the very same operation of joining the archive and the year 2007, the operation shall not fail, because all data files are to be found within the same filegroup:

I know that you might say, that merging on the left side of the right-ranged partitioning function is something that never ever ever happened to you in production, BUT there are people who already faced this very situation. Really. And so I thought that you better be aware before it will be a little bit late, because splitting partitions is not the easiest thing to do. 🙂

Splitting Partitions

Splitting partitions – an easy task that you shall never face at the very last moment, when some team forgot to do their job at the end of the year, for example.
For the purpose of the demo, if you have merged the year 2009, please re-run the setup section before proceeding.
Let’s try to split the year 2009 into 2 halfs, separated by the 6 calendar months or simply by the 1st of July and for that purpose, lets mark the filegroup [Columnstore2009] as the next used one and issue split command


We can not split a non-empty partition, when working with Columnstore Indexes !

Yes, the non-empty partition-splitting operation is not something that you should take on lightly, but it simply (and surely slowly) works with the rowstore indexes! Here is a script to test it out:

There are workarounds, such as taking out the data out of a partition, splitting it and then reloading data into the respective partitions, but this does not sound like a fun procedure, when we could simply issue 1 command to split the range, but this is something that everyone working with Columnstore Indexes should be aware.

Final Thoughts

Keep a good empty partition free from any data. Set it to unrealistic dates/values if needed, so that you are playing safe.
If you are a consultant and landing on the migration project from Rowstore to Columnstore, keep in mind that it might require a different amount of work, if the partitioning practices of having empty partitions is not well kept. Rowstore is different to Columnstore and the internals do matter, making the learning curve a very interesting/challenging for everyone.

Partitioning is available in every single edition (yes, including Standard Edition), starting with SQL Server 2016 Service Pack 1, meaning that this feature will be finding an increased usage in the nearest months, and if you are working with Data Warehousing and your fact tables are in the 100s of millions of rows, there are few reasons why your tables should not be partitioned.

We can not have every feature in Columnstore Indexes that we have had in the Rowstore, simply because the technologies are sufficiently different and a huge effort of implementing a feature that is not a makeIt/BreakIt for the most SQL Server shops is something that will stay for a long time like this. This has been this way since the original SQL Server 2012 release and 4 versions later I do not remember people crying about it in the way that this would be considered life-changing.

Well, if you still have doubts, here is a Connect Item on one of these topics, that was closed as Won’t Fix: (Given that Connect is shutting down, this link will probably die in a couple of days)
Here is a screenshot with the essential commentary from 2014 from Microsoft:

to be continued with Columnstore Indexes – part 117 (“Clustered vs Nonclustered”)

2 thoughts on “Columnstore Indexes – part 116 (“Partitioning Specifics”)

  1. Steve K

    Hi Niko,

    I don’t understand this part:
    “The reason behind this limitation [can’t merge non-empty partitions] has to do with the fact that Columnstore Indexes do not sort or control the boundaries of the data, and this is biting the total implementation in such operations.”
    Surely it’s the point of the partition function/scheme to control the boundaries. Why would the columnstore index itself have to be involved in the MERGE at all? Rowgroups don’t span partitions, do they?
    Couldn’t SQL Server just move the rowgroups from the merged partition into the destination partition? I realize that may result in suboptimal compression but it would be as good compression as having the two separate rowgroups in two separate partitions.

    1. Niko Neugebauer Post author

      Hi Steve,

      I am not a developer and do not have access to the source code, so all my thoughts and comments are guesses in the best case. 🙂

      What you have described is absolutely correct, but consider the fact that this code will need to support queries that are reading information (avoid loosing rows, avoid scanning them for the second time), and allowing other data modification queries (inserts, deletes & updates) – this will look quite a challenging task, given that the framework should have it own limitations.
      I believe that partitioning code is largely written for SQL Server 2012 and they were quite in a hurry to wrap it up before release.
      Even though they were improvements, the largest part should have stayed the same (totally guessing here) …

      Best regards,

Leave a Reply

Your email address will not be published. Required fields are marked *