Columnstore Indexes – part 103 (“Partitioning 2016 vs Partitioning 2014”)

Continuation from the previous 94 parts, the whole series can be found at http://www.nikoport.com/columnstore/.

I have been wanting to write on this topic for quite some time, but with all the exciting themes that I have on my todo list and as drafts on this blog post, prevented me until today – when I finally decided to put a couple of words and a rather simple test for the Columnstore Indexes partitioning, especially since one of the next blog posts will be dedicated to this topic in even bigger details.

Partitioning – the unsung hero of the Columnstore Indexes, one of the most essential elements for the success when using Columnstore Indexes. Rebuilding multiple billion rows table is a daunting task that even the bravest of the DBAs will not (or SHOULD NOT) take easy.
With so many incredible changes, improvements and additions in SQL Server 2016, was there something under the hood, hiding from the public eye? Something that nobody thought was worth mentioning? Something that can improve performance of the huge tables ?
Anything ?
Anything at all ?

Let’s do a quick test between 2 equal instances of SQL Server 2014 and SQL Server 2016. I will use my own generated copy of the TPCH database (1GB version), that I have done with the help of the HammerDB (free software).

Notice that to do a proper comparison, set the compatibility level to 120, even when working with SQL Server 2016:

To make results more visible, we shall need to go a little bit “crazy” with partitioning, creating daily partitioning for the 6 Million Rows table with just 2375 rows per partition on average, like I did it in Columnstore Indexes – part 94 (“Use Partitioning Wisely”):

Now let’s add a copy of the original table lineitem, by building a Clustered Columnstore Index on a new table lineitem_cci (notice that I am running the same script on the 2014 & 2016 SQL Server versions):

After that we shall need to build a partitioned table lineitem_cc_parti on both SQL Server 2014 & SQL Server 2016:

To make sure that we are dealing with only compressed segments, let’s force the closure and compression of all Delta-Stores, by executing the following statement:

As the next step, I will use my CISL open source library to show the details on both of the partitioned tables, in order to ensure their equality:


On my instances their look equal like I presented on the picture above with 0.12GB of space occupied and the 2528 different Row Groups 🙂

First let’s run the test query, that I have built, against the basis table containing Clustered Columnstore Index, but that was not partitioned:

I have observed some incredible results here, and here they are:

One of the lines corresponds to the SQL Server 2014, while the other corresponds to the SQL Server 2016. The numbers on my VM were pretty much constant, with some not so positive variations in the favour of SQL Server 2014 – I say because of the simpler structure.
Well, 186ms was the average result on SQL Server 2016 running under the 2014 compatibility level while 175ms of the total elapsed time was for the SQL Server 2014!

Now, let’s re-run the query, but this time against both SQL Server 2014 & SQL Server 2016 with their fully partitioned tables:

What about now ? Which result belongs to which SQL Server?

266ms was the partitioned table under SQL Server 2016 (compatibility level 120) while 353ms of the total elapsed time was obtained on SQL Server 2014! This represents a solid 25% improvement

All execution plans will have the same iterators, but will differ on the overall estimated cost (the non-partitioned queries will be way lower than the partitioned ones), as well as the distribution of the estimated costs within the execution plan, but as for the rest – it will be quite similar, like the one shown on the image below:

One more thing – what about Compatibility Level 130 on SQL Server 2016?

Making sure that there is no mistake, I wanted to see how the query are performing when we enable the 2016 compatibility level (130):

Running the very same queries, obtains better results, as you can see on the images below:

One of the reasons that we shall instantly notice is that the actual execution plan for the queries will suffer some changes, including TOP SORT(N) iterator running in the batch execution mode, for example:

From this point you can take is that 2016 really runs faster, especially if you are using it’s newest features 🙂

Oh, but there is one more thing: let’s change a bit the query (that was written in the way to prevent some of the more advanced features of SQL Server 2016) and run it again:

The actual execution plan is presented below:

and the total elapsed time are:
126 ms for the partitioned table
16 ms for the non-partitioned table.
Yes, the difference is that big and that is a common thing if you take advantage of the Aggregate Predicate Pushdown in SQL Server 2016!

Just runs faster

There have been some smaller but important improvements in the way the Columnstore Indexes partition access is being processed in SQL Server 2016 and if you migrate from SQL Server 2014 to SQL Server 2016, and you are using partitioning very heavily, you should be able to notice the positive difference. I can’t estimate if it will be 5% or 25% or 55%, but if you are working with Partitioning (and you SHOULD) – and there are a lot of them, then you might see some measurable improvements.

According to Sunil Agarwal who was the Program Manager responsible for the SQL Server 2016,
the switching to eliminated partitions during scan has been removed, because in SQL Server 2014 the switch to every partition took place, regardless if it would be scanned or not.

This is one little, but an important improvement for the SQL Server 2016 that was not hailed before to my knowledge, and I thought that showing it might help someone to make the right decision about SQL Server 2016 migration.

to be continued with Columnstore Indexes – part 104 (“Batch Mode Adaptive Joins”)

5 thoughts on “Columnstore Indexes – part 103 (“Partitioning 2016 vs Partitioning 2014”)

  1. BhavyaArora

    Please here I have 1 query: If partitioning enabled, will deltastore be in each partition for partitioned columnstore Index

  2. Paul

    Hi Niko, I read your blogs constantly as they are excellent and this one is a great start to a good subject. I’ve utilized your tips with temp tables heavily and I appreciate the knowledge you’ve given us. Per the this article from Microsoft, https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-partition, “When creating partitions on clustered columnstore tables, it is important to consider how many rows belong to each partition. For optimal compression and performance of clustered columnstore tables, a minimum of 1 million rows per distribution and partition is needed.” With this being said, I’m not sure the test that you’ve done above really represents columnstore properly as they were created for much larger environments. Also, the execution plans tend to vary as the quantities and sizes increase. In case it happens to hit a merge join, then we have the following related bug in at least SQL Server 2014 and 2016. This bug in my opinion renders SQL Server 2014 and 2016 both still beta version in the year 2018. It involves merge joins that exist in conjunction with parallel plans. This was discovered in SS 2016 and exists in at least 2014 also, although they don’t admit to 2014. It is a terrible bug that can bring the execution to its knees… dropping a simple query to hours or days, while the DBAs scratch their heads trying to figure what the problem is. They originally repaired it, which caused another bug and then to repair that bug, they disabled it in their latest CU (7). Now we are stuck putting hash joins everywhere which totally messes up optimization. I have no doubt this also exists in 2017 but I haven’t tested it yet. They are not taking this bug seriously enough. Warning to all, if you are having serious performance problems, check to see if you have merge joins in your execution plan that involves parallelism in any way, and either change your query to remove it or force hash and loop joins. This issue may go all the way back to 2008. They also have data corruption bugs with cs indexes.

    The fact is we’ve been sadly let down by Microsoft’s implementation of columnstore indexes (and in memory tables) in SQL Server (and some outrageous bugs) for quite some time (six years now). Their marketing department has pushed the product to a high plateau, however CIs should have never been released in 2012 and I still consider the whole concept beta-ish in 2016 as they are hardly usable considering the heavy costs of using them in a data warehouse (which is where they are supposed to be advantageous). I myself, am tired of telling employers how much they will benefit by the next version of SQL Server, only to be mostly let down. Partitioning and parallelism are perhaps the best things they’ve done for the warehouse. I deal with tables that have billions of rows in them and the fact is that columnstore indexes have no place in these environments unless you have several days to wait for the indexes to be built. If you try to insert or update a table this large, the time to update the cs indexes is ridiculous. Once they repair this issue and the bugs mentioned above, SQL Server should be a great product. I can tell you our organization will not even consider 2017 or further until these bugs are repaired. I really don’t like to be this negative, but its many years of being disappointed by this product. Here’s an example why. Our department gets dinged on bonuses if a warehouse load goes over a specific time. Microsoft bugs have caused the entire department to lose bonus money. This is not a very popular thing with these people and they look to remove causes of these types of things, including product replacement, such as with Oracle or freeware versions. Our paid support from Microsoft has yielded nothing better than acknowledgement that we are experiencing the bugs and unacceptable options to do workarounds.
    Keep up the great work and warmest regards Niko

    1. Niko Neugebauer Post author

      Hi Paul,

      Did you tried to speak with Product Managers from Microsoft ?
      I am sure they will be interested to fix some of the stuff you mentioned. Some of the problems you are mentioning are definitely known, but do you have some Support Tickets, Connect Items (or now User Voice Items) that you can share.
      I would like to help out by bridging out contacts to the people I know within Product Group. I believe that they are interested making product work for you.

      Best regards,
      Niko

Leave a Reply

Your email address will not be published. Required fields are marked *