Clustered Columnstore Indexes – part 11 (“Clustered vs Nonclustered compression basics”)

Continuation from the previous 10 parts, starting from

This post is all about the the difference in the compression algorithms between Clustered Columnstore and Nonclustered Columnstore Indexes. As in the previous post, I have decided to continue using Contoso BI Database. In this test I will ignore the compression ratio from the original tables, since I just want to compare the results of the compression between Nonclustered Columnstore and Clustered Columnstore Indexes.

I have choosen to investigate 4 fact tables: FactStrategyPlan, FactSales, FactInventory & FactSalesQuota – they all have a different amount of data and their number of columns is not prohibitive for playing with Nonclustered Columnstore Indexes.

Lets drop all the keys (first Foreign and then Primary) for our 4 tables:

Now, game on and let us create Clustered Columnstore Indexes to see how much space they actually take:

Lets check on the occupied space by executing the following commands:

My results were: dbo.FactStrategyPlan – 30.840 KB, dbo.FactSales – 46.864 KB, dbo.FactInventory – 88.672 KB, dbo.FactSalesQuota – 110.616 KB respectively.

For now, lets just drop the Clustered Columnstore Indexes and try to create Nonclustered Columnstore Indexes on the HEAP:

The results for the Nonclustered Columnstore Indexes were following: dbo.FactStrategyPlan – 36.520 KB, dbo.FactSales – 54.952 KB, dbo.FactInventory – 101.168 KB, dbo.FactSalesQuota – 116.456 KB.
Note: Of course I am ignoring the fact that the Noncslutered Columnstore Index is not the only storage required for the table, but because in the post I am just focussing on the compression itself, I will be ignoring the rest of the required space.

Screen Shot 2013-07-18 at 15.33.47Lets compare the results now: it is quite visible that in the case of all 4 tables tested in this blog post, Clustered Columnstore has beaten the Nonclustered Columnstore Index by applying a better compression as it was expected.

Clustered Columnstore vs HEAP NonClustered Columnstore:
Screen Shot 2013-07-18 at 15.34.03

The average improvement was rounding about 10%, but once again I would like to stress that this is a very small test data with just 4 cases. This case serves though to test and to prove that the Clustered Columnstore Index has a different algorithm then the Nonclustered Columnstore Index, and that the Clustered Columnstore Indexes besides other improvement (updateable, supported data types, etc) offers a significantly better compression ratio using the default compression.

Update: Now let us consider the situation when the Nonclustered Columnstore Index is being built on the top of a Unique Clustered Index:

The amounts of data inside of the tables is less even for the Clustered Columnstore is because we are rebuilding on the top of the existing ordered traditional clustered index, this way enabling a better compression.
Now, lets take a look at the results:
Screen Shot 2013-07-18 at 23.29.39

Screen Shot 2013-07-18 at 23.29.01

It looks like there are some other external factors that potentially influence the size of the created tables, and that the variation can go either side in a couple of percent.
The difference between the results is very insignificant, which actually shows that the compression itself is extremely close or really the same. I have heard before and truly believed that the difference would be quite significant, but these tests has shown me so far that the difference is absolutely minimal.

This test proves that the Clustered Columnstore Indexes are making the Noncslutered Columnstore Indexes obsolete (especially when not using a traditional Clustered Index for Nonclustered Columnstore) in a major number of scenarios, and that only some very specific cases will be deserving the Nonclustered Columnstore Indexes usage.

to be continued with Clustered Columnstore Indexes – part 12 (“Compression Dive”)

3 thoughts on “Clustered Columnstore Indexes – part 11 (“Clustered vs Nonclustered compression basics”)

  1. Pingback: Clustered Columnstore Indexes – part 1 (“Intro”) | Nikoport

  2. manishkumar1980

    Hi Experts,

    Can we deduce that both the algorithms are the same.

    Here Heap+ NCCI should also be taken into consideration.

Leave a Reply

Your email address will not be published. Required fields are marked *