Clustered Columnstore Indexes – part 23 (“Data Loading”)

Continuation from the previous 22 parts, starting from http://www.nikoport.com/2013/07/05/clustered-columnstore-indexes-part-1-intro/

Like for any technology, for Clustered Columnstore Indexes it is important to understand what are the best ways to run data loading processes.
I have decided to test different ways of loading data to analyze the pros and cons of each of the available methods:

We should try to load 3 Million Rows
1. Load data into a HEAP and then create Clustered Columnstore Index on that table
2. Load data into a table with Clustered Columnstore Index
3. Load data into a table with Clustered Columnstore Index using Bulk process
4. Load data into a HEAP using Bulk process and then create Clustered Columnstore Index
5. Load data into a HEAP using parallel insert and then create Clustered Columnstore Index on that table

I have used the following setup:
A VM with 4 GB of RAM, 2 Intel Core i7-3667U CPU Cores @ 2GHz, 256 GB SSD based on Samsung S4LNo53X01-8030 controller.
As in a lot of previous posts, I will keep on working with Contoso BI database which represents a great starting point.

To prepare everything and to make our original table FactOnlineSales work fast, I have dropped all the constraints and created a clustered Columnstore on it, by executing the following scripts:

I have also expanded and prepared TempDB as well as ContosoRetailDW database in order not to make my Computer spending any time on autogrowing data files or log files.

So, that is enough for the preparations, let’s kick off by creating our staging table:

Now lets load 3 Million rows into our HEAP:

This process took around 14 seconds time, while at the same time spending 11.5 seconds of CPU.
Now we shall need to build our Clustered Columnstore Index:

This process took 18 seconds of execution time and around 24 seconds of CPU time. All in all, those 2 phases took us 32 seconds = 14 + 18, which seems to be reasonable, but it is still to early to arrive to any conclusions, so let us carry on.
Lets create a table with a Clustered Columnstore Index and lets load 3 million row into it:

This time it took us around 19.5 seconds of total spent time and almost 19 seconds of CPUT time.

Now I will export the same 3 million rows into a file and will load it through the BULK process:

This time it took us 27 seconds of time with 26.5 seconds of CPU time to load all the data into our Database. (I understand this test is quite biased since I am loading data on the same drive).

I decided to advance and load data into a Heap using the same bulk process just for comparison:

It took 26 seconds to Bulk Load the data while spending 21 seconds of the CPU time, after that spending additional 23 seconds to build Clustered Columnstore Index while spending 20.5 seconds of the CPU time.

And so here we are in front of the last option for this test – to load data into a HEAP using parallel insert and then create Clustered Columnstore Index on that table:

This time the process took amazing 9.5 seconds with almost 11 seconds of CPU time. You might be wondering what is going on here, but it is actually quite simple – SQL Server development team has developed a very important upgrade to the execution plans – Table Insert operator which is being executed in parallel mode.
This is quite remarkable, but now it’s time to create our Clustered Columnstore Index:

And so we have additional 18 seconds of time spent on the Clustered Columnstore Index creation with 24 seconds of CPU time being spent.

Now, lets take a look at what we have got here:
Screen Shot 2013-12-15 at 02.38.51
Screen Shot 2013-12-15 at 02.37.53

These results are quite sound in my opinion – it is very clear that should our priority be loading data as fast as possible (based on the requirements such as a small data loading window, a lot of data, etc), we should advance and choose building Clustered Columnstore Indexes on the table where we loading our data to.
From the other side, don’t forget those amazing 9.5 seconds spent on the parallel insert operation into a HEAP table – this might be a very important change should you be able to load data under such specific conditions.

I will be honest, since I was a bit shocked by seeing these results a couple of weeks ago and so I thought – maybe I should try running those queries on different system just to make a reality check. And so I went to Azure and created a reasonable VM. 🙂 Here are my results (note that I have not even tried to run HEAP + Bulk there.)
Screen Shot 2013-12-15 at 02.37.25

I guess that one of the “secret sauces” of the Clustered Columnstore is in its compression – we simply greatly optimize storage and spending less time on IO operation would simply mean spending less time in total, especially because storage are still the slowest elements in computers by far.

Stay tuned for the second part of this data loading article, since there are some other interesting aspects that we still need to look into – Compression results & created Dictionaries for example.

to be continued with Clustered Columnstore Indexes – part 24 (“Data Loading continued”)

2 thoughts on “Clustered Columnstore Indexes – part 23 (“Data Loading”)

Leave a Reply

Your email address will not be published. Required fields are marked *