Clustered Columnstore Indexes – part 30 (“Bulk Load API Magic Number”)

Continuation from the previous 29 parts, starting from http://www.nikoport.com/2013/07/05/clustered-columnstore-indexes-part-1-intro/

It has been a long time (almost a year) since Microsoft revealed the public preview version of SQL Server with Clustered Columnstore Indexes available, and ever since there were hundreds of presentation talking about the Bulk loading and the magic number around 100.000 rows which separates the final result as an open Delta-Store vs a compressed Segment.
I am guilty in this imprecise information as well – ever since the first presentation on Clustered Columnstore in July of 2013 I was telling everyone about “around 100.000 rows”.
Now it is the time to find out the exact number.

I have already shown this demos last weekend at SQLSaturday Edinburgh 2014, and so it is time to publish it on the web.

I will kick of by creating my test table:

Now let us load a full Segment (1045678 rows) and update it with some random data and then invoke Tuple Mover (since it will keep on staying as an open Delta-store even when it reaches the maximum number of rows):

Now we can verify if what kind of structures we have in our Clustered Columnstore Table:

Screen Shot 2014-06-20 at 21.41.38

We have enough data to play, so let us export 100K & 105K Rows into a file and load them back into our table using BULK API to see what happens:

I ran again my query to analyse what is going on with our table on the Row Group level and as you can see the 100.000 Rows BULK Load API generated a Delta-Store, while 105.000 Rows Build Load API created a compressed Segment.
Screen Shot 2014-06-20 at 21.48.53

Let us keep on and create 102.500 Rows Segment and see how it goes:

Here is our result:
Screen Shot 2014-06-20 at 21.59.03
This basic means that the Tipping Point of Bulk Load API is to be found between 100.000 & 102.500 Rows. Lets find the exact number!

Actually there is one interesting number that might be connected to this story – let us try something containing 1024, such as 102.400 🙂 :

Screen Shot 2014-06-20 at 22.05.25
Bingo, we have our winner – 102400 Rows is the magic number when Bulk Load API is switching from the Delta-Store usage into loading data into a fully operation & compressed Segment. Our 102.3909 File was added to an open Delta-Store, while we have a full segment with 102.400 Rows.

Wait a second, but I was actually testing rather short table, and what about much wider table ? I will create a table with Clustered Columnstore with almost the maximum width of the columns allowed – 8K+:

— Check on the Row Groups status

And here we go:
102400 is the magic number for the Clustered Columnstore Bulk Load API
102400 is the magic number for the Clustered Columnstore Bulk Load API

to be continued …

2 thoughts on “Clustered Columnstore Indexes – part 30 (“Bulk Load API Magic Number”)

Leave a Reply

Your email address will not be published. Required fields are marked *