Clustered Columnstore Indexes – part 31 (“Memory Pressure and Row Group Sizes”)

Continuation from the previous 30 parts, starting from http://www.nikoport.com/2013/07/05/clustered-columnstore-indexes-part-1-intro/

This blog post is focusing on one of the most important real world issues – Memory Pressure. While testing on your own Computer or on some development server in a controlled environment, most of the time, if not always, you are not hitting any limits. There are no hundreds of simultaneous connections bombing your server with requests and there are no queries taking away all of your available buffer pool memory.

I have executed a couple of tests recently and while experimenting with the maximum memory settings to show the Spill process, I totally forgot to reset it to the original value and so while trying to rebuild a Clustered Columnstore Index, I was surprised with quite distinct values as the maximum size of the Row Groups that I was getting.
This blog post is about diving into some of details that I have discovered since.

I am running those demos on a VM with 4 Cores and 8GB of RAM which is using an SSD drive, and so please take in count that your times and final result might vary significantly from mine.

As most of the times in this Clustered Columnstore Index blog post series, let us use Contoso Retail DW database which is freely available for download.

For the start, we need to remove the Unique Constraints (PK,FKs) from our test table – FactOnlineSales:

Next step would be creating a Clustered Columnstore Index on our test table:

Now we can control the amount of memory available to the SQL Server (through controlling “max server memory” setting) and check what kind of effect it will produce on rebuilding a table with Clustered Columnstore Index:

First, let us try it out with 8GB of RAM available to SQL Server.

Our memory grant was 1GB, since we have enough memory available. Using the following query, we are able to find out that there are 13 different Row Groups created for this 12.6 Million Rows table. A very good and reasonable number in my opinion.

Screen Shot 2014-06-21 at 02.29.25

Now let us limit the available memory to 2GB and see what happens:

This time our memory grant was around 768 MB and we have 14 Row Groups, with the maximum number of rows per each of the Row Groups is still capped at the maximum – 1048576:
Screen Shot 2014-06-21 at 02.37.08
I consider a Row Group with just 2 Rows to be an artefact which happens by some builds. I genuinely expect future builds of SQL Server to improve this situation, because there is absolutely no sense in having such a small Row Groups.
Notice: I have rebuild the Clustered Columnstore Index multiple times and every time I did it there was a lonely Row Group with under 10 rows. This has to do with some residual inefficiencies of the parallelism and as far as that Row Group is just one, it has less impact, but if we are using partitioning, than we are actually risking to have a high number of such Row Groups.

Lets decrease the max server memory even further to 1 GB:

The memory grant dropped even lower to 182 MB but the number of the Row Groups as well as the maximum number of rows per Row Group stayed the same.

Let’s set it on 512MB:

This time we are down to 91 MB for the memory grant,
AND we actually have 29 Row Groups which host up to 524.289 Rows at maximum.
Screen Shot 2014-06-21 at 02.49.00

Here is the rest of the script that I ran on my Virtual Machine in order to extract more information out of the Clustered Columnstore Index under Memory pressure (I went down straight to just 256 MB):

Here you have a complete table with the results that I have obtained in my environment:
Screen Shot 2014-06-21 at 16.37.47
And here we have the visualisation of what I have managed to capture:
Columnstore Row Groups per Available Memory
It is quite visible that once we have less than 1 GB, the number of Row groups starts to raise drastically – which can only mean that the maximum number of rows per Row Groups should go down:
Max Row Number in a Row Group per Available Memory

And hence we should have a clear picture of the Memory Grants getting lower and lower (which naturally happens under Memory Pressure in a real environment):
Memory Grants per Available Memory

For a better analysis here are 2 more images which are showing some of the dependencies between Memory Grants and the maximum number of Row Groups as well as the relation between the Memory Grants and the number of the Row Groups created by the Columnstore Index build process:

Memory Grants vs Max Rows in a Row GroupMemory Grants vs Row Group Number

You can clearly see that for this table, on this hardware once we go below 1GB on Max Server Memory, and hence the Memory Grant goes below the needed 180 MB – Clustered Columnstore Index build process starts to create more Row Groups with less Rows inside. Lack of the memory provokes the necessity to split the job between multiple Row Groups, which means less desirable result as we shall see in the next blog post. 🙂

Rest to say that the actual execution time for any of those operations was very similar it went from 57 to 66 seconds, and while of course I do understand that I did not measure the CPU time and that it should vary, in the cases when you have enough CPU – this should not be your biggest issue.

Final Thoughts:

Clustered Columnstore Indexes are very memory hungry feature of SQL Server and not giving them enough memory is simply not an option.
Besides using Partitioning for better Segment Elimination, Maintenance & Management purposes, one other key feature of SQL Server to be used for the Maintenance purposes is the Resource Governor. Giving Clustered Columnstore Maintenance procedures enough space and memory is a key for getting the best result.
One other key limitation of the SQL Server is that in the default resource pool the maximum amount of memory that is granted to a query is 25%, and should you have really wide tables with a lot of columns, than you might need to give an extra edge to your build/rebuild processes.
That is why in a real production environment you might consider creating your own resource pool for the Clustered Columnstore Indexes management.

to be continued with Clustered Columnstore Indexes – part 32 (“Size Does Matter, but how ?”)

Leave a Reply

Your email address will not be published. Required fields are marked *