Columnstore Indexes – part 89 (“Memory-Optimised Columnstore Limitations 2016”)

Continuation from the previous 88 parts, the whole series can be found at http://www.nikoport.com/columnstore/.

I would like to dedicate this blog post to the Memory-Optimised (also known and LOVED as Hekaton) Columnstore Indexes and their limitations in SQL Server 2016.
Disclaimer: the Memory-Optimised Technology is the ground-breaking development, which will be truly appreciated only in the next couple of years, and it has its incredible use cases (and maybe I will be blogging more about this space in the next couple of months), but people needs to understand that mapping InMemory Columnstore Indexes to disk-based Columnstore Indexes 1:1 is a very wrong idea, and that because InMemory technology is significantly younger and less stable than Columnstore Indexes – there are some very significant hidden cornerstones.

I have already blogged twice on the internals (Columnstore Indexes – part 72 (“InMemory Operational Analytics”)) and on the Columnstore Index addition (Columnstore Indexes – part 81 (“Adding Columnstore Index to InMemory Tables”)), but overall I feel that this topic is still very much unexplored, and being new it is a kind of normal – since most people do not want to risk to dive into unknown space.

Construction Limitations

If you have tried or worked with Memory-Optimised tables in SQL Server 2014, you will surely know that once you have created such table, you would not be able to alter it in any way. The same principle applies to the Memory-Opimitsed Stored Procedures in SQL Server 2014, where in order to change one, you have to drop and re-create it again.

In SQL Server 2016 this limitations (and many others) were removed and we can change our Memory-Optimised tables, changing their meta-data such as adding more columns or removing them, adding or removing indexes or changing hash indexes bucket count.
We can also add the Memory-Optimised Clustered Columnstore Indexes to our hekaton tables (within table definition or post-table creation) and everything should be fine and work perfectly.

🙂

Let’s take it for a ride, by restoring a copy of my favourite free ContosoRetailDW database, adding memory-optimised file group:

Let’s create our test Memory-Optimised table:

Let’s load 2 million rows

We can change the meta-data structure by adding another column

Let’s test if we can drop it, just to make sure we can do that:

Now, its time to add a Clustered Columnstore Index

Works fine as expected! Lovely stuff!

Now, let’s add a Hash Index, that we see that is necessary for our OLTP queries:

Msg 10794, Level 16, State 15, Line 1
The operation ‘ALTER TABLE’ is not supported with memory optimized tables that have a column store index.

Wait a second! What’s going on ? WHY ?
Can we do any other things, like adding or removing a column:

Msg 12349, Level 16, State 1, Line 1
Operation not supported for memory optimized tables having columnstore index.

Msg 12349, Level 16, State 1, Line 5
Operation not supported for memory optimized tables having columnstore index.

Bang!
That looks like SQL Server 2014 Memory-Optimised Tables, we can’t change anything after we add a columnstore index (and addition takes a lot of time, just check Columnstore Indexes – part 81 (“Adding Columnstore Index to InMemory Tables”)
That is a very serious bummer and you should be extra-careful when stepping into this kind of adventure, if you expect your schema to change regularly – because every index addition and removal operation for Columnstore Indexes is offline only!

Let’s drop the index and try to add it with a Columnstore Archival compression (also notice that there is no partition support for the Memory-Optimised tables and you can’t ):

Msg 10794, Level 16, State 91, Line 2
The index option ‘data_compression’ is not supported with indexes on memory optimized tables.

Yeah, no Archival compression.
Important to think that it’s improvement is not that relevant, since the vast majority of the memory space will be occupied by the memory-optimised table itself (we are talking about some 5% of the overall memory space), but I believe it is important for everyone to know what to expect when migrating to InMemory Columnstore.

In-Memory? 🙂

🙂
As you know, the Memory-Optimised Tables have all their Indexes only in Memory, that’s the reason why they are so fast and so great.
Let’s take it to the test. 🙂

Let’s re-create our Memory-Optimised Table and observe what happens with the disk space when we add a columnstore index and compare it to the regular hash in-memory index:

Using the upcoming version of the MOSL, let’s take a look at the checkpoint files and their sizes while also observing the C:\Data\Contosoxtp\$HKv2 folder, which we have configured for storing the data:
Here are the results that I can see with my WorkInProgress version for SQL Server 2016:
mosl-checkpoint-files-stats
The FileSystem looks the following way:
filesystem-original
Let’s add the Clustered Columnstore Index and issue a checkpoint:

After good 9 seconds 🙂 on my VM, I have the following results Checkpoint File Pairs and for the file system:
mosl-checkpoint-file-stats-after-adding-columnstore
filesystem-after-adding-columnstoreWhat happened on the file system is that the occupied space has been simply doubled – it has went from 0.328 GB to 0.672 GB ! So what the effectiveness of the Columnstore compression? Why do we have any File System imprint at all ?
Why don’t SQL Server keep all the data In-Memory like it supposed to do ?

Let’s step back and see what happens when we add a regular index to our table (include here running all the restore part and table creation :

filesystem-after-adding-hash-indexOn the picture on the right side, you can see the complete list of the checkpoint files that were added to the file system after executing the addition of a new Hash Index, the new files that were added are highlighted and their sum is around 68 MB.

Below this text, you can see the relative impact that the addition brings, but keep in mind that while the columnstore indexes addition will grow proportionally to the number of rows added, the nonclustered hash indexes will keep their size pretty much at the same level.
effect-in-mb-for-adding-2-million-rows

Columnstore: Disk vs In-Memory

Let’s compare the size of the Row Groups between the Disk-Based Clustered Columnstore Indexes and the In-memory Clustered Columnstore. Yeah, I know – the algorithms should be absolutely the same or similar and one would not expect to have any significant differences, but you know, just in case 😉
Let’s use CISL for discovering the sizes of the Row Groups:

row-group-sizes-disk-based-vs-in-memory

total-row-group-sizes-disk-based-vs-in-memoryOne can very clearly notice the huge difference (almost double the size) between the sizes of the disk-based columnstore indexes and the in-memory ones. As the size of the table increases, so will the occupied space and keep in mind, that for the disk-based columnstore you can always cut the disk space even further by using Columnstore Archival compression.
Let’s consider if the are some serious differences in the dictionaries and for verifying that, lets run the dbo.cstore_getDictionaries function from the CISL:

dictionaries-differences
Its really interesting that though the sizes of the dictionaries are very similar, the total number and their distribution are definitely not. I will be looking into the whole dictionary and the compression story and differences of the columnstore indexes in the upcoming months, but in the mean time let’s leave it for here.

The reason for this significant difference is that the disk-based columnstore are optimised on the compression while the Memory-Optimised ones are not, and to see that information you can easily use the CISL function “dbo.cstore_getRowGroupsDetails”:

You can see the results marked in red on the right side of the image below:
row-group-sizes-disk-based-vs-in-memory-focused-on-optimisation

Natively Compiled Stored Procedures

The message that has been delivered over and over again over all type of media and live presentations and blogs is that the natively compiled stored procedures are much better and that everyone should be using them when working with Memory-Optimised tables.

Let’s take it to the test 🙂
Let’s run the very same query as a interpreted one and compare it’s performance with a query written in the natively compiled stored procedure:

Since the natively compiled stored procedure with Columnstore indexes, does not show:
– logical reads for the columnstore index
– execution plan,
the only thing we can easily measure are the execution times, and they look like this:
317 ms vs 174 ms (Native Compiled Stored Procs vs Interpreted Query). Visually this looks like this:
natively-compiled-stored-proc-vs-interpreted-queryThat’s because the first query, the one that runs with the Natively Compiled Stored Procedure runs with a single core and it will not use the Columnstore Indexes, while the interpreted query is not limited to just 1 core! Meaning that you have got to be very careful when going all in on the natively compiled stored procedures, because when your analytical query requires more power than just 1 core, you will not be able to get it from SQL Server 2016.

OLAP vs HTAP (Hybrid Transactional/Analytical Processing)

Whenever you are comparing the real data warehousing solution (star schema or whatever design you are using), you should clearly understand the dangers of going into the Operational Analytics kind of the design (or now more known as HTAP – Hybrid Transactional/Analytical Processing).
Even though we have Clustered Columnstore Indexes on disk-based and on memory-optimised tables, they are very different solutions that are not that much comparable in the terms of the design, optimisations and naturally performance.

Final Thoughts

This post is not intended to be used as a bashing tool of Microsoft, but rather as a warning for those who run for the newest solutions, without understanding their true limits.

to be continued with Columnstore Indexes – part 90 (“In-Memory Columnstore Improvements in Service Pack 1 of SQL Server 2016 “)

2 thoughts on “Columnstore Indexes – part 89 (“Memory-Optimised Columnstore Limitations 2016”)

  1. Mohsen

    Hi great article.But I have specific question -we have a table with 90 million records and a server with 8 GB memory allocated.
    when we use xVelocity in-memory analytics engine on SSAS 2016 there is no problem and it just took 400 MB of memory but when we want to use it with in-memory optimized ColumnStore index 2016 (create an in-memory table with clustered columnstore index) we encounter an error like this:
    There is insufficient system memory in resource pool ‘default’ to run this query.
    I thought memory optimized ColumnStore index must use less memory than the xVelocity in-memory analytics engine and this two technologies are just the same! what is it so?
    thanks in advance.

    1. Niko Neugebauer Post author

      Hi Mohsen,

      In-Memory Clustered Columnstore does not use Vertipaq Compression and compresses less in comparison to the disk-based Clustered Columnstore (Logical reason is that currently In-Memory Columnstore is targeting HTAP aka Operational Analytics, where insertion speed is the key). I have mentioned that difference in the following blog post http://www.nikoport.com/2016/10/25/columnstore-indexes-part-89-memory-optimised-columnstore-limitations-2016/
      Did you try working with Resource Governor, configuring a different amount of RAM for the memory grants as a solution ?

      Best regards,
      Niko

Leave a Reply

Your email address will not be published. Required fields are marked *