Columnstore Indexes – part 108 (“Computed Columns”)

Continuation from the previous 107 parts, the whole series can be found at http://www.nikoport.com/columnstore/

As we are approaching the RTM (release to manufacturing) of the SQL Server 2017, I decided to focus on the implemented features in the upcoming release, thinking that if they are in the RC (release candidate) than most probably they will be here to stay and some of the next blog posts will be focusing on the features released in the CTP’s of the SQL Server 2017 that I have not mentioned before.

The Release Candidate 1 of the SQL Server 2017 has brought one very significant and extremely long overdue feature support for the Columnstore Indexes. I am talking about the feature that was originally available in the SQL Server 2012 RTM and was taken away from the engine in the CU1 or CU2. Yes, that feature worked for the Nonclustered Columnstore Indexes that were read-only at the time of SQL Server 2012. The feature that in the Banking and Health industries are extremely widely used: Computed Columns.
For the SQL Server 2017 RC1 Microsoft silently added the possibility of using computed columns for the Clustered Columnstore Indexes and this might make a huge difference for some companies when deciding wether migration to SQL Server 2017 is worth it. From my own experience there were some scenarios were clients would get upset by the idea of moving away from the computed columns just because Columnstore Indexes did not support them, so that after asking for a long time about the computed columns I even created Include support for Calculated Columns on Columnstore Indexes.

So when I found out that it was implemented I confess that I jumped for a joy a couple of times, I was so excited about it. My huge thanks goes to Jovan Popovic (Microsoft PM) for pointing out at it, since without a major announcement I did not found out this news on my own.
I had already a VM on Azure running with a RC1 of the SQL Server 2017 and so I decided to advance with some basic tests to discover the supportability of the computed columns and below you will find my tests that I have run on the restored version of the ContosoRetailDW free database:

I decided to create a new table, that I will call FactOnlineSales_Computed, containing a new column [EarnedAmount] that will be automatically calculated from the following equation that involves 4 different columns of the very same table: ([SalesAmount] – [ReturnAmount] – [DiscountAmount] – [TotalCost]).
Additionally I decided load the whole FactOnlineSales table into the new table – this means 12.6 million rows and run a couple of tests to see how the engine behaves.

This script executed without any problem on the SQL Server 2017 RC1, while running the same script on the SQL Server 2016 with Service Pack 1 will result in the following error message:

The fact that the we can have computed columns feels my heart with gratitude for finally getting this feature into the engine, this will allow some scenarios that were a total no-go to advance with the Columnstore Indexes, but let us put a couple of queries against the table with the Clustered Columnstore Index in order to find out how it functions and if there are some differences with the traditional RowStore implementation of the computed columns:


If you look carefully at the execution plan, you will notice that the Columnstore Index Scan, Compute Scalar and Filter are basically where the magic happens – we scan the original 4 columns, compute the desired value for the [EarnedAmount] and only then filter those values.

Right ? RIIIIHT ?

Let’s set up a different version of the very same table, this time using the Rowstore Indexes and then we shall fire the very same query to compare the result:

Let’s execute the same query we have run against the Columnstore Indexes table but this time against our Rowstore table:


Look, this time we have a predicate pushdown to the storage engine with just 4.9 million rows out of the 12.6 million rows coming out of the Clustered Index Scan, plus there are no computer or filter operations. I definitely wish to see the same functionality on the Columnstore Indexes side, if not now – then fixed in the near future.

Lets see now if we run the query getting the average EarnedAmount out of our table – will we get some surprises on the execution side for both types of the table (Rowstore & Columnstore). Will there be a Aggregate Predicate Pushdown for the Columnstore Indexes?

This time the execution plans are essentially the same in the form with 2 Compute Scalar operations where the value for the EarnedAmount is calculated and then aggregated. The execution time is of course slower for the Rowstore, since all relevant operations for the Columnstore Indexes runs in the Batch Execution Mode and this is where we get the biggest speed advantage over the Rowstore Indexes.
Of course the Aggregate Predicate Pushdown did not take place, because there is no support for the simple predicate pushdown with the Computed Columns, and so the more complicated the formula is, the more time we spent calculating the final value – but for the scenarios where the performance and filtering are important we have PERSISTED computed columns, and so let us try them out for the Clustered Columnstore Index:

Persisted Computed Columns

Let’s recreate our table, this time making our calculated column [EarnedAmount] as a persisted one.

Now all we need to do is to create the Clustered Columnstore Index:

The above query results in the following error message:

First of all this is a huge pity that the computed columns are not supported right now, but the irritating part here is the error message itself, delivering wrong message to the final user – the non-persisted computed columns are in fact supported.
I understand that storing the data in the Columnstore Index is much more difficult than just calculating the values out of it, but this feature is very needed, especially if the calculations are extremely complex or if we want to use this computed value as a predicate.

Nonclustered Columnstore Index

The HTAP(aka Operational Anyltics) scenarios might have some computed columns, enabling some more complex scenarios that are avoiding views usage because of the existing code, for example – let’s take it to the test by creating a new table, containing NCCI and a computed column:

The error message resulting in the message will still be the generic ones, but its easy to interpret – no computed columns are supported for the Nonclustered Columnstore Indexes:

Notice that we can still create a Nonclustered Columnstore Index on the table with a computed column, simply by not including the column into our index:

In-Memory Columnstore Index

Let’s see if the In-Memory Columnstore Index supports computed column and I do not hold my breath here, especially since the on-disk Nonclustered Columnstore Index does not support them.

As expected, the script above generates an error message, since the computed columns are not supported for the Columnstore Indexes:

Final Thoughts

First of all – I am HUGELY excited about computed columns getting their way into the disk-based Clustered Columnstore Indexes. HUGELY – I have been missing this feature for way too long!
It is a huge pity that right now we do have support for the computed columns only in their persisted variant, and the only scenario that is supported is the one for the DWH (Data Warehousing) with disk-based Clustered Columnstore Indexes.
I do admit thought that mostly the usage of the computed columns in my experience is related to the DWH, mostly because regular developers have no need for calculations with complex formulas to deliver to the final user, and if there are such computations – they do not change frequently.
Still, I am hoping for a full implementation of the computed columns support for all Columnstore Indexes scenario, as I put in my original connect item: Include support for Calculated Columns on Columnstore Indexes, especially for the persisted computed columns.
Especially the predicate pushdown is the functionality that I am looking forward to … One day hopefully. 🙂

to be continued with Columnstore Indexes – part 109 (“Trivial Plans in SQL Server 2017”)

Leave a Reply

Your email address will not be published. Required fields are marked *