Clustered Columnstore Indexes – part 49 (“Data Types & Predicate Pushdown”)

Continuation from the previous 48 parts, starting from

Note: Updated on 05th of March 2015 with information on SQL Server 2012.

Having blogged a couple of times on the matters of Segment Elimination (“Data Loading for Better Segment Elimination”, “Deleted Segments Elimination” ), I decided to go 1 step deeper in this direction and see which data types support segment elimination and which not.

For this purpose I will re-create my table MaxDataTable will all supported Data Types and then will run test queries on it, determining if Predicate Pushdown is taking place on the Data Type in question:

Next step is to load 2.5 Million Rows, so I would have some data to play with:

Now I am ready to update the table with some random data, and notice that for numerical values I am putting lower values (150 and below) into the first segment (1045678 rows):

Time to create a Clustered Columnstore Index with MAXDOP = 1, to make sure that the data is not shuffled around:

For SQL Server 2012, I have used the following Nonclustered Columnstore Index, which excludes the columns (2,13,22,23,24) with data types that were not supported in SQL Server 2012:

Warning: you will see different values, since the data I have loaded is pretty much random.

After having all data loaded, we can start analysing it:

segments_info_for_eliminationFrom the picture on the left you should be able to see that I have different data in the listed segments and so I can actually execute queries expecting Segment Elimination to take place.

Before advancing to far, the first thing I would love to do is to filter out those Data Types that do not support minimum & maximum values for the segments thus disabling any possibility of the Segment Elimination or Predicate Pushdown. It is easy to determine them, by running the following query, which will filter out all the columns that have only zeros:

segments_with_no_information_on_min_max_values_You can clearly see that the expectation for the following Data Types (binary, varbinary and uniqueidentifier) is pretty low – even though they have distinct data values, they show through sys.column_store_segments DMV that they do not have any values different to 0.

To ensure that we do have distinct values indeed, you can run the following query, which will give you the number of distinct values for each of the columns:

Testing Segment Elimination

There is nothing better as to run a test query and see the practical results for each of the data types, and so I will kick off with the BIGINT data type, which is pretty much the basic requirement of almost any BI or DataWarehouse solution:

bigint_execution_plan_exampleThis is the execution plan for the above query, you can see that we have only the essential elements with no Filter iterators. The absence of the parallelism iterators (and hence the batch mode if you are wondering) is explained by the low query cost – the whole subtree in my case is estimated to the value of 0.913655.
Should you have any doubts, you can use the column_store_segment_eliminate Extended Event, described in the Clustered Columnstore Indexes – part 47 (“Practical Monitoring with Extended Events”)

bigint_predicate_pushdownYou can clearly notice that we have a predicate well defined, it is written under the table of the Columnstore Table Scan – [Columnstore_Play].[dbo].[MaxDataTable].[c1]<CONVERT_IMPLICIT(bigint,[@1],0) . There is no residual predicate and the execution plan does not include any additional filters – everything is fine.

That’s fine, we have BigInt with us, but what about the others – numeric for example:
After analysing what’s inside the Segments I have created a query that have eliminated the 2nd and the 3rd Segments,

numeric_execution_plan_exampleIf you look at the execution plan, you will notice 2 important things – we are reading too many rows from the Columnstore Table (we are reading them all actually – all 2.5 Million of them) and the new iterator Filter which is actually doing all the work, filtering 69839 rows in my case.
numeric_predicate_pushdownIn case you are still wondering, there were no events of segment elimination in my Extended Events session which serve as an additional proof that the Numeric values are not supported for Segment Elimination.
This is an interesting situation, showing that the implementation of Segment Elimination for Numeric Data Type is probably on the way, because there is a support for minimum and maximum values in Segment information. I understand that it might be not the biggest priority for the development team because in the most cases the biggest is made with integer data types (they are occupying less space typically and they are used for determining the relationship between fact & dimensions in DWH), but I know that there are a lot of queries where filtering on some numerical information would be a great improvement – filtering out outliers in a IOT table for example 🙂

Now to the one of the most interesting & probably less useful Data Type – the bit. As you might remember, I have set the column C3 equals to 0 for the first Row Group while the second and the third ones include both 0 and 1 values.
Let’s run the query accessing the values of the first Row Group:

The actual execution plan includes the very same iterators, as the one with Integer Data Type, with number of values being passed from Columnstore Index Scan to Stream Aggregates being equal to the maximum number of rows in a Row Group – 1048576. Extended Events confirm what is also visible from the execution – that the Segment Elmination took place for this Data Type.

I will avoid going into details of each of the Data Type that I have tested, by simply listing the queries I have run and the respective results:

Here are the results of my investigation on SQL Server 2014:

Data Type Supports Min & Max Predicate Pushdown Segment Elimination 2012 Predicate Pushdown
bigint yes yes yes yes
numeric yes no no type is not supported for precision above 18
bit yes yes yes yes
smallint yes yes yes yes
decimal yes yes yes yes
smallmoney yes yes yes yes
int yes yes yes yes
tinyint yes yes yes yes
money yes yes yes yes
float yes yes yes yes
real yes yes yes yes
date yes yes yes yes
datetimeoffset yes no no type is not supported for precision above 2
datetime2 yes yes yes yes
smalldatetime yes yes yes yes
datetime yes yes yes yes
time yes yes yes yes
char yes no no no
varchar yes no no no
nchar yes no no no
nvarchar yes no no no
binary no no no type is not supported
varbinary no no no type is not supported
uniqueidentifier no no no type is not supported

From the table above we can see that should the Data Type not support the predicate pushdown operation, then obviously there is no Segment Elimination for the Columnstore Indexes and thus the overall performance of the queries should suffer significantly.
The difficulties with the Character Data Types are one of the areas where Columnstore Indexes could improve in the future, since SQL Server 2012 it is known that the performance with those types is a major problem.

Numeric Data Type appears to be one of the 2 major outsiders and as far as I am personally concerned – the biggest one. It should not be completely impossible to solve this one and I would hope that the next SQL Server version shall include support for it.

The surprising situation with DateTimeOffset Data Type is something that is quite unsurprising, since this is not one of the most popular & frequently used Data Types out there. I think that besides focusing on including some of the not supported Data Types such as CLR and (N)Varchar(max) the development team should focus on improving the performance of the existing data types.

to be continued with Clustered Columnstore Indexes – part 50 (“Columnstore IO”)

13 thoughts on “Clustered Columnstore Indexes – part 49 (“Data Types & Predicate Pushdown”)

  1. Vassilis Papadimos

    Greetings Niko — Another thorough writeup! One nit: The difference between c2 and c5 in your example is not about decimal vs. numeric (SQL rarely distinguishes between the two), but about declared precision, 18. Same deal with datetimeoffset, where the boundary precision is 2. (P.S. I work for Microsoft).

    1. Niko Neugebauer Post author

      Hi Vassilis,

      thank you very much, your help is truly appreciated!
      I have updated the table with your information – wish I had it before SQLBits, where I have presented an updated version of it (mentioning numeric & datetimeoffset precisions but not expanding on the c2 & c5 differences.)

      Best regards,
      Niko Neugebauer

      1. James Lupolt

        Hi Niko and Vassilis,

        So just to be clear — segment elimination for decimal & numeric types with precision over 18 is currently not supported in SQL 2014 (as of the current release, SP1)?


        1. Niko Neugebauer Post author

          Hi James,

          to my understandings, there is no support for Segment Elimination in SQL Server 2014 at all.
          It all starts with no support for predicate pushdown (is it coming for SQL Server 2016? :)) for numeric data types, and it does not really matter what precision do you use.

          Best regards,
          Niko Neugebauer

    1. Niko Neugebauer Post author

      Hi Mato,

      Segment elimination happens when we avoid reading Columnstore Segments and Row Groups, while predicate pushdown is a way of doing data processing at the deep level of the storage engine, without pushing the information into the memory and processing it later. For aggregation or filtering this does miracles.

      Best regards,
      Niko Neugebauer

  2. Alexander Sharovarov

    Hi Niko,

    Thank you for your great blog post series. I learned a lot from it. We are using Columnstore index for one of our fact tables and I noticed that Aggregate pushdown doesn’t work for float/real data types even though they fit in 8bytes (64bit) size limit. We are running Microsoft SQL Server 2016 (SP1-CU3).

    I wonder if I miss something or it is a big. Have you experienced anything like that?

    Thank you in advance,

    1. Niko Neugebauer Post author

      Hi Alexander,

      thank you for the kind words.
      You have written that the data types FIT into 8 bytes Float/Real, but are they declared as such ?
      Can you please share the column definition ?

      Best regards,

        1. Niko Neugebauer Post author

          Hi Alexander,

          interesting, I have confirmed that Float & Real are not supported. I thought that there were just other prohibitive limitations for them, but they are a no-go for way too many situations.
          I expect the docs to be updated soon.

          Regarding Numeric data type, only when it hits 19 precision, will it occupy 9 bytes – the docs are borked again, but before it was well written there, I believe. I have been showing it in my workshops for quite some time.

          Best regards,

Leave a Reply

Your email address will not be published. Required fields are marked *