Columnstore Indexes – part 86 (“New Trace Flags in SQL Server 2016”)

Continuation from the previous 85 parts, the whole series can be found at

With the release of SQL Server 2016, a great number of new functionalities with specific internal structures and own behaviours were created (Columnstore Indexes – part 60 (“3 More Batch Mode Improvements in SQL Server 2016”), Columnstore Indexes – part 60 (“3 More Batch Mode Improvements in SQL Server 2016”), Columnstore Indexes – part 65 (“Clustered Columnstore Improvements in SQL Server 2016”), Columnstore Indexes – part 72 (“InMemory Operational Analytics”)) – so that there is a customer need to execute a good control under some certain circumstances.

2 years ago I have blogged about available Trace Flags & Query Rules for the Columnstore Indexes in SQL Server 2014, and now it is the time to revisit this topic and show what new Trace Flags are available in SQL Server 2016, and this article focuses on this topic.

Batch Sort

The possibility of having your data sorted with the help of the Batch Execution Mode was introduced in SQL Server and some of the key improvements for the SQL Server 2016, such as Window Functions depend on the velocity of execution of the Sort iterator
The vast majority of the time, it works fine, but some rare times it can bring you into some serious trouble.
For that purpose, the documented (I call it documented, since it is mentioned in the KB 3172787) Trace Flag was introduced to allow to disable the Batch Execution Mode for the iterator.

This trace flag works as a Configuration Parameter, as a Global Trace Flag, Session Trace Flag or even QueryTraceOn option).

I will use the free ContosoRetailDW database (it is so easy to play) and run the standard script for restoring a copy of it from C:\Install\, upgrading it’s compatibility level to 130 (batch mode improvements require it) and then dropping the primary clustered key from the FactOnlineSales table and create a Clustered Columnstore Index on it:

Let’s run the following test query against our Clustered Columnstore table, it should give us in the execution plan a Sort Iterator which will function in the Batch Execution Mode:

Sort Iterator in Batch Mode
The 3rd from the left, the Sort Iterator is there, running in the Batch Execution Mode and occupying estimated 34% of the resources. It took just 171ms on my virtual machine.

Now, let’s us enable the Trace Flag 9347 and see if it shall bring the expected impact to the Batch Sort iterator (notice that if you are running on the RTM version of SQL Server, you will face a bug that was fixed in the Cumulative Update 1, which is already available):

Sort Iterator in Row Mode with Trace Flag 9347
Occupying estimated 84%! of the resources, the very same query runs over 15 times slower – 2731 ms, all because of the Sort Iterator being executed in the Row Execution Mode. What makes the performance suffer more is that the Sort Iterator does not get enough memory through the memory grant and that is why it is spilling on the disk.

Top N Sort

The Top N Sort iterator does sort like the above mentioned Sort iterator, with the difference that it passes only top N rows after sorting. There are different optimisations to the used algorithm that make this iterator functioning differently from the Sort iterator.
Given it is a different functionality and a different iterator, SQL Server 2016 has a separate Trace Flag that will allow you to disable the Batch Mode on it – the Trace Flag 9349. It is important to have it separate from the complete Sort Iterator, since a number of times one can catch both operators and having a possibility to shut down just a Top N Sort can be advantageous.

Let’s run a SELECT TOP X query, selecting 100.000 rows from the results of our previous query:

Top Sort Iterator in Batch Mode
As you can see on the image above, the execution plan presents TOP N Sort Iterator functioning in the Batch Execution Mode, taking 101ms in total execution time on my VM, while occupying estimated 53% of the execution plan resources.

Let’s execute the same query this time with the help of the Trace Flag 9349, thus preventing the TOP N Sort Iterator from running in the Batch Execution Mode:

Top Sort Iterator in Row Mode with Trace Flag 9349
This time the query took good 1820ms, over 18 times slower then the query with Sort Iterator using the Batch Execution Mode. Even the estimated resource consumption jumped up to 92%, but from the other side the amount of memory that was granted for the second query is around 38MB vs 72MB that was granted for the query with the Batch Execution Mode. When running an OLTP that is using a huge number of parallel connections, the amount of used memory can be the most important resource and keeping it down to an accepted level can become the paramount of keeping the system alive.

Sort in a Complex Parallel Query

Sometimes there are complex queries that suffers from the unnecessary Batch Execution Mode for Sort Iterators and since Cumulative Update 1 for SQL Server 2016, there is a new trace flag – 9358 for disabling it.

I could not easily produce with a sample Database such an example query, but I hope to get back to this blog post one day and provide a practical example.
An important additional part to consider is that according to the official documentation, the good old Trace Flag 4199 will produce the same effect, meaning that if it is active – you will see some of your Sort Iterators running in the Row Execution Mode.

Dynamic memory grant for batch mode operators

Starting with SQL Server 2016, if you have enough RAM and suffering from the TempDB Spills that do have a significant impact on your workload, then you can enable the Trace Flag 9389 that will enable Batch Mode Iterators to request additional memory for the work and thus avoiding producing additional unnecessary I/O.

I am glad that Microsoft has created this functionality and especially that at the current release, it is hidden behind this track, and so Microsoft can learn from the applications before enabling it by default, hopefully in the next major release of SQL Server.

Merge & recompress during columnstore index reorganisation

Some of the most incredibly useful feature in SQL Server 2016 was removal, recompression and merge of the fragmented Row Groups (containing deleted data in the Deleted Bitmap) that I have already described in details in Columnstore Indexes – part 74 (“Row Group Merging & Cleanup, SQL Server 2016 edition”).
In SQL Server 2016 we also have a special knob for disabling this functionality and it is a Trace Flag 10204. In other words, your SQL Server 2016 database with compatibility level 130 will become a kind of SQL Server 2014, where Row Groups maintenance was non-functioning with ALTER INDEX … REORGANIZE command.

Disk-Based Clustered Columnstore

Nothing beats a practical example that is also presentable in a blog post 🙂 and so here is the code to create an exact copy of a FactOnlineSales with a Clustered Columnstore Index, which will serve us for testing the ALTER INDEX … REORGANIZE command:

The next step is to install the CISL – Columnstore Indexes Script Library, which will provide us with the insights on the internal structures of our table:

Data Loaded into CCI Table
Everything is fine as you can see. We have a copy of our data with the Clustered Columnstore Index.

Let’s enable the Trace Flag 10204 (it does not make a lot of sense to do it on the session level of course, since the automated Tuple Mover runs in the background, unless disabled with the other Trace Flag 634):

We can now easily advance with our delete process, and so the next statement shall delete the first 2 million rows ordered by the column OnlineSalesKey:

Let’s consult the internal structures of our FactOnlineSales_CCI table:

You can see on the image below, that we have 2 million rows marked as deleted in the Deleted Bitmap and that the very first Row Group is completely deleted:
2 Million Deleted Rows with Trace Flag 10204

All we need to do now, is to invoke the Tuple Mover with the ALTER INDEX … REORGANIZE statement, so that we can see the removal of the first RowGroup and Auto-Merge of the second Row Group:

The statement does not even take 1 millisecond on my VM .. Because it does nothing. The activated Trace Flag goes deeper than the corresponding Extended Event columnstore_no_rowgroup_qualified_for_merge – which registers nothing at this point. Troubleshooting the issue with another potentially helpful Extended Event columnstore_rowgroup_merge_failed brings nothing as well. The trace flag simply shuts it down without a trace (pun intended).

Running the CISL functions will deliver the same result as we have had above, the ALTER INDEX … REORGANIZE falls back into SQL Server 2014 mode, where it takes care only of the open Delta-Store by default.

2 Million Deleted Rows with Trace Flag 10204
I guess that this functionality will drive people completely mad, if they know nothing about this trace flag, but hey, if you are using CISL from the version 1.3.1 and later, your cstore_GetSQLInfo.sql will provide you with the correct output (the current file at GitHub gives you this functionality already):


The Nonclustered Columnstore

We shall make now the same test we did for the Clustered Columnstore, but this time we shall do it for the Nonclustered Columnstore Index, which is also disk-based, as the previous one:

Let’s consult the internal structures:

Data Loaded into NCCI Table

Everything is under control here, let’s rebuild the Index 2 times (data will be first moved from the Deleted Buffer to the deleted Bitmap in the first step)

Let’s consult the internal structures again:

2 Million Rows Deleted and Merged from Deleted Buffer
Same as for the Clustered Columnstore Index, the data will not be migrated or merged between the Row Groups, because of the active trace flag 10204.

Memory-Optimized Columnstore

But let’s take it one step further, and see if this trace flag affects the Memory-Optimised Tables with Clustered Columnstore Index as well:

In order to move the data from the Tail Row Group, I will use the newest addition for the CISL library, the cstore_doMaintenance function:

Let’s see the current internals of our Memory-Optimised table:

memory-optimized columnstore with 5 million rows
Looks fine, as expected.

Let’s delete 2 Million Rows from our Memory-Optimised Table:

Since the process of moving the data from the Tail Row Group does not guarantee the sequential of the data, let’s delete 2.8 million rows, leaving only 200.000 rows active within our table:

Now, if we ran the Reorganize command on the Memory-Optimised table:

or you can do like me and simply invoke the CISL cstore_doMaintance function 🙂

The miracle happens, and you can see below that the internal structure of our Row Groups is very well affected by the Row Migration:

200 000 Rows in Memory-Optimized Columnstore Table
Surprise! On the image above, we have a Tail Row Group (row_group_id = -1), which is a kind of a Delta-Store, containing the surviving 200.000 rows, and all completely deleted Row Groups are simply removed from the internal structures.

Why did this happen ?
1. There is no REBUILD functionality for the Memory-Optimized Columnstore Tables yet, and so there must be a way to keep the table alive.
2. The Row Migration process for the Memory-Optimized Columnstore Tables moves rows from Tail Row Group into the compressed Row Groups and back, it does function in a different way than a disk-based columnstore and with no REBUILD possibility, this is the only way of cleaning up the rows from the Memory-Optimized Columnstore table (besides dropping and recreating the Columnstore Index, of course).

I am happy to see that the Trace Flag 10204 does not function for the Memory-Optimized Columnstore tables, because technically for the current version of SQL Server it would make no sense at all.

Old Trace Flags & Query Rules

The Trace Flags for SQL Server 2014 were described in Columnstore Indexes – part 35 (“Trace Flags & Query Optimiser Rules”) are still active in SQL Server 2016, I will not provide the tests here, feel free to simply re-run the queries that are specified in that post.

Final Thoughts

These Trace Flags presented in this article are some really great knobs, that Microsoft has inserted into the SQL Server 2016 (some of them are available only from SQL Server 2016 CU1), allowing you to expand the potential performance of your Columnstore Batch Execution Mode queries (Dynamic memory grant with TF 9389), and preventing the Sort operations from running in the Batch Execution Mode (could be important for overloaded OLTP systems).
The possibility of disabling Row Group Merge & Self-Compress should target the extreme OLTP System (Trace Flag 10204) is also nice, but I really hope that no unexperienced engineer will start applying this trace flag everywhere by default – though I will be updating the CISL to recognise and advise the user of this configuration.

to be continued …

6 thoughts on “Columnstore Indexes – part 86 (“New Trace Flags in SQL Server 2016”)

  1. tobi

    > Note The batch mode sort operations in a complex parallel query are also disabled when trace flag 4199 is enabled.

    Any idea why this was done? 4199 is supposed to be the trace flag that makes plans better, not worse.

    1. Geoff Patterson

      That does seem very strange to include 9358 under 4199. So basically if we are using columnstore, we need to keep all new QUERY_OPTIMIZER_HOTFIXES off in order to achieve the full benefit of batch mode? Weird. But at least it is nice that the earlier optimizer hotfixes are including in SQL 2016 by default, so we will not be using 4199 unless we find a critical real-world problem that it fixes.

      I am very excited about 9389 (“dynamic memory grant for batch mode operators”). I have tested it out on some real-world queries that were spilling and it worked very well (removed spills, significantly reduced elapsed time). From Microsoft’s perspective, I think the main concern with enabling by default is that a dynamic increase of memory could negatively impact throughput for concurrent requests under some workloads. So it seems likely to stay a trace flag unless there is enough community feedback to merit reconsidering that decision.

      1. Niko Neugebauer Post author

        Hi Geoff,

        totally agree with you – 9358 inclusion under 4199 is strange. I struggle about the reasons, let’s hope some of my Microsoft contacts can clear it up. If I find out something that is possible to share – I will do that here in this post.
        Regarding 9389 – this one is absolutely epic and hopefully if the feedback will be positive, Microsoft will consider putting it into the RTM of the vNext.

        Best regards,
        Niko Neugebauer

        1. Jakub Kaczanowski

          Will leaving 4199 off have any effect though? Won’t 9358 be turned on by default in 2016 Compatability Mode 130 in CU1? My understanding is that 130 automatically turns on 4199 (and therefore 9358).

          I too find this disturbing………. (same as your findings re: performance regression your part 85 post)

          1. Niko Neugebauer Post author

            Hi Jakub,

            I need to test it, but to my current understanding – only the 4199 fixes before 2016 RTM are included in the 130 compatibility level, the TF 9358 is not part of the TF4199 before SQL Server 2016 and so it is not included.

            Best regards,
            Niko Neugebauer

Leave a Reply

Your email address will not be published. Required fields are marked *