Columnstore Indexes – part 132 (“What’s missing in Columnstore indexes in the year 2026”)

Continuation from the previous 131 parts, the whole series can be found at https://www.nikoport.com/columnstore/.
Note: the original post was written on the 3rd of December 2025 in a moment of inspiration and a significant rewrite was done on the 7th of December 2025.

After spending some time thinking about the best way to come back to writing about Columnstore Indexes, after 5 and half years hiatus, I came to a conclusion that I have never published a post on what is still missing. With that in mind, I decided to mark my comeback to writing technical posts on my blog with rather simple post on the things that are needed, but did not made into the SQL Server – based engines so far (as of December 2025).
Over the last decade all of them were shared with Microsoft, but were never prioritized by the engineering group.

I want to stress underline my huge respect for the engineers working on the Columnstore Indexes (from time to time, haha) who are giving all their best.

The key painpoints are (they are sorted by the expected impact):
1. Inefficiency of the Columnstore operations (Dictionary Pressure & Fast Copy of Columnstore Data).
2. No configurability on the Row Group sizes.
3. Real-world complex queries batch mode performance downgrades (Row-group level aggregations, Faster Joins between Columnstore Tables, Persisted Computed Columns, better Query Optimization for Columnstore Indexes).
4. Impossibility of Online Operations without partitioning (Resumable Indexes).
5. Lack of the Replication Support (Replication, CDC (for CCI), CT, Fabric Mirroring, CES).
6. Feature support (Temporary Columnstore & Memory-optimized TempDB metadata)

With that, here is my current top Columnstore Wish List:
1. Big Dictionaries
2. Fast Copy of Columnstore Data
3. Configurable sizes for the Row Groups
4. Row Group – level aggregation
5. Faster Joins between Columnstore Tables
6. Persisted Computed Columns
7. Better Query Optimization of Columnstore Indexes
8. Online Resumable Build/Rebuild of the Columnstore Indexes
9. Lack of the Replication Support
10. Better feature support for Columnstore Indexes

2026 Columnstore Wish List

1. Big Dictionaries

I was rather hopeful for the SQL Server 2025 to address the Dictionary Pressure, and after trying out SQL Server 2025 RTM (1700.0.1000.7) I was disappointed. A lot of the words in this blog post was spent on the need of controlling the size of the Row Groups, and with Dictionary Pressure there is no escape. One could have wished avoiding using strings in the Fact tables, but unless you are not an NVTP (Not-Very-Technical-Person), you must know how reality looks like.
There are some great improvements in SQL Server 2022 and SQL Server 2025 for the strings, but this one – the maximum size of the dictionary – stays the same since SQL Server 2012. I did not tested Azure SQL Database and Azure SQL Managed Instance, but if there is something different out there – I am sure somebody will correct me very publicly. :)

2. Fast Copy of Columnstore Data

This is an item I know that Thomas Grohser will agree – we have discussed this item quite a number of times over the years. When we copy data from a Columnstore Index table into a different one (ETL, Data Replication needs, etc) – there is a huge waste of the resources that takes place. SQL Server decompresses the data for starting to compress it again. Passing the data unnecessarily twice from the Columnstore Object Pool into Buffer Pool and then back into Columnstore Object Pool, spending CPU cycles and memory while doing largely nothing.

Here is a basic idea behind it: copy the relevant segments into the new Columnstore Index without decompression, mark the non-copied rows in the Deleted Bitmap as removed, copy meta-data such as Dictionaries and voila! No need to overspend the CPU, tiny impact on the memory and IO – and a lot of happy customers!

Ok, now for the technical part that really matters, in the case of a curious SQL Server engineering team member reading this blog: add an additional structure mascarading the moved object_ids and all other references for the time being before the rebuild!
Think about the same opportunity on the Data Lake (think about OneLake in Fabric), this is precisely what customer need!

This feature if ever implemented, would make me a huge impact on so many ETL processes. Many years ago, before joining Microsoft, an engineering manager asked me to explain if it is common for customers to copy the data from one table to another. I remember saying yes and looking carefully around to ensure that there were no candid camera.

3. Configurable sizes for the Row Groups

Since the beginning of the existence of SQL Server Analysis Services Tabular Engine, we have had an opportunity to configure the sizes of the Row Groups, and a decade ago, I have spent enough time arguing that SQL Server engine needs the same. This quite similar to the functionality of ordering Columnstore Indexes (added in SQL Server 2022 and improved in SQL Server 2025) – customers need to be able to go beyond the defaults. “1048576 rows per Row Group is not enough for everyone” – signed by huge number of the customers with billions of rows in a single Columnstore Index. I want to be able to go for 2 / 4 / 8 / 16 / 32 Million rows per Row Group. I totally understand that the amount of available memory (on the instance or for the query processing) will affect the outcome, but for those of us with TB’s of memory, we can get a huge improvement for the query processing. Why ? The smaller number of meta-data structures (Deleted Bitmaps, Dictionaries, etc) there are out there, the faster their loading and hence query execution will be. Avoiding larger Row Groups will be a bigger advantage for the huge tables. My additional argument is that Columnstore Archival option compresses sometimes the Row Groups into a tiny little piece that is simply irrelevant.

With the continued existence of the Memory Pressure & Dictionary Pressure, we can use them to control the size of the Row groups (by using Resource Governor or by adding large useless strings :D), but that’s not an elegant nor a practical solution.

4. Row Group – level aggregation
A rather primitive but extremely powerful feature for the massive Columnstore Indexes – allow an aggregation calculation on the level of the Row Groups. Let’s start treating the Row Groups what they are – the horizontal partitions. By adding a possibility of calculating the outcome of on operation we can avoid scanning the Row Group at all and execute the operation on the level of meta-data instead. You might argue that for the smaller Columnstore Indexes, the benefit will be smaller, but I will argue that for any slow IO systems (such as current state of the Azure SQL PaaS / Fabric ), the benefit might be actually huge.

I do understand that updates will fragment and impact the performance of the operation, but in a lot of scenarios the updates are rather negligible, impacting just a couple of Row Groups instead of hundreds of them staying intact and being updated only as a part of the Rebuild process.

Building this feature will unlock massive improvements on complex calculations and allowing customer to join Persisted Computed Columns into the mix will bring massive performance improvements.

5. Faster Joins between Columnstore Tables

Columnstore Indexes are amazing at compressing information and Late Materialization is amazing for the data processing performance. This is great when we read data from a single table or the join operation is done on non-impacted columns, which is … not always the case. :)

Most of the times, I have seen customer choosing not putting Columnstore Indexes on the Dimension tables, because there are no explicit performance gains, especially whenever the Dimension table is small. If you look at the typical queries and their execution plans, you will fight a lot of times that there is a negative impact of joining Rowstore Dimensional table with a Fact Columnstore table. A lot of CPU and memory will be spent on early materialisaiton, without any considerable benefits.

Now, what if we would have a mechanism of having a shared dictionary between multiple tables with Columnstore Indexes. A lot of join work could have been simplified and executed on the level of the dictionaries instead of the Segments and Row Groups. Even for the simplest Star Schema, where the simplest query is maybe 1 Fact and 4-5 Dimension tables there can be a huge benefit for the joins. Think about it as something simple as a Conformed Dictionary (Conformed Dimension).

By allowing tables to use a common dictionary, the process of joining the table will not need to materialize the data before the join, but can rather join data by using dictionary references.
This can be enforced by the Foreign Key and an additional T-SQL syntax, forcing interconnection between the Columnstore columns. It might bring a negative impact on the compression level, but imagine indicating Parent & Child (big and small) tables, thus focusing on the improvements for the Fact (bigger) table.

Naturally, Columnstore Index rebuild process on both tables will have impact and implications – it will be longer and might produce bigger sized tables, but the reading and processing of such joined tables will be done on the fly!

6. Persisted Computed Columns
You might not be aware, but in the original SQL Server 2012 release, it was possible to create Nonclustered Columnstore index over persisted computed column. But this feature is not supported since then, in none of the 6 consequent releases. This is wrong.
I know that not every single DWH needs this and that one can always update a column by themselves, but I think that a modern engine needs to include support for the Computed Columns, because this is a chicken-and-egg kind of problem. A Product Manager in me is arguing that this feature is not of the highest priority, but the customer who has a DWH with complex financial computations – needs it.
This feature is interconnected with the next one, that I have originally asked the Product Team in the year 2016 – the 4. Row Group – level aggregation. I prioritize the Row Group – level aggregations higher than the Persisted Computed column, because it will bring much higher impact.

7. Better Query Optimization of Columnstore Indexes

There are many things that need to be improved in the Query Optimization space, but some of the real-world scenarios that I have faced many years ago in Columnstore Indexes – part 111 (“Row Group Elimination – Pain Points”), such as better OR conditions and inter-column search are still not addressed. Simple queries work like magic, but once we get into the real-world complex queries (which sometimes take large pages of code), the situation changes drastically. At very least I would love to have a similar intelligence of the Rowstore Indexes query optimization.

8. Online Resumable Build/Rebuild of the Columnstore Indexes
This is such a no-brainer of the need, from the customer point of view. The tables with the Columnstore Indexes are typically bigger (for Data Warehousing they are typically the Fact tables), and have longer and more resource-consuming operations.
Microsoft has added resumable index rebuild operation in SQL Server 2017 and building operation in SQL Server 2019, but never came to finish the job.
The idea behind online resumable indexes is to be able to suspend the operation whenever needed. This would be such a huge benefit for those who run massive Columnstore tables.

I understand some of the technical challenges for different isolation levels, synchronizing of the Delta-Stores, Dictionaries, Mapping Indexes etc – but the need is still there. I know this feature is not an AI feature, but even AI will eventually take advantage of this feature. :)

9. Lack of the Replication Support
With thanks to my good friend Bradley Ball, who by mentioning Fabric Mirroring, reminded me about the whole category of missing features.
For some strange reason, Microsoft has never given any love to the Replication features. I think it largely comes from the lack of the leadership understanding of what customers need to do with data. This item is a “replication” (pun very much intended) of the need to copy the data, as mentioned in 2. Fast Copy of Columnstore Data.
Customers need to replicate data and after 7 releases (2012, 2014, 2016, 2017, 2019, 2022, 2025) we are still unable to use ANY of the replication methods with the clustered Columnstore Indexes (and hence Data Warehouse scenarios):
Replication – supports only Nonclustered Columnstore (2016+). If customer runs a Data Warehouse or has a massive table with a Clustered Columnstore – they are out of luck. I know that this is one of the lower-hanging fruits, engineering-wise, but it seems never getting enough priority.
CDC (Change Data Capture) – supports only Nonclustered Columnstore (2016+)
CT (Change Tracking) – supports only Nonclustered Columnstore (2016+)
Fabric Mirroring – Columnstore Indexes are not supported on SQL Server 2025 Fabric Mirroring.
On Fabric SQL DB right now, one needs to create Clustered Columnstore Index right with the table creation, and that it is disappointing. It is still resolvable, by stopping and restarting the mirroring process, but the feature is in the preview and I will trust it once it is in GA.
CES (Change Event Streaming) – I do not buy the “preview” excuse, Columnstore Indexes support needs to be there.

I am going to write a separate blog post in the future on this topic.

10. Better feature support for Columnstore Indexes

After the initial release of the Memory-optimized TempDB metadata feature, there is still no support for the temporary Columnstore Indexes. 2 releases later (2022, 2025) and we are still unable to create them.
Temporary tables with Columnstore Indexes are incredibly useful in ETL processes. Writing you this from the first-hand experience.
This is one of the reasons I know a number of customers won’t enable the Memory-optimized TempDB metadata feature.
The second strong argument for me is the impossibility of using Microsoft-own sys.sp_estimate_data_compression_savings feature with the Columnstore compressions.

This is not complete list of the wishes, but it is good enough for me to get back to blogging.

to be continued with Columnstore Indexes – part 133 (“Data Replication features and Columnstore”)

4 thoughts on “Columnstore Indexes – part 132 (“What’s missing in Columnstore indexes in the year 2026”)”

Geoff Patterson December 4, 2025 at 2:21 pm

Good to see you back, Niko! I’ve retired from tech in recent years and no longer work with SQL Server in any significant way, but still follow a few of my favorite blogs. It’s fun to remember the early days of columnstore on SQL Server and loosely keep up with its current and potential future development.

Reply ↓
1. Niko Neugebauer Post authorDecember 7, 2025 at 9:16 pm
  
  :)
  
  Reply ↓
Martin December 9, 2025 at 4:38 pm

Great to see you back Niko! I must confess I was very sad when your site was temporarily down, I even had to revert to archive.org to find your really valuable insights. Happy to have you back blogging!

Reply ↓
1. Niko Neugebauer Post authorDecember 10, 2025 at 7:13 pm
  
  Thank you. :)
  
  Reply ↓

Niko Neugebauer

SQL Server, Columnstore, Data Platform & Community

Columnstore Indexes – part 132 (“What’s missing in Columnstore indexes in the year 2026”)

2026 Columnstore Wish List

4 thoughts on “Columnstore Indexes – part 132 (“What’s missing in Columnstore indexes in the year 2026”)”

Leave a Reply Cancel reply