Continuation from the previous 131 parts, the whole series can be found at https://www.nikoport.com/columnstore/.
After spending some time thinking about the best way to come back to writing about Columnstore Indexes, after 5 and half years hiatus, I came to a conclusion that I have never published a post on what is still missing. With that in mind, I decided to mark my comeback to writing technical posts on my blog with rather simple post on the things that are needed, but did not made into the SQL Server – based engines so far (as of December 2025).
Over the last decade all of them were shared with Microsoft, but were never prioritized by the engineering group.
I want to stress underline my huge respect for the engineers working on the Columnstore Indexes (from time to time, haha) who are giving all their best.
With that, here is my current top Columnstore Wish List, and I decided to keep it at the magical 7 items number:
1. Online Resumable Build/Rebuild of the Columnstore Indexes
2. Persisted Computed Columns
3. Row Group – level aggregation
4. Configurable sizes for the Row Groups
5. Fast Copy of Columnstore Data
6. Faster Joins between Columnstore Tables
7. Big Dictionaries
2026 Columnstore Wish List
1. Online Resumable Build/Rebuild of the Columnstore Indexes This is such a no-brainer of the need, from the customer point of view. The tables with the Columnstore Indexes are typically bigger (for Data Warehousing they are typically the Fact tables), and have longer and more resource-consuming operations.
Microsoft has added resumable index rebuild operation in SQL Server 2017 and building operation in SQL Server 2019, but never came to finish the job.
The idea behind online resumable indexes is to be able to suspend the operation whenever needed. This would be such a huge benefit for those who run massive Columnstore tables.
I understand some of the technical challenges for different isolation levels, synchronizing of the Delta-Stores, Dictionaries, Mapping Indexes etc – but the need is still there. I know this feature is not an AI feature, but even AI will eventually take advantage of this feature. :)
2. Persisted Computed Columns
You might not be aware, but in the original SQL Server 2012 release, it was possible to create Nonclustered Columnstore index over persisted computed column. But this feature is not supported since then, in none of the 6 consequent releases. This is wrong.
I know that not every single DWH needs this and that one can always update a column by themselves, but I think that a modern engine needs to include support for the Computed Columns, because this is a chicken-and-egg kind of problem. A Product Manager in me is arguing that this feature is not of the highest priority, but the customer who has a DWH with complex financial computations – needs it.
This feature is interconnected with the next one, that I have originally asked the Product Team in the year 2016 – the Row Group – level aggregation.
3. Row Group – level aggregation
A rather primitive but extremely powerful feature for the massive Columnstore Indexes – allow an aggregation calculation on the level of the Row Groups. Let’s start treating the Row Groups what they are – the horizontal partitions. By adding a possibility of calculating the outcome of on operation we can avoid scanning the Row Group at all and execute the operation on the level of meta-data instead. You might argue that for the smaller Columnstore Indexes, the benefit will be smaller, but I will argue that for any slow IO systems (such as current state of the Azure SQL PaaS / Fabric ), the benefit might be actually huge.
I do understand that updates will fragment and impact the performance of the operation, but in a lot of scenarios the updates are rather negligible, impacting just a couple of Row Groups instead of hundreds of them staying intact and being updated only as a part of the Rebuild process.
Building this feature will unlock massive improvements on complex calculations and allowing customer to join Persisted Computed Columns into the mix will bring massive performance improvements.
4. Configurable sizes for the Row Groups
Since the beginning of the existence of SQL Server Analysis Services Tabular Engine, we have had an opportunity to configure the sizes of the Row Groups, and a decade ago, I have spent enough time arguing that SQL Server engine needs the same. This quite similar to the functionality of ordering Columnstore Indexes (added in SQL Server 2022 and improved in SQL Server 2025) – customers need to be able to go beyond the defaults. “1048576 rows per Row Group is not enough for everyone” – signed by huge number of the customers with billions of rows in a single Columnstore Index. I want to be able to go for 2 / 4 / 8 / 16 / 32 Million rows per Row Group. I totally understand that the amount of available memory (on the instance or for the query processing) will affect the outcome, but for those of us with TB’s of memory, we can get a huge improvement for the query processing. Why ? The smaller number of meta-data structures (Deleted Bitmaps, Dictionaries, etc) there are out there, the faster their loading and hence query execution will be. Avoiding larger Row Groups will be a bigger advantage for the huge tables. My additional argument is that Columnstore Archival option compresses sometimes the Row Groups into a tiny little piece that is simply irrelevant.
With the continued existence of the Memory Pressure & Dictionary Pressure, we can use them to control the size of the Row groups (by using Resource Governor or by adding large useless strings :D), but that’s not an elegant nor a practical solution.
5. Fast Copy of Columnstore Data
This is an item I know that Thomas Grohser will agree – we have discussed this item quite a number of times over the years. When we copy data from a Columnstore Index table into a different one (ETL, Data Replication needs, etc) – there is a huge waste of the resources that takes place. SQL Server decompresses the data for starting to compress it again. Passing the data unnecessarily twice from the Columnstore Object Pool into Buffer Pool and then back into Columnstore Object Pool, spending CPU cycles and memory while doing largely nothing.
Here is a basic idea behind it: copy the relevant segments into the new Columnstore Index without decompression, mark the non-copied rows in the Deleted Bitmap as removed, copy meta-data such as Dictionaries and voila! No need to overspend the CPU, tiny impact on the memory and IO – and a lot of happy customers!
Ok, now for the technical part that really matters, in the case of a curious SQL Server engineering team member reading this blog: add an additional structure mascarading the moved object_ids and all other references for the time being before the rebuild!
Think about the same opportunity on the Data Lake (think about OneLake in Fabric), this is precisely what customer need!
This feature if ever implemented, would make me a huge impact on so many ETL processes. Many years ago, before joining Microsoft, an engineering manager asked me to explain if it is common for customers to copy the data from one table to another. I remember saying yes and looking carefully around to ensure that there were no candid camera.
6. Faster Joins between Columnstore Tables
Columnstore Indexes are amazing at compressing information and Late Materialization is amazing for the data processing performance. This is great when we read data from a single table or the join operation is done on non-impacted columns, which is … not always the case. :)
Most of the times, I have seen customer choosing not putting Columnstore Indexes on the Dimension tables, because there are no explicit performance gains, especially whenever the Dimension table is small. If you look at the typical queries and their execution plans, you will fight a lot of times that there is a negative impact of joining Rowstore Dimensional table with a Fact Columnstore table. A lot of CPU and memory will be spent on early materialisaiton, without any considerable benefits.
Now, what if we would have a mechanism of having a shared dictionary between multiple tables with Columnstore Indexes. A lot of join work could have been simplified and executed on the level of the dictionaries instead of the Segments and Row Groups. Even for the simplest Star Schema, where the simplest query is maybe 1 Fact and 4-5 Dimension tables there can be a huge benefit for the joins. Think about it as something simple as a Conformed Dictionary (Conformed Dimension).
By allowing tables to use a common dictionary, the process of joining the table will not need to materialize the data before the join, but can rather join data by using dictionary references.
This can be enforced by the Foreign Key and an additional T-SQL syntax, forcing interconnection between the Columnstore columns. It might bring a negative impact on the compression level, but imagine indicating Parent & Child (big and small) tables, thus focusing on the improvements for the Fact (bigger) table.
Naturally, Columnstore Index rebuild process on both tables will have impact and implications – it will be longer and might produce bigger sized tables, but the reading and processing of such joined tables will be done on the fly!
7. Big Dictionaries
I was rather hopeful for the SQL Server 2025 to address the Dictionary Pressure, and after trying out SQL Server 2025 RTM (1700.0.1000.7) I was disappointed. A lot of the words in this blog post was spent on the need of controlling the size of the Row Groups, and with Dictionary Pressure there is no escape. One could have wished avoiding using strings in the Fact tables, but unless you are not an NVTP (Not-Very-Technical-Person), you must know how reality looks like.
There are some great improvements in SQL Server 2022 and SQL Server 2025 for the strings, but this one – the maximum size of the dictionary – stays the same since SQL Server 2012. I did not tested Azure SQL Database and Azure SQL Managed Instance, but if there is something different out there – I am sure somebody will correct me very publicly. :)
This is not complete list of the wishes, but it is good enough for me to get back to blogging.
to be continued …