This is a first blog post in a very large series of posts about the CLustered Columnstore Indexes:
Columnstore Indexes – part 2 (“Internalsâ€)
Columnstore Indexes – part 3 (“More Internalsâ€)
Columnstore Indexes – part 4 (“Basic T-SQLâ€)
Columnstore Indexes – part 5 (“New Meta-Information and System Stored Procedureâ€)
Columnstore Indexes – part 6 (“Observing the behaviorâ€)
Columnstore Indexes – part 7 (“Transaction Isolationâ€)
Columnstore Indexes – part 8 (“Lockingâ€)
Columnstore Indexes – part 9 (“CTP1 Observationsâ€)
Columnstore Indexes – part 10 (“Compression basicsâ€)
Columnstore Indexes – part 11 (“Clustered vs Nonclustered compression basicsâ€)
Columnstore Indexes – part 12 (“Compression Diveâ€)
Columnstore Indexes – part 13 (“Dictionaries Analyzedâ€)
Columnstore Indexes – part 14 (“Partitioning”)
Columnstore Indexes – part 15 (“Partitioning Advanced”)
Columnstore Indexes – part 16 (“Index Builds”)
Columnstore Indexes – part 17 (“Resources 2012 vs 2014â€)
Columnstore Indexes – part 18 (“Basic Batch Mode Improvementsâ€)
Columnstore Indexes – part 19 (“Batch Mode 2012 Limitations … Updated!â€)
Columnstore Indexes – part 20 (“TempDB Spills – when memory is not enoughâ€)
Columnstore Indexes – part 21 (“DBCC CSIndexâ€)
Columnstore Indexes – part 22 (“Invisible Row Groupsâ€)
Columnstore Indexes – part 23 (“Data Loadingâ€)
Columnstore Indexes – part 24 (“Data Loading continuedâ€)
Columnstore Indexes – part 25 (“Faster Smaller Better Strongerâ€)
Columnstore Indexes – part 26 (“Backup & Restoreâ€)
Columnstore Indexes – part 27 (“Load with Delta-Storesâ€)
Columnstore Indexes – part 28 (“Update vs Delete + Insertâ€)
Columnstore Indexes – part 29 (“Data Loading for Better Segment Eliminationâ€)
Columnstore Indexes – part 30 (“Bulk Load API Magic Number”)
Columnstore Indexes – part 31 (“Memory Pressure and Row Group Sizesâ€)
Columnstore Indexes – part 32 (“Size Does Matter, but how ?â€)
Columnstore Indexes – part 33 (“A Tuple Mover that closes open Delta-Stores”)
Columnstore Indexes – part 34 (“Deleted Segments Eliminationâ€)
Columnstore Indexes – part 35 (“Trace Flags & Query Optimiser Rulesâ€)
Columnstore Indexes – part 36 (“Maintenance Solutions for Columnstoreâ€)
Columnstore Indexes – part 37 (“Deleted Bitmap & Delta-Store Sizesâ€)
Columnstore Indexes – part 38 (“Memory Structuresâ€)
Columnstore Indexes – part 39 (“Memory in Actionâ€)
Columnstore Indexes – part 40 (“Compression Algorithmsâ€)
Columnstore Indexes – part 41 (“Statisticsâ€)
Columnstore Indexes – part 42 (“Materialisationâ€)
Columnstore Indexes – part 43 (“Transaction Log Basicsâ€)
Columnstore Indexes – part 44 (“Monitoring with Extended Eventsâ€)
Columnstore Indexes – part 45 (“Multi-Dimensional Clusteringâ€)
Columnstore Indexes – part 46 (“DateTime compression and performanceâ€)
Columnstore Indexes – part 47 (“Practical Monitoring with Extended Eventsâ€)
Columnstore Indexes – part 48 (“Improving Dictionary Pressureâ€)
Columnstore Indexes – part 49 (“Data Types & Predicate Pushdownâ€)
Columnstore Indexes – part 50 (“Columnstore IOâ€)
Columnstore Indexes – part 51 (“SSIS, DataFlow & Max Buffer Memoryâ€)
Columnstore Indexes – part 52 (“What’s new for Columnstore XE in SQL Server 2014 SP1â€)
Columnstore Indexes – part 53 (“What’s new for Columnstore in SQL Server 2014 SP1â€)
Columnstore Indexes – part 54 (“Thoughts on upcoming improvements in SQL Server 2016″)
Columnstore Indexes – part 55 (“New Architecture Elements in SQL Server 2016”)
Columnstore Indexes – part 56 (“New DMV’s in SQL Server 2016”)
Columnstore Indexes – part 57 (“Segment Alignment Maintenanceâ€)
Columnstore Indexes – part 58 (“String Predicate Pushdownâ€)
Columnstore Indexes – part 59 (“Aggregate Pushdown”)
Columnstore Indexes – part 60 (“3 More Batch Mode Improvements in SQL Server 2016”)
Columnstore Indexes – part 61 (“Window aggregate functionsâ€)
Columnstore Indexes – part 62 (“Row Groups Trimmingâ€)
Columnstore Indexes – part 63 (“Parallel Data Insertionâ€)
Columnstore Indexes – part 64 (“T-SQL Improvements in SQL Server 2016â€)
Columnstore Indexes – part 65 (“Clustered Columnstore Improvements in SQL Server 2016â€)
Columnstore Indexes – part 66 (“More Clustered Columnstore Improvements in SQL Server 2016â€)
Columnstore Indexes – part 67 (“Clustered Columstore Isolation Levels & Transactional Lockingâ€)
Columnstore Indexes – part 68 (“Data Loading, Delta-Stores & Vertipaq Compression Optimisationâ€)
Columnstore Indexes – part 69 (“Operational Analytics – Rowstoreâ€)
Columnstore Indexes – part 70 (“Filtered Indexes in Actionâ€)
Columnstore Indexes – part 71 (“Change Data Capture, Change Tracking & Temporalâ€)
Columnstore Indexes – part 72 (“InMemory Operational Analyticsâ€)
Columnstore Indexes – part 73 (“Big Delta-Stores with Nonclustered Columnstoreâ€)
Columnstore Indexes – part 74 (“Row Group Merging & Cleanup, SQL Server 2016 editionâ€)
Columnstore Indexes – part 75 (“Stretch DB & Columnstoreâ€)
Columnstore Indexes – part 76 (“Compression Delayâ€)
Columnstore Indexes – part 77 (“SSIS 2016 & Columnstoreâ€)
Columnstore Indexes – part 78 (“Temporary Objectsâ€)
Columnstore Indexes – part 79 (“Loading Data into Non-Updatable Nonclustered Columnstoreâ€)
Columnstore Indexes – part 80 (“Local Aggregationâ€)
Columnstore Indexes – part 81 (“Adding Columnstore Index to InMemory Tablesâ€)
Columnstore Indexes – part 82 (“Extended Events in SQL Server 2016â€)
Columnstore Indexes – part 83 (“Columnstore Replication in SQL Server 2016â€)
Columnstore Indexes – part 84 (“Practical Dictionary Casesâ€)
Columnstore Indexes – part 85 (“Important Batch Mode Changes in SQL Server 2016â€)
Columnstore Indexes – part 86 (“New Trace Flags in SQL Server 2016â€)
Columnstore Indexes – part 87 (“Indexed Viewsâ€)
Columnstore Indexes – part 88 (“Minimal Logging in SQL Server 2016â€)
Columnstore Indexes – part 89 (“Memory-Optimised Columnstore Limitations 2016â€)
Columnstore Indexes – part 90 (“In-Memory Columnstore Improvements in Service Pack 1 of SQL Server 2016 “)
Columnstore Indexes – part 91 (“SQL Server 2016 Standard Edition Limitationsâ€)
Columnstore Indexes – part 92 (“Lobsâ€)
Columnstore Indexes – part 93 (“Batch Mode Adaptive Memory Grant Feedbackâ€)
Columnstore Indexes – part 94 (“Use Partitioning Wiselyâ€)
Columnstore Indexes – part 95 (“Basic Query Patternsâ€)
Columnstore Indexes – part 96 (“Nonclustered Columnstore Index Online Rebuildâ€)
Columnstore Indexes – part 97 (“Working with Stringsâ€)
Columnstore Indexes – part 98 (“Null Expressions & String Aggregatesâ€)
Columnstore Indexes – part 99 (“Mergeâ€)
Columnstore Indexes – part 100 (“Identityâ€)
Columnstore Indexes – part 101 (“Estimated? Similar! Similar How?â€)
Columnstore Indexes – part 102 (“CCI with Secondary Rowstore Indexes on SQL 2014â€)
Columnstore Indexes – part 103 (“Partitioning 2016 vs Partitioning 2014â€)
Columnstore Indexes – part 104 (“Batch Mode Adaptive Joinsâ€)
Columnstore Indexes – part 105 (“Performance Countersâ€)
Columnstore Indexes – part 106 (“Memory Requirements for Rebuild & Reorganizeâ€)
Columnstore Indexes – part 107 (“Dictionaries Deeper Diveâ€)
Columnstore Indexes – part 108 (“Computed Columnsâ€)
Columnstore Indexes – part 109 (“Trivial Plans in SQL Server 2017â€)
Columnstore Indexes – part 110 (“The best column for sorting Columnstore Index onâ€)
Columnstore Indexes – part 111 (“Row Group Elimination – Pain Pointsâ€)
Columnstore Indexes – part 112 (“Linked Serversâ€)
Columnstore Indexes – part 113 (“Row Groups Merging Limitationsâ€)
Columnstore Indexes – part 116 (“Partitioning Specificsâ€)
Columnstore Indexes – part 117 (“Clustered vs Nonclusteredâ€)
Columnstore Indexes – part 118 (“SQL Server 2017 Editions Limitationsâ€)
Columnstore Indexes – part 119 (“In-Memory Columnstore Locationâ€)
Columnstore Indexes – part 120 (“Merge Replication 2016-2017â€)
Columnstore Indexes – part 121 (“Columnstore Indexes on Standard Tier of Azure SQL DBâ€)
Columnstore Indexes – part 122 (“Wait Typesâ€)
Columnstore Indexes – part 123 (“Clustered Columnstore Index Online Rebuildâ€)
Columnstore Indexes – part 124 (“Estimate Columnstore Compressionâ€)
Columnstore Indexes – part 125 (“Estimate Columnstore Compression as a System Stored Procâ€)
Columnstore Indexes – part 126 (“Extracting Columnstore Statistics to Cloned Database”)
Columnstore Indexes – part 127 (“Batch Mode on Rowstore – is it a Columnstore Killer?â€)
Columnstore Indexes – part 128 (“Ordering Columnstore Indexes in Azure SQL Datawarehouseâ€)
Columnstore Indexes – part 129 (“Compatibility with the New Features of Sql Server 2019”)
Columnstore Indexes – part 130 (“Columnstore Indexes on Azure SQL DBâ€)
Columnstore Indexes – part 131 (“Rebuilding Rowstore Indexes ONLINEâ€)
After the Microsoft’s first attempt to improve DataWarehouse performance drastically with introduction of the Nonclustered Columnstore Indexes, which are extremely limited (non-updatable, schema modification prohibited, limited number of data types supported, cannot be unique, batch execution mode is also very limited in SQL Server 2012, etc),
this time for SQL Server 2014 we shall have a feature that really promises to change this game around – CLUSTERED columnstore indexes.
As far as I know, not many organizations around the world are using Nonclusted NonUpdateable Columnstore Indexes mainly because of it is being a very new feature and the other reasons is its limitations.
Clustered Columnstore Indexes in SQL Server 2014 are promising to solve a lot of the problems, such as:
– making tables updatable
– schema modification is available
– more datatypes included
– mixed execution modes support (batch & row)
– more operations support for the batch mode ( outer join, union all, etc )
– improved global dictionaries for segments compression
– support for even better data compression (aka Archival)
– seek operation support
– spillbulk insert operation support and associated improvements
– etc
Since CTP1 is already available I decide to give it a try and to play with Clustered Columnstore Indexes a bit :)
I will start with introduction of some of the architectural concepts and then in the following posts I will try to explore the actual improvements:
Columnstore Indexes are using the X-Velocity In-Memory Compression Engine which is already being used by a number of Microsoft Technologies, such as PowerPivot, Tabular Mode for Analysis Services and PowerView (Thanks to Jamie Thompson and to Riccardo Mutti for clarifications.) for example.
The basic technic of this engine consist in dividing available data into different row groups, also known as segments. Each of those segments should consist of around 1.000.000 rows (this number is not controlled by a parameter in the current versions) and the rows are encoded & compressed by using some internal algorithm. For some of the columns which are requiring dictionaries (character types for example) an additional dictionary encoding conversion is being used.
Note: each column is separated into its own groups of the Segments – and so this where the ColumnStore idea comes from.
After the process of encoding and compression is done, then the segments and the dictionaries are converted to Blobs and stored inside of the SQL Server. The underlying storage mechanism is still the good old 8K pages. Should one of the Blobs span more then 1 page, then the usual blob storage mechanisms are invoked.
Note: a very important part of understanding and optimization of Columnstore Indexes is that data is not sorted inside of the Segment, which will lead us to a number of technics to improve their compression and performance in some of the next posts.
There is a principal meta-information storage, which is called a Directory. Inside of this Directory, the information about the segment & dictionaries allocation status is stored. It is also contains additional metadata about number of rows, size, min and max values inside of each of the segments. This information is available inside of the sys.column_store_segments DMV.
There is one more very important element for the meta-data storage inside of the Clustered Columnstore Indexes – Delete Bitmap. Delete Bitmap is basically a storage which contains information about the deleted rows inside of the Segments. Its representation in-memory is a bitmap, while on the disk it is stored as a b-tree, where ids of the deleted rows are stored. Delete Bitmap is consulted on a regular basis in order to avoid returning the rows which were already deleted.
The Clustered Columnstore Index is the primary source for the data storage, as the name Clustered typically referrers to the original source of the table data.
Note 1: Upcoming SQL Server 2014 has the limitation that no other indexes are allowed to be created on the table with a Clustered Columnstore Index.
Note 2: The data is not really clustered inside of the CCI, since it is not really being ordered in the upcoming SQL Server version.
The whole architecture of the updatable Clustered Columnstore Indexes is basically divided into 2 parts: Row Groups (aka Segments) and Delta Stores. While Segments are encoded & compressed, as well as unupdateable directly, the Delta Stores are a “normal” good old b-tree row stores, which have all the characteristics of an habitual Sql Server table.
Same as with Segments, every Clustered Columnstore Index can have multiple Delta Stores.
Microsoft has introduced a brand new DMV sys.column_store_row_groups for the SQL Server 2014 CTP1. This view contains the information about all segments and delta stores for the Columnstore Indexes (Clustered & NonClustered).
It seems like internally they are represented equally, so you will see them listed all together. The difference for distinction is that Segments are actually having their status “state” column set on 3, with the description of “Compressed” while the Delta Stores can have status of 1 & 2 as a state with corresponding description of “Open” / “Close”. This status for the Delta Store is delivering the information if the Delta Store still accepting new information or it is just waiting to become a Segment (encoded & compressed).
Some words about some of the basic operations:
Inserts:
Inserts are simply added to one of the currently open Delta Stores.
Deletes:
If the deleted row is found inside of a Segment, then the Deleted Bitmap information is updated with the row id of the respective row.
If the deleted row is actually inside of a Delta Store, then the direct process of removal is executed on the b-tree.
Updates:
They are basically represented as deletes and inserts.
To be continued with Clustered Columnstore Indexes – part 2 (“Internalsâ€)
Pingback: Clustered Columnstore Indexes – part 2 (“Internalsâ€) | Nikoport
Pingback: Clustered Columnstore Indexes – part 6 (“Observing the behavior”) | Nikoport
Pingback: Clustered Columnstore Indexes – part 3 (“More Internalsâ€) | Nikoport
Pingback: Clustered Columnstore Indexes – part 4 (“Basic T-SQLâ€) | Nikoport
Pingback: Clustered Columnstore Indexes – part 5 (“New Meta-Information and System Stored Procedureâ€) | Nikoport
Pingback: Clustered Columnstore Indexes – part 7 (“Transaction Isolationâ€) | Nikoport
Pingback: Clustered Columnstore Indexes – part 9 (“CTP1 Observationsâ€) | Nikoport
Pingback: Hekaton is Microsoft’s In-Memory OLTP solution coming to SQL Server 2014 | Scott Stauffer dot C-eh?
Pingback: Clustered Columnstore Indexes – part 10 (“Compression basicsâ€) | Nikoport
Pingback: Clustered Columnstore Indexes – part 11 (“Clustered vs Nonclustered compression basicsâ€) | Nikoport
Pingback: SQL Server 2014: Columnstore Index improvements | James Serra's Blog
Pingback: SQL Server 2014: Columnstore Index improvements - SQL Server - SQL Server - Toad World
Pingback: SQL Server 2014: Columnstore Index improvements - SQL Server - SQL Server - Toad World
Pingback: Clustered Columnstore Indexes – part 13 (“Dictionaries Analyzedâ€) | Nikoport
Pingback: Clustered Columnstore Indexes – part 14 (“Partitioningâ€) | Nikoport
Pingback: Clustered Columnstore Indexes – part 15 (“Partitioning Advancedâ€) | Nikoport
Pingback: ColumnStore Archival Compression–SQL Server 2014 | The SQL Herald
Pingback: Clustered Columnstore Indexes – part 17 (“2012 vs 2014â€) | Nikoport
Pingback: Columnstore Indexes and Query Plans in SQL Server 2014 CTP1 - Bob Beauchemin
Pingback: Clustered Columnstore Indexes – part 18 (“Basic Batch Mode Improvementsâ€) | Nikoport
Pingback: Clustered Columnstore Indexes – part 19 (“Batch Mode 2012 Limitations … Removed!â€) | Nikoport
Pingback: Clustered Columnstore Indexes – part 19 (“Batch Mode 2012 Limitations … Updated!â€) | Nikoport
Excellent post.
Thank you, Murtaza!
Best regards,
Niko Neugebauer
Hey Niko, fantastic articles on Columnstore Indexes! Can you tell me which if any sections may be outdated due to recent updates in SQL Server? Thanks!
Hi Henry,
all of these blog posts are still valid, depending on the used SQL Server versions.
Some of the things get constantly improved in the next versions of SQL Server – The part of the deleted segments that are not removed or deleted rows within segments that are not removed are solved in SQL Server 2016.
Best regards,
Niko
Hi, Just wondering what is actually stored in memory with a cluster column store index ? Is it the meta data only for the table ? e.i. MIN, MAX for each segment etc..
Hi Ludo,
depending on the version of the SQL Server, different informations are stored.
For example, if you query sys.column_store_segments, you will be able to see some of the informations, such as min, max, has nulls, magnitude (compression technique), encoding type, etc …
For diving really deep into the internals, consider using DBCC CSINDEX (http://www.nikoport.com/2013/11/07/clustered-columnstore-indexes-part-21-dbcc-csindex/), but only on your private DEVELOPMENT environment.
Now regarding the memory part – the whole segment meta-data is stored in the memory, once the segment is landed in the Columnstore Object Pool, but it might be taken off through the memory pressure. Consider also taking a look at the following blog post for more details http://www.nikoport.com/2014/08/11/clustered-columnstore-indexes-part-38-memory-structures/
Best regards,
Niko