Columnstore

I am deeply fascinated by the Columnstore Indexes, and I have open some Connect Items to suggest their important improvements:
Implement Batch Mode Support for Row Store
Multi-threaded rebuilds of Clustered Columnstore Indexes break the sequence of pre-sorted segment ordering (Order Clustering)
Columnstore Segments Maintenance – Remove & Merge
Implement Computed Columns for Clustered Columnstore Indexes

Scripts Library:
I am publishing CSIL – Columnstore Indexes Script Library, with the first release targeting the 1st of September 2015.
Sign up for notifications, if you are interested!

Here is the series of blog posts that I have written about them:

Azure:
Azure Columnstore, part 1 – The initial Preview offering
Azure Columnstore, part 2 – Snapshot Isolation & Batch Mode DOP
Azure Columnstore, part 3 – Modern Segment Elimination and Set Statistics IO

SQL Server:
Columnstore Indexes – part 1 (“Intro”)
Columnstore Indexes – part 2 (“Internals”)
Columnstore Indexes – part 3 (“More Internals”)
Columnstore Indexes – part 4 (“Basic T-SQL”)
Columnstore Indexes – part 5 (“New Meta-Information and System Stored Procedure”)
Columnstore Indexes – part 6 (“Observing the behavior”)
Columnstore Indexes – part 7 (“Transaction Isolation”)
Columnstore Indexes – part 8 (“Locking”)
Columnstore Indexes – part 9 (“CTP1 Observations”)
Columnstore Indexes – part 10 (“Compression basics”)
Columnstore Indexes – part 11 (“Clustered vs Nonclustered compression basics”)
Columnstore Indexes – part 12 (“Compression Dive”)
Columnstore Indexes – part 13 (“Dictionaries Analyzed”)
Columnstore Indexes – part 14 (“Partitioning”)
Columnstore Indexes – part 15 (“Partitioning Advanced”)
Columnstore Indexes – part 16 (“Index Builds”)
Columnstore Indexes – part 17 (“Resources 2012 vs 2014”)
Columnstore Indexes – part 18 (“Basic Batch Mode Improvements”)
Columnstore Indexes – part 19 (“Batch Mode 2012 Limitations … Updated!”)
Columnstore Indexes – part 20 (“TempDB Spills – when memory is not enough”)
Columnstore Indexes – part 21 (“DBCC CSIndex”)
Columnstore Indexes – part 22 (“Invisible Row Groups”)
Columnstore Indexes – part 23 (“Data Loading”)
Columnstore Indexes – part 24 (“Data Loading continued”)
Columnstore Indexes – part 25 (“Faster Smaller Better Stronger”)
Columnstore Indexes – part 26 (“Backup & Restore”)
Columnstore Indexes – part 27 (“Load with Delta-Stores”)
Columnstore Indexes – part 28 (“Update vs Delete + Insert”)
Columnstore Indexes – part 29 (“Data Loading for Better Segment Elimination”)
Columnstore Indexes – part 30 (“Bulk Load API Magic Number”)
Columnstore Indexes – part 31 (“Memory Pressure and Row Group Sizes”)
Columnstore Indexes – part 32 (“Size Does Matter, but how ?”)
Columnstore Indexes – part 33 (“A Tuple Mover that closes open Delta-Stores”)
Columnstore Indexes – part 34 (“Deleted Segments Elimination”)
Columnstore Indexes – part 35 (“Trace Flags & Query Optimiser Rules”)
Columnstore Indexes – part 36 (“Maintenance Solutions for Columnstore”)
Columnstore Indexes – part 37 (“Deleted Bitmap & Delta-Store Sizes”)
Columnstore Indexes – part 38 (“Memory Structures”)
Columnstore Indexes – part 39 (“Memory in Action”)
Columnstore Indexes – part 40 (“Compression Algorithms”)
Columnstore Indexes – part 41 (“Statistics”)
Columnstore Indexes – part 42 (“Materialisation”)
Columnstore Indexes – part 43 (“Transaction Log Basics”)
Columnstore Indexes – part 44 (“Monitoring with Extended Events”)
Columnstore Indexes – part 45 (“Multi-Dimensional Clustering”)
Columnstore Indexes – part 46 (“DateTime compression and performance”)
Columnstore Indexes – part 47 (“Practical Monitoring with Extended Events”)
Columnstore Indexes – part 48 (“Improving Dictionary Pressure”)
Columnstore Indexes – part 49 (“Data Types & Predicate Pushdown”)
Columnstore Indexes – part 50 (“Columnstore IO”)
Columnstore Indexes – part 51 (“SSIS, DataFlow & Max Buffer Memory”)
Columnstore Indexes – part 52 (“What’s new for Columnstore XE in SQL Server 2014 SP1”)
Columnstore Indexes – part 53 (“What’s new for Columnstore in SQL Server 2014 SP1”)
Columnstore Indexes – part 54 (“Thoughts on upcoming improvements in SQL Server 2016”)
Columnstore Indexes – part 55 (“New Architecture Elements in SQL Server 2016”)
Columnstore Indexes – part 56 (“New DMV’s in SQL Server 2016”)
Columnstore Indexes – part 57 (“Segment Alignment Maintenance”)
Columnstore Indexes – part 58 (“String Predicate Pushdown”)
Columnstore Indexes – part 59 (“Aggregate Pushdown”)
Columnstore Indexes – part 60 (“3 More Batch Mode Improvements in SQL Server 2016”)
Columnstore Indexes – part 61 (“Window aggregate functions”)
Columnstore Indexes – part 62 (“Row Groups Trimming”)
Columnstore Indexes – part 63 (“Parallel Data Insertion”)
Columnstore Indexes – part 64 (“T-SQL Improvements in SQL Server 2016”)
Columnstore Indexes – part 65 (“Clustered Columnstore Improvements in SQL Server 2016”)
Columnstore Indexes – part 66 (“More Clustered Columnstore Improvements in SQL Server 2016”)
Columnstore Indexes – part 67 (“Clustered Columstore Isolation Levels & Transactional Locking”)
Columnstore Indexes – part 68 (“Data Loading, Delta-Stores & Vertipaq Compression Optimisation”)
Columnstore Indexes – part 69 (“Operational Analytics – Rowstore”)
Columnstore Indexes – part 70 (“Filtered Indexes in Action”)
Columnstore Indexes – part 71 (“Change Data Capture, Change Tracking & Temporal”)
Columnstore Indexes – part 72 (“InMemory Operational Analytics”)
Columnstore Indexes – part 73 (“Big Delta-Stores with Nonclustered Columnstore”)
Columnstore Indexes – part 74 (“Row Group Merging & Cleanup, SQL Server 2016 edition”)
Columnstore Indexes – part 75 (“Stretch DB & Columnstore”)
Columnstore Indexes – part 76 (“Compression Delay”)
Columnstore Indexes – part 77 (“SSIS 2016 & Columnstore”)
Columnstore Indexes – part 78 (“Temporary Objects”)
Columnstore Indexes – part 79 (“Loading Data into Non-Updatable Nonclustered Columnstore”)
Columnstore Indexes – part 80 (“Local Aggregation”)
Columnstore Indexes – part 81 (“Adding Columnstore Index to InMemory Tables”)
Columnstore Indexes – part 82 (“Extended Events in SQL Server 2016”)
Columnstore Indexes – part 83 (“Columnstore Replication in SQL Server 2016”)
Columnstore Indexes – part 84 (“Practical Dictionary Cases”)
Columnstore Indexes – part 85 (“Important Batch Mode Changes in SQL Server 2016”)
Columnstore Indexes – part 86 (“New Trace Flags in SQL Server 2016”)
Columnstore Indexes – part 87 (“Indexed Views”)
Columnstore Indexes – part 88 (“Minimal Logging in SQL Server 2016”)
Columnstore Indexes – part 89 (“Memory-Optimised Columnstore Limitations 2016”)
Columnstore Indexes – part 90 (“In-Memory Columnstore Improvements in Service Pack 1 of SQL Server 2016 “)
Columnstore Indexes – part 91 (“SQL Server 2016 Standard Edition Limitations”)
Columnstore Indexes – part 92 (“Lobs”)
Columnstore Indexes – part 93 (“Batch Mode Adaptive Memory Grant Feedback”)
Columnstore Indexes – part 94 (“Use Partitioning Wisely”)
Columnstore Indexes – part 95 (“Basic Query Patterns”)
Columnstore Indexes – part 96 (“Nonclustered Columnstore Index Online Rebuild”)
Columnstore Indexes – part 97 (“Working with Strings”)
Columnstore Indexes – part 98 (“Null Expressions & String Aggregates”)
Columnstore Indexes – part 99 (“Merge”)
Columnstore Indexes – part 100 (“Identity”)
Columnstore Indexes – part 101 (“Estimated? Similar! Similar How?”)
Columnstore Indexes – part 102 (“CCI with Secondary Rowstore Indexes on SQL 2014”)
Columnstore Indexes – part 103 (“Partitioning 2016 vs Partitioning 2014”)
Columnstore Indexes – part 104 (“Batch Mode Adaptive Joins”)
Columnstore Indexes – part 105 (“Performance Counters”)
Columnstore Indexes – part 106 (“Memory Requirements for Rebuild & Reorganize”)
Columnstore Indexes – part 107 (“Dictionaries Deeper Dive”)
Columnstore Indexes – part 108 (“Computed Columns”)
Columnstore Indexes – part 109 (“Trivial Plans in SQL Server 2017”)
Columnstore Indexes – part 110 (“The best column for sorting Columnstore Index on”)
Columnstore Indexes – part 111 (“Row Group Elimination – Pain Points”)
Columnstore Indexes – part 112 (“Linked Servers”)
Columnstore Indexes – part 113 (“Row Groups Merging Limitations”)
Columnstore Indexes – part 114 (“Machine Learning Services”)
Columnstore Indexes – part 115 (“Bulk Load API and Pressure”)
Columnstore Indexes – part 116 (“Partitioning Specifics”)
Columnstore Indexes – part 117 (“Clustered vs Nonclustered”)
Columnstore Indexes – part 118 (“SQL Server 2017 Editions Limitations”)
Columnstore Indexes – part 119 (“In-Memory Columnstore Location”)
Columnstore Indexes – part 120 (“Merge Replication 2016-2017”)
Columnstore Indexes – part 121 (“Columnstore Indexes on Standard Tier of Azure SQL DB”)
Columnstore Indexes – part 122 (“Wait Types”)
Columnstore Indexes – part 123 (“Clustered Columnstore Index Online Rebuild”)
Columnstore Indexes – part 124 (“Estimate Columnstore Compression”)
Columnstore Indexes – part 125 (“Estimate Columnstore Compression as a System Stored Proc”)
Columnstore Indexes – part 126 (“Extracting Columnstore Statistics to Cloned Database”)
Columnstore Indexes – part 127 (“Batch Mode on Rowstore – is it a Columnstore Killer?”)
Columnstore Indexes – part 128 (“Ordering Columnstore Indexes in Azure SQL Datawarehouse”)
Columnstore Indexes – part 129 (“Compatibility with the New Features of Sql Server 2019”)
Columnstore Indexes – part 130 (“Columnstore Indexes on Azure SQL DB”)
Columnstore Indexes – part 131 (“Rebuilding Rowstore Indexes ONLINE”)

27 thoughts on “Columnstore

  1. Anuj Saboo

    Hello,

    I have heard praise about your blog from Brent Ozar podcasts and I would want to ask you a question about ColumnStore Indexes as a DBA. I use SQL 2014 and using the traditional DMV – sys.dm_db_index_physical_stats, I am not able to find fragmentation on Clustered Columnstore Index. When I manually try to find the fragmentation by going into Index Properties, the fragmentation shows at 0% which is quite surprising seeing that I do a lot of data inserts/deletes in my Data Warehouse.

    Does the fragmentation work in some other way, is there any other method to see fragmentation on ColumnStore Indexes?

    1. Niko Neugebauer Post author

      Hi Anuj,

      Columnstore Indexes do not have physical fragmentation in the same sense as the traditional Rowstore indexes. The columnstore segments are stored as LOBs continuously.
      You have the logical fragmentation, because of the deleted rows. For more information check out these posts:
      http://www.nikoport.com/2014/07/29/clustered-columnstore-indexes-part-36-maintenance-solutions-for-columnstore/
      http://www.nikoport.com/2015/06/28/columnstore-indexes-part-57-segment-alignment-maintenance/
      http://www.nikoport.com/2014/07/20/clustered-columnstore-indexes-part-34-deleted-segments-elimination/

      Additionally check out the following script at the CISL library (SQL Server 2016 version):
      https://github.com/NikoNeugebauer/CISL/blob/master/SQL-2016/fragmentation.sql

      Best regards,
      Niko

  2. Ovidiu Sorin Berca

    Hi Niko !
    I am a columnstore index beginner, live and work in Unites states I am working at a presentation for my Company and a demo and I have a related question for you:
    I have to admit I am very confused. The Microsoft page tells us that related to column store bulk insert mode the optimal number of rows is 102400 in order to be compressed but when I load that using an insert –select in SQL2016 I still get delta-stores I do not see any compressed data not until I hit the other number 1048576 (2^20).
    The Microsoft article is at
    https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-overview

    I will try to do bcp so other methods of bulk insert , but what I am doing wrong here?
    If I create the columnstore index from the data heap yes I get the row-groups compressed directly, but not with the insert-select.
    If you answered in your blog somewhere, just point me there please..

    Thank you !
    Sorin

    1. Niko Neugebauer Post author

      Hi Sorin,

      The number 102.400 rows is correct, it activates the switch to load into compressed Row Group without touching Delta-Stores.
      Are you using TABLOCK hint ?
      Are you using SSIS ? Can you share an example of the statement you are invoking ?
      Did you take a look at these articles:
      http://www.nikoport.com/2014/06/20/clustered-columnstore-indexes-part-30-bulk-load-api-magic-number/
      http://www.nikoport.com/2015/08/19/columnstore-indexes-part-62-parallel-data-insertion/

      Best regards,
      Niko

  3. Thorsten

    Hallo Niko,
    great job!

    One remark for VLDBs:
    In Suggested Tables.sql [Min RowGroups] should be int not smallint.

    Best regards,
    Thorsten

  4. Sagar Bathe

    Hi Niko – First of all, let me say this is a remarkable collection of information on CCIs. Awesome!!
    I do an issue on CCIs which I am hoping you may be able to assist. I am creating CCI with partitions on a Fact Table (1.5B records). But when I look at the plan I see a big difference between the estimated and actual row count. I think this is leading to tempdb spillover which is slowing down our reporting queries (Most of our queries have group/order by). I ran DBCC Stats on the CCI and saw that it did not return any records (which I believe is an expected behavior)
    My question is the CCI built properly? Is there a way to build Stats on CCIs which I am missing which is causing the actual vs estimated mismatch.

    This is how I am building the CCI (based on suggestions from Microsoft). Auditwebsite is our partition column
    CREATE CLUSTERED INDEX TableName_cci ON TableName (AuditWebsite)
    WITH (MAXDOP = 0, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
    ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
    ON PS_FactEligibility (auditwebsite);

    CREATE CLUSTERED COLUMNSTORE INDEX TableName_NonQuoted_cci ON TableName
    WITH (MAXDOP = 0, DROP_EXISTING = ON)

    Any pointers are appreciated

    1. Niko Neugebauer Post author

      Hi Sagar,

      I suggest you update the statistics on your CCI table manually before building the index.
      Otherwise notice that the statistics object is populated on the fly when a query is executed against the columnstore index or when executing DBCC SHOW_STATISTICS against the columnstore index, but the columnstore index statistics aren’t persisted in the storage.

      Best regards,
      Niko

  5. Fred

    Hi Niko,
    Vers gréât job on CCI & co
    Little question:
    Do you have a pdf document compiling all posts on CCI/CI ?
    Thks
    Fred

    1. Niko Neugebauer Post author

      Hi Fred,

      thank you very much. There is no PDF, but I know that some people simply convert web pages into PDFs for reading.
      Later this year, there will be a PDF in a form of a book.

      Best regards,
      Niko Neugebauer

  6. Andrey

    Niko hi!

    Do you have any reasons not make all tables cluster columnstore even small ones (<100 recs)?

    Our developers prefer to have all tables unified (all ccs) despite their sizes.
    I have a feeling that it's not a good approach, but have no valid reasons yet except the case with a query which fails in case of small table being ccs and runs fine when the same table is a classic table with clustered index.

    Thanks in advance,
    Andrey.

    1. Niko Neugebauer Post author

      Hi Andrey,

      the unnecessary level of Hash Joins might punish your applications and the forced preference for the Hash Joins instead of the Inner Loop Joins will definitely have effects, even thought they might be small.
      One day the situation might change and the penalty will be too big, because a different kind of testing and different kind of artefacts will appear.
      I suggest to be EXTREMELY careful when building CCI on such small tables.

      Best regards,
      Niko Neugebauer

      1. Andrey

        Niko, thanks for reply!

        I didn’t mention that the db is DWH and analytical queries are the most often ones.
        In this case Hash Joins are more typical than Nested Loops, if I’m not mistaken. What do you think?

        Anyway, I share your opinion with our developers, thanks for that again.

        Regards,
        Andrey.

        1. Niko Neugebauer Post author

          Hi Andrey,

          Regarding the Joins – you write that they are more typical but not exclusive. :)
          I would give an opportunity to Query Optimiser to do the hard choice of choosing, and unless it is badly wrong – I love being able to get better plans according to the current scenario.
          Sounds like you developers are looking for a hammer … As long as they just have the nails to hit – all is fine. ;)

          Best regards,
          Niko Neugebauer

          1. Andrey

            Thanks again, Niko!

            All the best to you, I appreciate your support of SQL community :)

            Regards,
            Andrey.

  7. Gw van Olderen

    Hello,

    Very informative blogseries on the columnstore indexes.
    Do you have any tips or insights on how to use Visual Studio to automate the deoloyment of these indexes?
    If a i add a column to a table with a columnstore index and i deploy that to a production environment de publish script first drops the indes, adds the column, adds a normal clustered index and then recreates the columnstore index.

    Regards,
    Gerwin

      1. Gerwin

        Hi Niko,

        Thanks for your quick reply. For now its usually possible to add the column manually but with continuous integration and automated deployment it would be nice not to manually intervene,

        I reported the problem through visual studio (2019) hoping that it will be picked up and fixed.
        If people reading this will add comments it will hopefully be picked up and fixed.

        https://developercommunity.visualstudio.com/content/problem/787825/when-publishing-a-datbase-with-a-new-column-on-a-t.html

  8. Sumrin

    Hi Niko,

    I am inserting 1M rows into a table I have Columnstore index on it with MaxDOP =0 , but I see the insertation of records its taking more than 2hours. Any tips that you would like to provide or how can I over come this.

Leave a Reply to Anuj Saboo Cancel reply

Your email address will not be published. Required fields are marked *