Columnstore Indexes – part 57 (“Segment Alignment Maintenance”)

Continuation from the previous 56 parts, the whole series can be found at http://www.nikoport.com/columnstore/

In this post I want to focus on a very important aspect of Columnstore Indexes maintenance – Segment Alignment. I have extensively blogged and presented on the matters of the Segment Elimination in the past (Columnstore Indexes – part 34 (“Deleted Segments Elimination”) & Columnstore Indexes – part 29 (“Data Loading for Better Segment Elimination”), for example), but in this post I want to show how to analyse the Segment Alignment, for making the right decision if Columnstore Index is dealigned.

I have already written some of my thoughts about Columnstore Indexes maintenance in Columnstore Indexes – part 36 (“Maintenance Solutions for Columnstore”), but in this post I wanted to focus on the Segment Alignment maintenance.

Notice that this blogpost applies for any SQL Server version starting from 2012 with non-updateable Nonclustered Columnstore Indexes, because basic structure (Compressed Segment) for Columnstore Indexes is available through the sys.column_store_segments DMV, which can be found in every SQL Server version since Columnstore Indexes inception.

The difference in performance of the aligned Segments versus the non-aligned Segments can be huge, because of less Hard Drive involvement. Imagine that your query is reading 100GB of Columnstore Data instead of reading lets say 150GB, or in the very worst case 500GB. You will notice a great performance improvement once you will use it correctly on your Fact Tables, and even if you are applying Columnstore Indexes as Operational Analytics (in SQL Server 2016 for example), the impact can be quite significant, even through just the amount of Data your CPUs will be processing.

Let’s see how the things are functioning in practice and how can we analyse the current Columnstore alignment situation.

My favourite basic test Database ContosoRetailDW is here to support me once again.
I am simply restoring a copy of it, and after upgrading to SQL Server 2014 compatibility level, dropping all Foreign Keys and Primary Key on the test table FactOnlineSales, before creating Clustered Columnstore Index. Notice that this test is executed on SQL Server 2014, but can be adapted for SQL Server 2012,2014,2016 and of course Azure SQLDatabase:

Now let’s run a simple test query to see how good our Segment Elimination is:

Notice that you will need SQL Server 2014 SP1 or superior to see the results of the segment elimination, as I have already disclosed in Columnstore Indexes – part 53 (“What’s new for Columnstore in SQL Server 2014 SP1”), alternatively you will be able to determine it through the column_store_segment_eliminate extended event – which was described in Columnstore Indexes – part 47 (“Practical Monitoring with Extended Events”).

After executing the query, I have received the following results (filtering out the concrete Reads, which are less relevant for the point I am trying to make):

For improving the performance of this query I will align Segments of my Columnstore Index on the DateKey column:

Executing the test query again,

leads us to the following result:

The main thing here is that we simply went from 9 read Segments to 5 Segments, which means we have improved our IO performance almost 2 times, with the very same data. Cool feature. 🙂

In the real-life situation we will have a number of updates on daily/weekly basis, and at some point we might need to make decision if the Columnstore Indexes are truly dealigned or they are still finely tuned.

The issue with the Columnstore Indexes in the real life is that the order of the Segments is not guaranteed at all. We can observe Segments appearing in any order possible, and loading data ordered into the table/partition multiple times will make you loose all that perfect order you have created.

Basically in order to detect Segment overlapping we have 2 situations:
1. The minimum value of the compared Segment is between the Min & Max values of the current Segment
2. The maximum value of the compared Segment is between the Min & Max values of the current Segment.
Scenario 1 for Segment OverlappingScenario 2 for Segment Overlapping

Additionally, I have included the message for the support of the respective datatype for Segment Elimination (you do not want to waste your time aligning on the wrong data type, which shall simply won’t do any Segment Elimination at all).

I could have went much into this beta-version implementation, but at this point it is as simple loop through each of the available Segment in the Table partition and compared its maximum & minimum values.
I understand that it performance can be improved and I promise to post updates for this post very soon. 🙂

Segment Dealignment AnalysedHere is the result of this query, which shows Segment Dealignment for each of the available columns in my FactOnlineSales table. You can clearly see which columns Segments are 100% aligned, which are totally overlapping and which you should not care about.

You can try to order the Segments on the different column and re-execute the script to see what combination and alignment percentage it will show.
Feel free to play with this script and let me know about the results.

to be continued with Columnstore Indexes – part 58 (“String Predicate Pushdown”)

6 thoughts on “Columnstore Indexes – part 57 (“Segment Alignment Maintenance”)

  1. Hugo Kornelis

    Hi Niko,

    There is actually a simpler way to detect overlapping segments. Key is to first think about the scenario’s of non-overlapping: either the other segment ends before the current segment starts, or the other segments starts after the current segment ends. All other scenarios are then overlap.
    In T-SQL terms, the condition for overlap can be written as

    WHERE seg.hobt_id = otherSeg.hobt_id
    AND seg.partition_id = otherSeg.partition_id
    AND seg.column_id = otherSeg.column_id
    AND seg.segment_id otherSeg.segment_id
    AND seg.min_data_id otherSeg.min_data_id

    I expect this to perform a bit better as well (but you’ll need a REALLY huge table before you get so many rows in sys.column_store_segments that you will notice the difference)

    1. Niko Neugebauer Post author

      Hey Hugo,

      Interesting. Currently in the CISL, the algorithm used is different to the one posted originally in this blog post. Looking at your suggestion I would guess that they are equal or extremely similar. 🙂
      I start to believe, I should go back to my elder posts and update them – but generally I am too lazy for that.

      Best regards,
      Niko Neugebauer

  2. Alex

    Thank you for your article.
    When you drop the existing index to create the clustered (Rowstore) index on DateKey, that is to reorder the physical data by the DateKey right?
    In my own environment for some reason, when I follow this step with dropping the index and creating the columnstore index, I am not achieving the segment alignment as you have. In fact I have 90 to 100 dealignment, even though I am first creating a clustered index. Can you recommend what I look into?

      1. Alex

        Yes, I did notice that for higher DOP than 1 then the CCI build will not keep the ordering, thank you for writing about that. I have no dictionaries and I did set MAXDOP = 1, so memory may be where I’m having a problem – I only have 8GB.

Leave a Reply

Your email address will not be published. Required fields are marked *