Clustered Columnstore Indexes – part 21 (“DBCC CSIndex”)

Continuation from the previous 20 parts, starting from https://www.nikoport.com/2013/07/05/clustered-columnstore-indexes-part-1-intro/

There is a new kid on the block for Columnstore Indexes (Clustered & Nonclustered), which allow us to take a deep look into the internal stuff of the Columnstore Indexes. This one is a kind of newer brother to the good old DBCC Page function, which a lot of SQL Server professionals try to use at one occasion or another.
One absolutely awesome detail about it – is that it exists since SQL Server 2012 actually. Yes, that means that you can actually run this command on the good old Nonclustered Columnstore Indexes on SQL Server 2012. :)

I would be glad to claim the authorship of this discovery, but in the reality it was the awesome Dmitri Pilugin that already wrote about it a lot of time ago.

So the name of this command is simply – DBCC CSIndex and here is the known syntax for this command:

DBCC CSIndex (
    {'dbname' | db_id}, 
    rowsetid, -- HoBT or PartitionID from sys.column_store_segments
    columnid, -- column_id from sys.column_store_segments
    rowgroupid, -- segment_id from sys.column_store_segments
    object_type, -- 1 (Segment), 2 (Dictionary),
    print_option -- {0 or 1 or 2}; No idea what is the difference between those values.
    [, start]
    [, end]
)

I am keep on playing with Contoso BI database while using SQL Server 2014 CTP 2.

To start I am running the necessary queries to create the Clustered Columnstore Index on the dbo.FactOnlineSales table.

ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [PK_FactStrategyPlan_StrategyPlanKey];

ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimCurrency];
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimCustomer];
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimDate];
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimProduct];
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimPromotion];
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimStore];

Create Clustered Columnstore Index PK_FactOnlineSales
	on dbo.FactOnlineSales;

Now let us get the necessary data to make the DBCC CSIndex command running:

-- Get HoBT or PartitionID
select top 1 DB_ID(), seg.HOBT_ID from sys.column_store_segments seg
	inner join sys.partitions as p 
		ON seg.partition_id = p.partition_id
	where OBJECT_id = OBJECT_ID('FactOnlineSales');

We are ready, lets run the command:

-- Print DBCC output into a new Window
dbcc traceon (3604);

-- Here ... We ... Go ... :)
dbcc csindex (
    7,  -- DB_ID()
    72057594048282624, -- HoBT or PartitionID
    18, -- column_id
    0, -- segment_id from sys.column_store_segments
    1, -- 1 (Segment), 2 (Dictionary),
    0 -- [0 or 1 or 2]
);

Here comes the beginning of the output that I have executed on my table (note that your output might be quite different):
Segment Attributes:

Version = 1                         encodingType = 2                    hasNulls = 0
BaseId = -1                         Magnitude = -1.000000e+000          PrimaryDictId = 0
SecondaryDictId = -1                MinDataId = 4800                    MaxDataId = 10602200
NullValue = -1                      OnDiskSize = 318584                 RowCount = 1048576

RLE Header:

Lob type = 3                        RLE Array Count (In terms of Native Units) = 1587
RLE Array Entry Size = 8            
RLE Data:

Index = 0                           Value = 427                         Count = 252747
Index = 1                           Value = 324                         Count = 5279
Index = 2                           Bitpack Array Index = 0             Count = 2548
Index = 3                           Value = 438                         Count = 79
Index = 4                           Bitpack Array Index = 2548          Count = 264
...

First of all we really see all the data that we can already find in the DMVS, such as Version, Base_Id, MinDataId, MaxDataId, etc.
But in the 2nd paragraph the really interesting stuff is appearing:

Lob type – It looks that Microsoft is using different type of LOBs when storing the data
RLE – Run Length Encoding, the encoding that is being used to create the compression
Bitpack Array Index – Bitpack algorithm, used for compression.
… and so on and so forth.
I just hope that there will be someone from the product team that will explain all the output of this function soon, but realistically saying I am not expecting it to happen really soon.

Feel free to try and explore this function, but please do it only on the test VMs, since you will easily manage to produce a good amount of errors.

to be continued with Clustered Columnstore Indexes – part 22 (“Invisible Row Groups”)

Leave a Reply

Your email address will not be published. Required fields are marked *