Clustered Columnstore Indexes – part 22 (“Invisible Row Groups”)

Continuation from the previous 21 parts, starting from

Last week on an amazing 2013 edition of PASSCamp in Germany one of the attendees has called my attention to the fact that during hands-on labs he has managed to get an invisible Row Group. I was in the middle of demos and did not get the message during the first call – I thought that it was an unexpected Row Group that has managed to see, but as a matter of a fact – it was something different.

Indeed it was something, that is called Invisible Row Group. I can only say that I was extremely excited about it since I have never managed to see one in my tests and demos. I am referring to the very special moment of life of any Delta Store, when it is right to become a compressed Segment. If you are running tests during the period when Tuple Mover kicks in, you have great chances to find it at CTP2 for sure.

When I had a first couple of minutes free I have managed to imitate the conditions, which I will describe below:
Lets kick off by creating an empty table with a Clustered Columnstore Index:

create table dbo.MaxDataTable(
	c1 bigint,
	c2 numeric (36,3),
	c3 bit,
	c4 smallint,
	c5 decimal (18,3),
	c6 smallmoney,
	c7 int,
	c8 tinyint,
	c9 money,
	c10 float(24),
	c11 real,
	c12 date,
	c13 datetimeoffset,
	c14 datetime2 (7),
	c15 smalldatetime,
	c16 datetime,
	c17 time (7),
	c18 char(100),
	c19 varchar(100),
	c20 nchar(100),
	c21 nvarchar(100),
	c22 binary(8),
	c23 varbinary(8),
	c24 uniqueidentifier,

-- Lets try out to create a Clustered Columnstore Index:
Create Clustered Columnstore Index 
	CC_MaxDataTable on dbo.MaxDataTable;

I will insert 2 new rows and compress this Row Group:

insert into dbo.MaxDataTable
	default values;

insert into dbo.MaxDataTable
	default values;

alter table dbo.MaxDataTable

Let us insert 10.000 rows and update them immediately:

-- Insert 10.000 rows
insert into dbo.MaxDataTable
	default values;
GO 10000

-- *****************************************************************
-- Lets Update all rows with some data 
with updTable as
	select *
		, row_number() over(partition by C1 order by C1) as rnk
		from dbo.MaxDataTable
update updTable
	set C1 = rnk,
		C2 = ABS(CHECKSUM(NewId())) % 142359.0,
		C3 = 1,
		C4 = cast(ABS(CHECKSUM(NewId())) % 10000 as smallint),
		C5 = ABS(CHECKSUM(NewId())) % 242359.0,
		C6 = rnk / ABS(CHECKSUM(NewId())) % 242359.0,
		C7 = ABS(CHECKSUM(NewId())) % 2000000,
		C8 = ABS(CHECKSUM(NewId())) % 255,
		C9 = rnk / ABS(CHECKSUM(NewId())) % 242359.0,
		C10 = rnk / ABS(CHECKSUM(NewId())) % 242359.0,
		C11 = rnk / ABS(CHECKSUM(NewId())) % 242359.0,
		C12 = getDate(),
		C14 = DATEADD(day, (ABS(CHECKSUM(NEWID())) % 65530), 0),
		C15 = DATEADD(day, (ABS(CHECKSUM(NEWID())) % 65530), 0),
		C16 = DATEADD(day, (ABS(CHECKSUM(NEWID())) % 65530), 0),
		C17 = getDate(),
		C18 = cast( ABS(CHECKSUM(NewId())) % 242359.0 as char(25)),
		C19 = cast( ABS(CHECKSUM(NewId())) % 232659.0 as varchar(25)),
		C20 = cast( ABS(CHECKSUM(NewId())) % 242359.0 as nchar(25)),
		C21 = cast( ABS(CHECKSUM(NewId())) % 232659.0 as nvarchar(25));

Before we advance to the next step, please use the following script to monitorize the state of the Row Groups in our table:

SELECT rg.total_rows, 
	    cast(100.0*(total_rows)/1048576 as Decimal(6,3)) as PercentFull,
		cast(100-100.0*(total_rows - ISNULL(deleted_rows,0))/iif(total_rows = 0, 1, total_rows) as Decimal(6,3)) AS PercentDeleted, 
		i.object_id, object_name(i.object_id) AS TableName, AS IndexName, i.index_id, i.type_desc, 
	FROM sys.indexes AS i
	INNEr JOIN sys.column_store_row_groups AS rg
		ON i.object_id = rg.object_id
	AND i.index_id = rg.index_id 
	WHERE object_name(i.object_id) = 'MaxDataTable' 
		--and state = 0
	ORDER BY object_name(i.object_id),, row_group_id;


Its time to insert an extra Row Group – 1.045.678 rows:

-- Insert 1.000.000 rows in order to get the Delta-Store full )
declare @i as int;
declare @max as int;
select @max = isnull(max(C1),0) from dbo.MaxDataTable;
set @i = 1;

begin tran
while @i <= 1048576
	insert into dbo.MaxDataTable
		default values

	set @i = @i + 1;

You should immediately kick in Tuple Mover manually, while checking the status of the Row Groups using the script above:

alter index [CC_MaxDataTable] on dbo.MaxDataTable

This is a screenshot of the result that I have managed to capture:
Screen Shot 2013-12-10 at 00.07.35

There are 4 Row Groups visible, with 1 of them having status equal to 0 (Zero), and status_description of "Invisible". This is clearly a moment when we are compressing our closed Delta-Store which is identified with Row_Group_id = 1 (state = "closed").

How can I be sure ? - Very easily: should we keep executing our script for checking on Row Groups, we should be able to find our Row Group with id = 2 gone:
Screen Shot 2013-12-10 at 00.43.11

A very clear show of how things are functioning in Clustered Columnstore :)

This raises another important issue for all those future scripts consulting sizes of Clustered Columnstore Indexes - a predicate with state > 0 should be included in order to avoid double counting the number of rows inside the Clustered Columnstore Index.

to be continued with Clustered Columnstore Indexes – part 23 ("Data Loading")

4 thoughts on “Clustered Columnstore Indexes – part 22 (“Invisible Row Groups”)

  1. Michael Zilberstein

    Thanks for this great series! I’ve been exploring the topic myself and also stumbled upon Invisible row-groups. I’ve submitted a Connect item about it:

    Microsoft answered that this type wasn’t supposed to be seen by users, but since it is evidently can be seen, they’ve documented it now. So now this “invisible” status is “visible” :-)

  2. Suman

    Any idea how to find the number of rows going into delta store. I am running a load from SSIS package using “Fast load” to OLEDB Destination, and the row store state changes from OPEN,CLOSED,COMPRESSED. But when load is done using “INSERT INTO SELECT * FROM”, row group state moves from INVISIBLE to COMPRESSED, and there is a lot of performance difference between the 2 methods, in which, the later is much significantly faster…

    1. Niko Neugebauer Post author

      Hi Suman,

      it seems to be a bug, I have submitted a Connect Item –
      In order to get to use BULK LOAD API in SSIS DataFlow, please use “SQL Server Destination” for your table. It will support the Bulk Load API for Dataflows inserting more than 102.400 rows, as explained here –
      I actually consider that it might be by design, since SQLNCLI version was not changed from SQL Server 2012 to 2014.
      I have seen enough references from the Columnstore Developers that OLE DB Destination should work in Bulk LOAD API, but I hope to find out a definitive answer soon.

      Best regards,
      Niko Neugebauer

Leave a Reply

Your email address will not be published. Required fields are marked *