Clustered Columnstore Indexes – part 34 (“Deleted Segments Elimination”)

Continuation from the previous 33 parts, starting from https://www.nikoport.com/2013/07/05/clustered-columnstore-indexes-part-1-intro/

Segment Elimination for Deleted Row Groups

After some very interesting and inspiring conversations during SQLBits, I came to an idea of testing a Segment Elimination while working with a fully deleted Segment:

Imagine that you are regularly updating your Columnstore Table and it reaches a certain moment when all data inside a compressed Segment is deleted.
What happens to the Segment elimination process executed by Query Optimiser ?
Will it check on the meta-information level if each of the present rows inside a particular Segment are eliminated and thus reading this segment does not make any sense at all ?

Read this blog post to find it out:

Let us kick of with the a simple setup script, that will create a simple table with just 1 column, which shall be defined as an identity by default.

create table dbo.MaxDataTable(
	c1 bigint identity(1,1));

— Create a Clustered Columnstore Index:
create clustered columnstore index PK_MaxDataTable
	on dbo.MaxDataTable;

— Insert 2 Sequential segments full of default values
declare @i as int;
declare @max as int;
select @max = isnull(max(C1),0) from dbo.MaxDataTable;
set @i = 1;

begin tran
while @i <= 1048576*2
begin
	insert into dbo.MaxDataTable
		default values

	set @i = @i + 1;
end;
commit;

— Invoke Tuple Mover and for all Delta Stores Compression:
alter index PK_MaxDataTable on dbo.MaxDataTable
	Reorganize with (COMPRESS_ALL_ROW_GROUPS = ON);

Now we can check the situation with our Row Groups, and for this purpose I shall use the following script:

-- Clustered Columnstore Indexes Analysis
SELECT seg.segment_id, min_data_id, max_data_id, row_count, deleted_rows
	,state_description, data_compression_desc
	--,*
	FROM sys.column_store_segments AS seg 
		INNER JOIN sys.partitions AS p 
			ON seg.hobt_id = p.hobt_id 
		INNER JOIN sys.column_store_row_groups rg
			ON p.object_id = rg.object_id and seg.segment_id = rg.row_group_id
	WHERE p.object_id = object_id('MaxDataTable')
		and seg.column_id = 1 
	order by seg.segment_id

Row Groups after initial Load As you can see, we have a perfect column clustering on our C1 column, with numbers sequentially occupying their place between 2 compressed Segments.

Now we can test our Table by invoking a test Query for determining if Segment Elimination is working as expected in a situation without any deleted rows.

-- Turn on traces 3605 & 646
dbcc TraceOn(3605,-1);
dbcc TraceOn(646,-1);
dbcc ErrorLog;
go

-- A simple query
select C4, count(C1) 
	from dbo.MaxDataTable 
		where C1 = 10
	group by C4;
go

-- Turn off both traces
dbcc TraceOff(3605,-1);
dbcc TraceOff(646,-1);

-- Read error log
exec sys.xp_readerrorlog

Now we can test our Table by invoking a test Query for determining if Segment Elimination is working as expected in a situation without any deleted rows:
ErrorLog with Segment Elimination
Everything looks fine our second Row Group (id =3) was eliminated from the final result, this is exactly what we have expected.

Lets Delete half of the available rows, all those which do belong to the first of our 2 Segments:

delete from dbo.MaxDataTable
	where C1 <= 1048576;

We need to determine if anything has changed in our meta-information, and so let us run a query to check on the compressed Segments:

-- Clustered Columnstore Indexes Analysis
SELECT seg.segment_id, min_data_id, max_data_id, row_count, deleted_rows
	,state_description, data_compression_desc
	--,*
	FROM sys.column_store_segments AS seg 
		INNER JOIN sys.partitions AS p 
			ON seg.hobt_id = p.hobt_id 
		INNER JOIN sys.column_store_row_groups rg
			ON p.object_id = rg.object_id and seg.segment_id = rg.row_group_id
	WHERE p.object_id = object_id('MaxDataTable')
		and seg.column_id = 1 
	order by seg.segment_id

We need to determine if anything has changed in our meta-information, and so let us run a query to check on the compressed Segments:
Row Groups after deleting first Segment data Quite visible that in our DeletedBitmap for the first of our Segments we have the very same amount of rows that are located in our first Segment - 1048567. This is means that our first Segment has become practically useless, because it does not contain any data at all (all rows are now deleted).
The first question naturally will be what happens with this Segment – and the answer is Nothing. It will stay this way until we shall rebuild our table or partition containing the respective Segment.
I will get back to this topic in the end of this blog post.

For now let us invoke our test query again with the same predicate id (10) again, to see if we can get some Segment Elimination out of Query Optimiser:

-- Turn on traces 3605 & 646
dbcc TraceOn(3605,-1);
dbcc TraceOn(646,-1);
dbcc ErrorLog;
go

-- A simple query
select count(C1) 
	from dbo.MaxDataTable 
		where C1 = 10
go

-- Turn off both traces
dbcc TraceOff(3605,-1);
dbcc TraceOff(646,-1);

-- Read error log
exec sys.xp_readerrorlog

Error Log with no success at Deleted Segment Elimination Oh well, we are not very lucky, Query Optimiser has ignored only the second segment while reading the first one, where the value 10 should be contained, but only in this case as we have seen, each value of the Segment has already been marked as deleted in Deleted Bitmap.

Final Thoughts

I imagine that it might be considered quite a waste of time to check deleted bitmaps information and if a Segment has become useless, even though I could argue that such checks by default for DataWarehousing queries are not that expensive, given the nature and specificity of the long running queries, especially because in such environments there are not a big number of them.

As an alternative, we need some kind of mechanism which would allow us to delete those Segments without invoking REBUILD functionality, because at the moment we would automatically loose our well prepared Segment Clustering and because it might not make any sense at all.

I would be more than happy to have some kind of hint for Tuple Mover command, such as:
ALTER INDEX REORGANIZE with (REMOVE_ALL_DELETED_SEGMENTS = ON);
which would analyse such Segments and remove them completely.
I hope that this blog post shall receive an update soon … :)

Update on 28th of July 2014: A Connect item has been filed suggesting some improvements into the maintenance procedures for Columnstore Indexes.

to be continued with Clustered Columnstore Indexes – part 35 ("Trace Flags & Query Optimiser Rules")

Leave a Reply

Your email address will not be published. Required fields are marked *