Clustered Columnstore Indexes – part 33 (“Tuple Mover that closes open Delta-Stores”)

Continuation from the previous 32 parts, starting from https://www.nikoport.com/2013/07/05/clustered-columnstore-indexes-part-1-intro/

We all love and use Columnstore Segments, but we all kind of scared of the open Delta-Stores (yes there can be more than 1).
Why would that be?
– because they are uncompressed heaps which most probably do not benefit from the Batch Mode at all, and even more probably slowing it down;
– because there is no way of how we can measure their size definitely;

What can we do about it:
1. Rebuild the Index – this operation will definitely close all open Delta-Store, shuffle the data and after some time (depending on a number of factors) it will present us a fully “Segmented” Table :) It might take long time, depending if we have enough partitioning and are using
2. Wait until we can load more data through the normal API (BULK Load API would create another closed Segment without touching on the Delta-Store), filling out the maximum capacity of a Row Group (1.045.678 Rows)
3. Cry. :)

Reorganize with (COMPRESS_ALL_ROW_GROUPS = ON)

Consider an undocumented (at the moment of the writing) hint COMPRESS_ALL_ROW_GROUPS = ON which so far I understand looks for any open Delta-Stores, closes them as they are, without shuffling any additional data and compresses them immediately.
This is our old good friend Tuple Mover, but which corrects any undesired situations. :)

For the demo, I shall be using a freshly restored version of the good old & free Contoso Retail DWH database:

-- Let us create a table which copies structure of FactOnlineSales
CREATE TABLE [dbo].[FactOnlineSales_Test](
	[OnlineSalesKey] [int] NOT NULL,
	[DateKey] [datetime] NOT NULL,
	[StoreKey] [int] NOT NULL,
	[ProductKey] [int] NOT NULL,
	[PromotionKey] [int] NOT NULL,
	[CurrencyKey] [int] NOT NULL,
	[CustomerKey] [int] NOT NULL,
	[SalesOrderNumber] [nvarchar](20) NOT NULL,
	[SalesOrderLineNumber] [int] NULL,
	[SalesQuantity] [int] NOT NULL,
	[SalesAmount] [money] NOT NULL,
	[ReturnQuantity] [int] NOT NULL,
	[ReturnAmount] [money] NULL,
	[DiscountQuantity] [int] NULL,
	[DiscountAmount] [money] NULL,
	[TotalCost] [money] NOT NULL,
	[UnitCost] [money] NULL,
	[UnitPrice] [money] NULL,
	[ETLLoadID] [int] NULL,
	[LoadDate] [datetime] NULL,
	[UpdateDate] [datetime] NULL
) ON [PRIMARY]

-- Create a Clustered Columnstore Index on it
create clustered columnstore index CCI_FactOnlineSales_Test
	on dbo.FactOnlineSales_Test;

Now let’s load 100.000 Rows, under the magic number of 102.400 in order to keep an open Delta-Store:

insert into dbo.FactOnlineSales_Test
select top 100000 *
	from dbo.FactOnlineSales;

Now we can verify that we have an open Delta-Store at our table:

select object_name(object_id) as TableName, * from sys.column_store_row_groups
	where object_id = object_id('FactOnlineSales_Test');

This is what I see at my VM, we have an open Delta-Store, with 100.000 rows in it.
Screen Shot 2014-07-04 at 20.05.46

Now we can go ahead and invoke our Tuple Mover with a hint COMPRESS_ALL_ROW_GROUPS in order to close & compress our Delta-Store:

alter index CCI_FactOnlineSales_Test on FactOnlineSales_Test
	Reorganize with (COMPRESS_ALL_ROW_GROUPS = ON);

We need to consult the situation with the row groups again:

select object_name(object_id) as TableName, * from sys.column_store_row_groups
	where object_id = object_id('FactOnlineSales_Test');

And here we go:
Compressed_Row_Group_after_Reorganize_with_ COMPRESS_ALL_ROW_GROUPS _hint

I consider this to be an extremely valuable hint, which in the case of a bug with Delta-Store allows us to correct the situation or should we be getting a lot of queries hitting an open Delta-Store, to compress it and to remove the pressure.

Final thoughts:
– I am wondering if when loading the data we could take a good usage in order to get the desired Row Group sizes … :)
– Are there any methods to prevent automated Segment creation for BULK Load API processes with more then 102.400 rows…?

to be continued with Clustered Columnstore Indexes – part 34 (“Deleted Segments Elimination”)

6 thoughts on “Clustered Columnstore Indexes – part 33 (“Tuple Mover that closes open Delta-Stores”)

  1. Stijn Wynants

    Niko,

    While executing this on the new SQL Server 2016 I notice that even with the COMPRESS_ALL_ROW_GROUPS = ON statement, it will also start reshuffling the existing row groups. Has there been any changes to the statement? Is it still possible on SS2016 SP1 to force it to only close the open delta stores with some parameter?

    1. Niko Neugebauer Post author

      Hi Stijn,

      on 2016 ALTER INDEX … REORGANIZE executes all operations, including self-merge and inter-group merge.
      So far i know there is no way to close Delta-Stores only without avoiding other operations.

      Best regards,
      Niko Neugebauer

  2. Joe Obbish

    Hi Niko!

    I hope that this message finds you well. In your final thoughts section you asked if there are any methods to insert more than 102399 rows while avoiding the bulk load API. I know that this is an old post, but just in case it’s still useful you can try a MAX_GRANT_PERCENT hint of 0 or writing the query in such a way that the cardinality estimate for the insert is less than 251 rows. In the first case the query executes with a memory grant of 0 KB and in the second case it executes without even asking for a memory grant for the columnstore compression. Of course there can be other consequences to the query if it’s more than just a simple INSERT.

    – Joe Obbish

  3. Greg

    Hi Niko ,
    With the Tuple Mover back ground process will it every move the the delta’s rows to the column store by itself? We have a situation where we are loading around 5000 rows per minute into a columnstore table. These rows never seem to be moved from the Deltastore to the columnstore.
    To fix this we run a reorg on the index .
    To me it seems strange that this wouldn’t be triggered once it past the 102k threshold.

    cheers
    Greg

    1. Niko Neugebauer Post author

      Hi Greg,

      did you check if your table uses COMPRESSION_DELAY property as in Columnstore Indexes – part 76 (“Compression Delay”) [http://www.nikoport.com/2016/02/04/columnstore-indexes-part-76-compression-delay/] ?
      This is a common case for such situations.

      Best regards,
      Niko

Leave a Reply to Niko Neugebauer Cancel reply

Your email address will not be published. Required fields are marked *