CISL 1.1.0: cstore_doMaintenance (Columnstore Maintenace Solution Explained)

After the initial release of the SQL Server Maintenance Solution in CISL 1.1.0, I felt that a separate blog post that would explain the current parameters and the way to use it are needed.
Responding to this inner calling, here is the blog post that will guide you into the basics of the maintenance solution for Columnstore Indexes in SQL Server 2014.

Introduction

With the release of CISL 1.1.0, I have included the very first version of the maintenance script for SQL Server 2014.
The current version has a lot of configuration parameters, but the number of options will be expanded in the following versions as I will be finalising some of the more advanced functionalities.
Current implementation of the cstore_doMaintenance stored procedure is based on other CISL function and requires them to be installed in order to function correctly.
While very common and easy to develop, I recognise that it’s performance right now is pretty far away from I want it to be. In the future versions I will invest time to make it work faster and independently from other CISL stored procedures.

Right now you can maintain any Clustered Columnstore Index based on the different types of logical fragmentations (for more information please read Columnstore Indexes – part 36 (“Maintenance Solutions for Columnstore”) & Columnstore Indexes – part 62 (“Row Groups Trimming”)),
you can force Segment Algiment,
should you work on a SQL Server Instance, where Tuple Mover is compressed – by default the procedure will detect this situation and invoke it manually,
you can force the closure of open Delta-Stores (there is a parameter for that),
and should your table suffer from the Dictionary pressure, then the default parameter will simply ignore and not rebuild it (but of course you can override that with another parameter),
and much more!

The procedure cstore_doMaintenance will allow you to get rid of the deleted data (it is not done automatically), and to maintain your Columnstore Index fit to perform at the maximum speed by taking advantage of the Segment Elimination.

Current Limitations:
SQL Server 2014 – this limitation will be removed in the 1.1.x series in the next 2 months, as I am preparing the SQL Server 2012 version.
I could obviously release a limited SQL Server 2016 version, but I want to concentrate on bringing it to a good level first, where taking advantage of the Row Group Merging is automatic, and all possible scenarios of the combination of the SQL Server 2016 technologies are recognised.
Only Clustered Columnstore Indexes are supported – this limitation should be lifted soon, together with the release of SQL Server 2012 version of the maintenance script.
Segment Clustering on Partition Level – is not yet supported. Coming out together with SQL Server 2016.

Installation

The stored procedure dbo.cstore_doMaintenance can take advantage of the 3 separate tables (cstore_MaintenanceData_Log, cstore_Operation_Log, cstore_Clustering) that will allow you to do better maintenance. The first 2 tables will guard the logging of the columnstore table analysis

The following parameter in the cstore_doMaintenance.sql file will allow you to control if the tables are created and if the cstore_Clustering table is populated with the default values for the current existing Clustered Columnstore Indexes or not.

declare @createLogTables bit = 1;

By default the script will create those 3 tables inside your database. Those tables are needed for logging and for the advanced functionalities.
For example if you want to take advantage of the Segment Clustering, you will need to have table cstore_Clustering existing and well-configured.

Configuration

During the setup if you enable parameter @createLogTables, besides creating dbo.cstore_Clustering table, the setup script will take all existing Clustered Columnstore Indexes and will store the names of the tables together with the partition ids (should there be no partitioning, the default partition_id of 1 will be stored).

As you might have noticed, I am using the script from not yet released or committed version 1.1.1, that contains a couple of improvements :)

-- Configuration table for the Segment Clustering
	create table dbo.cstore_Clustering(
		TableName nvarchar(256)  constraint [PK_cstore_Clustering] primary key clustered,
		Partition int,
		ColumnName nvarchar(256)
	);

	IF OBJECT_ID('tempdb..#ColumnstoreIndexes') IS NOT NULL
		DROP TABLE #ColumnstoreIndexes;

	create table #ColumnstoreIndexes(
		[id] int identity(1,1),
		[TableName] nvarchar(256),
		[Type] varchar(20),
		[Partition] int,
		[Compression Type] varchar(50),
		[BulkLoadRGs] int,
		[Open DeltaStores] int,
		[Closed DeltaStores] int,
		[Compressed RowGroups] int,
		[Total RowGroups] int,
		[Deleted Rows] Decimal(18,6),
		[Active Rows] Decimal(18,6),
		[Total Rows] Decimal(18,6),
		[Size in GB] Decimal(18,3),
		[Scans] int,
		[Updates] int,
		[LastScan] DateTime
	);

	insert into #ColumnstoreIndexes
		exec dbo.cstore_GetRowGroups @indexType = 'CC', @showPartitionDetails = 1;

	insert into dbo.cstore_Clustering( TableName, Partition, ColumnName )
		select TableName, Partition, NULL 
			from #ColumnstoreIndexes ci
			where TableName not in (select clu.TableName from dbo.cstore_Clustering clu);

Should you need to add more tables, because you have created more Clustered Columnstore Indexes, all you need is to execute the following part, which will include all new tables into the dbo.cstore_Clustering table:

insert into dbo.cstore_Clustering( TableName, Partition, ColumnName )
		select TableName, Partition, NULL 
			from #ColumnstoreIndexes ci
			where TableName not in (select clu.TableName from dbo.cstore_Clustering clu);

Parameters

It is important to note that the default parameters will not execute anything.
I have decided to advance with this implementation for the 1.1.x series, so that people who start playing with Columnstore Indexes won’t get into major troubles like accidentally rebuilding 1TB table or something similar :)

@execute bit = 0 – controls if the maintenace is executed.

@orderSegments bit = 0 – controls whether Segment Clustering is being applied or not. You will need to have table cstore_Clustering created and configured. For more details, please consult the previous item in this blog post – the configuration.

@executeReorganize bit = 0, – controls if the Tuple Mover is being invoked or not. By enabling this parameter and disabling @execute you can simply run Tuple Mover on your database.

@closeOpenDeltaStores bit = 0 – controls if the Open Delta-Stores are closed and compressed. Open Delta-Stores are those which have not reached the maximum number of rows – 1048567.

@usePartitionLevel bit = 1 – controls if whole table is maintained or the maintenance is done on the partition level.

@tableName nvarchar(max) = NULL – this parameter allows to filter out only a particular table by it’s name.

@useRecommendations bit = 1 – activates internal optimizations for a more correct maintenance proceedings. In the current version, this parameter controls if procedure will look for the active Trace Flag 634 and automatically activates Tuple Mover for compressing closed Delta-Stores, but in the upcoming releases a lot of intelligence will be connected to this parameter functionality.

@maxdop tinyint = 0 – allows to control the maximum degreee of parallelism. Notice that factual DOP will be lowered to the maximum settings in Resource Governor and the used DOP might be lowered even further, depending on the amount of free memory available.

@logData bit = 1 – controls if functionalites are being logged into the logging tables cstore_MaintenanceData_Log, cstore_Operation_Log. By default it is enabled, but should logging tables not exist, then no arrow will happen.

@debug bit = 0 – this parameter prints out the debug information and the commands that will be executed if the @execute parameter is set to 1. It is mostly used by me in the development

@minSegmentAlignmentPercent tinyint = 70 – sets the minimum alignment percentage, after which the Segment Alignment is forced. This parameter requires @orderSegments = 1, and the configuration table cstore_Clustering to exist and to be well-configured.

@logicalFragmentationPerc int = 15 – defines the maximum logical fragmentation for the Rebuild. This is simply the percentage of the data that is marked as deleted in the Deleted Bitmap. This parameter is influenced by the recognition of the partitions or the whole index with the usage of the @usePartitionLevel parameter.

@deletedRGsPerc int = 10 – defines the maximum percentage of the Row Groups that can be marked as Deleted. This parameter is influenced by the recognition of the partitions or the whole index with the usage of the @usePartitionLevel parameter.

@deletedRGs int = NULL – defines the maximum number of Row Groups that can be marked as Deleted before Rebuild. NULL means that this parameter is to be ignored. This parameter is influenced by the recognition of the partitions or the whole index with the usage of the @usePartitionLevel parameter.

@trimmedRGsPerc int = 30 – defines the maximum percentage of the Row Groups that are trimmed (not full). This parameter is influenced by the recognition of the partitions or the whole index with the usage of the @usePartitionLevel parameter.

@trimmedRGs int = NULL – defines the maximum number of the Row Groups that are trimmed (not full). NULL means to be ignored. This parameter is influenced by the recognition of the partitions or the whole index with the usage of the @usePartitionLevel parameter.

@minAverageRowsPerRG int = 550000 – defines the minimum average number of rows per Row Group for triggering Rebuild. This parameter is influenced by the recognition of the partitions or the whole index with the usage of the @usePartitionLevel parameter.
The default value for this parameter is set to 550.000 rows, which is very low. In the upcoming versions there will be some important changes here.

@maxDictionarySizeInMB Decimal(9,3) = 10. – defines the maximum size of a dictionary to determine the dictionary pressure and avoid rebuilding. This parameter is influenced by the recognition of the partitions or the whole index with the usage of the @usePartitionLevel parameter.

@ignoreInternalPressures bit = 0 – Allows to execute rebuild of the Columnstore, while ignoring the signs of memory & dictionary pressures

Examples

Let’s consider a couple of examples:
First I will use a freshly restored copy of the free sample ContosoRetailDW database, which backup I am storing in C:\Install and the data files are to be stored in C:\Data folder:

USE [master]
 
alter database ContosoRetailDW
    set SINGLE_USER WITH ROLLBACK IMMEDIATE;
 
RESTORE DATABASE [ContosoRetailDW] 
    FROM  DISK = N'C:\Install\ContosoRetailDW.bak' WITH  FILE = 1,  
        MOVE N'ContosoRetailDW2.0' TO N'C:\Data\ContosoRetailDW.mdf', 
        MOVE N'ContosoRetailDW2.0_log' TO N'C:\Data\ContosoRetailDW.ldf',  
        NOUNLOAD,  STATS = 5;
 
alter database ContosoRetailDW
    set MULTI_USER;
GO

GO
ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 120
GO
ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0', SIZE = 2000000KB , FILEGROWTH = 128000KB )
GO
ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0_log', SIZE = 400000KB , FILEGROWTH = 256000KB )
GO

Since this is SQL Server 2014, we will need to drop all foreign & primary keys:

use ContosoRetailDW;

ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimCurrency]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimCustomer]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimDate]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimProduct]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimPromotion]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimStore]

ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [PK_FactOnlineSales_SalesKey];

Let’s create a clustered columnstore index on our table:

create clustered columnstore Index PK_FactOnlineSales
	on dbo.FactOnlineSales 

As the next step, I have installed all the CISL stored procedures, by executing the script cstore_install_all_stored_procs.sql, and from now on I can analyse my columnstore table and do maintenance on it.

As the first step, before advancing I will check on the fragmentation of my table, by invoking store_GetFragmentation procedure:

exec dbo.cstore_GetFragmentation;

FactOnlineSales Default Fragmentation
As you can see a default creation of the Columnstore Index on my test table has left some space for optimisations. This happened because the Columnstore Index was build by using more then 1 core.

Because this table has no deleted rows, it’s logical fragmentation is equals to 0 (zero). It is also related to the number of Deleted Row Groups and their percentage.

If you look at the logical trimmed row groups percentage, you will notice that it is lower then the default value for the @trimmedRGsPerc of 30 and should we execute cstore_doMaintenance function, it will not do any maintenance to the table based on this criteria.
The other maintenance criteria related to the number of the trimmed row groups is disabled by default – trimmedRGs int = NULL, and so without any additional parameters it will be ignored as well.

The average number of rows per Row Group is 971.354, which is very well above 550.000 – hence there is no maintenance need as well.
Given that there are just 13 Row Groups with such a high average number of rows, there is no Row Group that can be totally optimised (12627608 / 12 = 1052300.66(6) and this number is above needed 1048576 rows, meaning that we can’t put all the data into just 12 Row Groups).

To see a reasonable maintenance let’s introduce some logical fragmentation by deleting 1 million rows from the table:

delete top (1000000)
	from dbo.FactOnlineSales;

Let’s see the fragmentation status of our table:

exec dbo.cstore_GetFragmentation;

FactOnlineSales after 1M Rows Deleted Fragmentation
Now you can see that we have 1 Row Group that has been completely deleted, that our deleted rows represent 13.89% of the total number of rows in the table.

At this point we are ready to do some maintenance, and for that I will be forcing the logging of the processed information, plus I will activate the logging:

exec dbo.cstore_doMaintenance @execute = 1, @logData = 1, @debug = 1;

It will take under a second to run the procedure, which will return 2 tables with the results, doing no maintenance whatsoever.
cstore_doMaintenance output 1
(Notice that the second result set has a very long list of columns).
This happens because no maintenance thresholds have been reached by the current fragmentation.
If you want to understand better the reasons, then take a careful look at the information from the last of the 2 tables outputted – it represents the logging information from the table cstore_MaintenanceData_Log.
The table cstore_MaintenanceData_Log represents joined information from 3 different CISL functions: cstore_GetAlignment, cstore_GetFragmentation & cstore_GetDictionaries. Every single row inside the table cstore_MaintenanceData_Log represents an information relative a single execution of the cstore_doMaintenance function, with its own unique id, table & partition identifications.

Let’s force the rebuild of our table, by setting the limit of the maximum logical fragmentation to 10%:

exec dbo.cstore_doMaintenance @execute = 1, @logData = 1, @logicalFragmentationPerc = 10;

This time it took around 35 seconds on my test VM, and my table has got a rebuild operation.

To check on the results of the last executed maintenance, you will need to run the following command:

select *
	from dbo.cstore_Operation_Log
	order by id desc;

cstore_Operation_Log
Here you will see the exact operation executed and the exact reason behind it – such as Logical Fragmentation, Deleted RowGroup Percentage, Trimmed RowGroup Percentage, Average Rows per RowGroup or Dictionary Pressure between others.

To see the current state of our table, let’s execute the cstore_GetFragmentation stored procedure:

exec dbo.cstore_GetFragmentation

FactOnlineSales Fragmentation after cstore_doMaintenance
The total number of rows lowered exactly 1 Million of rows that we have previously deleted, and the rest of the parameters are what they are – extremely similar to the original results.

Let’s see if we can get the Segment Clustering (Segment Alignment) for our table, by executing the respective operation.
Before advancing let’s see the current situation with the help of the cstore_GetAlignment stored procedure:

exec dbo.cstore_GetAlignment;

The result set is listed below, here you can see that there is one column SalesOrderNumber, which data type (nvarchar) does not allow to do segment alignment, but all the rest of the visible columns (some are did not fit the screenshot) are ready to be aligned.
GetAlignment - Original
Let’s see if we can align our table on the base of the column DateKey – right now it’s segment only have 15.38% of the alignment, meaning that they are overlapping in their majority.

The first step here would be to set up the segment alignment in the table cstore_Clustering, by checking if our table and partition exist and what column is set for the alignment:

select *
	from dbo.cstore_Clustering;

dbo.cstore_Clustering results
As you can see, we have already our table prepared, but not configured.
Let’s execute the following statement for configuring the Segment Clustering:

update dbo.cstore_Clustering
	set ColumnName = 'DateKey'
	where TableName = '[dbo].[FactOnlineSales]';

Now we are ready to force Segment Clustering on the DateKey column for our table, and so let’s invoke the maintenance procedure of CISL with the parameters asking to do Segment Alignment:

exec dbo.cstore_doMaintenance @orderSegments = 1, @execute = 1, @logData = 1;
exec dbo.cstore_GetAlignment;

This time as you can see, we have 100% alignment for the DateKey column, as one should expect:
GetAlignment - After Alignment

Now let’s consider some other commands, which can be extremely useful for data loading into tables with Columnstore Indexes.
In the following script I am creating a new test table, dbo.TestTable with a Clustered Columnstore Index:

create table dbo.TestTable(
	c1 int );

create clustered columnstore index CCI_TestTable
	on dbo.TestTable;

Let us enable globally the documented and supported Trace Flag 634, which disables the automated Tuple Mover:

dbcc traceon (634,-1)

Let’s insert 3 Million rows from the FactOnlineSales table, loading the data in batches of 100.000 rows, thus disabling a possibility of the automated compressed Row Groups creation:

insert into dbo.TestTable (c1)
	select top 100000 OnlineSalesKey
		from dbo.FactOnlineSales;
GO 30

Let’s consult the inner structure of our table:

exec dbo.cstore_GetRowGroups @tableName = 'TestTable';

Row Groups after Data Load 2
As you can see, currently we have 3 Row Groups, that contain those 3 million rows, but because the automated Tuple Mover was switched off, all we have are Delta-Stores (2 closed and 1 open).
To confirm that, let’s execute the stored procedure cstore_GetRowGroupsDetails:

exec dbo.cstore_GetRowGroupsDetails @tableName = 'TestTable';

Row Groups Details after Data Load 2
Let’s execute the maintenance on our TestTable table specifically, by issuing the following command, which will not use any of the internal recommendations:

exec dbo.cstore_doMaintenance @tableName = 'TestTable', @execute = 1, @logData = 1, @useRecommendations = 0;

After a couple of seconds, we can see that both our closed Delta-Stores are now compressed, but the open Delta-Store is still open:
Row Groups Details after Data Load
Row Groups after Data Load

Now, we are not looking to load more data in the next times and so it would be nice to compress it as well, and for that let’s force the open Delta-Stores closure with activating back the recommendation engine, which will automatically detect the 634 Trace Flag active and will force closure of all open Delta-Stores.

exec dbo.cstore_doMaintenance @tableName = 'TestTable', @logData = 1, @useRecommendations = 1;

As an alternative you can always do it with the @executeReorganize set to 1 and activating the parameter @closeOpenDeltaStores = 1:

exec dbo.cstore_doMaintenance @tableName = 'TestTable', @logData = 1, @executeReorganize = 1, @closeOpenDeltaStores = 1;

Let’s check on the status of our Row Groups and it’s details:

			
exec dbo.cstore_GetRowGroups @tableName = 'TestTable';
exec dbo.cstore_GetRowGroupsDetails @tableName = 'TestTable';

Row Groups after Maintenance
Row Groups Details after Maintenance
As you can see in the results, our open Delta-Store was successfully closed and has become a compressed Row Group now.

There are a number of other different parameters, controlling different aspects of the maintenance, consider especially @maxDictionarySizeInMB (set by default to 10 MB) which defines and in my professional life I have seen way too many cases where Dictionary pressure would start from under 7 MB, and so if you see that your Columnstore Tables are suffering from the internal dictionary pressure, then set that parameter to a lower number or simply exclude them from the maintenance.
Alternatively if you dictionaries are very big, but the Row Groups are still not trimmed, then use the parameter @ignoreInternalPressures by setting it to be equal to 1, so that the dictionary pressure won’t be the factor for the maintenance rejection.

Otherwise, please consider filing bugs & suggestions for the project at GitHub site.

4 thoughts on “CISL 1.1.0: cstore_doMaintenance (Columnstore Maintenace Solution Explained)

  1. Adrian Sugden

    Do you have to create the maintenance stored procedures in the target database or can you deploy them in an admin database and run them against multiple target databases via parameter?

Leave a Reply to Niko Neugebauer Cancel reply

Your email address will not be published. Required fields are marked *