Columnstore Indexes – part 120 (“Merge Replication 2016-2017”)

Continuation from the previous 119 parts, the whole series can be found at https://www.nikoport.com/columnstore/.

A couple of days ago at the SQLKonferenz, during my full-day precon I have recieved a question that I could not answer for sure, because I never tried it out. Empirically I could say that it the mentioned combination would work, but as you know (or should know) – in technology do not expect a technology to work without testing it for your concrete scenario.
This blogpost is about a rather simple and plain test with a technology that gets no love from the SQL Server specialists, nor from the Microsoft – unfortunately. The technology is while old, is still one of the very ways of how you can make multiple servers writing data into the same table in parallel. This blog post is about the Merge Replication and its support for the Columnstore Indexes. I have previously blogged on support for the Transactional Replication with Columnstore Indexes, but this blog post is focusing on the Merge Replication and its interaction and options for the Columnstore Indexes, especially because at the moment of writing of this article, there is no mention of the Columnstore Indexes at the Microsoft Documentation.

If you do not have a lot of knowledge about the Merge Replication, I suggest that you will take a look at the MSDN Documentation and build a test system by at least following the suggested steps.

I have decided to set up a rather simple Merge Replication on the SQL Server 2016 between 2 databases on the same server, where the source shall be my favourite free test database ContosoRetailDW, and the subscriber database will be called Contoso2.

Clustered Columnstore Indexes

I have selected the FactOnlineSales table to build a Clustered Columnstore Indexes (this is mainly DWH/BI/Reporting scenario considered) with the following code:

ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [PK_FactOnlineSales_SalesKey];

create clustered columnstore Index PK_FactOnlineSales
	on dbo.FactOnlineSales;

Adding this table to the publication will end up with the following, self-explaining error message, being very clear that the Clustered Columnstore Indexes are not supported for the Merge Replication:

There is no surprise here, as the same Clustered Columnstore Indexes are not supported for the Transactional Replication, but I feel that a great opportunity is lost and the Replication technology are being quite ignored by the emerged technologies, such as In-Memory & Columnstore, where the scenarios of replicating the Data Warehousing data is something that a lot of people can find very useful.
I wish it would be otherwise, and this would allow to bring more customers to use Columnstore Indexes.

Nonclustered Columnstore Indexes

For the test of the Nonclustered Indexes (Operational Analytics aka Hybrid Transacitonal Analyticsal Processing), I will take an existing FactSales table, and add a filtered Nonclustered Columnstore Index for the first 100 entries (IDs) of the table:

CREATE NONCLUSTERED COLUMNSTORE INDEX [Ncci_MergeRepl_FactSales] ON [dbo].[FactSales]
(
	[SalesKey],
	[DateKey],
	[channelKey],
	[StoreKey],
	[ProductKey],
	[PromotionKey],
	[CurrencyKey],
	[UnitCost],
	[UnitPrice],
	[SalesQuantity],
	[ReturnQuantity],
	[ReturnAmount],
	[DiscountQuantity],
	[DiscountAmount],
	[TotalCost],
	[SalesAmount],
	[ETLLoadID],
	[LoadDate],
	[UpdateDate]
)
WHERE ([SalesKey]<(100));

Going into the Pusblished Article properties will present the same good old properties of the Transactional Replication where the copying of the Nonclustered Columnstore Indexes and/or the filtered Nonclustered Columnstore Indexes, and I will pick up both of them, just making sure right from the beginning that I cover all scenarios for the Columnstore Indexes.

After initialising the Subscriber with the snapshot, let's run the simplest queries against both databases:

SELECT COUNT(*)
	FROM ContosoRetailDW.dbo.FactSales;

SELECT COUNT(*)
	FROM Contoso2.dbo.FactSales;

We shall obtain the number 3.406.089 of rows as the result.

Let's check on the Row Groups with the help of the CISL on the publisher (the database ContosoRetailDW):

EXECUTE dbo.cstore_getRowGroupsDetails @tableName = 'FactSales';

Let's execute a simple update statement, which will provoke the update over the compressed Row Group, marking the row with id = 1 as inactive through the Deleted Bitmap and shall insert a brand new updated row into a new open Delta Store:

UPDATE Contoso2.dbo.FactSales
	SET ReturnQuantity = 1
	WHERE SalesKey = 1;

To verify the successful synchronisation, lets run the CISL dbo.cstore_getRowGroupsDetails stored procedure on the both databases to verify the correct execution .

EXECUTE ContosoRetailDW.dbo.cstore_getRowGroupsDetails @tableName = 'FactSales'
EXECUTE contoso2.dbo.cstore_getRowGroupsDetails @tableName = 'FactSales'


Looks good, lets verify by updating the publisher, and let's update whole filtered Nonclustered Columnstore Index:

UPDATE ContosoRetailDW.dbo.FactSales
	SET ReturnQuantity = 100
	WHERE SalesKey < 100;

Let's check what is going on with out the Row Groups, let us re-run the dbo.cstore_getRowGroupsDetails stored procedure on the both Publisher and Subscriber:

EXECUTE ContosoRetailDW.dbo.cstore_getRowGroupsDetails @tableName = 'FactSales'
EXECUTE contoso2.dbo.cstore_getRowGroupsDetails @tableName = 'FactSales'


You can see that we have 99 rows in the Delta-Stores on the both ends, but the only observable difference is that they are not marked as deleted for the Row Groups. This happens because for the Nonclustered Columnstore Indexes we are using Deleted Buffers (2 to be more precise),

You can verify that the Deleted Buffers are affected, by issuing the following query against the sys.internal_partitions DMV:

SELECT *
	FROM sys.internal_partitions
	WHERE object_id = OBJECT_ID('FactSales')
		AND internal_object_type = 4 /* COLUMN_STORE_DELETE_BUFFER
COLUMN_STORE_DELETE_BUFFER */;


and on the picture above you will notice that one of the Deleted Buffers (the active one) has already 99 rows marked as deleted.
Running ALTER INDEX ... REORGANIZE (especially with COMPRESS_ALL_ROW_GROUPS = ON) will help to get rid of the Delta-Stores (and even from the open ones), but remember that it won't solve the fragmentation that will take place, once any of the compressed rows shall get updated. To keep so-called Hot Data in the Delta-Stores, preventing from increasing the logical fragmentation (creating obsolete row-versions through the Deleted Bitmap and Deleted Buffers), consider using the COMPRESSION_DELAY as explained in the Columnstore Indexes – part 76 ("Compression Delay").

Final Thoughts

I am happy to see that the Merge Replication is compatible with the Nonclustered Columnstore Indexes, including the filtered ones, but I see that the replication of the BI/Data Mart data as an important task and the support of the Replication for the Clustered Columnstore Indexes will be a pressing item in the years to come.
Microsoft has promised to deliver improvements for the Replication in the next versions of SQL Server and I really hope that the Clustered Columnstore Indexes will get a full support as they deserve.
Well, and if In-Memory Indexes are to have any future, the Replication should support In-Memory Clustered Columnstore Indexes as well :)

to be continued with Columnstore Indexes – part 121 ("Columnstore Indexes on Standard Tier of Azure SQL DB")

4 thoughts on “Columnstore Indexes – part 120 (“Merge Replication 2016-2017”)

Leave a Reply to Niko Neugebauer Cancel reply

Your email address will not be published. Required fields are marked *