Clustered Columnstore Indexes – part 28 (“Update vs Delete + Insert”)

Continuation from the previous 27 parts, starting from http://www.nikoport.com/2013/07/05/clustered-columnstore-indexes-part-1-intro/

This blog post is all about the architectural specifics & performance optimizations for the Clustered Columnstore Indexes.
It is known that Inserts are mostly faster than pure Update operations, since we could get less page splits (when inserting ordered data), but what about the latest & the greatest Clustered Columnstore Indexes in SQL Server?
We know from the 1st blog in this Clustered Columnstore series, that an Update in Clustered Columnstore Indexes is being executed as a update of the Deleted Bitmap structure for the rows that we are modifying and the newest values are inserted into a Delta Store, so we can say that Update operation is actually being executed as an Delete + Insert.
I wanted to see how fast/slow the Update operation looks like when compared with a direct Delete & Insert operations.

So lets restore a fresh copy of the ContosoRetailDW database (download it for free).
Note: I maintain my Backup at C:\Install and my default test data location is c:\Data

USE [master]

RESTORE DATABASE [ContosoRetailDW] 
	FROM  DISK = N'C:\Install\ContosoRetailDW.bak' WITH  FILE = 1,  
		MOVE N'ContosoRetailDW2.0' TO N'C:\DATA\ContosoRetailDW.mdf', 
		MOVE N'ContosoRetailDW2.0_log' TO N'C:\DATA\ContosoRetailDW.ldf',  
		NOUNLOAD,  STATS = 5

Let us upgrade it a little bit for not battling the limitations while testing:

 USE [master]
GO
ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 120
GO
ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0', SIZE = 2560000KB , FILEGROWTH = 512000KB )
GO
ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0_log', SIZE = 2560000KB , FILEGROWTH = 512000KB )
GO
USE [ContosoRetailDW]
GO
EXEC dbo.sp_changedbowner @loginame = N'sa', @map = false
GO

Next step would be to drop all the primary & foreign keys from our main 5 tables:

/* 
 * This commands drop all Primary & Foreign Keys on the Contoso BI Database 
 */
 
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimCurrency]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimCustomer]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimDate]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimProduct]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimPromotion]
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimStore]
ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimAccount]
ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimCurrency]
ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimDate]
ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimEntity]
ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimProductCategory]
ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimScenario]
ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimChannel]
ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimCurrency]
ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimDate]
ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimProduct]
ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimPromotion]
ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimStore]
ALTER TABLE dbo.[FactInventory] DROP CONSTRAINT [FK_FactInventory_DimCurrency]
ALTER TABLE dbo.[FactInventory] DROP CONSTRAINT [FK_FactInventory_DimDate]
ALTER TABLE dbo.[FactInventory] DROP CONSTRAINT [FK_FactInventory_DimProduct]
ALTER TABLE dbo.[FactInventory] DROP CONSTRAINT [FK_FactInventory_DimStore]
ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimChannel]
ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimCurrency]
ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimDate]
ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimProduct]
ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimScenario]
ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimStore]

ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [PK_FactOnlineSales_SalesKey]
ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [PK_FactStrategyPlan_StrategyPlanKey]
ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [PK_FactSales_SalesKey]
ALTER TABLE dbo.[FactInventory] DROP CONSTRAINT [PK_FactInventory_InventoryKey]
ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [PK_FactSalesQuota_SalesQuotaKey]

Now we need to create our Clustered Columnstore Index on the Test table – FactOnlineSales:

create clustered columnstore Index PK_FactOnlineSales
	on dbo.FactOnlineSales;

For the sake of the test, lets create a copy of our table:

CREATE TABLE [dbo].[FactOnlineSalesCopy](
	[OnlineSalesKey] [int] IDENTITY(1,1) NOT NULL,
	[DateKey] [datetime] NOT NULL,
	[StoreKey] [int] NOT NULL,
	[ProductKey] [int] NOT NULL,
	[PromotionKey] [int] NOT NULL,
	[CurrencyKey] [int] NOT NULL,
	[CustomerKey] [int] NOT NULL,
	[SalesOrderNumber] [nvarchar](20) NOT NULL,
	[SalesOrderLineNumber] [int] NULL,
	[SalesQuantity] [int] NOT NULL,
	[SalesAmount] [money] NOT NULL,
	[ReturnQuantity] [int] NOT NULL,
	[ReturnAmount] [money] NULL,
	[DiscountQuantity] [int] NULL,
	[DiscountAmount] [money] NULL,
	[TotalCost] [money] NOT NULL,
	[UnitCost] [money] NULL,
	[UnitPrice] [money] NULL,
	[ETLLoadID] [int] NULL,
	[LoadDate] [datetime] NULL,
	[UpdateDate] [datetime] NULL
);

Create Clustered Columnstore Index PK_FactOnlineSalesCopy on dbo.FactOnlineSalesCopy;
GO

set identity_insert dbo.FactOnlineSalesCopy ON

insert into dbo.FactOnlineSalesCopy
(OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate)
SELECT [OnlineSalesKey]
      ,[DateKey]
      ,[StoreKey]
      ,[ProductKey]
      ,[PromotionKey]
      ,[CurrencyKey]
      ,[CustomerKey]
      ,[SalesOrderNumber]
      ,[SalesOrderLineNumber]
      ,[SalesQuantity]
      ,[SalesAmount]
      ,[ReturnQuantity]
      ,[ReturnAmount]
      ,[DiscountQuantity]
      ,[DiscountAmount]
      ,[TotalCost]
      ,[UnitCost]
      ,[UnitPrice]
      ,[ETLLoadID]
      ,[LoadDate]
      ,[UpdateDate]
  FROM [ContosoRetailDW].[dbo].[FactOnlineSales]
  order by OnlineSalesKey

set identity_insert dbo.FactOnlineSalesCopy OFF

Columnstore Update Tests

Now we are ready to run a couple of tests and to compare the performance and the impact of the Update Statement with the Direct Delete + Insert operations.
In order to minimize the impact I shall try to update only 1 Million rows out of our 12.5 Million rows table.

dbcc dropcleanbuffers;

set statistics io on
set statistics time on

update top (1000000)
	dbo.FactOnlineSales
	set UpdateDate = GetDate();

After running multiple times the setup and the actual update statement I came the following performance numbers on my virtual machine:
– CPU time = 23563 ms
– Elapsed time = 25053 ms
– Reads ~= 26500 Pages
The key performance indicator here is the actual Columnstore Index Update operator at the execution plan – it runs in the Row Mode.
Screen Shot 2014-03-17 at 00.05.37

Also, we can clearly see that this operation has modified 3 different Row Groups where updated rows were located, plus we have received a new Delta Store which is storing our 1 Million rows.

SELECT rg.total_rows, 
		cast(100.0*(total_rows - ISNULL(deleted_rows,0))/iif(total_rows = 0, 1, total_rows) as Decimal(6,3)) AS PercentFull, 
		i.object_id, object_name(i.object_id) AS TableName, 
		i.name AS IndexName, i.index_id, i.type_desc, 
		rg.*
	FROM sys.indexes AS i
	INNEr JOIN sys.column_store_row_groups AS rg
		ON i.object_id = rg.object_id
	AND i.index_id = rg.index_id 
	WHERE object_name(i.object_id) = 'FactOnlineSales' 
	ORDER BY object_name(i.object_id), i.name, row_group_id;

Here is a screenshot of the modified Row Groups:
Screen Shot 2014-03-16 at 23.21.35

Columnstore Delete + Insert Tests

Now lets get to the Delete + Insert operations.

Setup is very simple – truncate the actual test table, insert all the data from the table with a copy and rebuild it in order to get better Row Groups:

truncate table dbo.FactOnlineSales;

set identity_insert dbo.FactOnlineSales ON

insert into dbo.FactOnlineSales
(OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate)
SELECT [OnlineSalesKey]
      ,[DateKey]
      ,[StoreKey]
      ,[ProductKey]
      ,[PromotionKey]
      ,[CurrencyKey]
      ,[CustomerKey]
      ,[SalesOrderNumber]
      ,[SalesOrderLineNumber]
      ,[SalesQuantity]
      ,[SalesAmount]
      ,[ReturnQuantity]
      ,[ReturnAmount]
      ,[DiscountQuantity]
      ,[DiscountAmount]
      ,[TotalCost]
      ,[UnitCost]
      ,[UnitPrice]
      ,[ETLLoadID]
      ,[LoadDate]
      ,[UpdateDate]
  FROM [ContosoRetailDW].[dbo].[FactOnlineSalesCopy]
  order by OnlineSalesKey

set identity_insert dbo.FactOnlineSales OFF

alter table dbo.FactOnlineSales
	rebuild;

Let us delete 1 Million Rows:

dbcc dropcleanbuffers;

set statistics io on
set statistics time on

delete top (1000000)
	dbo.FactOnlineSales

My results were
– CPU time = 3656 ms
– Elapsed time = 4025 ms
which are quite nice in my opinion.

but we need to insert data to arrive to any conclusion, and for that lets create a copy of our 1 Million Rows into a separate table:


select top 1000000 [OnlineSalesKey]
      ,[DateKey]
      ,[StoreKey]
      ,[ProductKey]
      ,[PromotionKey]
      ,[CurrencyKey]
      ,[CustomerKey]
      ,[SalesOrderNumber]
      ,[SalesOrderLineNumber]
      ,[SalesQuantity]
      ,[SalesAmount]
      ,[ReturnQuantity]
      ,[ReturnAmount]
      ,[DiscountQuantity]
      ,[DiscountAmount]
      ,[TotalCost]
      ,[UnitCost]
      ,[UnitPrice]
      ,[ETLLoadID]
      ,[LoadDate]
      ,[UpdateDate]
	into dbo.FactOnlineSales_1M
  FROM [ContosoRetailDW].[dbo].[FactOnlineSalesCopy]
  order by OnlineSalesKey 

Now we are ready to execute our insert operation:

dbcc dropcleanbuffers;

set statistics io on
set statistics time on

set identity_insert dbo.FactOnlineSales ON

insert into dbo.FactOnlineSales
(OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate)
SELECT [OnlineSalesKey]
      ,[DateKey]
      ,[StoreKey]
      ,[ProductKey]
      ,[PromotionKey]
      ,[CurrencyKey]
      ,[CustomerKey]
      ,[SalesOrderNumber]
      ,[SalesOrderLineNumber]
      ,[SalesQuantity]
      ,[SalesAmount]
      ,[ReturnQuantity]
      ,[ReturnAmount]
      ,[DiscountQuantity]
      ,[DiscountAmount]
      ,[TotalCost]
      ,[UnitCost]
      ,[UnitPrice]
      ,[ETLLoadID]
      ,[LoadDate]
      ,[UpdateDate]
  FROM [ContosoRetailDW].[dbo].[FactOnlineSales_1M]
  order by OnlineSalesKey 

set identity_insert dbo.FactOnlineSales OFF

Here are the average results that I have got:
– CPU time = 5563 ms
– Elapsed time = 5716 ms
– Reads ~= 14700 Pages

Now to the execution plan – it was actually running in the Row Mode:
Screen Shot 2014-03-17 at 00.53.36

And if we check our Row Groups we can actually see that we have got a new compressed Segment – the best possible situation in this case:
Screen Shot 2014-03-17 at 00.53.09

Final Thoughts

We can actually speed the whole insertion process up by using Bulk Load API or even running our loads in Batch Mode, so it is a very clear indicator of a better way to work with new data when updating a table.

Not only it looks over 2.5 times faster to delete & insert data instead of executing a direct update, but such aspects as locking & blocking can become quite an issue as it was shown in the 8th part of the series Clustered Columnstore (Locking).
Screen Shot 2014-03-17 at 01.01.58

Whenever you start working with SQL Server 2014, think twice if you really want to update your data or you just want to delete it and insert/load new information directly into the table.

to be continued with Clustered Columnstore Indexes – part 29 (“Data Loading for Better Segment Elimination”)

4 thoughts on “Clustered Columnstore Indexes – part 28 (“Update vs Delete + Insert”)

Leave a Reply

Your email address will not be published. Required fields are marked *