Continuation from the previous 27 parts, starting from https://www.nikoport.com/2013/07/05/clustered-columnstore-indexes-part-1-intro/
This blog post is all about the architectural specifics & performance optimizations for the Clustered Columnstore Indexes.
It is known that Inserts are mostly faster than pure Update operations, since we could get less page splits (when inserting ordered data), but what about the latest & the greatest Clustered Columnstore Indexes in SQL Server?
We know from the 1st blog in this Clustered Columnstore series, that an Update in Clustered Columnstore Indexes is being executed as a update of the Deleted Bitmap structure for the rows that we are modifying and the newest values are inserted into a Delta Store, so we can say that Update operation is actually being executed as an Delete + Insert.
I wanted to see how fast/slow the Update operation looks like when compared with a direct Delete & Insert operations.
So lets restore a fresh copy of the ContosoRetailDW database (download it for free).
Note: I maintain my Backup at C:\Install and my default test data location is c:\Data
USE [master] RESTORE DATABASE [ContosoRetailDW] FROM DISK = N'C:\Install\ContosoRetailDW.bak' WITH FILE = 1, MOVE N'ContosoRetailDW2.0' TO N'C:\DATA\ContosoRetailDW.mdf', MOVE N'ContosoRetailDW2.0_log' TO N'C:\DATA\ContosoRetailDW.ldf', NOUNLOAD, STATS = 5
Let us upgrade it a little bit for not battling the limitations while testing:
USE [master] GO ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 120 GO ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0', SIZE = 2560000KB , FILEGROWTH = 512000KB ) GO ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0_log', SIZE = 2560000KB , FILEGROWTH = 512000KB ) GO USE [ContosoRetailDW] GO EXEC dbo.sp_changedbowner @loginame = N'sa', @map = false GO
Next step would be to drop all the primary & foreign keys from our main 5 tables:
/* * This commands drop all Primary & Foreign Keys on the Contoso BI Database */ ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimCurrency] ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimCustomer] ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimDate] ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimProduct] ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimPromotion] ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [FK_FactOnlineSales_DimStore] ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimAccount] ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimCurrency] ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimDate] ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimEntity] ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimProductCategory] ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [FK_FactStrategyPlan_DimScenario] ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimChannel] ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimCurrency] ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimDate] ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimProduct] ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimPromotion] ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [FK_FactSales_DimStore] ALTER TABLE dbo.[FactInventory] DROP CONSTRAINT [FK_FactInventory_DimCurrency] ALTER TABLE dbo.[FactInventory] DROP CONSTRAINT [FK_FactInventory_DimDate] ALTER TABLE dbo.[FactInventory] DROP CONSTRAINT [FK_FactInventory_DimProduct] ALTER TABLE dbo.[FactInventory] DROP CONSTRAINT [FK_FactInventory_DimStore] ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimChannel] ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimCurrency] ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimDate] ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimProduct] ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimScenario] ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [FK_FactSalesQuota_DimStore] ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [PK_FactOnlineSales_SalesKey] ALTER TABLE dbo.[FactStrategyPlan] DROP CONSTRAINT [PK_FactStrategyPlan_StrategyPlanKey] ALTER TABLE dbo.[FactSales] DROP CONSTRAINT [PK_FactSales_SalesKey] ALTER TABLE dbo.[FactInventory] DROP CONSTRAINT [PK_FactInventory_InventoryKey] ALTER TABLE dbo.[FactSalesQuota] DROP CONSTRAINT [PK_FactSalesQuota_SalesQuotaKey]
Now we need to create our Clustered Columnstore Index on the Test table – FactOnlineSales:
create clustered columnstore Index PK_FactOnlineSales on dbo.FactOnlineSales;
For the sake of the test, lets create a copy of our table:
CREATE TABLE [dbo].[FactOnlineSalesCopy]( [OnlineSalesKey] [int] IDENTITY(1,1) NOT NULL, [DateKey] [datetime] NOT NULL, [StoreKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [PromotionKey] [int] NOT NULL, [CurrencyKey] [int] NOT NULL, [CustomerKey] [int] NOT NULL, [SalesOrderNumber] [nvarchar](20) NOT NULL, [SalesOrderLineNumber] [int] NULL, [SalesQuantity] [int] NOT NULL, [SalesAmount] [money] NOT NULL, [ReturnQuantity] [int] NOT NULL, [ReturnAmount] [money] NULL, [DiscountQuantity] [int] NULL, [DiscountAmount] [money] NULL, [TotalCost] [money] NOT NULL, [UnitCost] [money] NULL, [UnitPrice] [money] NULL, [ETLLoadID] [int] NULL, [LoadDate] [datetime] NULL, [UpdateDate] [datetime] NULL ); Create Clustered Columnstore Index PK_FactOnlineSalesCopy on dbo.FactOnlineSalesCopy; GO set identity_insert dbo.FactOnlineSalesCopy ON insert into dbo.FactOnlineSalesCopy (OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate) SELECT [OnlineSalesKey] ,[DateKey] ,[StoreKey] ,[ProductKey] ,[PromotionKey] ,[CurrencyKey] ,[CustomerKey] ,[SalesOrderNumber] ,[SalesOrderLineNumber] ,[SalesQuantity] ,[SalesAmount] ,[ReturnQuantity] ,[ReturnAmount] ,[DiscountQuantity] ,[DiscountAmount] ,[TotalCost] ,[UnitCost] ,[UnitPrice] ,[ETLLoadID] ,[LoadDate] ,[UpdateDate] FROM [ContosoRetailDW].[dbo].[FactOnlineSales] order by OnlineSalesKey set identity_insert dbo.FactOnlineSalesCopy OFF
Columnstore Update Tests
Now we are ready to run a couple of tests and to compare the performance and the impact of the Update Statement with the Direct Delete + Insert operations.
In order to minimize the impact I shall try to update only 1 Million rows out of our 12.5 Million rows table.
dbcc dropcleanbuffers; set statistics io on set statistics time on update top (1000000) dbo.FactOnlineSales set UpdateDate = GetDate();
After running multiple times the setup and the actual update statement I came the following performance numbers on my virtual machine:
– CPU time = 23563 ms
– Elapsed time = 25053 ms
– Reads ~= 26500 Pages
The key performance indicator here is the actual Columnstore Index Update operator at the execution plan – it runs in the Row Mode.
Also, we can clearly see that this operation has modified 3 different Row Groups where updated rows were located, plus we have received a new Delta Store which is storing our 1 Million rows.
SELECT rg.total_rows, cast(100.0*(total_rows - ISNULL(deleted_rows,0))/iif(total_rows = 0, 1, total_rows) as Decimal(6,3)) AS PercentFull, i.object_id, object_name(i.object_id) AS TableName, i.name AS IndexName, i.index_id, i.type_desc, rg.* FROM sys.indexes AS i INNEr JOIN sys.column_store_row_groups AS rg ON i.object_id = rg.object_id AND i.index_id = rg.index_id WHERE object_name(i.object_id) = 'FactOnlineSales' ORDER BY object_name(i.object_id), i.name, row_group_id;
Here is a screenshot of the modified Row Groups:
Columnstore Delete + Insert Tests
Now lets get to the Delete + Insert operations.
Setup is very simple – truncate the actual test table, insert all the data from the table with a copy and rebuild it in order to get better Row Groups:
truncate table dbo.FactOnlineSales; set identity_insert dbo.FactOnlineSales ON insert into dbo.FactOnlineSales (OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate) SELECT [OnlineSalesKey] ,[DateKey] ,[StoreKey] ,[ProductKey] ,[PromotionKey] ,[CurrencyKey] ,[CustomerKey] ,[SalesOrderNumber] ,[SalesOrderLineNumber] ,[SalesQuantity] ,[SalesAmount] ,[ReturnQuantity] ,[ReturnAmount] ,[DiscountQuantity] ,[DiscountAmount] ,[TotalCost] ,[UnitCost] ,[UnitPrice] ,[ETLLoadID] ,[LoadDate] ,[UpdateDate] FROM [ContosoRetailDW].[dbo].[FactOnlineSalesCopy] order by OnlineSalesKey set identity_insert dbo.FactOnlineSales OFF alter table dbo.FactOnlineSales rebuild;
Let us delete 1 Million Rows:
dbcc dropcleanbuffers; set statistics io on set statistics time on delete top (1000000) dbo.FactOnlineSales
My results were
– CPU time = 3656 ms
– Elapsed time = 4025 ms
which are quite nice in my opinion.
but we need to insert data to arrive to any conclusion, and for that lets create a copy of our 1 Million Rows into a separate table:
select top 1000000 [OnlineSalesKey] ,[DateKey] ,[StoreKey] ,[ProductKey] ,[PromotionKey] ,[CurrencyKey] ,[CustomerKey] ,[SalesOrderNumber] ,[SalesOrderLineNumber] ,[SalesQuantity] ,[SalesAmount] ,[ReturnQuantity] ,[ReturnAmount] ,[DiscountQuantity] ,[DiscountAmount] ,[TotalCost] ,[UnitCost] ,[UnitPrice] ,[ETLLoadID] ,[LoadDate] ,[UpdateDate] into dbo.FactOnlineSales_1M FROM [ContosoRetailDW].[dbo].[FactOnlineSalesCopy] order by OnlineSalesKey
Now we are ready to execute our insert operation:
dbcc dropcleanbuffers; set statistics io on set statistics time on set identity_insert dbo.FactOnlineSales ON insert into dbo.FactOnlineSales (OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate) SELECT [OnlineSalesKey] ,[DateKey] ,[StoreKey] ,[ProductKey] ,[PromotionKey] ,[CurrencyKey] ,[CustomerKey] ,[SalesOrderNumber] ,[SalesOrderLineNumber] ,[SalesQuantity] ,[SalesAmount] ,[ReturnQuantity] ,[ReturnAmount] ,[DiscountQuantity] ,[DiscountAmount] ,[TotalCost] ,[UnitCost] ,[UnitPrice] ,[ETLLoadID] ,[LoadDate] ,[UpdateDate] FROM [ContosoRetailDW].[dbo].[FactOnlineSales_1M] order by OnlineSalesKey set identity_insert dbo.FactOnlineSales OFF
Here are the average results that I have got:
– CPU time = 5563 ms
– Elapsed time = 5716 ms
– Reads ~= 14700 Pages
Now to the execution plan – it was actually running in the Row Mode:
And if we check our Row Groups we can actually see that we have got a new compressed Segment – the best possible situation in this case:
Final Thoughts
We can actually speed the whole insertion process up by using Bulk Load API or even running our loads in Batch Mode, so it is a very clear indicator of a better way to work with new data when updating a table.
Not only it looks over 2.5 times faster to delete & insert data instead of executing a direct update, but such aspects as locking & blocking can become quite an issue as it was shown in the 8th part of the series Clustered Columnstore (Locking).
Whenever you start working with SQL Server 2014, think twice if you really want to update your data or you just want to delete it and insert/load new information directly into the table.
to be continued with Clustered Columnstore Indexes – part 29 (“Data Loading for Better Segment Eliminationâ€)
Great article.
Thank you! :)
Hi,
Does this apply to SQL 2016 as well?
For SQL 2017 is it true there is in place updates now? So this would improve?
thanks
Hi DC,
to my knowledge there are no improvements in this area.
Best regards,
Niko
Hi Niko,
Also, is this relevant for Non-clustered ColumnStore indexes?
Also, according to following link, in SQL Server 2017 there are in place updates for ColumnStore indexes:
https://www.sqlshack.com/sql-server-2017-columnstore-in-place-updates/
Can you confirm?
Thanks
Hi Alex,
Yes, the UPDATES indeed were improved in SQL Server 2017, it was in one of the first RCs, I believe it was launched in January 2017 or somewhere around.
Best regards,
Niko Neugebauer