Columnstore Indexes – part 123 (“Clustered Columnstore Index Online Rebuild”)

[Updated VERY SIGNIFICANTLY on 9th of October 2018 after SQL Server 2019 CTP 2.0]
Continuation from the previous 122 parts, the whole series can be found at https://www.nikoport.com/columnstore/.

I am finally ready to write about one of the most important enterprise option for the Clustered Columnstore Indexes and one of the most long-awaited one – the online rebuild option.
Given that Clustered Columnstore Indexes are keep getting the improvement on the basis of the type of the indexes and the interconnected feature, I kept waiting for almost 2 years for Microsoft to fix the parity and allow Clustered Columnstore Indexes to have an ONLINE creation & rebuild option, like it is a feature for the Nonclustered Columnstore Indexes.

I have already blogged previously about the “Nonclustered Columnstore Index Online Rebuild” in the part 96 of this series, when SQL Server 2017 has introduced the support for the Operational Analytics (also known as HTAP) scenarios, but the Clustered Columnstore Indexes and the Data Warehousing/BI/Reporting scenarios were largely offline.
In January at SQL Saturday Vienna 2018, Lindsey Allen (Engineering Manager for Azure SQLDB & SQL Server) announced that the next version of SQL Server will support the Clustered Columnstore Indexes, which has been publicly confirmed by Jovan Popovic (Program Manager) during my session at SQLKonferenz 2018 in Germany.
So I have been impatiently waiting until yesterday night, when while testing a new script for my open-source Columnstore Indexes Library CISL – I tried to rebuild a table with Clustered Columnstore Index and for the sake of testing specified that the Rebuild should be executed online. And no error message came back…
This blog post is about what I have found out so far.

Clustered Columnstore Index Online Rebuild

Even though there is no public preview version of the SQL Server that contains Clustered Columnstore Indexes with Online Rebuild, we always have the latest bits in the Azure SQL DB instances and for that purpose I have created a brand new test database on the Premium 1 tier.
Let us create a sample table “SampleDataTable” with a Clustered Columnstore Index that shall contain 10 full Row Groups:

DROP TABLE IF EXISTS dbo.SampleDataTable;

CREATE TABLE dbo.SampleDataTable (
    C1 BIGINT NOT NULL,
    INDEX CCI_SampleDataTable CLUSTERED COLUMNSTORE
);

INSERT INTO dbo.SampleDataTable WITH (TABLOCK)
SELECT t.RN
	FROM
	(
		SELECT TOP (10 * 1048576) ROW_NUMBER()
			OVER (ORDER BY (SELECT NULL)) RN
		FROM sys.objects t1
		CROSS JOIN sys.objects  t2
		CROSS JOIN sys.objects  t3
		CROSS JOIN sys.objects  t4
		CROSS JOIN sys.objects  t5    
	) t
	OPTION (MAXDOP 1);

It took just a couple of seconds to execute and observing the internals with the help of my CISL function cstore_getRowGroupsDetails:

EXECUTE dbo.cstore_getRowGroupsDetails @tableName = 'SampleDataTable';

You can clearly observe 10 perfect Row Groups with all things that we would expect: no trimming, perfectly compressed!

It was the right time to try out the Online Rebuild for the Clustered Columnstore Index:

ALTER INDEX CCI_SampleDataTable
	ON dbo.SampleDataTable
		REBUILD WITH (ONLINE = ON);

After a couple of seconds of waiting … and another couple of seconds of waiting … I started worrying. I worried if something went wrong and rebuilding a couple of million rows for a Premium 1 instance with one full CPU core should definitely take less time, and so I went to check on the progress:

EXECUTE dbo.cstore_getRowGroupsDetails @tableName = 'SampleDataTable';

and it seems that after 2 minutes only 2 Row Groups were rebuilt – this sounds like an RBAR (Row By Agonising Row) Execution Mode on a table that simply contains just one 8-byte big integer column. I have marked with a red color the Row Groups that contained more than one copy (the original and the new one). To see which group is which, I recommend observing the property generation which is returned by the cstore_getRowGroupsDetails function or you can check it directly in the DMV sys.dm_db_column_store_row_group_physical_stats (available since SQL Server 2016) – it will display the order of the Row Group generation and it will contain the value of 1 for the newly generated Row Groups. Additionally created_time property can help you to pinpoint exact time of the Row Group creation (and it is available in both functions – mine & in the above mentioned DMV).

How much time did it took, you might want to ask. :) On multiple attempts this very table on the Premium 1 Instance took whooping 10 Minutes and 41 seconds! That is definitely not easily comparable to the 7 seconds that it typically takes to execute an OFFLINE rebuild operation. You can see on the picture on the left, how big this difference is, but please keep in mind, that most probably this is an early variant and that execution times must (please!) improve before it will become a full-feature in Azure SQLDB and SQL Server vNext.

Regarding the Delta-Stores, I have assumed that their rebuild behaviour would be the same as for the Row Store indexes, but I have had to verify it (within SSMS, that is why I am using GO keyword while loading 1 Million Rows into the Delta-Store:

DROP TABLE IF EXISTS dbo.SampleDeltaStore;

CREATE TABLE dbo.SampleDeltaStore (
    C1 BIGINT NOT NULL,
    INDEX CCI_SampleDeltaStore CLUSTERED COLUMNSTORE
);
GO

INSERT INTO dbo.SampleDeltaStore 
SELECT t.RN
	FROM
	(
		SELECT TOP (1 * 100000) ROW_NUMBER()
			OVER (ORDER BY (SELECT NULL)) RN
		FROM sys.objects t1
		CROSS JOIN sys.objects  t2
		CROSS JOIN sys.objects  t3
		CROSS JOIN sys.objects  t4
		CROSS JOIN sys.objects  t5    
	) t
	OPTION (MAXDOP 1);
GO 10

Again with the help of CISL function cstore_getRowGroupsDetails we can re-confirm that we still have an open Delta-Store:

EXECUTE dbo.cstore_getRowGroupsDetails @tableName = 'SampleDeltaStore';

Running an ONLINE Rebuild operation against this table took me precisely 1 minute with the following command:

ALTER INDEX CCI_SampleDeltaStore
	ON dbo.SampleDeltaStore
		REBUILD WITH (ONLINE = ON);

And the outlook on the Row Groups was following:
We have obtained a compressed Row Group that was trimmed as a Residual one (not enough rows), which totally makes sense. What does not make any sense is the execution time of one minute, but that’s just me complaining against what is most probably still a beta-version :)

SQL Server 2019 CTP 2.0

To make a more representative test, I decided to take a SQL Server 2019 CTP 2.0 and to test the online rebuild of the Clustered Columnstore Index against the tables dbo.lineitem and dbo.orders which in TPCH 10GB have 60 Million Rows &15Million Rows respectively.
Below you will find the script to build the rowstore & columnstore equivavelents for both of the tables with all the constraints but no additional indexes:


DROP TABLE IF EXISTS [dbo].[lineitem_cci];

CREATE TABLE [dbo].[lineitem_cci](
	[l_shipdate] [date] NULL,
	[l_orderkey] [bigint] NOT NULL,
	[l_discount] [money] NOT NULL,
	[l_extendedprice] [money] NOT NULL,
	[l_suppkey] [int] NOT NULL,
	[l_quantity] [bigint] NOT NULL,
	[l_returnflag] [char](1) NULL,
	[l_partkey] [bigint] NOT NULL,
	[l_linestatus] [char](1) NULL,
	[l_tax] [money] NOT NULL,
	[l_commitdate] [date] NULL,
	[l_receiptdate] [date] NULL,
	[l_shipmode] [char](10) NULL,
	[l_linenumber] [bigint] NOT NULL,
	[l_shipinstruct] [char](25) NULL,
	[l_comment] [varchar](44) NULL
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[lineitem_cci]  WITH NOCHECK ADD  CONSTRAINT [lineitem_cci_order_fk] FOREIGN KEY([l_orderkey])
REFERENCES [dbo].[orders] ([o_orderkey])
GO

ALTER TABLE [dbo].[lineitem_cci] CHECK CONSTRAINT [lineitem_cci_order_fk]
GO

ALTER TABLE [dbo].[lineitem_cci]  WITH NOCHECK ADD  CONSTRAINT [lineitem_cci_partkey_fk] FOREIGN KEY([l_partkey])
REFERENCES [dbo].[part] ([p_partkey])
GO

ALTER TABLE [dbo].[lineitem_cci] CHECK CONSTRAINT [lineitem_cci_partkey_fk]
GO

ALTER TABLE [dbo].[lineitem_cci]  WITH NOCHECK ADD  CONSTRAINT [lineitem_cci_partsupp_fk] FOREIGN KEY([l_partkey], [l_suppkey])
REFERENCES [dbo].[partsupp] ([ps_partkey], [ps_suppkey])
GO

ALTER TABLE [dbo].[lineitem_cci] CHECK CONSTRAINT [lineitem_cci_partsupp_fk]
GO

ALTER TABLE [dbo].[lineitem_cci]  WITH NOCHECK ADD  CONSTRAINT [lineitem_cci_suppkey_fk] FOREIGN KEY([l_suppkey])
REFERENCES [dbo].[supplier] ([s_suppkey])
GO

ALTER TABLE [dbo].[lineitem_cci] CHECK CONSTRAINT [lineitem_cci_suppkey_fk]
GO


INSERT INTO dbo.lineitem_cci WITH (TABLOCK)
SELECT [l_shipdate]
      ,[l_orderkey]
      ,[l_discount]
      ,[l_extendedprice]
      ,[l_suppkey]
      ,[l_quantity]
      ,[l_returnflag]
      ,[l_partkey]
      ,[l_linestatus]
      ,[l_tax]
      ,[l_commitdate]
      ,[l_receiptdate]
      ,[l_shipmode]
      ,[l_linenumber]
      ,[l_shipinstruct]
      ,[l_comment]
  FROM [dbo].[lineitem];



DROP TABLE IF EXISTS [dbo].[orders_cci]

CREATE TABLE [dbo].[orders_cci](
	[o_orderdate] [date] NULL,
	[o_orderkey] [bigint] NOT NULL,
	[o_custkey] [bigint] NOT NULL,
	[o_orderpriority] [char](15) NULL,
	[o_shippriority] [int] NULL,
	[o_clerk] [char](15) NULL,
	[o_orders_ccitatus] [char](1) NULL,
	[o_totalprice] [money] NULL,
	[o_comment] [varchar](79) NULL,
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[orders_cci]  WITH NOCHECK ADD  CONSTRAINT [orders_cci_customer_fk] FOREIGN KEY([o_custkey])
REFERENCES [dbo].[customer] ([c_custkey])
GO

ALTER TABLE [dbo].[orders_cci] CHECK CONSTRAINT [orders_cci_customer_fk]
GO

INSERT INTO dbo.orders_cci WITH (TABLOCK)

SELECT  [o_orderdate]
      ,[o_orderkey]
      ,[o_custkey]
      ,[o_orderpriority]
      ,[o_shippriority]
      ,[o_clerk]
      ,[o_orderstatus]
      ,[o_totalprice]
      ,[o_comment]
  FROM [dbo].[orders];

Building and the rebuilding of the Clustered Columnstore Index was made with the following operation:

CREATE CLUSTERED COLUMNSTORE INDEX CCI_lineitem_CCI
	ON dbo.lineitem_CCI 
		WITH (ONLINE = ON);

ALTER INDEX CCI_lineitem_CCI
	ON dbo.lineitem_CCI 
		REBUILD 
			WITH (ONLINE = ON);

For the Rowstore Tables, here is the code that I have used:

DROP TABLE IF EXISTS [dbo].[lineitem_rowstore];

CREATE TABLE [dbo].[lineitem_rowstore](
	[l_shipdate] [date] NULL,
	[l_orderkey] [bigint] NOT NULL,
	[l_discount] [money] NOT NULL,
	[l_extendedprice] [money] NOT NULL,
	[l_suppkey] [int] NOT NULL,
	[l_quantity] [bigint] NOT NULL,
	[l_returnflag] [char](1) NULL,
	[l_partkey] [bigint] NOT NULL,
	[l_linestatus] [char](1) NULL,
	[l_tax] [money] NOT NULL,
	[l_commitdate] [date] NULL,
	[l_receiptdate] [date] NULL,
	[l_shipmode] [char](10) NULL,
	[l_linenumber] [bigint] NOT NULL,
	[l_shipinstruct] [char](25) NULL,
	[l_comment] [varchar](44) NULL
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[lineitem_rowstore]  WITH NOCHECK ADD  CONSTRAINT [lineitem_rowstore_order_fk] FOREIGN KEY([l_orderkey])
REFERENCES [dbo].[orders] ([o_orderkey])
GO

ALTER TABLE [dbo].[lineitem_rowstore] CHECK CONSTRAINT [lineitem_rowstore_order_fk]
GO

ALTER TABLE [dbo].[lineitem_rowstore]  WITH NOCHECK ADD  CONSTRAINT [lineitem_rowstore_partkey_fk] FOREIGN KEY([l_partkey])
REFERENCES [dbo].[part] ([p_partkey])
GO

ALTER TABLE [dbo].[lineitem_rowstore] CHECK CONSTRAINT [lineitem_rowstore_partkey_fk]
GO

ALTER TABLE [dbo].[lineitem_rowstore]  WITH NOCHECK ADD  CONSTRAINT [lineitem_rowstore_partsupp_fk] FOREIGN KEY([l_partkey], [l_suppkey])
REFERENCES [dbo].[partsupp] ([ps_partkey], [ps_suppkey])
GO

ALTER TABLE [dbo].[lineitem_rowstore] CHECK CONSTRAINT [lineitem_rowstore_partsupp_fk]
GO

ALTER TABLE [dbo].[lineitem_rowstore]  WITH NOCHECK ADD  CONSTRAINT [lineitem_rowstore_suppkey_fk] FOREIGN KEY([l_suppkey])
REFERENCES [dbo].[supplier] ([s_suppkey])
GO

ALTER TABLE [dbo].[lineitem_rowstore] CHECK CONSTRAINT [lineitem_rowstore_suppkey_fk]
GO


INSERT INTO dbo.lineitem_rowstore WITH (TABLOCK)
SELECT [l_shipdate]
      ,[l_orderkey]
      ,[l_discount]
      ,[l_extendedprice]
      ,[l_suppkey]
      ,[l_quantity]
      ,[l_returnflag]
      ,[l_partkey]
      ,[l_linestatus]
      ,[l_tax]
      ,[l_commitdate]
      ,[l_receiptdate]
      ,[l_shipmode]
      ,[l_linenumber]
      ,[l_shipinstruct]
      ,[l_comment]
  FROM [dbo].[lineitem];

DROP TABLE IF EXISTS [dbo].[orders_rowstore]

CREATE TABLE [dbo].[orders_rowstore](
	[o_orderdate] [date] NULL,
	[o_orderkey] [bigint] NOT NULL,
	[o_custkey] [bigint] NOT NULL,
	[o_orderpriority] [char](15) NULL,
	[o_shippriority] [int] NULL,
	[o_clerk] [char](15) NULL,
	[o_orders_rowstoretatus] [char](1) NULL,
	[o_totalprice] [money] NULL,
	[o_comment] [varchar](79) NULL,
) ON [PRIMARY]
GO

ALTER TABLE [dbo].[orders_rowstore]  WITH NOCHECK ADD  CONSTRAINT [orders_rowstore_customer_fk] FOREIGN KEY([o_custkey])
REFERENCES [dbo].[customer] ([c_custkey])
GO

ALTER TABLE [dbo].[orders_rowstore] CHECK CONSTRAINT [orders_rowstore_customer_fk]
GO

INSERT INTO dbo.orders_rowstore WITH (TABLOCK)
SELECT  [o_orderdate]
      ,[o_orderkey]
      ,[o_custkey]
      ,[o_orderpriority]
      ,[o_shippriority]
      ,[o_clerk]
      ,[o_orderstatus]
      ,[o_totalprice]
      ,[o_comment]
  FROM [dbo].[orders]

For the building and rebuilding of the indexes the following code was applied:

CREATE CLUSTERED INDEX [CI_lineitem_rowstore] ON [dbo].[lineitem_rowstore]
(
	[l_shipdate] ASC
)WITH (DATA_COMPRESSION = PAGE, ONLINE = ON);


ALTER INDEX [CI_lineitem_rowstore]
	ON dbo.lineitem_rowstore 
		REBUILD 
			WITH (ONLINE = ON);


CREATE CLUSTERED INDEX [CI_orders_rowstore] ON [dbo].[orders_rowstore]
(
	[o_orderkey] ASC
) WITH (DATA_COMPRESSION = PAGE, ONLINE = ON);


ALTER INDEX [CI_orders_rowstore] 
    ON [dbo].[orders_rowstore]
	REBUILD
        WITH (ONLINE = ON, DATA_COMPRESSION = PAGE);


CREATE CLUSTERED COLUMNSTORE INDEX [CCI_orders_CCI] 
        ON [dbo].[orders_cci]
               WITH (ONLINE = ON);

ALTER INDEX [CCI_orders_CCI] 
	ON [dbo].[orders_cci]
		REBUILD
			WITH (ONLINE = ON);

CREATE CLUSTERED COLUMNSTORE INDEX CCI_lineitem_CCI
	ON dbo.lineitem_CCI 
		WITH (ONLINE = ON);

ALTER INDEX CCI_lineitem_CCI
	ON dbo.lineitem_CCI 
		REBUILD 
			WITH (ONLINE = ON);

The results for the Online Clustered Columnstore Index creation and comparison with the Online Rowstore and with the Offline Columnstore Index creation are presented below:

The pattern here for both tables is pretty much clear – the overall performance of going from the Offline Clustered Columnstore Index creation to the Online Columnstore Index creation is going into the direction of doubling the elapsed time spent, which is the news that can be incredibly great for those who have enough resources, or using the Resource Governor can allocate a little bit more of CPU, Disk IOPs & Memory, but in some cases it might be a pretty heavy penalty to pay. From the other side – if you want to have an Online Index operation for the Columnstore Indexes – you will need to start somewhere and this result is reasonably good for a lot of cases, but mostly it will be useful for the partition-based maintenance. The overall increase of the elapsed time from the Online Rowstore Index creation is similar, but in the number of different cases I have tried – stayed in the range between 2-3 times slower, and that is the news that everyone will have to decide for themselves if this is enough or not. Given that this is a definitely Enterprise-Edition feature, I am also expecting the kind of answer from Microsoft that I have written above – if its ain’t fast enough and you need more speed while caring for the highest availability: throw more resources at it. With Azure SQL Database, with very low downtime while migrating between the Premium Editions you can get some very high processing power, that will allow you to achieve your expected SLA.

Vertipaq Optimisation

There is also something happening with the compression of the Row Groups that is beyond obvious – one of the compression & optimisation layers, the Vertipaq Compression is not being used as you can see on the picture below, showing the aftermath of the ONLINE rebuild operation:

This will result in loss of the compression quality, but the true impact of it will be interesting to measure when comparing really big tables (and I am quite unsure if I want to try ONLINE rebuild right now against a Multi-Billion row tables.

Rebuild Cancelling

Canceling an online rebuild seems to be a pretty easy task if there are not too many other tasks are being executed in parallel – the process will simply remove the new versions of the compressed Row Groups thus backing off to the initial state of the operation. I tried it a couple of times and the cancelling was impressively good.

I wish we could have a Resumable Online Index Rebuild, like the one we have for the RowStore Indexes since SQL Server 2017, but even though in my interpretation at the PASS Summit at the open Panel it was mentioned like very probable, I find it sad to observe no other mention ever since.
At least for the Online Index Rebuild currently I would love to have some processed Row Groups kept, but I understand the difficulty of keeping up with the implementation of half-deleted Row Groups that are being updated in parallel …
Well, it does not mean that I would prefer having it other way. :)

Parallel Processes and Clustered Columnstore Index Online Rebuild Process

Starting an online rebuild of the Clustered Columnstore Index with one process

ALTER INDEX CCI_SampleDataTable
	ON dbo.SampleDataTable
		REBUILD WITH (ONLINE = ON);

and running some queries against the table itself in the other process, works very fine:

ALTER DATABASE SCOPED CONFIGURATION 
	CLEAR PROCEDURE_CACHE;

SELECT sum(C1)
	FROM dbo.SampleDataTable;

no matter if we are doing an Online or Offline rebuild of the Clustered Columnstore Index.

What is very different is the locking, which can be observed with the following query:

-- What's being locked
SELECT dm_tran_locks.request_session_id,
       dm_tran_locks.resource_database_id,
       DB_NAME(dm_tran_locks.resource_database_id) AS dbname,
       CASE
           WHEN resource_type = 'object'
               THEN OBJECT_NAME(dm_tran_locks.resource_associated_entity_id)
           ELSE OBJECT_NAME(partitions.OBJECT_ID)
       END AS ObjectName,
       partitions.index_id,
       indexes.name AS index_name,
       dm_tran_locks.resource_type,
       dm_tran_locks.resource_description,
       dm_tran_locks.resource_associated_entity_id,
       dm_tran_locks.request_mode,
       dm_tran_locks.request_status
FROM sys.dm_tran_locks
LEFT JOIN sys.partitions ON partitions.hobt_id = dm_tran_locks.resource_associated_entity_id
JOIN sys.indexes ON indexes.OBJECT_ID = partitions.OBJECT_ID AND indexes.index_id = partitions.index_id
WHERE resource_associated_entity_id > 0
  AND resource_database_id = DB_ID()
ORDER BY request_session_id, resource_associated_entity_id;

Ladies and Gentlemen here are the differences between Online and Offline Rebuilds:

Locking for Online Rebuild

Locking for Offline Rebuild

The usual suspects, such as Exclusive Lock for the Data Pages and the system tables (for the duration of the process) plus Schema-Modification lock on the HOBT are present for the Offline rebuild, while Online Rebuild shows no locks at all (this is obviously not a complete picture, because Schema-Modification lock will be needed and the sysrscols and sysrowsets will need to be modified and there will be locks, but I am aint going to hunt for them in this blog post).

Let’s try to update 100.000 Rows in the table during the rebuild process (and to be manageable I will run it against 1 Million Rows table), while rebuilding the table with the Clustered Columnstore Index online:

ALTER INDEX CCI_SampleDeltaStore
	ON dbo.SampleDeltaStore
		REBUILD WITH (ONLINE = ON);

update top (100000)  
	dbo.SampleDeltaStore 
	SET C1 = C1 + 1

Running the rebuild process ONLINE vs OFFLINE produces the following locking information:

Locking during Updates for Online Rebuild

Locking during Updates for Offline Rebuild

Given that none of the UPDATE process currently ends before the rebuild operation finishes (and I consider it to be a total unfinished development for the ONLINE rebuild), it feels like a very work in progress version, and because there were no official announcement about it being ready so far, to my knowledge, I consider it to be a beta version.

Offline vs Online Rebuild Transaction Log Impact

Let us explore more of the impact in the different ways that is possible. For now, let’s take a look at the Transaction Log size after each of the operations, and for that purpose I will use the following script for the OFFLINE rebuild operation:

CHECKPOINT;

ALTER INDEX CCI_SampleDeltaStore
	ON dbo.SampleDeltaStore
		REBUILD;

SELECT CAST(SUM([log record length]) / 1024. / 1024 as Decimal(9,2)) as LogSizeInMB
	FROM sys.fn_dblog (NULL, NULL);

It does not take a lot of time and in my typical attempt the transaction log size was 5.89 MB – and notice that I did not run any other t-log affecting transaction in the mean time on my Azure SQL Database.

Let’s try the same for the ONLINE rebuild operation and measure the transaction log impact:

CHECKPOINT;

ALTER INDEX CCI_SampleDeltaStore
	ON dbo.SampleDeltaStore
		REBUILD WITH (ONLINE = ON);

SELECT CAST(SUM([log record length]) / 1024. / 1024 as Decimal(9,2)) as LogSizeInMB
	FROM sys.fn_dblog (NULL, NULL);

It took 1 Minute and 28 Seconds on the average, where 1/3 of the time was spent on analysing the transaction log size, which was a huge 364.20 MB
You can see the comparison between the transaction log impact on the picture on the left, and to me, for a 1 Million Row table this looks pretty impressive. Impressive for a beta-version and this is how I truly consider the current status, because no one in their sane mind will attempt any of these operations on multi-billion row tables.

Online Index Creation

One of the thing missing from this first attempt is to create a Clustered Columnstore Index online, and here is a script for that.

DROP TABLE IF EXISTS dbo.SampleDeltaStore;

CREATE TABLE dbo.SampleDeltaStore (
    C1 BIGINT NOT NULL
);
GO

INSERT INTO dbo.SampleDeltaStore WITH(TABLOCK)
SELECT t.RN
	FROM
	(
		SELECT TOP (1 * 1000000) ROW_NUMBER()
			OVER (ORDER BY (SELECT NULL)) RN
		FROM sys.objects t1
		CROSS JOIN sys.objects  t2
		CROSS JOIN sys.objects  t3
		CROSS JOIN sys.objects  t4
		CROSS JOIN sys.objects  t5    
	) t
	OPTION (MAXDOP 1);

CREATE CLUSTERED COLUMNSTORE INDEX CCI_SampleDeltaStore
	ON dbo.SampleDeltaStore
		WITH (ONLINE = ON);

Creation & creation with (DROP_EXISTING = ON) works without any problems and delivers no observable locks through the DMV’s, which are really good news for those of us who are waiting to be able to migrate to Columnstore Indexes with as small downtime as possible.
I would not vote for the current state of the feature with the possible parallel insertion of the data though :)

Final Thoughts

The current state of the Clustered Columnstore Index ONLINE rebuild points to be an unfinished version, which will definitely get vastly improved before being released & supported in SQL Server. I have seen a couple of deadlocks and canceled transactions and so I decided that this blog post will get updated as soon as there will be an official announcement of this feature.
If you are still looking to start working on this feature, then I would suggest trying it on smaller tables. Like really, really small ones.
Oh, and for online rebuild operation focus on using partition rebuild – you are using the partitioning, right ? :)

to be continued with Columnstore Indexes – part 124 (“Estimate Columnstore Compression”)

One thought on “Columnstore Indexes – part 123 (“Clustered Columnstore Index Online Rebuild”)”

Jovan Popovic(MSFT) July 19, 2018 at 3:51 pm

I think that you hit some optimization of offline index rebuild that happens only if you have a single column table. I have ran your script but I have added more columns with random values in the table and I got 00:17 for offline rebuild (instead of 4 sec for single column table), and 01:33 online rebuild. Interesting is that I also got ~01:30 when I executed online rebuild on single columns CCI.

Single-column table is not representative example, so could you try with some ContosoDW table?
When I run online/offline on a tpch database that I use for texting, I got the following results:

General purpose(Standard) TPCH300 database
alter index ord_cs on dbo.orders rebuild
–00:44:14
–00:30:24
–00:33:20
–00:30:29

alter index ord_cs on dbo.orders rebuild with (online=on)
–03:49:07
–02:50:17
–02:49:59
–02:50:29

Business Critical(Premium) TPCH1000 database
alter index ord_cs on dbo.orders rebuild
–00:03:35
–00:02:56
–00:03:03
–00:03:07

alter index ord_cs on dbo.orders rebuild with (online=on)
–00:14:46
–00:14:23
–00:14:15
–00:14:30
–00:14:17

In my experiments online build is slower than offline as expected, but it is around x5 slower.

Reply ↓

Niko Neugebauer

SQL Server, Columnstore, Data Platform & Community