Resumable Online Index Creation

It is quite strange to write an article about Resumable Online Index Creation 14 months after writing about Resumable Online Index Rebuild operation, because from the physical world logic there should be first the initial creation and then rebuild process, but in Data Platform world, Microsoft advanced with a Rebuild operation before Initial Creation operation support.

I do not know about you, but I find the Resumable Online Index feature as one of the absolute highlights of the Microsoft Data Platform with a huge potential to take it to a brand new height – such as asynchronous indexing of the content and/or defining the policy of when new data should be indexed (think hot/cold data). Being able to manage the indexing process is something that most of the relational Data Platform providers struggled to provide so far, but times are definitely changing and I love the response of the relational databases providers, who are tackling quite effectively the new challenges.

Besides the new businesses and data management models, there is a goodly one necessity – as the data in the databases grows bigger & bigger, there is a necessity to keep the business online, and as unaffected as possible while adding new optimisation paths, such as indexes. Yes, we can build right now almost every possible type of index online (without interrupting business operations), but that does not mean that the business operations won’t get affected (cough, cough).

Before advancing let’s try to think logically about the Resumable Online Index Creation – what is logically to expect from the feature:
– While being created asynchronously for the first time, it should not become available until the creation is finished;
– Being suspendible and resumable, meaning that we can pause the process and then resume later;
– Having an opportunity to completely cancel the process;
– Ability to define the maximum amount of time that the process shall be executed;

After creating a test database in Azure SQL DB, and I have created a P1 (Premium 1) one, in order to be able to run some reasonable tests, I shall set the compatibility level to 150 – which corresponds to SQL Server vNext (in the 2018):

ALTER DATABASE CloudColumnstore -- Yeah, that's how I call my Azure SQL DB for the test purposes
	SET compatibility_level = 150;

Let us create a plain simple SampleDataTable with 10.485.760 Rows containing simple integer numbers with the following script:

DROP TABLE IF EXISTS dbo.SampleDataTable;

CREATE TABLE dbo.SampleDataTable (
    C1 BIGINT NOT NULL
);

INSERT INTO dbo.SampleDataTable WITH (TABLOCK)
SELECT t.RN
	FROM
	(
		SELECT TOP (10 * 1048576) ROW_NUMBER()
			OVER (ORDER BY (SELECT NULL)) RN
		FROM sys.objects t1
		CROSS JOIN sys.objects  t2
		CROSS JOIN sys.objects  t3
		CROSS JOIN sys.objects  t4
		CROSS JOIN sys.objects  t5    
	) t
	OPTION (MAXDOP 1);

It is about the time to create our first Clustered Online Resumable Index:

CREATE CLUSTERED INDEX CI_SampleDataTable
	ON dbo.SampleDataTable (c1)
		WITH ( ONLINE = ON, RESUMABLE = ON ) ;

But all we shall get is an error message:

Msg 155, Level 15, State 1, Line 25
'RESUMABLE' is not a recognized CREATE CLUSTERED INDEX option.

I was shocked and I was disappointed, but I have understood that it was my own mind’s fault. Nobody, I repeat – NOBODY has told me that it would work for the CLUSTERED Indexes, but when I see an announcement that the Indexes are supported, I was totally believing that the traditional (not XML, no CLR, no LOB’s) Rowstore Indexes would be totally supported. Oh yes, I know that it is crazy difficult. I know that this is a pretty forward-facing feature, but come on – my mind played trick on me, telling me the story that does not exist, for now, at least.

After realising my mind’s mistake I took a deeper breath and decided to try out the Resumable Nonclustered Index Creation with the following command:

CREATE NONCLUSTERED INDEX NCI_SampleDataTable
	ON dbo.SampleDataTable (c1)
		WITH ( ONLINE = ON, RESUMABLE = ON );

and on the very first attempt on my P1 (Premium 1) instance it took 4 Minutes and 37 Seconds to run, and so I decided that I truly wanted to have an average value out of at least 3 attempts and to make things easy I would simply add a DROP_EXISTING = ON command, just like this:

CREATE NONCLUSTERED INDEX NCI_SampleDataTable
	ON dbo.SampleDataTable (c1)
		WITH ( ONLINE = ON, RESUMABLE = ON, DROP_EXISTING = ON );

but the Query Optimiser did not agreed with me and the following error message was presented to me:

Msg 10665, Level 15, State 1, Line 23
'CREATE INDEX' with DROP_EXISITNG option is not supported with RESUMABLE option.




Well, I … I … I simply wanted to rebuild my index by substituting it with a new one … Yes, I could have used Online Resumable Index Rebuild operation, but that’s not the point. That is not the point at all. It is not the point either that if I wanted to change my index I could create a new one with Online Resumable Create command and drop the existing one … It is not the point.
The point is that this is the functionality that (I Guess) most good DBA are expecting to be supported, without regard of the technical difficulty of it’s implementation.
In any case I simply decided to add indexes NCI_SampleDataTable_2, NCI_SampleDataTable_3 & NCI_SampleDataTable_4 & NCI_SampleDataTable_5 to my table in order to have some kind of result for the comparison with the default index creation (I throw away one outlier on both sides).
My consecutive operations had a pretty interesting distribution and the average value would fall down to 3:36 Minutes.

To compare I ran multiple comparable indexes creation operations that

CREATE NONCLUSTERED INDEX NCI_SampleDataTable_OLD
	ON dbo.SampleDataTable (c1)
		WITH ( ONLINE = ON ) ;

And I patiently executed the same creation process with the names %2, %3, %4 & %5 in order to make sure that the execution time was the same – 3:36 Minutes, the very same time as in the case of the Resumable Online Index Creation (ROIC). I admit that it is very impressive, even though very similar results have been shown during the default and unaffected Resumable Online Index Rebuild (ROIR) operations that I have blogged previously about. This virtually can hopefully mean that we are looking by a default improvement for the existing functionality that will not require an extra intervention and troubleshooting (Yes, by default – which means there will be still enough exceptions, but it will be good enough for the vast majority of the users).

Before advancing any further, please drop any of the existing indexes on the test table dbo.SampleDataTable.

This is a default test, but what if we update some data in our table while rebuilding the index, with the following statement:

UPDATE dbo.SampleDataTable
	SET C1 += 1;

while creating a new Nonclustered Resumable Online Index:

CREATE NONCLUSTERED INDEX NCI_SampleDataTable2
	ON dbo.SampleDataTable (c1)
		WITH ( ONLINE = ON, RESUMABLE = ON ) ; 

I should have written before the execution start, but I will do it only now – it will take an extraordinary amount of time to bring the execution of both of the queries to an end. The whooping 31:12 Minutes were spent for updating the whole table, while creating the index on our table that I have started at the same time in parallel took 34:46 Minutes on my Azure SQL Database P1 Instance! That was a very long operation for just 10 Million Rows, but given the context of the update operation maybe it is simply too early to give any serious judgement – I thought. So while waiting for those wonderful 30+ Minutes I thought I would check on the DMV’s to see the progress of the operations and the most interesting thing that I already knew were the sys.index_resumable_operations DMV, which shows the status of the current operation, and so I kept firing the following trivial query against it in the hopes to see and evaluate the progress:

SELECT *
	FROM sys.index_resumable_operations;

The funny part was that before the update operation has finished, the response I would receive was following:

and after the update was finished I could finally observe the Resumable Online Index Creation:

Doing a simple mathematics lead us to a reasonable conclusion that the difference in time between finishing the update operation and the finishing the Resumable Online Index Creation is extremely similar to the time spent on the time to just create the index – 3:35 Minutes, and the fact that there were no entries in the sys.index_resumable_operations DMV just confirms that.

The exciting part is that if we start the other way around – first the Resumable Online Index Creation and then the update function: we can see the entry in the sys.index_resumable_operations DMV and then we can truly see it’s progress if we focus on the percent_complete property:

select percent_complete, *
	from sys.index_resumable_operations;


but there will be no real change in the overall performance of the Rebuild process – it will take the same amount of time to execute the creation of the Resumable Index, but it will not complete until the long operation on the table comes to an end (which is my test case was around 31:30 Minutes).

There can be only one

While testing and playing with Resumable Online Index creation process, I have found out a couple of interesting things, the most important of which is the following – should we start the Resumable Online Index creation in a window:

CREATE NONCLUSTERED INDEX NCI_SampleDataTable_Parallel1
	ON dbo.SampleDataTable (c1)
		WITH ( ONLINE = ON, RESUMABLE = ON ) ;

SUSPENDING the process after a couple of seconds and verifying in the sys.index_resumable_operations that the process is suspended like it should be:

select percent_complete, *
	from sys.index_resumable_operations;


and try to START A NEW Resumable Online Index creation (ROIC) process:

CREATE NONCLUSTERED INDEX NCI_SampleDataTable_Parallel2
	ON dbo.SampleDataTable (c1)
		WITH ( ONLINE = ON, RESUMABLE = ON ) ;

The response we shall get from the query processor is quite clear:

Msg 10637, Level 16, State 3, Line 23
Cannot perform this operation on 'object' with ID 754101727 as one or more indexes are currently in resumable index rebuild state. Please refer to sys.index_resumable_operations for more details.

There can be only one Resumable Online Index Creation process per table ? Hmm, this sounds like not what I would expect …

The resumable operations

As I have already blogged previously the basic operations for the Resumable Online Indexes are SUSPEND, RESUME & CANCEL and as we have suspended the index creation process, let us resume it with the help of the following command:

ALTER INDEX NCI_SampleDataTable_Parallel1
	ON dbo.SampleDataTable 
               RESUME;

And after suspending operation once again we can easily experiment cancelling Resumable Online Index creation by invoking ABORT command

ALTER INDEX NCI_SampleDataTable_Parallel1
	ON dbo.SampleDataTable 
               ABORT;

And this operation is successful, but notice that in my experience it takes a couple of seconds because it must clean up already prepared structures.

Some smaller limitations (the old ones, if I might say)

Like in the case of the Resumable Online Index Rebuild (ROIR) operations, some data types still do not play well with the Resumable Online Index operations and that might be taking off the line some of the scenarios, like in the case of the Dynamics for using with Resumable Online Indexes.

Here is the old example for the Timestamp

DROP TABLE IF EXISTS dbo.ROIR_Test1;

CREATE TABLE dbo.ROIR_Test1(
	c1 int identity(1,1) not null,
	c2 timestamp,
	constraint pk_roir_test1 primary key(c1),
	index ix_ROIR_Test1 nonclustered(c2)
	);
GO

set nocount on
INSERT INTO dbo.ROIR_Test1 default values;
GO 10

CREATE NONCLUSTERED INDEX ix_ROIR_Test1 on ROIR_Test1 (c2)
 WITH ( RESUMABLE = ON, ONLINE = ON );

which will produce the following error message:

Msg 10639, Level 16, State 1, Line 15
Resumable index operation for index 'ix_ROIR_Test1' failed because the index contains column 'c2' of type timestamp as a key column.

Another item would be sorting in TempDB, where this operation is not supported, just try this:

CREATE NONCLUSTERED INDEX NCI_SampleDataTable_Parallel2
	ON dbo.SampleDataTable (c1)
		WITH ( ONLINE = ON, RESUMABLE = ON, SORT_IN_TEMPDB = ON ) ;

which results in the following error message:

Msg 11438, Level 15, State 2, Line 25
The SORT_IN_TEMPDB option cannot be set to 'ON' when the RESUMABLE option is set to 'ON'.

The Query Store

There looks to be a quite interesting issue for the plan forcing in the Query Store where plan forcing is not supported for resumable index builds, but I am still asking myself is there a such a need for forcing an alternative plan.

Database Scoped Configurations

I absolutely love the improvement that Microsoft has been pushing with the Database Scoped Configurations and there will be a much bigger blog post on this topic in the next weeks to come, but when writing about the Resumable Online in Azure SQL Database I feel excited to touch on some of the more exciting additions there – the ELEVATE_RESUMABLE option.

To see what is currently available there, please use the following query against the sys.database_scoped_configurations DMV:

select *
	from sys.database_scoped_configurations

The configuration that interests us is the ELEVATE_RESUMABLE, which is currently supports the following options:
– OFF (by default, meaning that Query Processor will not try to change the type of the operations for the Index creation or rebuilds, and so you have total control manually over the type of the operations that you are executing)
– FAIL_UNSUPPORTED (meaning that the Query Processor will try to change the type of the operation to the Resumable Online Index creation or rebuilds and will fail the operation if they are not supported.
– WHEN_SUPPORTED (meaning that Query Processor will try to execute the Resumable Rebuild & Create (aka ROIR & ROIC) operations for the indexes)

Notice that it is necessary to have the other brand new Database Scoped Configuration ELEVATE_ONLINE, in order for Resumable ONLINE Index Operation to take if you did not specify ONLINE = ON explicitly – which makes total sense, if you are not running an Resumable Operation – it can be online & resumable. :)

For changing this configuration, you will need to execute the following script (where in this example I am changing this option to WHEN_SUPPORTED, meaning that it will try to automatically convert the Resumable Online Index operation where available):

ALTER DATABASE SCOPED CONFIGURATION 
    SET ELEVATE_RESUMABLE = WHEN_SUPPORTED;
	
ALTER DATABASE SCOPED CONFIGURATION 
    SET ELEVATE_ONLINE = WHEN_SUPPORTED;

Unfortunately so far, the Index creation attempts to be elevated have failed in my tests – which is why most probably, the feature is freshly announced and still in the public preview, but the Online Rebuild elevation for the Resumable Index Rebuilds works very fine:

CREATE NONCLUSTERED INDEX NCI_SampleDataTable_DBScoped2
	ON dbo.SampleDataTable (c1);

ALTER INDEX NCI_SampleDataTable_DBScoped
	ON dbo.SampleDataTable
                REBUILD;

If you catch the ALTER INDEX REBUILD operation, it will be both Online & Resumable:

select percent_complete, *
	from sys.index_resumable_operations;

Another important notice to be added here is that:
ALTER INDEX REBUILD does not support rebuilding ALL WITH (RESUMABLE=ON) and the ELEVATE_RESUMABLE database scoped configuration is set to FAIL_UNSUPPORTED, but I love the direction where Azure SQL Database team is trying to take all those feature – it will allow so many more people to achieve much more through general configuration and hopefully even by default one day.

Oh, and the Columnstore?

It is very far from being a secret that I have been asking and begging for the Columnstore Indexes support of this feature, because it is exactly on the huge amounts of data that it makes total sense to be able to have a well-defined data maintenance window that could be used to optimise data maintenance processes,
and in 2018 who would be in their sane mind to work with big amounts of data without Clustered Columnstore Index ? (Yes, the OLTP people, I know … )

Final Thoughts

I am very happy to see that the current Azure SQL Database is moving into the right direction as well as the next SQL Server version – giving more granular & more online control over the indexing operations. I am hoping to see the support for the Clustered Index in the final version and I do really, really, really hope to see the support for the Columnstore Indexes, by the way. :) hahaha
Oh, and a totally asynchronous indexing is such a nice idea, right ? :)
I have no doubt that this is an extraordinary effort by the engineering team and while I am sitting comfortably on my sofa and writing this text – it is easy to say, but making any online parallel operations work flawlessly is definitely the most difficult task in the IT overall.
I am hungry for more and I will be watching very closely this space.

to be definitely continued …

2 thoughts on “Resumable Online Index Creation

  1. Ricardo Peres

    Hi, Niko! I think the first code snippet has a typo: “ALTER DATABAS” should be “ALTER DATABASE”. Great job, as usual!

Leave a Reply

Your email address will not be published.