Columnstore Indexes – part 102 (“CCI with Secondary Rowstore Indexes on SQL 2014”)

Continuation from the previous 101 parts, the whole series can be found at https://www.nikoport.com/columnstore/.

In SQL Server 2014 we have received a possibility to update the Columnstore Index directly with an addition of the Clustered Columnstore Index. This addition had some important limitations, between which were the impossibility of adding secondary Rowstore Indexes (making the Clustered Columnstore Index the only index on the table) and the lack of the unique constraints (

Meet the good old and absolutely underused friend – the Indexed Views, which in fact (and for some cost) will provide you with the above mentioned features.
If you absolutely need those features, follow this article to find this absolutely trivial way.

Using the good old free database ContosoRetilDW, download the backup and restore it from the C:\Install\:

USE [master]
 
alter database ContosoRetailDW
    set SINGLE_USER WITH ROLLBACK IMMEDIATE;
 
RESTORE DATABASE [ContosoRetailDW] 
    FROM  DISK = N'C:\Install\ContosoRetailDW.bak' WITH  FILE = 1,  
        MOVE N'ContosoRetailDW2.0' TO N'C:\Data\ContosoRetailDW.mdf', 
        MOVE N'ContosoRetailDW2.0_log' TO N'C:\Data\ContosoRetailDW.ldf',  
        NOUNLOAD,  STATS = 5;
 
alter database ContosoRetailDW
    set MULTI_USER;
GO

GO
ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 120
GO
ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0', SIZE = 2000000KB , FILEGROWTH = 128000KB )
GO
ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0_log', SIZE = 400000KB , FILEGROWTH = 256000KB )
GO

Let’s convert the FactOnlineSales table to the Clustered Columnstore one, by dropping the primary key and all the foreign keys:

alter table dbo.[FactOnlineSales] DROP CONSTRAINT PK_FactOnlineSales_SalesKey;
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimCurrency;
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimCustomer;
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimDate;
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimProduct;
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimPromotion;
alter table dbo.[FactOnlineSales] DROP CONSTRAINT FK_FactOnlineSales_DimStore;

create clustered columnstore Index PK_FactOnlineSales
	on dbo.FactOnlineSales;

Here is query that I would like to optimise today, which is a rather basic SalesAmount aggregation on the daily basis for some certain orders, where we try to sort the data based on the Total Sales Amount per Day:

SET STATISTICS TIME, IO ON

select st.CalendarDayOfWeekLabel, SUM(SalesAmount) as TotalSales
	from dbo.FactOnlineSales v 
	inner join dbo.DimDate st
		on v.DateKey = st.DateKey
	where v.SalesOrderNumber >= N'200912282CS780'
	GROUP BY st.CalendarDayOfWeekLabel
	ORDER BY TotalSales desc;

This query takes almost 330 ms on the average to run on my virtual machine, at the same burning 1281 ms of the CPU time, doing over incredible 36.000 LOB logical reads!

Table 'FactOnlineSales'. Scan count 4, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 36367, lob physical reads 0, lob read-ahead reads 0.
Table 'DimDate'. Scan count 5, logical reads 334, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 1281 ms,  elapsed time = 330 ms.

Here is the execution plan for this query:

If you run the query and look at the actual execution plan, you will think about adding a nonclustered index:

USE [ContosoRetailDW]
GO
CREATE NONCLUSTERED INDEX IX_MyMissingIndex
ON [dbo].[FactOnlineSales] ([DateKey])
INCLUDE ([SalesAmount])
WITH (DATA_COMPRESSION = PAGE);

this operation will naturally fail, as expected:

Msg 35303, Level 16, State 1, Line 3
CREATE INDEX statement failed because a nonclustered index cannot be created on a table that has a clustered columnstore index. Consider replacing the clustered columnstore index with a nonclustered columnstore index.

We can solve and work around this problem if we build an indexed view, and we shall use it: :)

create view dbo.vContosoTest WITH SCHEMABINDING  
AS
	select OnlineSalesKey, DateKey, SalesAmount
		from dbo.FactOnlineSales;
GO

We shall need a unique clustered index to turn this view an indexed one:

create unique clustered index pk_vContosoTest
	on dbo.vContosoTest (OnlineSalesKey)
		WITH (DATA_COMPRESSION = PAGE);

And let us add an additional nonclustered rowstore index, to make this view performing great:

create nonclustered index cci_vContosoTest
	on dbo.vContosoTest (SalesOrderNumber,DateKey)
		Include(SalesAmount)	
		WITH (DATA_COMPRESSION = PAGE);

Let us re-execute the query, this time against our indexed view to see if the query is performing well or not:

set statistics time, io on

select st.CalendarDayOfWeekLabel, SUM(SalesAmount) as TotalSales
	from dbo.vFactOnlineSales v 
	inner join dbo.DimDate st
		on v.DateKey = st.DateKey
	where v.SalesOrderNumber >= N'200912282CS780'
	GROUP BY st.CalendarDayOfWeekLabel
	ORDER BY TotalSales desc;

This time the results are returned with a different speed – it took just 23ms of the total execution time, while spending just 31ms of the CPUT time! From the read accesses we are talking here about less than 170 logical reads. Compare these results with the original ones with 330 ms of the total execution (10 times improvement), and over 36000 LOB logical reads (over 200 times improvement).

(4 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'DimDate'. Scan count 1, logical reads 81, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'vFactOnlineSales'. Scan count 1, logical reads 85, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
 SQL Server Execution Times:
   CPU time = 31 ms,  elapsed time = 23 ms.

Here is the actual execution plan that was produced for this query:

Here we have a beautiful and a simple execution plan, which delivers what we need – great performance with a relative low overall cost. And in this way we enjoy the possibility to get the best out of the 2 worlds – Columnstore & Rowstore.
Should we need to run a similar query but agains the whole dataset, it will be redirected to our Columnstore Index which will deliver great performance:

select st.CalendarDayOfWeekLabel, SUM(SalesAmount) as TotalSales
	from dbo.vFactOnlineSales v 
	inner join dbo.DimDate st
		on v.DateKey = st.DateKey
	GROUP BY st.CalendarDayOfWeekLabel
	ORDER BY TotalSales desc;

Uniqueness

Another frequent requirements is the uniqueness for the Clustered Columnstore Indexes on SQL Server 2014, and this one can be achieved by simply adding unique indexes to our indexed view (and notice that there could be more than 1 default required unique index on it).
The operation that we have already executed to make an indexed view out of the simple view is the one which shall guarantee the uniqueness for the column OnlineSalesKey:

create unique clustered index pk_vContosoTest
	on dbo.vContosoTest (OnlineSalesKey)
		WITH (DATA_COMPRESSION = PAGE);

Should we try to insert a duplicate into our table FactOnlineSales, here by copying one of the rows into our test table FactOnlineSales:

SET IDENTITY_INSERT dbo.FactOnlineSales ON

insert into dbo.FactOnlineSales
	(OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate)
	select top 1
		OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate
		from dbo.FactOnlineSales;

SET IDENTITY_INSERT dbo.FactOnlineSales OFF

The error message is very clear that it the uniqueness of the data is enforced and will be preserved by the unique index in our view:

Msg 2601, Level 14, State 1, Line 3
Cannot insert duplicate key row in object 'dbo.vFactOnlineSales' with unique index 'pk_vFactOnlineSales'. 
The duplicate key value is (20362532).
The statement has been terminated.

Now, if only we could build primary keys/foreign keys to the views … Ok, Ok – I know, this is a different topic …

Final Thoughts

The only thing to add here is that one should evaluate costs very carefully, but if you are dealing with reasonably (not huge) tables in the low million rows, this might be a great solution for your unique constraints and secondary indexes.

to be continued with Columnstore Indexes – part 103 (“Partitioning 2016 vs Partitioning 2014”)

7 thoughts on “Columnstore Indexes – part 102 (“CCI with Secondary Rowstore Indexes on SQL 2014”)

  1. Avinash Jaiswal

    Hi Niko,

    How does the secondary rowstore index work physically? Is the data completely duplicated (at leaf of B-Tree) once we create any B-Tree index on a table with CCI or does it have references within the CCI somehow?
    In case the data is duplicated completely then shouldn’t we prefer Non Clustered Columnstore Index if we intend to have more than one B-Tree non clustered index since the data will be duplicated for each NCI.

    Thanks.

    1. Niko Neugebauer Post author

      Hi Avinash,

      The secondary rowstore index duplicates the data of the clustered index and it does not matter which types are of indexes are involved.
      Secondary Rowstore Indexes over Clustered Columnstore Index serve to speed up and optimise resource spending for the Short-Range (couple of thousands ordered) & Point Lookup (couple of rows) queries.
      Data Duplication on the Index-Level is ok, if you are spending less system resources overall in the end.

      Best regards,
      Niko

      1. Avinash

        I agree about the performance gains. I use that in my production too.
        I am more curious to know your opinion in the case when we need “more than one” NCIs along with the CCI. Will it be better to then have a traditional clustered index (say on the primary key) and have a non clustered column store index(for performance) and make other indexes traditional NCIs? This strategy can save us storage space because the NCIs don’t need to duplicate the data. Of course the column store will duplicate the data but in case of CCI all NCIs would have their own copies of the data.

        1. Niko Neugebauer Post author

          Hi Avinash,

          so if I am understanding correctly, you are wondering wether
          CCI + Rowstore NCCI
          vs
          Rowstore CCI + NCCI + Rowstore NCCI
          are good fit for your scenario.
          I suggest you take a look the following blog post: http://www.nikoport.com/2017/12/29/columnstore-indexes-part-117-clustered-vs-nonclustered/
          it might have some answers for you.
          As for NCCI, it will not solve your Short-Range Scans or Point Lookups and will introduce a greater resource spending.

          Best regards,
          Niko

  2. Shyam Viking

    Niko, Isn’t the view vContosoTest, but your second query with the view is vFactOnlineSales? Is this view different from vContosoTest?

Leave a Reply to Niko Neugebauer Cancel reply

Your email address will not be published. Required fields are marked *