Continuation from the previous 65 parts, the whole series can be found at https://www.nikoport.com/columnstore/.
Continuing writing about the improvements added to Clustered Columnstore Indexes (DataWarehousing & Business Intelligence) in SQL Server 2016, this blog post is a direct continuation from my previous post on the Nonclustered B-Tree Indexes added to Clustered Columnstore Index in SQL Server 2016.
This blog is focused on the improvements for the Clustered Columnstore Indexes and especially the support of the unique constraints and referential integrity between the tables in SQL Server 2016.
In SQL Server 2014 and before we were always told to verify the referral integrity between the tables by setting up a correct ETL process. This is a very correct recommendation, and indeed the verification of data integrity should be executing during the loading process – there is nothing new to add or to object at.
But … :)
The existence of the referential constraints and integrity constraints is something that any data solution will greatly appreciate. Take a look at NoSQL or Hadoop people – they started negating the need of constraints, but once going for the performance optimisations, one of the first thing to implement are the referential constraints.
As for me, I love Unique Constraints, Primary & Foreign Keys – they allow to describe the state of the database on the meta level, by describing the relationship between sets of data. They allow us to enforce those relations with just few lines of code. They help Query Optimiser to take more precise decisions on how to process our queries as fast as possible, and for DataWarehouse/BI Solutions the speed with with the information is being returned is one of the keys to success.
In SQL Server 2016, Microsoft has added a couple of very important improvements, regarding the unique Constraints, Primary Keys & Foreign Keys, and so let’s dive into them without any further delay.
For the start, let’s consider some clean & shiny new tables, that will contain a unique constrain, clustered columnstore index and a second table will contain a foreign key on the primary one:
create table dbo.T1( c1 int identity(1,1), c2 varchar(20), constraint UQ_T1_C1 unique nonclustered (c1), index CCI_T1 clustered columnstore ); create table dbo.T2( id int, c1_t1 int not null, constraint FK_T2_T1_c1 foreign key (c1_t1) references dbo.T1(c1) );
Notice that right from the beginning, I am including a unique constraint for the table T1. This is impossible in SQL Server 2014 and earlier versions. Observe that I have defined my unique constraint UQ_T1_C1 as a nonclustered – this is because I need to define a Nonclustered Index on it.
My table T2 has a foreign key FK_T2_T1_c1 that references the
T1 table! This is really amazing, but let’s test if the data insertion, to verify if everything works correctly whenever users are inserting and reading the data:
insert into dbo.T1 (c2) values ('Alex'), ('Brown'), ('Cat'); insert into dbo.T2 (id, c1_t1) values (1, 1 );
Worked like magic, but what about reading data from the tables:
select * from dbo.T1 inner join dbo.T2 on T1.c1 = T2.c1_t1;
Works very well, with no problems that I have seen so far. You can reference the table itself (from one column to another unique column by using a foreign key), you can naturally make cross references between any desired number of tables – you will face the consequences, but yes you can! ;)
Let’s try some disruptive changes – let’s add a clustered columnstore index to the T2 table, so that we can test that a table containing columnstore together with a foreign key:
alter table dbo.T2 add index CCI_T2 clustered columnstore;
Ha! Interesting …
Msg 10785, Level 16, State 1, Line 27 The operation 'ALTER TABLE ADD INDEX' is supported only with memory optimized tables. Msg 1750, Level 16, State 0, Line 27 Could not create constraint or index. See previous errors.
It looks like using this specific syntax to add a clustered columnstore index does not work in SQL Server 2016 CTP 2.3 yet. Let’s try to add a clustered columnstore index with a good old syntax:
create clustered columnstore index CCI_T2 on dbo.T2;
This time it functioned without any problem.
To continue with a more complete example, I am restarting with a fresh copy of the free ContosoRetailDW database, that I am using for most of my tests.
Let’s start the engines with the a fresh restore, upgrading the database compatibility level to 130, and making a bit more space for the Data File & the Log File:
USE [master] alter database ContosoRetailDW set SINGLE_USER WITH ROLLBACK IMMEDIATE; RESTORE DATABASE [ContosoRetailDW] FROM DISK = N'C:\Install\ContosoRetailDW.bak' WITH FILE = 1, MOVE N'ContosoRetailDW2.0' TO N'C:\Data\ContosoRetailDW.mdf', MOVE N'ContosoRetailDW2.0_log' TO N'C:\Data\ContosoRetailDW.ldf', NOUNLOAD, STATS = 5; alter database ContosoRetailDW set MULTI_USER; GO GO ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 130 GO ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0', SIZE = 2000000KB , FILEGROWTH = 128000KB ) GO ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0_log', SIZE = 400000KB , FILEGROWTH = 256000KB ) GO
Now, its time to play with my favourite test table: FactOnlineSales.
Notice, that not like before, I am not dropping any of the 6 foreign keys, that are defined on it, but I will simply try to replace the existing Clustered Index on it, by creating a Clustered Columnstore Index with a hint (DROP_EXISTING = ON):
create clustered columnstore Index [PK_FactOnlineSales_SalesKey] on dbo.FactOnlineSales with( drop_existing = on);
This did not work the way I have expected, I have to say:
Msg 1907, Level 16, State 1, Line 35
Cannot recreate index ‘PK_FactOnlineSales_SalesKey’. The new index definition does not match the constraint being enforced by the existing index.
This message and the lacking functionality seems to be an overkill, and while in my thoughts I can imagine that actual development of this feature can be complex and compelling, there is no doubt, that the actual SQL Server users are expecting this to work directly.
Since there is a Primary Key with a clustered Index built as one on this table, I will have to drop the Primary Key thus dropping the clustered Index as well:
ALTER TABLE dbo.[FactOnlineSales] DROP CONSTRAINT [PK_FactOnlineSales_SalesKey];
Now, FactOnlineSales is a HEAP, and this way I can add a Primary Key that is NONCLUSTERED, before creating Clustered Columnstore Index:
alter table dbo.FactOnlineSales add constraint PK_FactOlnineSales primary key Nonclustered (OnlineSalesKey);
create clustered columnstore Index [CCI_FactOnlineSales] on dbo.FactOnlineSales;
This time, it works like magic!
Now we have a table with Clustered Columnstore Index, Primary Key & 6 Foreign Keys!
Looking at the picture on the left side you can see a view on a table that you would expect to find in a well configured environment, with a unique constraint, foreign keys and a clustered columnstore index. :)
Having foreign keys enabled on our table, will allow Query Optimiser to take advantage of the better estimations for memory grants and more precise execution plans, which are some of the most important aspects of the performance, and their absence for Columnstore Indexes for the versions 2012 & 2014 of SQL Server was a very sad miss. Now with SQL Server 2014 and Azure SQLDatabase we can finally take advantage of them.