Columnstore Indexes – part 78 (“Temporary Objects”)

Continuation from the previous 77 parts, the whole series can be found at https://www.nikoport.com/columnstore/.

I have decided to spend some time exploring the opportunities that Columnstore Indexes offer in the relation of the temporary objects. Temporary objects represent a very important part of a lot of ETL processes and if you are running an OLTP, good chances are that in your stored procedures sometimes you need the temporary objects as well.

I have seen in a lot of ETL process the creation of temporary tables, which serve as a supporting vehicle for some kind of a transformation, or for a number of times I have faced stored procedures which were storing the intermediate results of the calculations in a stored procedure, before doing even more calculations and then returning some info the final user.

Let’s consider which temporary objects exist in the upcoming release of the SQL Server 2016, and which temporary objects were supporting the Columnstore Indexes in the previous releases since the very first one – the SQL Server 2012.

The most popular temporary object where Columnstore Indexes make a lot of sense are the temporary tables. Their usage for ETL processes for complex scenarios (where transferring data takes a couple of days) are essential.

For the test I am picking up the my favourite free test database ContosoRetailDW from Microsoft, which I will restore from the downloaded backup that was copied into C:\Install folder:

USE [master]
 
alter database ContosoRetailDW
    set SINGLE_USER WITH ROLLBACK IMMEDIATE;
 
RESTORE DATABASE [ContosoRetailDW] 
    FROM  DISK = N'C:\Install\ContosoRetailDW.bak' WITH  FILE = 1,  
        MOVE N'ContosoRetailDW2.0' TO N'C:\Data\ContosoRetailDW.mdf', 
        MOVE N'ContosoRetailDW2.0_log' TO N'C:\Data\ContosoRetailDW.ldf',  
        NOUNLOAD,  STATS = 5;
 
alter database ContosoRetailDW
    set MULTI_USER;
GO

GO
ALTER DATABASE [ContosoRetailDW] SET COMPATIBILITY_LEVEL = 130
GO
ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0', SIZE = 2000000KB , FILEGROWTH = 128000KB )
GO
ALTER DATABASE [ContosoRetailDW] MODIFY FILE ( NAME = N'ContosoRetailDW2.0_log', SIZE = 400000KB , FILEGROWTH = 256000KB )
GO

Local Temporary Tables

Let’s create a simple local temporary table, which will be named FactOnlineSalesTemp:

CREATE TABLE #FactOnlineSalesTemp(
	[OnlineSalesKey] [int] NOT NULL,
	[DateKey] [datetime] NOT NULL,
	[StoreKey] [int] NOT NULL,
	[ProductKey] [int] NOT NULL,
	[PromotionKey] [int] NOT NULL,
	[CurrencyKey] [int] NOT NULL,
	[CustomerKey] [int] NOT NULL,
	[SalesOrderNumber] [nvarchar](20) NOT NULL,
	[SalesOrderLineNumber] [int] NULL,
	[SalesQuantity] [int] NOT NULL,
	[SalesAmount] [money] NOT NULL,
	[ReturnQuantity] [int] NOT NULL,
	[ReturnAmount] [money] NULL,
	[DiscountQuantity] [int] NULL,
	[DiscountAmount] [money] NULL,
	[TotalCost] [money] NOT NULL,
	[UnitCost] [money] NULL,
	[UnitPrice] [money] NULL,
	[ETLLoadID] [int] NULL,
	[LoadDate] [datetime] NULL,
	[UpdateDate] [datetime] NULL
);

Let’s add a clustered columnstore Index to it and then load 5 million rows from the original table FactOnlineSales:

create clustered columnstore index CCI_FactOnlineSalesTemp
    on #FactOnlineSalesTemp;

insert into #FactOnlineSalesTemp
select top 5000000 OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate
	from dbo.FactOnlineSales;

All those operations work perfectly and everything is fine – the birds are singing :)

Now let’s get to the curious details, that I have observed while working on my free CISL library – the information on the temporary object’s internals is not to be found in the columnstore dmvs.
Check out the following query that should return nothing, if you have not created any other columnstore indexes previously:

select *
	from sys.column_store_row_groups;

To be sure that the columnstore index really functions and does it well, I have executed the following query and included the actual execution plan:

select sum(SalesAMount)
	from #FactOnlineSalesTemp;

temporary table sample query execution planAs you can see on the image to the left, the actual execution plan includes a real Clustered Columnstore Index Scan with all the juicy improvements of the SQL Server 2016 such as storage level aggregation operation.
Running a rebuild operation for the Columnstore Index, trying to ensure that the information in the DMV is refreshed brought no changes whatsoever.
I am trying to find out if this is a planned feature or a small bug, and I will post an update here once I have some answers from Microsoft.

Global Temporary Tables

The global temporary tables are very useful, though from the security side can be quite dangerous implementations (unless you are making sure that only the users who should access the information are truly able to do so).
Let’s create a global temporary table with a clustered columnstore index, loading the same 5 million rows into it then:

CREATE TABLE ##FactOnlineSalesGlobalTemp(
	[OnlineSalesKey] [int] NOT NULL,
	[DateKey] [datetime] NOT NULL,
	[StoreKey] [int] NOT NULL,
	[ProductKey] [int] NOT NULL,
	[PromotionKey] [int] NOT NULL,
	[CurrencyKey] [int] NOT NULL,
	[CustomerKey] [int] NOT NULL,
	[SalesOrderNumber] [nvarchar](20) NOT NULL,
	[SalesOrderLineNumber] [int] NULL,
	[SalesQuantity] [int] NOT NULL,
	[SalesAmount] [money] NOT NULL,
	[ReturnQuantity] [int] NOT NULL,
	[ReturnAmount] [money] NULL,
	[DiscountQuantity] [int] NULL,
	[DiscountAmount] [money] NULL,
	[TotalCost] [money] NOT NULL,
	[UnitCost] [money] NULL,
	[UnitPrice] [money] NULL,
	[ETLLoadID] [int] NULL,
	[LoadDate] [datetime] NULL,
	[UpdateDate] [datetime] NULL
);

create clustered columnstore index CCI_FactOnlineSalesGlobalTemp
    on ##FactOnlineSalesGlobalTemp;

insert into ##FactOnlineSalesGlobalTemp
select top 5000000 OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate
	from dbo.FactOnlineSales;

Once again, like with the local temporary Columnstore Indexes – there is no information to be found on this matter in the sys.column_store_row_groups DMV:

select object_name(object_id), * 
	from sys.column_store_row_groups

Of course, feel free to run the following test making sure that you are really using the Columnstore Indexes, just do not forget to include the actual execution plan before executing:

select sum(SalesAMount)
	from ##FactOnlineSalesGlobalTemp;

Temporary Variables

While admittedly more problematic and absolutely less desired situation, I thought that there might be a possibility for creating Columnstore Index on the temporary variables. While temporary variables are absolutely not suited for the huge amounts of data that the Columnstore Indexes are optimised for, but still I thought – let’s make a try and test if it is possible at all:

Declare @FactOnlineSalesTemp TABLE(
	[OnlineSalesKey] [int] NOT NULL,
	[DateKey] [datetime] NOT NULL,
	[StoreKey] [int] NOT NULL,
	[ProductKey] [int] NOT NULL,
	[PromotionKey] [int] NOT NULL,
	[CurrencyKey] [int] NOT NULL,
	[CustomerKey] [int] NOT NULL,
	[SalesOrderNumber] [nvarchar](20) NOT NULL,
	[SalesOrderLineNumber] [int] NULL,
	[SalesQuantity] [int] NOT NULL,
	[SalesAmount] [money] NOT NULL,
	[ReturnQuantity] [int] NOT NULL,
	[ReturnAmount] [money] NULL,
	[DiscountQuantity] [int] NULL,
	[DiscountAmount] [money] NULL,
	[TotalCost] [money] NOT NULL,
	[UnitCost] [money] NULL,
	[UnitPrice] [money] NULL,
	[ETLLoadID] [int] NULL,
	[LoadDate] [datetime] NULL,
	[UpdateDate] [datetime] NULL,
	index CCI_FactOnlineSalesTemp clustered columnstore 
);

I am happy to share that it is impossible to create a columnstore index on the temporary variables:

Msg 35310, Level 15, State 1, Line 23
The statement failed because columnstore indexes are not allowed on table types and table variables. Remove the column store index specification from the table type or table variable declaration.

This is a really good design decision if you ask me. :)

InMemory Schema-Only Tables

Another very useful scenario for the ETL processes are the InMemory Schema-Only tables that are part of the Hekaton:

Let’add a Memory Optimized File Group and a Memory Optimized Data File for the database, in order to be able to start creation of the InMemory objects:

USE master;

ALTER DATABASE [ContosoRetailDW] 
	ADD FILEGROUP [ContosoRetailDW_Hekaton] 
	CONTAINS MEMORY_OPTIMIZED_DATA;
ALTER DATABASE [ContosoRetailDW]
	ADD FILE(NAME = ContosoRetailDW_HekatonDir, 
	FILENAME = 'C:\Data\xtp') 
	TO FILEGROUP [ContosoRetailDW_Hekaton];

Now, let’s create an in-memory table with a clustered columnstore index that will not maintain it’s data after restart, making a temporary table in the relation to data, while having a permanent schema:

CREATE TABLE [dbo].[FactOnlineSales_Hekaton](
	[OnlineSalesKey] [int] NOT NULL,
	[DateKey] [datetime] NOT NULL,
	[StoreKey] [int] NOT NULL,
	[ProductKey] [int] NOT NULL,
	[PromotionKey] [int] NOT NULL,
	[CurrencyKey] [int] NOT NULL,
	[CustomerKey] [int] NOT NULL,
	[SalesOrderNumber] [nvarchar](20) NOT NULL,
	[SalesOrderLineNumber] [int] NULL,
	[SalesQuantity] [int] NOT NULL,
	[SalesAmount] [money] NOT NULL,
	[ReturnQuantity] [int] NOT NULL,
	[ReturnAmount] [money] NULL,
	[DiscountQuantity] [int] NULL,
	[DiscountAmount] [money] NULL,
	[TotalCost] [money] NOT NULL,
	[UnitCost] [money] NULL,
	[UnitPrice] [money] NULL,
	[ETLLoadID] [int] NULL,
	[LoadDate] [datetime] NULL,
	[UpdateDate] [datetime] NULL,
	CONSTRAINT [pk_t_colstor_hk] PRIMARY KEY NONCLUSTERED HASH (OnlineSalesKey) WITH (BUCKET_COUNT = 10000000),
	INDEX NCCI_FactOnlineSales_Hekaton 
		CLUSTERED COLUMNSTORE
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY);

That’s not good:

Msg 35320, Level 16, State 1, Line 1
Column store indexes are not allowed on tables for which the durability option SCHEMA_ONLY is specified.
Msg 1750, Level 16, State 0, Line 1
Could not create constraint or index. See previous errors.

Unfortunately, not even in the RC0 of SQL Server 2016 there is a support for the Schema-only InMemory tables, making it extremely probable that we shall have to wait at least until the next version after SQL Server 2016 for having this functionality enabled.

Nonclustered Columnstore Indexes

There is no major difference to the Nonclustered Columnstore Indexes, I have tested them in different versions of SQL Server versions, plus Azure SQLDatabase – but found no real differences. For those interested to play, here is the script for the Nonclustered Columnstore Index creation on the temporary table, that will work in all SQL Server versions starting with SQL Server 2012:

CREATE TABLE #FactOnlineSalesTemp(
	[OnlineSalesKey] [int] NOT NULL,
	[DateKey] [datetime] NOT NULL,
	[StoreKey] [int] NOT NULL,
	[ProductKey] [int] NOT NULL,
	[PromotionKey] [int] NOT NULL,
	[CurrencyKey] [int] NOT NULL,
	[CustomerKey] [int] NOT NULL,
	[SalesOrderNumber] [nvarchar](20) NOT NULL,
	[SalesOrderLineNumber] [int] NULL,
	[SalesQuantity] [int] NOT NULL,
	[SalesAmount] [money] NOT NULL,
	[ReturnQuantity] [int] NOT NULL,
	[ReturnAmount] [money] NULL,
	[DiscountQuantity] [int] NULL,
	[DiscountAmount] [money] NULL,
	[TotalCost] [money] NOT NULL,
	[UnitCost] [money] NULL,
	[UnitPrice] [money] NULL,
	[ETLLoadID] [int] NULL,
	[LoadDate] [datetime] NULL,
	[UpdateDate] [datetime] NULL
);

insert into #FactOnlineSalesTemp
select top 5000000 OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate
	from dbo.FactOnlineSales;

create nonclustered columnstore index CCI_FactOnlineSalesTemp
    on #FactOnlineSalesTemp (OnlineSalesKey, DateKey, StoreKey, ProductKey, PromotionKey, CurrencyKey, CustomerKey, SalesOrderNumber, SalesOrderLineNumber, SalesQuantity, SalesAmount, ReturnQuantity, ReturnAmount, DiscountQuantity, DiscountAmount, TotalCost, UnitCost, UnitPrice, ETLLoadID, LoadDate, UpdateDate);

The results for the DMV’s are the same – none, as for the Clustered Columnstore Indexes.

TempDB

The good old alternative of creating objects in TempDB directly is also viable for the Columnstore Indexes, unless of course you are trying to use the InMemory feature, which is not supported unfortunately – trying to add a memory optimised FileGroup:

ALTER DATABASE TempDB 
	ADD FILEGROUP [TempDB_Hekaton] 
	CONTAINS MEMORY_OPTIMIZED_DATA;

gives an immediate error message in both SQL Server 2014 & SQL Server 2016 (in SQL Server 2012 there were no Hekaton):

Msg 1826, Level 16, State 1, Line 3
User-defined filegroups are not allowed on “TempDB”.

Conclusion

There is a very usable support for Columnstore Indexes within the temporary objects, but they are not appearing in any of the DMV’s to be analysed or optimised. This is especially sad in the relation to the global temporary tables which are some of the more useful temporary objects.

For the future of course the support for the InMemory Optimised Schema-Only Columnstore Indexes will be very nice, but the number of tools available right now allows to achieve great speeds with ETL processes already.

to be continued with Columnstore Indexes – part 79 (“Loading Data into Non-Updatable Nonclustered Columnstore”)

2 thoughts on “Columnstore Indexes – part 78 (“Temporary Objects”)

  1. Vassilis Papadimos

    Greetings Niko — It’s not very intuitive, but since temp tables live in tempdb, you’d want to check the DMVs in tempdb. You’d also need to pass a database_id argument to use object_name on objects from a different database, like so:
    select object_name(object_id, db_id(‘tempdb’)), *
    from tempdb.sys.column_store_row_groups
    It’s a bit of a trap, people can use #t unqualified in queries, but then they do a double take when object_id(‘#t’) returns null and they realize they need to use object_id(‘tempdb..#t’) instead. In this case it would just be wrong for the DMV to show rowgroups from a different database though.

Leave a Reply to Vassilis Papadimos Cancel reply

Your email address will not be published. Required fields are marked *