Clustered Columnstore Indexes – part 19 (“Batch Mode 2012 Limitations … Updated!”)

Continuation from the previous 18 parts, starting from https://www.nikoport.com/2013/07/05/clustered-columnstore-indexes-part-1-intro/

This post is dedicated to the limitations of the Batch Mode of the Columnstore Indexes in SQL Server 2012 and the improvements which are already present in SQL Server 2014 CTP1.
I am continuing using the VMs and the Contoso BI database with the alterations that I have done in the previous post of the series on Clustered Columnstore Indexes.

If you have seen some of the presentations or read some documents or blog posts on the NonClustered Columnstore Indexes in SQL Server 2012, you have definitely heard a good list of the limitations of the Batch Mode implementation. Basically, in SQL Server 2012 should you go just a couple of steps away from the instructions on how you should write queries, then you are back into the Row Mode, which will be painfully slow since we are working with the tables containing at least a couple of million of rows.
Well, lets kick off and check on them, if SQL Server 2014 and the Clustered Columnstore Indexes in it are fit and able to be called the next generation of the DataWarehousing solutions:

The following limitations are being considered and tested in this blog post (more on them can be found on the respective page at MSDN:

  • OUTER JOIN
  • IN & EXISTS
  • NOT IN
  • UNION ALL
  • Scalar Aggregates
  • One or more DISTINCT Aggregates

Lets consider a simple query which has a right OUTER JOIN:

select prod.ProductName, sum(sales.SalesAmount)
	from dbo.DimProduct prod
		right outer join dbo.FactOnlineSales sales
			on sales.ProductKey = prod.ProductKey
	group by prod.ProductName
	order by prod.ProductName

Here are the execution plans of both of the queries – on SQL Server 2012 and on SQL Server 2014 CTP1:
SQL Server 2012:Screen Shot 2013-08-15 at 21.33.46
SQL Server 2014 CTP1:
Screen Shot 2013-08-15 at 9.33.17 PM

Unfortunately no tool so far I have seen shows the Batch Mode of the operators (Not even the SQL Sentry Plan Explorer that I am using), but if you look carefully on the execution plan difference, you will notice that there are 4 Parallelism operators in SQL Server 2012 plan, and that there are just 2 of them for SQL Server 2014 CTP1 execution plan. The presence of the Hash Match (Left Outer Join) is another indication of the SQL Server 2014 CTP1 plan being using Batch mode extensively. On my VM’s the execution time difference was extremely big ~10 seconds vs < 1 second. This sounds exciting, but lets not stop and see other limitations and so the next one in the queue - IN & EXISTS:
Lets execute the following query:

-- In
select (select store.StoreName from dbo.DimStore store where store.StoreKey = sales.StoreKey) as 'Store', sum(sales.SalesAmount) as 'Store Sales'
	from dbo.FactOnlineSales sales
		where sales.ProductKey  in ( select prod.ProductKey from dbo.DimProduct prod )
	group by sales.StoreKey
	order by sales.StoreKey

-- Exists 
select (select store.StoreName from dbo.DimStore store where store.StoreKey = sales.StoreKey) as 'Store', sum(sales.SalesAmount) as 'Store Sales'
	from dbo.FactOnlineSales sales
		where EXISTS ( select prod.ProductKey from dbo.DimProduct prod where sales.ProductKey = prod.ProductKey )
	group by sales.StoreKey
	order by sales.StoreKey

Here are the execution plans of both of the queries since they are really equal for the SQL Server Query Optimizer:
SQL Server 2012:Screen Shot 2013-08-15 at 22.34.18
SQL Server 2014 CTP1:Screen Shot 2013-08-15 at 10.38.17 PM

It is quite clear where Batch Mode lives, since on SQL Server 2012 those queries take around 7.3 seconds on my laptop, and on a similar VM with SQL Server 2014 CTP1 it takes around 0.25 seconds. This is one more great improvement, since a lot of people are keep on using IN & EXISTS ignoring the possibilities of re-writing the queries using inner joins.

Actually a good indicator of a Batch Mode from the plan that I am seeing is the presence and the position of the Parallelism operator – should it happen before the Hash Match operator, then most probably the execution mode is a Row, if otherwise we see just a few of the Parallism operators which are located after any join operators, then we are more probably running the relevant operations in Batch Mode.

Now let’s go and consider a case with a NOT IN:
From our previous Sales Analysis query, let’s try to adapt it to see the global sales of only non-luxury products:

-- Sales without Luxury Products
select (select store.StoreName from dbo.DimStore store where store.StoreKey = sales.StoreKey) as 'Store', sum(sales.SalesAmount) as 'Non-Luxury Store Sales'
	from dbo.FactOnlineSales sales
		where sales.ProductKey  not in ( select prod.ProductKey from dbo.DimProduct prod where prod.ClassID = 3 )
	group by sales.StoreKey
	order by sales.StoreKey

Lets check the execution plans:
SQL Server 2012: Screen Shot 2013-08-15 at 23.10.57
SQL Server 2014 CTP1:Screen Shot 2013-08-15 at 11.16.43 PM

Once again the difference is as described for the previous case, Row Mode for SQL Server 2012 and 7.6 seconds execution time vs a Batch Mode execution in SQL Server 2014 CTP1 with 0.5 seconds of execution time.

Next one in line is UNION ALL:

-- Let us create a view on the totality of the purchases 
create view dbo.vPurchases as
select [SalesQuantity], [TotalCost]
	from dbo.FactOnlineSales
union all
select [SalesQuantity], [TotalCost]
	from dbo.FactSales 

-- Lets get the totals out of this view
select sum([SalesQuantity]) as TotalSales, sum([TotalCost]) as TotalCosts
	from dbo.vPurchases

The execution plan for SQL Server 2012:Screen Shot 2013-08-16 at 00.46.13
Here comes the execution plan for SQL Server 2014 CTP1:
Screen Shot 2013-08-16 at 12.52.38 AM

It is clear that one of the slowest points of the query is the scanning of the totality of the clustered index (Non-Columnstore) on the FactSales table, but the difference in the execution plans is quite clear – once again we have 3 operators in Batch Mode for the SQL Server 2014 CTP1, while everything is in Row Mode for SQL Server 2012. One more improvement, especially if you compare it to writing 3 CTEs + 1 SELECT instead :)

Scalar Aggregates
This one is very easy, it was already referred to in the previous post on the matter of the Batch Mode improvements:

-- Total count of the sales
select count(*)
	from dbo.FactOnlineSales

Here are the execution plans, I will just mention that the execution times were 1469 ms for SQL Server 2012 and 31 ms for SQL Server 2014 CTP1:
SQL Server 2012 execution plan:Screen Shot 2013-08-16 at 01.07.25

SQL Server 2014 CTP1 execution plan:
Screen Shot 2013-08-16 at 1.04.44 AM

This all looks quite clear.

Lets check then on queries with 1+ Scalar Aggregates:
For the not faint of a heart and with good Hardware, here comes a simple query to test the performance:

select count( distinct StoreKey ) as 'Distinct Stores'
	, count( distinct CustomerKey ) as 'Distinct Customers'
	, count( distinct CurrencyKey ) as 'Distinct Currencies'
	from dbo.FactOnlineSales sales
		inner join dbo.DimPromotion prom
			on sales.PromotionKey = prom.PromotionKey
	where prom.DiscountPercent = 0
	group by sales.LoadDate

On my SQL Server 2012 VM it took around 18 seconds to run this one. :)
Here comes a little monster of the execution plan:Screen Shot 2013-08-16 at 17.17.27
Well, but here comes the rain – the situation for this query is pretty much the same on SQL Server 2014 CTP1:
Screen Shot 2013-08-16 at 5.43.23 PM
With very similar execution times (17 seconds) and just small insignificant cosmetic changes within the execution plans, this is one of the situations Microsoft needs to keep on pushing to make this plan working fine in a Batch Mode.

This is a very real scenarios for a lot of reporting queries but I do remember some of the similar queries performance problems for a Row Mode in all available SQL Server version. Lets hope that something will change for the RTM, but in this case I am definitely not holding my breath.

All in all I am very excited about the changes in the upcoming version of SQL Server, I am eagerly waiting for CTP2 to see if the last one of the situations described above will suffer any changes.

to be continued with Clustered Columnstore Indexes – part 20 (“TempDB Spills – when memory is not enough”)

2 thoughts on “Clustered Columnstore Indexes – part 19 (“Batch Mode 2012 Limitations … Updated!”)

Leave a Reply

Your email address will not be published. Required fields are marked *