Azure SQL DW Materialized Views (part 2)

This blog post is a continuation of a small series of posts that I am writing about Azure SQL DW Materialized Views, and you can catch up the initial things in Azure SQL DW Materialized Views (part 1).

In this blog post I will focus on the practical aspects & specifics of the Materialized Views creation – yes, it is not exactly trivial as one might (or might not) expected it to be.
Without much further ado, let’s start with the tests:

Let’s set up a plain table with just 2 columns, one is NULLable and the other is NOT NULLable and let’s load plain 60 complete Row Groups there (60 * 1048576 rows).

DROP TABLE dbo.SrcDataTable;

CREATE TABLE dbo.SrcDataTable (
    C1 BIGINT NOT NULL,
	C2 BIGINT NULL
);
GO

SET NOCOUNT ON

INSERT INTO dbo.SrcDataTable (C1, C2)
SELECT t.RN, t.RN
	FROM
	(
		SELECT TOP (60 * 1048576) ROW_NUMBER()
			OVER (ORDER BY (SELECT NULL)) RN
		FROM sys.objects t1
		CROSS JOIN sys.objects  t2
		CROSS JOIN sys.objects  t3
		CROSS JOIN sys.objects  t4
		CROSS JOIN sys.objects  t5    
        CROSS JOIN sys.objects  t6   
        CROSS JOIN sys.objects  t7   
        CROSS JOIN sys.objects  t8   
	) t

Let’s create a simple COUNT(*) calculation for our test view:

IF( EXISTS (SELECT V.name
	FROM sys.views V 
	JOIN sys.indexes I 
		ON V.object_id= I.object_id AND I.index_id < 2
        WHERE V.Name = 'vCountTest')) 
	DROP VIEW dbo.vCountTest ;
GO

CREATE MATERIALIZED VIEW dbo.vCountTest  
WITH (distribution = round_robin)  
AS
SELECT COUNT(*) as CountC1
	FROM dbo.SrcDataTable;
GO

Updating the Materialized View

Let's see if we can edit & update the view:

ALTER MATERIALIZED VIEW dbo.vCountTest   
AS
SELECT COUNT(C1) as CountC1ForUpdate
	FROM dbo.SrcDataTable;
GO

This attempt results in the following error message:

Msg 103010, Level 16, State 1, Line 5
Parse error at line: 3, column: 1: Incorrect syntax near 'AS'.

Boo! Ok ... What if we update it as a view ?

ALTER VIEW dbo.vCountTest   
AS
SELECT COUNT(C1) as CountC1ForUpdate
	FROM dbo.SrcDataTable;

Well .. It looks like the materialized views are not editable!

Msg 106102, Level 16, State 1, Line 4
Materialized views cannot be altered using the 'ALTER VIEW' command.

Consulting the official documentation that the only operations currently supported with the ALTER MATERIALIZED VIEW are DISABLE & REBUILD.
That does not get me overly happy ... Even though we can definitely alter the indexed views (which will force us to rebuild the indexes), but that's OK to me, when the situation with the Materialized Views is definitely not excessively fine.
We can and should use the script of dropping the view after checking that it already exists (

IF( EXISTS (SELECT V.name
	FROM sys.views V 
	JOIN sys.indexes I 
		ON V.object_id= I.object_id AND I.index_id < 2
        WHERE V.Name = 'vCountTest')) 
	DROP VIEW dbo.vCountTest ;
GO

CREATE MATERIALIZED VIEW dbo.vCountTest  
WITH (distribution = round_robin)  
AS
SELECT COUNT(*) as CountC1
	FROM dbo.SrcDataTable;
GO

If only there would be some kind of a command ... such as CREATE OR ALTER in the Microsoft Data Platform ... It would potentially allow to copy the already existing code and port it to Azure SQL DW ... :)

The materialized view construction

Querying it returns our created column as expected:

SELECT *
	FROM vCountTest;

It might look like a waste of time to you, but there are multiple points coming, so please bare with me at the moment.

Let's modify our test Materialized View and count just the C1, as COUNT(C1) instead of COUNT(*), it should be no big thing here, right ?

IF( EXISTS (SELECT 1 FROM sys.views WHERE Name = 'vCountTest')) 
	DROP VIEW dbo.vCountTest ;
GO

CREATE MATERIALIZED VIEW dbo.vCountTest  
WITH (distribution = round_robin)  
AS
SELECT COUNT(C1) as CountC1
	FROM dbo.SrcDataTable;
GO

Let's run the test query ... and (drumroll)

SELECT *
	FROM vCountTest;

Wait a second, party people! Who placed the [cb] column inside my materialized view !? According to my code there should be none! Well, it turns out, if you did not notice/forget/knew - traditionally you are required to have a COUNT_BIG(*) (probably the reason for the [c]ount[b]ig name) column in your indexed views and hence we have to say a pretty big THANK YOU to Microsoft for doing it for us autoMagically in Azure Synapse Analytics (Azure SQL DW).
You might wonder what happens if we name our own column [cb] and I will tell you right away - we shall get a [cb_2] column!
This means that we should not expect the automated COUNT_BIG(*) column to have a static/reserved name (ooh, I would not want that) but rather being dynamically added to the materialized view.
Should we have added the COUNT_BIG(*) column into our query directly, it would interestingly (and potentially dangerously) move it to the first place within the result set:

IF( EXISTS (SELECT V.name
	FROM sys.views V 
	JOIN sys.indexes I 
		ON V.object_id= I.object_id AND I.index_id < 2
        WHERE V.Name = 'vCountTest')) 
	DROP VIEW dbo.vCountTest ;
GO

CREATE MATERIALIZED VIEW dbo.vCountTest  
WITH (distribution = round_robin)  
AS
SELECT COUNT(C1) as CountC1, COUNT_BIG(*) as CountBig
	FROM dbo.SrcDataTable;
GO

Not that I excessively care, but this is a pretty much a breaking change in my opinion. I do not consume any queries in applications without pointing to the name/position of the element and test my workloads, but it is a kind of a perk that nobody expects.

Nullable Expressions

Let's build a simple aggregation counting the sum of our columns

IF( EXISTS (SELECT V.name
	FROM sys.views V 
	JOIN sys.indexes I 
		ON V.object_id= I.object_id AND I.index_id < 2
        WHERE V.Name = 'vCountTest')) 
	DROP VIEW dbo.vCountTest ;
GO

CREATE MATERIALIZED VIEW dbo.vCountTest  
WITH (distribution = round_robin)  
AS
SELECT SUM(C1) as SumC1, SUM(C2) as SumC2
	FROM dbo.SrcDataTable;
GO

Surprisingly we receive an error message over the nullable expression (Column C2 is indeed NULLable), and in this case the COUNT_BIG is explicitly required in order to make the materialized view work.

The first attempt by just simply adding a COUNT_BIG(*) to the materialized view fails with the very same error message,
which I would hope to be a little bit more explicit

CREATE MATERIALIZED VIEW dbo.vCountTest  
WITH (distribution = round_robin)  
AS
SELECT SUM(C1) as SumC1, SUM(C2) as SumC2, COUNT_BIG(*) as CountBig
	FROM dbo.SrcDataTable;
GO

What is needed in the reality is the addition of the COUNT_BIG([our nullable expression]) calculation and so here it, working totally fine:

CREATE MATERIALIZED VIEW dbo.vCountTest  
WITH (distribution = round_robin)  
AS
SELECT SUM(C1) as SumC1, SUM(C2) as SumC2, COUNT_BIG(C2) as CountBigC2
	FROM dbo.SrcDataTable;
GO

The total COUNT_BIG(*) will be added automatically with our already good friend [cb] column :)

SELECT *
	FROM vCountTest;

This time we have 4 columns as already expected. I guess that while addition of the COUNT_BIG(*) is pretty easy, the addition of the missing COUNT_BIG([Nullable Expression]) is not the easiest or most important tasks, but it would be a huge delighter.

The AVG

Let's try out more functions to see their advantages and limitations and in this case take an AVG calculations for a ride. Remember that running AVG calculation directly in an indexed view is not supported, with the original indexed view error message appearing at the CLUSTERED UNIQUE INDEX creation, such as this:

Msg 10125, Level 16, State 1, Line 42
Cannot create index on view "ContosoRetailDW.dbo.vMaxTest" because it uses aggregate "AVG". 
Consider eliminating the aggregate, not indexing the view, or using alternate aggregates. For example, for AVG substitute SUM and COUNT_BIG, or for COUNT, substitute COUNT_BIG.

Let us verify how it is being handled in the Azure SQL DW (Azure Synapse Analytics) Materialized Views:

IF( EXISTS (SELECT V.name
	FROM sys.views V 
	JOIN sys.indexes I 
		ON V.object_id= I.object_id AND I.index_id < 2
        WHERE V.Name = 'vCountTest')) 
	DROP VIEW dbo.vCountTest ;
GO

CREATE MATERIALIZED VIEW dbo.vCountTest  
WITH (distribution = round_robin)  
AS
SELECT AVG(C1) as AvgC1, AVG(C2) as AvgC2
	FROM dbo.SrcDataTable;
GO

This one works fine ... but is it ?

SELECT *
	FROM vCountTest;


What the QuestionMark? Ok, I do get that the COUNT_BIG(*) needs to be added, and I also kind of understand that SUM([expression]) needs to be added for each of the columns, but what I DEFINITELY do not understand is why those results are presented in such form: the original name of the column is modified (How dare!) and instead of the Average I get a sum on that value (how ridiculous!). I know that from those results I can calculate the average, but when I give to the engine a function of AVG it has no right to alter the calculations nor rename the column - otherwise why am I giving the instructions !!!

I get the same result in the end - but this is not the functionality I want to use ... like ever!

STDEV & STDEVP

Well, you might have expected this blog post to have only happy things ... and I can give you happy news ! Besides that current documentation does not list STDEVP function as supported, you can use the listed STDEV function. Apparently. Let's see :)

IF( EXISTS (SELECT V.name
	FROM sys.views V 
	JOIN sys.indexes I 
		ON V.object_id= I.object_id AND I.index_id < 2
        WHERE V.Name = 'vCountTest')) 
	DROP VIEW dbo.vCountTest ;
GO

CREATE MATERIALIZED VIEW dbo.vCountTest  
WITH (distribution = round_robin)  
AS
SELECT STDEV(C1) as AvgC1
GO

The documentation indicates that the necessary adjustments (adding SUM(a), COUNT_BIG(a), SUM(square(a)) where [a] is our expression),
but all I can see is the error message, that is not very clear to me - our C1 column is precise and not computed.

Msg 1901, Level 16, State 1, Line 8
Cannot create index or statistics 'vCountTest' on view 'dbo.vCountTest' because key column 'AvgC1_sum' is imprecise, computed and not persisted. 
Consider removing reference to column in view index or statistics key or changing column to be precise. If column is computed in base table consider marking it PERSISTED there.

To get a workaround just add SUM(C1) to the list of your calculations and voila:

IF( EXISTS (SELECT V.name
	FROM sys.views V 
	JOIN sys.indexes I 
		ON V.object_id= I.object_id AND I.index_id < 2
        WHERE V.Name = 'vCountTest')) 
	DROP VIEW dbo.vCountTest ;
GO

CREATE MATERIALIZED VIEW dbo.vCountTest  
WITH (distribution = round_robin)  
AS
SELECT STDEV(C1) as AvgC1, SUM(C1) as SumC1
GO

Now it works without any problem ... or does it ?

SELECT *
	FROM vCountTest;

Right now, I am not in the mood of decoding the output:
I confess that I find it extremely confusing and unfriendly. Like before where I had additional columns obliging me to finalize the calculations, forcing me effectively to create an additional layer with a view that will do the necessary work. That sounds disappointing.

What I am looking for is as simple / (and I admit that it might be hot grill difficult on MPP) as this:

SELECT STDEV(C1) as StDevC1
	FROM dbo.SrcDataTable;

No hassle, no recalculations - no extra views, but simple raw processing of the information. This is what I expect from any feature within Microsoft Data Platform - and I know that given the last developments it might be a kind of wishful thinking, but I tend to think & expect higher values.

The usual suspects

The usual for the indexed views, are they still around for the Materialized Views?

Outer Joins
CREATE MATERIALIZED VIEW dbo.vCountTest  
WITH (distribution = round_robin)  
AS
SELECT COUNT(C1) as StDevC1
	FROM dbo.SrcDataTable src
	LEFT OUTER JOIN dbo.DimCustomer cust
		ON src.C1 = cust.CustomerKey ;
Msg 106110, Level 16, State 1, Line 8
Cannot create materialized view because it uses a LEFT, RIGHT, or FULL OUTER join.

Ok, no outer joins for materialized views.

Self Joins
CREATE MATERIALIZED VIEW dbo.vNewFuncs  
WITH (distribution = round_robin)  
AS
SELECT COUNT(src.C1) as StDevC1
	FROM dbo.SrcDataTable src
	INNER JOIN dbo.SrcDataTable src2
		ON src.C1 > src2.c2 * 2 ;
Msg 1947, Level 16, State 1, Line 10
Cannot create index on view "resultsetcaching.dbo.vNewFuncs". The view contains a self join on "resultsetcaching.dbo.SrcDataTable".
Distinct & Apply & Top & Offset & Union & Intersect & Except & Having
CREATE MATERIALIZED VIEW dbo.vNewFuncs  
WITH (distribution = round_robin)  
AS
SELECT DISTINCT src.C1 as TestResult
	FROM dbo.SrcDataTable src
Msg 106108, Level 16, State 1, Line 10
Cannot create materialized view because it contains an invalid keyword among 
DISTINCT, TOP, OFFSET, TABLESAMPLE, APPLY, PIVOT, UNPIVOT, UNION, INTERSECT, EXCEPT, CUBE, ROLLUP, GROUPING SETS, and HAVING.

Well, this is pretty explicit, I would say... :(

Aggregate Window Functions
CREATE MATERIALIZED VIEW dbo.vNewFuncs  
WITH (distribution = round_robin)  
AS
SELECT c2%10 as L1, SUM(src.C1) OVER (PARTITION BY c2%10) as TestResult
	FROM dbo.SrcDataTable src
	GROUP BY c2%10, c1
Msg 106113, Level 16, State 1, Line 10
Cannot create materialized view because it references a ranking or aggregate window function.

Not that I would expect it from the v1 implementation, but given the usefulness of the window functions ...

Subqueries
CREATE MATERIALIZED VIEW dbo.vNewFuncs  
WITH (distribution = round_robin)  
AS
SELECT SUM(src.C1) as TestResult
	FROM dbo.SrcDataTable src
	WHERE c1 NOT IN (SELECT MAX(CustomerKey) as MaxCustomerKey FROM dbo.DimCustomer ) 
Msg 106109, Level 16, State 1, Line 10
Cannot create materialized view because it references subqueries.

...

Current Thoughts

There is no doubt, that the Materialized Views on Azure SQL DW are more advanced than the Indexed Views on Azure SQL DB (Min,Max,Avg,Nullable Expressions,...) but they are very very far from what I was hoping them to be. Hopefully there will be progress and update for this feature, but it will be back to the chicken & egg (use first and get improvements later vs get feature right and then we shall use it).
I love the part of the materialized views recommendation - with some care and brains the developers/DBAs can get to some great results without much hustle. I would just be careful about implementing them right away. :)
I have no doubt that pre-aggregating some data and allowing to search & filter it will kick some serious butts even out of the Result Set Caching feature in some scenarios (diverse non-repetitive ad-hoc queries, for example), but how I wish that this feature would be more ...

Disclaimer: I am not attempting to bash the amazing people working/worked on Azure SQL DW, I am exploring & testing the features. I wish there would be more people (PMs & Devs) working on them, so the features would be more complete ...

to be continued ?

One thought on “Azure SQL DW Materialized Views (part 2)

  1. James Martinez

    Buenas tardes,

    Me gustaría saber si me podrían regalar un alternativa para solucionar este error

    Msg 106108, Level 16, State 1, Line 10
    Cannot create materialized view because it contains an invalid keyword among
    DISTINCT, TOP, OFFSET, TABLESAMPLE, APPLY, PIVOT, UNPIVOT, UNION, INTERSECT, EXCEPT, CUBE, ROLLUP, GROUPING SETS, and HAVING.

Leave a Reply

Your email address will not be published. Required fields are marked *