This blog post is a continuation of a small series of posts that I am writing about Azure SQL DW Materialized Views, and you can catch up the initial things in Azure SQL DW Materialized Views (part 1).
In this blog post I will focus on the practical aspects & specifics of the Materialized Views creation – yes, it is not exactly trivial as one might (or might not) expected it to be.
Without much further ado, let’s start with the tests:
Let’s set up a plain table with just 2 columns, one is NULLable and the other is NOT NULLable and let’s load plain 60 complete Row Groups there (60 * 1048576 rows).
DROP TABLE dbo.SrcDataTable; CREATE TABLE dbo.SrcDataTable ( C1 BIGINT NOT NULL, C2 BIGINT NULL ); GO SET NOCOUNT ON INSERT INTO dbo.SrcDataTable (C1, C2) SELECT t.RN, t.RN FROM ( SELECT TOP (60 * 1048576) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) RN FROM sys.objects t1 CROSS JOIN sys.objects t2 CROSS JOIN sys.objects t3 CROSS JOIN sys.objects t4 CROSS JOIN sys.objects t5 CROSS JOIN sys.objects t6 CROSS JOIN sys.objects t7 CROSS JOIN sys.objects t8 ) t
Let’s create a simple COUNT(*) calculation for our test view:
IF( EXISTS (SELECT V.name FROM sys.views V JOIN sys.indexes I ON V.object_id= I.object_id AND I.index_id < 2 WHERE V.Name = 'vCountTest')) DROP VIEW dbo.vCountTest ; GO CREATE MATERIALIZED VIEW dbo.vCountTest WITH (distribution = round_robin) AS SELECT COUNT(*) as CountC1 FROM dbo.SrcDataTable; GO
Updating the Materialized View
Let's see if we can edit & update the view:
ALTER MATERIALIZED VIEW dbo.vCountTest AS SELECT COUNT(C1) as CountC1ForUpdate FROM dbo.SrcDataTable; GO
This attempt results in the following error message:
Msg 103010, Level 16, State 1, Line 5 Parse error at line: 3, column: 1: Incorrect syntax near 'AS'.
Boo! Ok ... What if we update it as a view ?
ALTER VIEW dbo.vCountTest AS SELECT COUNT(C1) as CountC1ForUpdate FROM dbo.SrcDataTable;
Well .. It looks like the materialized views are not editable!
Msg 106102, Level 16, State 1, Line 4 Materialized views cannot be altered using the 'ALTER VIEW' command.
Consulting the official documentation that the only operations currently supported with the ALTER MATERIALIZED VIEW are DISABLE & REBUILD.
That does not get me overly happy ... Even though we can definitely alter the indexed views (which will force us to rebuild the indexes), but that's OK to me, when the situation with the Materialized Views is definitely not excessively fine.
We can and should use the script of dropping the view after checking that it already exists (
IF( EXISTS (SELECT V.name FROM sys.views V JOIN sys.indexes I ON V.object_id= I.object_id AND I.index_id < 2 WHERE V.Name = 'vCountTest')) DROP VIEW dbo.vCountTest ; GO CREATE MATERIALIZED VIEW dbo.vCountTest WITH (distribution = round_robin) AS SELECT COUNT(*) as CountC1 FROM dbo.SrcDataTable; GO
If only there would be some kind of a command ... such as CREATE OR ALTER in the Microsoft Data Platform ... It would potentially allow to copy the already existing code and port it to Azure SQL DW ... :)
The materialized view construction
Querying it returns our created column as expected:
SELECT * FROM vCountTest;
It might look like a waste of time to you, but there are multiple points coming, so please bare with me at the moment.
Let's modify our test Materialized View and count just the C1, as COUNT(C1) instead of COUNT(*), it should be no big thing here, right ?
IF( EXISTS (SELECT 1 FROM sys.views WHERE Name = 'vCountTest')) DROP VIEW dbo.vCountTest ; GO CREATE MATERIALIZED VIEW dbo.vCountTest WITH (distribution = round_robin) AS SELECT COUNT(C1) as CountC1 FROM dbo.SrcDataTable; GO
Let's run the test query ... and (drumroll)
SELECT * FROM vCountTest;
Wait a second, party people! Who placed the [cb] column inside my materialized view !? According to my code there should be none! Well, it turns out, if you did not notice/forget/knew - traditionally you are required to have a COUNT_BIG(*) (probably the reason for the [c]ount[b]ig name) column in your indexed views and hence we have to say a pretty big THANK YOU to Microsoft for doing it for us autoMagically in Azure Synapse Analytics (Azure SQL DW).
You might wonder what happens if we name our own column [cb] and I will tell you right away - we shall get a [cb_2] column!
This means that we should not expect the automated COUNT_BIG(*) column to have a static/reserved name (ooh, I would not want that) but rather being dynamically added to the materialized view.
Should we have added the COUNT_BIG(*) column into our query directly, it would interestingly (and potentially dangerously) move it to the first place within the result set:
IF( EXISTS (SELECT V.name FROM sys.views V JOIN sys.indexes I ON V.object_id= I.object_id AND I.index_id < 2 WHERE V.Name = 'vCountTest')) DROP VIEW dbo.vCountTest ; GO CREATE MATERIALIZED VIEW dbo.vCountTest WITH (distribution = round_robin) AS SELECT COUNT(C1) as CountC1, COUNT_BIG(*) as CountBig FROM dbo.SrcDataTable; GO
Not that I excessively care, but this is a pretty much a breaking change in my opinion. I do not consume any queries in applications without pointing to the name/position of the element and test my workloads, but it is a kind of a perk that nobody expects.
Nullable Expressions
Let's build a simple aggregation counting the sum of our columns
IF( EXISTS (SELECT V.name FROM sys.views V JOIN sys.indexes I ON V.object_id= I.object_id AND I.index_id < 2 WHERE V.Name = 'vCountTest')) DROP VIEW dbo.vCountTest ; GO CREATE MATERIALIZED VIEW dbo.vCountTest WITH (distribution = round_robin) AS SELECT SUM(C1) as SumC1, SUM(C2) as SumC2 FROM dbo.SrcDataTable; GO
Surprisingly we receive an error message over the nullable expression (Column C2 is indeed NULLable), and in this case the COUNT_BIG is explicitly required in order to make the materialized view work.
The first attempt by just simply adding a COUNT_BIG(*) to the materialized view fails with the very same error message,
which I would hope to be a little bit more explicit
CREATE MATERIALIZED VIEW dbo.vCountTest WITH (distribution = round_robin) AS SELECT SUM(C1) as SumC1, SUM(C2) as SumC2, COUNT_BIG(*) as CountBig FROM dbo.SrcDataTable; GO
What is needed in the reality is the addition of the COUNT_BIG([our nullable expression]) calculation and so here it, working totally fine:
CREATE MATERIALIZED VIEW dbo.vCountTest WITH (distribution = round_robin) AS SELECT SUM(C1) as SumC1, SUM(C2) as SumC2, COUNT_BIG(C2) as CountBigC2 FROM dbo.SrcDataTable; GO
The total COUNT_BIG(*) will be added automatically with our already good friend [cb] column :)
SELECT * FROM vCountTest;
This time we have 4 columns as already expected. I guess that while addition of the COUNT_BIG(*) is pretty easy, the addition of the missing COUNT_BIG([Nullable Expression]) is not the easiest or most important tasks, but it would be a huge delighter.
The AVG
Let's try out more functions to see their advantages and limitations and in this case take an AVG calculations for a ride. Remember that running AVG calculation directly in an indexed view is not supported, with the original indexed view error message appearing at the CLUSTERED UNIQUE INDEX creation, such as this:
Msg 10125, Level 16, State 1, Line 42 Cannot create index on view "ContosoRetailDW.dbo.vMaxTest" because it uses aggregate "AVG". Consider eliminating the aggregate, not indexing the view, or using alternate aggregates. For example, for AVG substitute SUM and COUNT_BIG, or for COUNT, substitute COUNT_BIG.
Let us verify how it is being handled in the Azure SQL DW (Azure Synapse Analytics) Materialized Views:
IF( EXISTS (SELECT V.name FROM sys.views V JOIN sys.indexes I ON V.object_id= I.object_id AND I.index_id < 2 WHERE V.Name = 'vCountTest')) DROP VIEW dbo.vCountTest ; GO CREATE MATERIALIZED VIEW dbo.vCountTest WITH (distribution = round_robin) AS SELECT AVG(C1) as AvgC1, AVG(C2) as AvgC2 FROM dbo.SrcDataTable; GO
This one works fine ... but is it ?
SELECT * FROM vCountTest;
What the QuestionMark? Ok, I do get that the COUNT_BIG(*) needs to be added, and I also kind of understand that SUM([expression]) needs to be added for each of the columns, but what I DEFINITELY do not understand is why those results are presented in such form: the original name of the column is modified (How dare!) and instead of the Average I get a sum on that value (how ridiculous!). I know that from those results I can calculate the average, but when I give to the engine a function of AVG it has no right to alter the calculations nor rename the column - otherwise why am I giving the instructions !!!
I get the same result in the end - but this is not the functionality I want to use ... like ever!
STDEV & STDEVP
Well, you might have expected this blog post to have only happy things ... and I can give you happy news ! Besides that current documentation does not list STDEVP function as supported, you can use the listed STDEV function. Apparently. Let's see :)
IF( EXISTS (SELECT V.name FROM sys.views V JOIN sys.indexes I ON V.object_id= I.object_id AND I.index_id < 2 WHERE V.Name = 'vCountTest')) DROP VIEW dbo.vCountTest ; GO CREATE MATERIALIZED VIEW dbo.vCountTest WITH (distribution = round_robin) AS SELECT STDEV(C1) as AvgC1 GO
The documentation indicates that the necessary adjustments (adding SUM(a), COUNT_BIG(a), SUM(square(a)) where [a] is our expression),
but all I can see is the error message, that is not very clear to me - our C1 column is precise and not computed.
Msg 1901, Level 16, State 1, Line 8 Cannot create index or statistics 'vCountTest' on view 'dbo.vCountTest' because key column 'AvgC1_sum' is imprecise, computed and not persisted. Consider removing reference to column in view index or statistics key or changing column to be precise. If column is computed in base table consider marking it PERSISTED there.
To get a workaround just add SUM(C1) to the list of your calculations and voila:
IF( EXISTS (SELECT V.name FROM sys.views V JOIN sys.indexes I ON V.object_id= I.object_id AND I.index_id < 2 WHERE V.Name = 'vCountTest')) DROP VIEW dbo.vCountTest ; GO CREATE MATERIALIZED VIEW dbo.vCountTest WITH (distribution = round_robin) AS SELECT STDEV(C1) as AvgC1, SUM(C1) as SumC1 GO
Now it works without any problem ... or does it ?
SELECT * FROM vCountTest;
Right now, I am not in the mood of decoding the output:
I confess that I find it extremely confusing and unfriendly. Like before where I had additional columns obliging me to finalize the calculations, forcing me effectively to create an additional layer with a view that will do the necessary work. That sounds disappointing.
What I am looking for is as simple / (and I admit that it might be hot grill difficult on MPP) as this:
SELECT STDEV(C1) as StDevC1 FROM dbo.SrcDataTable;
No hassle, no recalculations - no extra views, but simple raw processing of the information. This is what I expect from any feature within Microsoft Data Platform - and I know that given the last developments it might be a kind of wishful thinking, but I tend to think & expect higher values.
The usual suspects
The usual for the indexed views, are they still around for the Materialized Views?
Outer Joins
CREATE MATERIALIZED VIEW dbo.vCountTest WITH (distribution = round_robin) AS SELECT COUNT(C1) as StDevC1 FROM dbo.SrcDataTable src LEFT OUTER JOIN dbo.DimCustomer cust ON src.C1 = cust.CustomerKey ;
Msg 106110, Level 16, State 1, Line 8 Cannot create materialized view because it uses a LEFT, RIGHT, or FULL OUTER join.
Ok, no outer joins for materialized views.
Self Joins
CREATE MATERIALIZED VIEW dbo.vNewFuncs WITH (distribution = round_robin) AS SELECT COUNT(src.C1) as StDevC1 FROM dbo.SrcDataTable src INNER JOIN dbo.SrcDataTable src2 ON src.C1 > src2.c2 * 2 ;
Msg 1947, Level 16, State 1, Line 10 Cannot create index on view "resultsetcaching.dbo.vNewFuncs". The view contains a self join on "resultsetcaching.dbo.SrcDataTable".
Distinct & Apply & Top & Offset & Union & Intersect & Except & Having
CREATE MATERIALIZED VIEW dbo.vNewFuncs WITH (distribution = round_robin) AS SELECT DISTINCT src.C1 as TestResult FROM dbo.SrcDataTable src
Msg 106108, Level 16, State 1, Line 10 Cannot create materialized view because it contains an invalid keyword among DISTINCT, TOP, OFFSET, TABLESAMPLE, APPLY, PIVOT, UNPIVOT, UNION, INTERSECT, EXCEPT, CUBE, ROLLUP, GROUPING SETS, and HAVING.
Well, this is pretty explicit, I would say... :(
Aggregate Window Functions
CREATE MATERIALIZED VIEW dbo.vNewFuncs WITH (distribution = round_robin) AS SELECT c2%10 as L1, SUM(src.C1) OVER (PARTITION BY c2%10) as TestResult FROM dbo.SrcDataTable src GROUP BY c2%10, c1
Msg 106113, Level 16, State 1, Line 10 Cannot create materialized view because it references a ranking or aggregate window function.
Not that I would expect it from the v1 implementation, but given the usefulness of the window functions ...
Subqueries
CREATE MATERIALIZED VIEW dbo.vNewFuncs WITH (distribution = round_robin) AS SELECT SUM(src.C1) as TestResult FROM dbo.SrcDataTable src WHERE c1 NOT IN (SELECT MAX(CustomerKey) as MaxCustomerKey FROM dbo.DimCustomer )
Msg 106109, Level 16, State 1, Line 10 Cannot create materialized view because it references subqueries.
...
Current Thoughts
There is no doubt, that the Materialized Views on Azure SQL DW are more advanced than the Indexed Views on Azure SQL DB (Min,Max,Avg,Nullable Expressions,...) but they are very very far from what I was hoping them to be. Hopefully there will be progress and update for this feature, but it will be back to the chicken & egg (use first and get improvements later vs get feature right and then we shall use it).
I love the part of the materialized views recommendation - with some care and brains the developers/DBAs can get to some great results without much hustle. I would just be careful about implementing them right away. :)
I have no doubt that pre-aggregating some data and allowing to search & filter it will kick some serious butts even out of the Result Set Caching feature in some scenarios (diverse non-repetitive ad-hoc queries, for example), but how I wish that this feature would be more ...
Disclaimer: I am not attempting to bash the amazing people working/worked on Azure SQL DW, I am exploring & testing the features. I wish there would be more people (PMs & Devs) working on them, so the features would be more complete ...
to be continued ?
Buenas tardes,
Me gustarÃa saber si me podrÃan regalar un alternativa para solucionar este error
Msg 106108, Level 16, State 1, Line 10
Cannot create materialized view because it contains an invalid keyword among
DISTINCT, TOP, OFFSET, TABLESAMPLE, APPLY, PIVOT, UNPIVOT, UNION, INTERSECT, EXCEPT, CUBE, ROLLUP, GROUPING SETS, and HAVING.
Translation to English:
Can you give me an alternative to resolve this error
Msg 106108, Level 16, State 1, Line 10
Cannot create materialized view because it contains an invalid keyword among
DISTINCT, TOP, OFFSET, TABLESAMPLE, APPLY, PIVOT, UNPIVOT, UNION, INTERSECT, EXCEPT, CUBE, ROLLUP, GROUPING SETS, and HAVING.
I would suggest you post the query would be helpful
Traduccion al español:
Yo recomendarÃa que incluyeras to código de SQL.
Thanks for this article. It helped me out of a bind!
You are very welcome, Michael.