Query Identification in Azure SQL DW (Synapse Analytics)

If you have been following this blog closely in the past month, you might have noticed that I have started blogging actively about the Azure Synapse Analytics (Azure SQL DW) lately – (Azure SQLDW Result Set Caching, Azure SQL DW Materialized Views (part 1) & Azure SQL DW Materialized Views (part 2)). My interest and my research are picking up as I am discovering more of the specifics.

This time I will blog about a pretty old feature (I am not sure how many years ago was it launched, but many to be sure), that I am still fascinated with and I still (like with so many feature of the Azure SQL DW) want to get on the relational SPP Microsoft Data Platform offerings (SQL Server, Azure SQL Database, Azure SQL Database Managed Instance) – query labelling. Yes, indeed you can really label a query in Azure SQL DW !

Let us consider a simple aggregation query for the AdventureWorks sample database that I installed with the new instance of the Azure Synapse Analytics (Azure SQL DW).

 
SELECT prom.[EnglishPromotionName], SUM(sales.SalesAmount)
	FROM dbo.FactInternetSales sales
	INNER JOIN dbo.DimPromotion prom
		ON sales.PromotionKey = prom.PromotionKey
	INNER JOIN dbo.DimCustomer cust
		ON sales.CustomerKey = cust.CustomerKey
	WHERE cust.BirthDate < DATEADD(YEAR,-50,GETDATE())
	GROUP BY prom.[EnglishPromotionName]
	HAVING SUM(sales.SalesAmount) > 100000

that takes a couple of seconds to process on the DW200 instance.
How should one find out about its execution progress ?
What if there are other queries executing in parallel and what if they are semantically equal to mine ?
Looking into the 2 fantastic DMVs sys.dm_pdw_exec_requests & sys.dm_pdw_request_steps one can always determine some elements and have quite a few ideas and insights, but we shall simply struggle to do the expected task.

Running the following query won’t give us a lot of useful answers on a busy Azure SQL DW instance:

 SELECT req.session_id, req.start_time, req.end_time, reqSteps.location_type, reqSteps.total_elapsed_time, reqSteps.row_count,
		reqSteps.[status], reqSteps.command, reqSteps.step_index, reqSteps.operation_type, reqSteps.request_id
	FROM sys.dm_pdw_exec_requests req
	INNER JOIN sys.dm_pdw_request_steps reqSteps
		ON req.request_id = reqSteps.request_id 
	ORDER BY req.start_time, reqSteps.step_index

Of corse we can filter out using the query text (command)

SELECT req.session_id, req.start_time, req.end_time, reqSteps.location_type, reqSteps.total_elapsed_time, reqSteps.row_count,
		reqSteps.[status], reqSteps.command, reqSteps.step_index, reqSteps.operation_type, reqSteps.request_id
	FROM sys.dm_pdw_exec_requests req
	INNER JOIN sys.dm_pdw_request_steps reqSteps
		ON req.request_id = reqSteps.request_id 
	WHERE req.command = 'SELECT prom.[EnglishPromotionName], SUM(sales.SalesAmount)
							FROM dbo.FactInternetSales sales
							INNER JOIN dbo.DimPromotion prom
								ON sales.PromotionKey = prom.PromotionKey
							INNER JOIN dbo.DimCustomer cust
								ON sales.CustomerKey = cust.CustomerKey
							WHERE cust.BirthDate < DATEADD(YEAR,-50,GETDATE())
							GROUP BY prom.[EnglishPromotionName]
							HAVING SUM(sales.SalesAmount) > 100000'
	ORDER BY req.start_time, reqSteps.step_index


, but if the command is really big or if there are some other particular nuances of the query of query filtering, we shall finish up adding more filters for the command column overloading and potentially missing some of the important results.

In Azure Synapse Analytics (Azure SQL DW) we have a tool that can help us – the query labels. Firing up the same analytical query, but this time with the OPTION (LABEL = ‘QueryLabelIdentification’) can help us with the identification of the processing. So for the test example I have simply included the format QL – [Query Pupose] where QL stands for Query Labelling:

SELECT prom.[EnglishPromotionName], SUM(sales.SalesAmount)
	FROM dbo.FactInternetSales sales
	INNER JOIN dbo.DimPromotion prom
		ON sales.PromotionKey = prom.PromotionKey
	INNER JOIN dbo.DimCustomer cust
		ON sales.CustomerKey = cust.CustomerKey
	WHERE cust.BirthDate < DATEADD(YEAR,-50,GETDATE())
	GROUP BY prom.[EnglishPromotionName]
	HAVING SUM(sales.SalesAmount) > 100000
	OPTION (LABEL = 'QL - EnglishPromotions');

This allows to rewrite our original query with filter of label within the sys.dm_pdw_exec_requests DMV, that stores those identification for each of the invoked processes:

 SELECT req.session_id, req.start_time, req.end_time, reqSteps.location_type, reqSteps.total_elapsed_time, reqSteps.row_count,
		reqSteps.[status], reqSteps.command, reqSteps.step_index, reqSteps.operation_type, reqSteps.request_id
	FROM sys.dm_pdw_exec_requests req
	INNER JOIN sys.dm_pdw_request_steps reqSteps
		ON req.request_id = reqSteps.request_id 
	WHERE req.[label] = 'QL - EnglishPromotions'
	ORDER BY req.start_time, reqSteps.step_index


On the picture above you can identify the 12 steps within the query execution with the temp tables [TEMP_ID_18],[TEMP_ID_17], & [TEMP_ID_16] creation and eventual dropping, also note that we do have a [status] column and can track in real time the status of the particular execution step.

Is not this an absolutely fantastic feature ?

Even more we can track some of the system activities within Azure SQL DW (Synapse Analytics) by filtering the output on the ‘system’ label:

SELECT req.session_id, req.start_time, req.end_time, reqSteps.location_type, reqSteps.total_elapsed_time, reqSteps.row_count,
		reqSteps.[status], reqSteps.command, reqSteps.step_index, reqSteps.operation_type, reqSteps.request_id
	FROM sys.dm_pdw_exec_requests req
	INNER JOIN sys.dm_pdw_request_steps reqSteps
		ON req.request_id = reqSteps.request_id 
	WHERE req.[label] = 'system'
	ORDER BY req.start_time, reqSteps.step_index


For this particular test instance of Azure SQL DW you can see internal workload operations (in this case Backups) that were executed in the past 9 hours in my case. If you need some proofs for the backup for the audit in your company or want to see a little bit under the cover, that’s a good place to start.

Final Thoughts

I love this feature and every time I remember it and its existence for many years in Azure SQL DW, I cringe with the lack of understanding why this feature is still not ported to SQL Server & Azure SQL Database.

I understand why it might seems that is not the best idea of bringing this query over to the regular Microsoft Data Platform – it will eventually get used for the wrong purposes for the OLTP scenarios where server/instance might get simply crashed by storing millions of queries with unnecessary information.
There are some of us who do run Data Warehouses on the those platforms you have relatively and comparatively to OLTP few queries, but which are consuming a big amount of resources and essential to be tracked, just like in Azure SQL DW.
Indeed we have a Live Query Statistics, which do not require a trace flag by default anymore in SQL Server 2019, but that’s a different kind of a beast. After query is finished, there is no way to track its execution, especially if the selected execution plan was the same as with hundreds of other queries.

The possibility of easily tracking a single ad-hoc query is absolutely amazing and we should not be filtering it by the query text within the potentially huge space inside the DMVs on big servers, but simply by invoking a single label.

This post is just the beginning of the bigger dive into Workload importance, classification and isolation that is available on Azure Synapse Analytics (Azure SQL DW).

to be certainly continued …

Leave a Reply

Your email address will not be published. Required fields are marked *