Workload Classification in Azure Synapse Analytics (Resource Governor)

This blog post is part of the series dedicated to Azure Synapse Analytics that can be found at my Azure Synapse Analytics page.

Continuing from the previous post in the Resource Governor in Azure Synapse Analytics – Workload Isolation in Azure Synapse Analytics (Resource Governor), let us dive into the configuration of the Workload Classifier – thus how we can map the users to the workload groups (default ones or custom ones).

Workload Classifier

As mentioned in the introduction, Workload Classifier is a mapping function that connects the Database Users with the workload groups, thus working like a kind of a good old msdb-based classifier function for the SQL Server Resource Governor.

The current syntax for creating a Workload Classifier is presented below:

CREATE WORKLOAD CLASSIFIER classifier_name  
WITH  
    (   WORKLOAD_GROUP = 'name'  
    ,   MEMBERNAME = 'security_account' 
[ [ , ] WLM_LABEL = 'label' ]  
[ [ , ] WLM_CONTEXT = 'context' ]  
[ [ , ] START_TIME = 'HH:MM' ]  
[ [ , ] END_TIME = 'HH:MM' ]    
[ [ , ] IMPORTANCE = { LOW | BELOW_NORMAL | NORMAL | ABOVE_NORMAL | HIGH }]) 
[;]

with the parameters bringing the following meanings
– WORKLOAD_GROUP – defines the name of the workload group (default (aka Resource Class – either static or dynamic) or custom ones that you create, such as the one we did in the previous post)

– MEMBERNAME – is the security account within our Azure Synapse Database that is serving as a primary or secondary identification mechanism for the workload mapping. It is not limited just to a Username, but can also be a Database User Role, Azure AD Account or Azure AD Group.

– WLM_LABEL – the label that will be used for the query identification and that I have already described in Query Identification in Azure SQL DW (Synapse Analytics). For that purpose you can use OPTION(LABEL=”) syntax. It can serve as a primary or as a secondary identification mechanism for the username and workload group mapping.

– WLM_CONTEXT – defines the session context that a request can be mapped to. It can serve as a primary or as a secondary identification mechanism for the username and workload group mapping, meaning that this parameter is optional. (Note that the session context is defined with the help of the sys.sp_set_session_context Stored Procedure). It can serve as a primary or as a secondary identification mechanism for the username and workload group mapping.

– START_TIME define the end time for mapping of a request. Specifying START_TIME requires specifying END_TIME. The time are specified in HH:MM format and the time is set in UTC timezone.
– END_TIME – define the end time for mapping of a request. Specifying START_TIME requires specifying END_TIME. The time are specified in HH:MM format and the time is set in UTC timezone.
The combination of those 2 parameters is incredible, because it will allow to specify the timeframe with so much ease and precision – it is absolutely wonderful. This task does not require a relatively senior administrator to handle. It can serve as a primary or as a secondary identification mechanism for the username and workload group mapping.

– IMPORTANCE – defines the relative priority for the request. The requests with higher priority will be pushed in front of the regular ones in the request queue. The default value is NORMAL, as in the case of the Workload Groups.

Parameter Weighting

Given that we can specify 5 different parameters (USER MEMBERNAME, ROLE MEMBERNAME, WLM_LABEL, WLM_CONTEXT, START_TIME/END_TIME) – there must be a prioritisation mechanism in order to decide which condition gets selected. This mechanism is called Parameter Weighting in Azure Synapse and it assigns the following weight to each of those parameters:
USER = 64
ROLE = 32
WLM_LABEL = 16
WLM_CONTEXT = 8
START_TIME/END_TIME = 4
meaning that if the Workload Classifier fits into the timeframe START_TIME/END_TIME, WLM_LABEL & ROLE – it will receive 52 points = 4 + 16 + 32,
while a different Workload Classifier that fits into WLM_CONTEXT & USER will get 72 points = 8 + 64, thus will prevail and will be selected over the first Workload Classifier.

DMVs for checking the Workload Classifiers

There are 2 DMVs that contain information relevant to the Workload Classification – sys.workload_management_workload_classifiers & sys.workload_management_workload_classifier_details that do not provide an excessive amount of informations about the classifiers, besides the basic definitions. They are useful in finding out programatically what has been configured, but even more useful information on what classifier is being used with what frequency is not available (hopefully just yet, while the feature is still in preview).
Let us check on what has been configured so far, by default with the following query:

SELECT cl.classifier_id, cl.name, cl.group_name, cl.importance, det.classifier_type, det.classifier_value, cl.is_enabled, cl.create_time, cl.modify_time
    FROM sys.workload_management_workload_classifiers cl
    INNER JOIN sys.workload_management_workload_classifier_details det
        ON cl.classifier_id = det.classifier_id;


We can see all our mapped by default Workload Groups and Resource Classes. According to the documentation, the values below 13 are reserved for the system purposes and I expect that as the new generation of workloads & the new generation of Synapse eventually will come, this situation might change – so do not hold on them and given that classifier_value seems to be pretty dynamic – I would not map it with 100% certainty right away to other DMVs.
Having said that, here is a query joining sys.sysusers and given you some very interesting insight on the default configuration and raisin interesting questions about the smallrc resource class that is also a default one:

SELECT cl.classifier_id, cl.name, cl.group_name, cl.importance, det.classifier_type, det.classifier_value, cl.is_enabled, us.altuid, us.gid, us.hasdbaccess, us.islogin, us.isntname, us.isntgroup, us.isntuser, us.issqluser, us.isaliased, us.issqlrole, us.isapprole
    FROM sys.workload_management_workload_classifiers cl
    INNER JOIN sys.workload_management_workload_classifier_details det
        ON cl.classifier_id = det.classifier_id
    LEFT JOIN sys.sysusers us    
        ON det.classifier_type = 'membername' and det.classifier_value = us.name;

Using Resource Class assignments with Workload Classifiers

There is one danger with mixing the Resource Class assignments and Workload Classifiers – you will be using 2 systems concurrently where the analysis of the permissions might be getting a little bit tricky.
If you are using the Stored Procedure sys.sp_addrolemember for your users while using the Workload Classifiers – you might be assigning multiple overlapping classifications, and where the MEMBERNAME (aka USER = 64 points) will be overriding any other combination of the permissions. Be mindful about it and if possible – once the Workload Classifier is Generally Available and if you are intending to use it – consider it to be the exclusive tool.

For the situations when you are troubleshooting the current permission assignments, please consider using the following query (this one is pretty much a copy with slight modification of one found in the official documentation):

SELECT  r.name as ResourceClass, m.name AS MemberName
FROM    sys.database_role_members rm
JOIN    sys.database_principals AS r ON rm.role_principal_id = r.principal_id
JOIN    sys.database_principals AS m ON rm.member_principal_id = m.principal_id
WHERE   r.name IN (SELECT cl.name FROM sys.workload_management_workload_classifiers cl WHERE cl.classifier_id <= 12);

Setup

We can start by creating the respective logins for our final users in the Azure SQL Server MASTER database:

CREATE LOGIN [ETLDeveloper] WITH PASSWORD='rx8TAtD,;sK5fGS}72BZr=e8J.YC';
CREATE LOGIN [CEO] WITH PASSWORD='|x6Rfcw]Z1^m=Jek|P75IP6daSwN';
CREATE LOGIN [CTO] WITH PASSWORD='7xw`Y1P|[8vgfHJcv8^e3JrQR]Wv';
CREATE LOGIN [CFO] WITH PASSWORD='heg5vF5A>qy]aRsoM8o2!%P6TGi%';

We need to add the respective users for each of the created logins in our Azure Synapse Analytics database:

CREATE USER [ETLDeveloper] FOR LOGIN [ETLDeveloper];
CREATE USER [CEO] FOR LOGIN [CEO];
CREATE USER [CTO] FOR LOGIN [CTO];
CREATE USER [CFO] FOR LOGIN [CFO];

Making sure our queries work, do not forget to create the workload Groups that were created in the previous blog post Workload Isolation in Azure Synapse Analytics (Resource Governor)

CREATE WORKLOAD GROUP wgETL
WITH
  ( MIN_PERCENTAGE_RESOURCE = 60                
    , CAP_PERCENTAGE_RESOURCE = 80
    , REQUEST_MIN_RESOURCE_GRANT_PERCENT = 10 
  );

CREATE WORKLOAD GROUP wgDashboard
WITH ( MIN_PERCENTAGE_RESOURCE = 40
      ,CAP_PERCENTAGE_RESOURCE = 80
      ,REQUEST_MIN_RESOURCE_GRANT_PERCENT = 2 
      ,QUERY_EXECUTION_TIMEOUT_SEC = 30 
      ,QUERY_WAIT_TIMEOUT_SEC = 30 );

Creating Workload Classifiers

Let us create the very first Workload Classifier for the ETL Developer, who will run his tasks during the night between 10 PM and 6 AM

CREATE WORKLOAD CLASSIFIER wgcETL WITH
( WORKLOAD_GROUP = 'wgETL'  
 ,MEMBERNAME = 'ETLDeveloper'
 ,WLM_LABEL = 'ql:ETLNightDebugger'
 ,IMPORTANCE = HIGH
 ,START_TIME = '22:00'
 ,END_TIME = '07:00' 
);

His parameter weight for this Workload Classifier will be 68 = 64 + 4,
and if we want to supersede this classifier with an exceptional one - let us create and define that will specify a specific label 'ql:ETLNightDebugger' as one of parameters:

CREATE WORKLOAD CLASSIFIER [wgcETLDebugger] WITH
( WORKLOAD_GROUP = 'wgETL'  
 ,MEMBERNAME = 'ETLDeveloper'
 ,WLM_LABEL = 'ql:ETLNightDebugger'
 ,IMPORTANCE = HIGH
 ,START_TIME = '22:00'
 ,END_TIME = '07:00' 
);

that will get a higher weight of 80 = 64 + 16 + 4, allowing any query with the label to be mapped against the HIGH importance - thus jumping in front of the other mapped requests that will be classified as ABOVE_NORMAL only.

Checking on the output within the DMVs we can some interesting details:

SELECT cl.classifier_id, cl.name, cl.group_name, cl.importance, det.classifier_type, det.classifier_value, cl.is_enabled, cl.create_time, cl.modify_time
    FROM sys.workload_management_workload_classifiers cl
    INNER JOIN sys.workload_management_workload_classifier_details det
        ON cl.classifier_id = det.classifier_id;


Given that classifier_type & classifier_value are NVARCHAR (128) & (255) respectively and that they are not documented at the moment of this article writing, I will assume that they are designed to be very much dynamical and that the Azure Synapse Analytics has bigger plans for them in the future - you can put a good variety of values in them, thus creating even more complex/precise workload classifications.

WLM_CONTEXT

This blog post would not be complete without WLM_CONTEXT example. Very few people in the real life yet are using this wonderful mechanism for session context identification (and I have already pointed out in SQL Server 2019: Session Context & Parallel Plans that those using SQL Server previous to SQL Server 2019 have had good reasons for mistrusting the sys.sp_set_session_context application result when using in high-performing environments.

Still yet and without affect on the parallelism you can use sys.sp_set_session_context for setting the context value per user session.
You can force them to use the smallest Resource Class or Workload Group - whatever is necessary for your Azure Synapse Analytics best performance. All you will need to do is simply to assign the inside the sys.sp_set_session_context Stored Procedure invocation the context to be equal to WLM_CONTEXT and the value to the one you are mapping in your workload classifier.

The best thing about this feature mapping is that you do not have to run the whole workload under the same circumstances, meaning that you can change your context dynamically during the batch.

Consider the example where the CTO user needs to be automatically to the highest consumption priority for just one query, while other queries should be run under the normal context and priority.
Let's create a specific workload classifier for this purpose, giving access to the ETL workload group:

CREATE WORKLOAD CLASSIFIER wcDataLoad WITH  
( WORKLOAD_GROUP = 'wgETL'
 ,MEMBERNAME     = 'CTO'
 ,WLM_CONTEXT    = 'wlmTechAnalytics' )

In our script we shall run 2 queries with normal priority and context, switch to the wgETL workload group and then switch back to our regular priority and context.

-- Invoking 2 regular queries
SELECT Username FROM dim.Users;
SELECT AVG(session_time) as AverageSessionTime from fact.BusinessRequests;

-- Activate session content for the workload classifier wlmTechAnalytics
EXEC sys.sp_set_session_context @key = 'wlm_context', @value = 'wlmTechAnalytics'

-- This query will be given resources from the wgETL Workload Group
SELECT COUNT(DISTINCT user_id) as DistinctUsers, COUNT(DISTINSTC request_id) as DistinctRequests, COUNT(*) as TotalRequest FROM fact.BusinessRequests;

-- Disable current session content
EXEC sys.sp_set_session_context @key = 'wlm_context', @value = null

-- Back to our regular queries
SELECT YEAR(year) as SalesYear, SUM(SalesAmount) as YearSalesAmount FROM fact.Sales GROUP BY YEAR(year) ORDER BY YEAR(year) DESC;

Running this workload will give us 2 queries under normal context, 1 query (distinct counts) with wgETL Workload Group and then 1 query back to our regular execution.

Final Thoughts

I love so much this classification mechanism. Just as the fact that it will make Resource Governance available not only to the senior DBAs.
The easiness of setup, without hassle of writing a t-sql function is brilliant. Of course, for this feature to be available, some certain flexibility and customisation are lost, and I hope that in some future, a custom function would make its way into classifier, because more often then not, at some points clients start asking for the complex features.
The possibility to configure the priorities, session context and the timing mechanism are absolutely awesome.
Now, tell me, Microsoft - when are those features coming over to Azure SQL Database and SQL Server ? :)

Leave a Reply

Your email address will not be published.