Resouce Governor IOPS_PER_VOLUME is not enough

SQL Server has originally introduced Resource Governor in the first decade of the 21st Century, with the release 2008 to be more precise and a number of changes to were made to the original engine ever since. Looking back I can’t stop noticing how limited the original release was, with more significant Resource Pool properties appearing in later versions, such as (probably my favourite one, the hammer to nail down a lot of problems) CAP_CPU_PERCENT which appeared in SQL Server 2012, the probably least used AFFINITY (which also has appeared in SQL Server 2012) and then, finally in SQL Server 2014 we have received the first version of the IO control in the Resource Pools – MIN_IOPS_PER_VOLUME & MAX_IOPS_PER_VOLUME properties, which allowed us to control the number of IOPs per volume (and if you read this last part of the statement carefully, you got the idea/TLDR of where I am going with this blog post).

Imagine that you have an Azure VM with SQL Server with the following volumes:
– 4 2TB Drives (P40) consisting in one single volume, that will give you roughly 7,500*4 = 30K IOPS and 250 MB/s * 4 = 1000 MB/s maximum throughput.
– One 2TB Drive (P40) consisting in another single volume that will give you roughly 7,500 IOPS and 250 MB/s maximum throughput.
– Another 2TB Drive (P40) for the TempDB consisting in another single volume that will give you roughly 7,500 IOPS and 250 MB/s maximum throughput.

Ignoring the VM itself & its limits (which is such a huge topic in reality and I will get back to it in the next weeks, hopefully), let us just think what we are trying to limit.
Imagine that we are dealing with DATA – we shall be mostly hitting the 4*2TB drive with all possible block sizes of operations.
If we are parsing the log for the Full or Differential Backups (let us say CDC being in effect quite imminent) – we shall be hammering over a TB (and some times a couple of TB – yes, that means multiple log files, because the absolute limit for the log file size is 2TB).
If we are facing a big TempDB workload, we shall be hitting a different drive (the last one) with different impacts at different times.

IOPS ?

As you might have heard, the Block Size of the SQL Server operations might be quite different, depending on the operation (and here is a link to a small but very concise view from Argenis Fernandes – What is SQL Server’s IO Block Size? and as Argenis has pointed, consider consulting the ever-deep and marvellous Bob Ward’s presentation at SQLPASS Summit 2014 on Inside SQL Server IO.

The different range of the block sizes will be going from 512 bytes right up to 8 MB (Yeah, Columnstore I am looking at you, for example), so how on the planet earth are we going to calculate the exact impact on our drives?

How can I throttle my disk throughput if the total size of the disk is above my VM capacity and I am constantly getting those “Cluster Shakes” (TM) that force the failover attempts every single time the TOTAL size of the IO goes over ?

The actual limits that we are hitting are the MB/second speed limits of the Virtual Machine and that makes total sense, since the general focus of the Azure VMs & Disks are not on SQL Server but rather on general applications.

Why are we limiting IOPS when the real threat for the unwanted throttling are the MB/s ?

I have yet to see a real-world SQL Server workload that is hitting the IOPs limitation of the Azure Drives and not the MB/second limitation. You must be running something extremely small OLTP kind of buffer-pool-extension-please-avoid-it-at-all-costs (R) solution, reading those perky 8 KB.

My Backup operation does not do reads in 8 KB, and neither is yours, and facing the cancelation of your multi-hour backup to URL does not sound like a fun adventure, trust me on that. Note: The backup does operations with a maximum of 1MB (16 extents) by default, but you can control the block size with the different parameters of the Backup Command (MAXTRANSFERSIZE, BLOCKSIZE, …) and in the case of the Backup to URL you will need to set it them to MaxTransferSize = 4194304 & BlockSize = 65536, unless you are looking to hit the limit on the single BLOB file or the stripe size for the bigger databases, as I have mention in the blogpost Striping Backups to Azure Blob Storage. :)

Which operations are you trying to throttle ?
Reading (Which Reads) ? Writing ? Backups ?

How many Resource Pools should you define and how can you guarantee that there is no overlap between the operations ?

We need a CAP, and not MAX

Just like in the case of the CPU, we need a possibility to distribute the total number of the disk throughput between the different Resource Pools, guaranteeing that one or multiple pools will not go over the maximum defined capacity.

It took Microsoft 4 years to introduce CAP_CPU_PERCENT (SQL Server 2012), so not it is definitely the time to give us a tool for capping the IO.

Per Volume?

What if we add a new 2 TB Standard HDD disk (S40) for the archival data which rarely accessed ?
How can we calculate this factor ? How should we limit the access to the IO ? Impossible.

Imagine we have the maximum cap of 256 MB/second in our VM, how would you like to distribute the IOPs between the Data, Archive Data, Log And TempDB ?
If you define it on 250 MB/s (right now with IOPs you can’t calculate it at all, unless you workload is all the same), how would you like to distribute the limit between the drives ?

If we somehow manage to find a reasonable number of IOPs (hahaha) to cap, are we setting it 250 MB/second or on 62,5 MB/second ?
Do we want to limit our total data drive (8TB) throughput on just 62,5 MB/second ? This sounds pretty ridiculous.
Unless someone copies something onto the C Drive (oops, another point not to forget), we should be kind of safe, but slow as a snail!

Notice: I am not touching on the cached/uncached throughput here, in order not to make it even more complicated.

We need to define the maximum total throughput per Resource Pool and to be able to cap the total maximum throughput on the instance level in Resource Governor.

Final Thoughts

The lack of Resource Governor in Azure SQL Database (single instance) is a very long overdue topic, even though as obvious there is very serious resource governance for each Azure SQL Database, since we are talking about shared resources in the Azure.
I would love to believe that those responsible for prioritising the implementation decision will take the necessary step and with courage give a green light for the implementation. I know that since Azure SQL Database is primarily more suited for the OLTP workloads, while the current implementation of the Resource Governor is more effective with longer OLAP-style queries, but we do all hope to see more OLAP-style queries migrating to PaaS and not necessary Synapse Analytics.

We need to be able to CAP the TOTAL IO of the SQL Server and for that purpose I have opened a Suggestion at the respective site – Allow Resource Governor to control the CAP of the TOTAL IO in MB/s.
Please vote it up, and trust me – even if you are going to run SQL Server with Availability Groups or Failover Cluster in the future, you ARE going to need these options.

2 thoughts on “Resouce Governor IOPS_PER_VOLUME is not enough

Leave a Reply

Your email address will not be published. Required fields are marked *