Optimize for Analytics (Should be a Database Configuration option)

There is a world-famous “optimize for ad hoc workloads” Server Configuration Option in SQL Server since the version 2008 (meaning it is well over 10 years old), which is a great help for the to clean the single-use plan cache bloat as Kimberley Tripp so eloquently wrote in the article Plan cache, adhoc workloads and clearing the single-use plan cache bloat.
I confess to have never seen a server where I could not activate this option so far – I use it everywhere (oh and I wish it would be activated by default …)
This option is so ever popular and needed that Microsoft is bringing it into the Database Scoped Configurations for Azure SQL Database and SQL Server 2019, as I blogged in Database Scoped Configurations in 2018, yahoo!

This rather short blogpost is about a feature I would love to have in Azure SQL Database and SQL Server – “Optimize for Analytics”. I would like to have this feature just like the “Optimize for ad hoc workloads” configuration option – on the server as well on the database level.

Which functionalities do I imagine being hidden under this configuration? Here are some of the basic suggestions:

-E configuration, that will help extent allocation for the File Groups, as written in the good old Data Loading Performance Guide from the year 2008.

Query Optimiser huge preference for the Hash Joins, whenever we are working with larger sets of data (think Batch Execution Mode for Rowstore) and/or when we are working with Columnstore Indexes. What do I mean by a huge preference? Whenever Query Optimiser is doing a cost-based selection of the one of the possible execution plans, it should assign a significantly higher value for the Hash joins, over the Merge and Inner Loop joins – thus helping us to nail down the problem of the sudden Merge & Inner Loop joins, which are killing the performance of the Batch Execution Mode.

Bigger preference for the Batch Execution Mode for the Rowstore tables. As the data amounts grow bigger I see the eventual marginalisation of the Row Execution Mode and I would love to be able to influence the choice as described in Batch Mode – part 3 (“Basic Heuristics and Analysis”)

Bigger preferences for the Scans and Prefetching for the Storage Engine. Every time I scan the whole table for the operation I wish to indicate the Storage Engine that this it truly can throw even more resources at the operation, to dive more recklessly (as in opposite of over carefully) into the prefetching, when I am running a Data Warehousing environment.

Bigger Columnstore Object Pool allocation for the server. Yes, I am admitting to an option to force its bigger size on the server size.

Elevate Online Resumable Columnstore Indexes creation and rebuild. Indeed, the feature itself is still missing (and I am critically upset by this fact, since it is not the Rowstore Indexes that hold bigger amount of data, but the Columnstore ones), but once it will be implemented I really would love the default Columnstore Index build and rebuild operations to be resumable. This is what non-technical people expect from a modern database system.

Improve the actual plans. I have shared this old idea of mine with the Query Optimiser team last November that I would love to be able to allocate some of the processing capacity of the server at the configurable moments (a lot of times server has a clearly identifiable workloads and stale times are quite easily to identify for maintenance and ETL tasks) for finding better execution plans. As we know the execution plans are analysed in an extremely fast fashion, most of the time being totally invisible for the final user who is firing the query. What if we could say to the server – use your capacity for the analytics and find me some better plans that are in cache, so the next time/day they are used, they can get an advantage of the more complete analysis of let’s say 2 minutes instead of 50 milliseconds. This is totally something that needs a mind that is inside the matter and is absolutely brilliant to find the wholes in my idea (such as Gjorgji Gjeorgjievski‘s or Conor Cunningham‘s).

There must be others that I do not have in my mind right away, but I am sure that more intelligent and experienced people will contribute with valuable suggestions and ideas and for that purpose I have opened a Connect (oops!) Azure Feedback item here – “https://feedback.azure.com/forums/908035-sql-server/suggestions/36454111-optimise-for-analytics-configuration-option“.

This configuration would also be a place holder for any other future enhancements, giving an opportunity to “subscribe” for the Data Warehousing enhancements for the regular users and DBA’s, without too much of the knowledge and/or research, since the Trace Flags are still a no-go for the most people, even if they are well documented and recommended.

4 thoughts on “Optimize for Analytics (Should be a Database Configuration option)

  1. tobi

    It seems that the optimizer should generally spend a lot more time on optimizations. For most queries compile times do not matter.

    What about finally fixing the totally outdated cost model? I have seen quite a few bad plans where I suspected that the cost model caused them.

    In particular the optimizer need to do justice to the fact that different kinds of hardware have 1000x different relative performance metrics such as IOPS, scan speed, memory speed, CPU. The problem is not just that the cost model is outdated – no single cost model can account for these vast hardware differences. It must be dynamic.

    1. Niko Neugebauer Post author

      Hi tobi,

      The relation & the connection to the modern hardware is a huge point and I am sure that Microsoft Research is looking into this and other opportunities. Given that the low-hanging fruits (easy wins) are something that Microsoft is quite focused on implementing, I am hoping to get more of those rather than the big one such as hardware connection & relation, that will take years to implement and years to make work correctly.
      But how I would love to have it !

      Best regards,
      Niko

  2. Christo Kutrovsky

    Hi Niko, this is something I’ve been struggling with as well.

    Very often I find the optimizer choosing a nested loops plan with index as opposed to hash join plan – especially given that I have clustered column store indexes…

    Have you found a good way to adjust the statistics of the tables, to influence the optimizer to choose hash joins ?

    1. Niko Neugebauer Post author

      Hi Christo,

      Statistics & Hinting are some of the easiest ways to solve these problems.
      Rewriting the queries on the individual basis would be an alternative approach to force them to do what is expected.

      Best regards,
      Niko

Leave a Reply to tobi Cancel reply

Your email address will not be published. Required fields are marked *