SET Statistics IO Update in SQL Server 2019

Playing with SQL Server 2019 CTPs for some time, I have noticed some very exciting feature that I did not see mentioned by anyone before.
The feature if you think about it, is so straight and obvious that after reading this blog post you will ask yourself like I did – how on earth did it happen that it was not implemented before.
I also suspect that this item will make a lot of people (and especially those ones who are doing a lot of ETL) think about the data loading process & practices – because it will uncover the resources consumption which are extremely logical, but from the other side are way too often forgotten.

For the purpose of the test, I will ran a very simple sample on the SQL Server 2019 CTP 2.0 where I will use my good old test database from Microsoft – ContosoRetailDW.
I have already pre-created a Columnstore Index on the FactOnlineSales table, and this should be a no-brainer for the regular readers of my blog, but if you are struggling, please take a look at the scripts from the Columnstore Indexes – part 107 (“Dictionaries Deeper Dive”), for example.

For the simple test, let’s create a new table FactSalesDestination, with a primary key on the OnlineSalesKey column – like on the script shown below:

DROP TABLE IF EXISTS [dbo].[FactSalesDestination];

CREATE TABLE [dbo].[FactSalesDestination](
	[OnlineSalesKey] [int] NOT NULL PRIMARY KEY,
	[DateKey] [datetime] NOT NULL,
	[StoreKey] [int] NOT NULL,
	[ProductKey] [int] NOT NULL,
	[PromotionKey] [int] NOT NULL,
	[CurrencyKey] [int] NOT NULL,
	[CustomerKey] [int] NOT NULL,
	[SalesOrderNumber] [nvarchar](20) NOT NULL,
	[SalesOrderLineNumber] [int] NULL,
	[SalesQuantity] [int] NOT NULL,
	[SalesAmount] [money] NOT NULL,
	[ReturnQuantity] [int] NOT NULL,
	[ReturnAmount] [money] NULL,
	[DiscountQuantity] [int] NULL,
	[DiscountAmount] [money] NULL,
	[TotalCost] [money] NOT NULL,
	[UnitCost] [money] NULL,
	[UnitPrice] [money] NULL,
	[ETLLoadID] [int] NULL,
	[LoadDate] [datetime] NULL,
	[UpdateDate] [datetime] NULL
);

Let’s run a rather trivial query below, wrapping it into the SET STATISTICS IO ON command

SET STATISTICS IO ON;

INSERT INTO dbo.FactSalesDestination WITH (TABLOCK)
	SELECT *
		FROM dbo.FactOnlineSales_CCI;

SET STATISTICS IO OFF;

Let’s take a look at the results in the Messages window:

I totally understand access to the table FactOnlineSales_CCI, the scan is totally expected the lob logical and physical reads are all in order.
I love the fact that we can see the number of Segments (well, actually Row Groups) read and skipped for the Columnstore table FactOnlineSales_CCI – this is the feature that we have received in SQL Server 2016.
The Worktable is a product of the execution plan as you can see below:

We have a sorting iterator, because of the insertion into a table with a primary key.

But, what does the FactSalesDestination does in the Statistics output ?
There has never been such information before!
We are just writing into it!
Why do we have those wonderful 1351498 logical reads ?
Are they actually writes ? And if they would be, would not it be correct to display them as physical accesses ?

The answer is rather simple and actually should have been expected.
We are inserting a big amount of data into an empty table with a Primary Key, which triggers a creation/update of the statistics and those are the reads of the statistics scan operation. This is why when you really need to achieve the best speed on the ETL and you will be updating the table very heavily, you should consider disabling automated statistics update and executing it with all available cores once you are finished working with this table.
In this example it takes me an easy 20 seconds out of the total 50 seconds execution time, while the manual parallel statistics update with a FULL SCAN will take just 4 seconds. :) Tip: yeap, that is the kind of stuff I was presenting on my SQLBits precon 2 days ago (by the time of this article publishing).

How can you disable the automated statistics creation/updates ?
You can do it on 3 levels:
On the Table/Index/Statistics Level by executing UPDATE STATISTICS command with NORECOMPUTE parameter:

UPDATE STATISTICS dbo.FactSalesDestination WITH NORECOMPUTE; 

On the Table Level globally, by using the good old sys.sp_autostats stored procedure:

EXEC sys.sp_autostats 'dbo.FactSalesDestination', 'OFF'; 

or you can advance on the whole database level (BE CAREFUL ABOUT IT!) with the command shown below, disabling automated creation and update of the statistics in a synchronous & asynchronous ways:

ALTER DATABASE [ETL Destination] SET  AUTO_CREATE_STATISTICS  = OFF;   
ALTER DATABASE [ETL Destination] SET  AUTO_UPDATE_STATISTICS  = OFF;   
ALTER DATABASE [ETL Destination] SET  AUTO_CREATE_STATISTICS_ASYNC  = OFF;

Let’s set the NORECOMPUTE on our destination table:

UPDATE STATISTICS dbo.FactSalesDestination WITH NORECOMPUTE; 

and re-run the data loading, this time achieving very significantly speed improvement and noticing that the STATISTICS IO are not showing the logical reads of the destination table:

TRUNCATE TABLE dbo.FactSalesDestination 
	
SET STATISTICS IO ON;

INSERT INTO dbo.FactSalesDestination WITH(TABLOCK)
	SELECT *
		FROM dbo.FactOnlineSales_CCI;

SET STATISTICS IO OFF;


I love this new feature! I love the little but important improvements that Microsoft is bringing into the product with each release.

Final Thoughts

Now what’s else that is important but is missing from SET STATISTICS IO ?
The sum of the read operations – I am tired of calculating the sum of the Logical & Physical IOs across big number of tables, I would like to have this rather primitive calculations to be delivered at the end of the transaction!
Let us hope that this item will get solved in SQL Server 2019 as well :)

5 thoughts on “SET Statistics IO Update in SQL Server 2019

  1. jeremyh

    I thought that the logical reads included the physical reads. i.e. Each physical read counts as 1 physical read and 1 logical read.

Leave a Reply

Your email address will not be published.