Columnstore Indexes – part 112 (“Linked Servers”)

Continuation from the previous 111 parts, the whole series can be found at https://www.nikoport.com/columnstore/.

This blogpost is focusing on the Server Object functionality of the SQL Server – the Linked Servers. Almost everybody knows that the Linked Servers are evil (yeah, I know of some exceptions), almost everybody loves to bash them … and still people use them.
First of all because it allows them to stay within their commodity area of knowledge – TSQL, and secondly because it limits number of services to run and service (Yay, SSIS!).
If you look at the elder designed Database Systems (7+ years) or elder Database Engineers or Developers (7+ years) or those who Google Stackoverflow without questioning about the consequences, you will find a good number of LinkedServers. I keep on facing the systems with the LinkedServers, no matter where I go, but lately at a client I have seen something that was different – they are using Clustered Columnstore Indexes with the Linked Servers and encouraged by them to write about the experience this blog post is to focus on some less expected limitations, at least for me.

The Setup

For the start, let us create a simple Linked Server to the SQL Server 2016 Instance, which is a named instance .\SQL16 in my case, as you can see on the screenshot on the left side. I am using a SQL Server 2014 instance on the same virtual machine for this purpose, but I expect the behaviour to be equal in 2005+ versions of SQL Server to be exactly equal. Now we have a connection between our instances that we can use and explore.

Let us start with the reading and joining data from the LinkedServer, the most basic operation one would use the Linked Server for:

Reading data from the LinkedServer

On the linked Instance (SQL Server 2016), let us set up in the database [Test] a simple table T1 with just 1 integer column and a Clustered Columnstore Index:

DROP TABLE IF EXISTS dbo.T1;

CREATE TABLE dbo.T1(
	C1 INT,
	INDEX CCI_T1 CLUSTERED COLUMNSTORE );

INSERT INTO dbo.T1 (C1)
	VALUES (1),(2),(3);

ALTER TABLE dbo.T1
	REBUILD;

Now, from the original Server (in my case this is a SQL Server 2014 instance), I will fire a simple query, calculating the total of the values contained within our table

SELECT SUM(C1) as Total
	FROM [.\SQL16].Test.dbo.T1;

The query runs successfully and the execution plan is rather plain and simple, but it will start hiding important information from you if the Predicate Pushdown took place (yes, it did) or not, but this situation is the very same as for the RowStore Indexes, here are no visible exceptions for the Columnstore Indexes.

Joining the remote table with the local table is also an easy item, which works as expected, and here a basic example with an execution plan:

SELECT SUM(C1) as Total
	FROM [.\SQL16].Test.dbo.T1
	INNER JOIN sys.objects so
		on T1.C1 < so.object_id;

All other basic operations that I ran against the LinkedServer were successful in the equal measure, and so we should focus on writing some data into the Linked Server, which should be a rather trivial thing, right ?

Writing into Linked Server

Lets us make sure everything is fine for data transfer and as we are using our source server (SQL Server 2014) with Linked Server to SQL Server 2016, let us insert a couple of ObjectIds to the T1 table that we have created in the [Test]Â database:

INSERT INTO [.\SQL16].Test.dbo.T1 (C1)
	SELECT so.object_id
		FROM sys.objects so;

This statement will result in the error message that you can find below, telling us something about Cursors (????):

Msg 35370, Level 16, State 1, Line 1
Cursors are not supported on a table which has a clustered columnstore index.

WHAT ? SORRY ? THERE ARE NO CURSORS HERE !
We have no cursors, we just have a Clustered Columnstore Index on our table!
Let us create a simple table on the SQL Server 2016 that does not have a Clustered Columnstore Index and try again:

DROP TABLE IF EXISTS dbo.t2;

CREATE TABLE dbo.t2(
	C1 INT,
        INDEX CI_t2 CLUSTERED (C1) );

INSERT INTO dbo.t2 (C1)
	VALUES (1),(2),(3);

ALTER TABLE dbo.t2
	REBUILD;

Again from SQL Server 2014, our original instance, let's try to insert data into the T2 table, which is a Rowstore with a Clustered Index:

INSERT INTO [.\SQL16].Test.dbo.T2 (C1)
	SELECT so.object_id
		FROM sys.objects so;

This statement was executed successfully without any problem!

So the problem is lying within the Clustered Columnstore Indexes ?

Well ... :) Let's us push it a bit - let's use the Nonclustered Columnstore Index in our destination table:

DROP TABLE IF EXISTS dbo.t2;

CREATE TABLE dbo.t2(
	C1 INT,
	INDEX CI_t2 CLUSTERED (C1),
	INDEX NCI_t2 NONCLUSTERED COLUMNSTORE (C1)
	 );

INSERT INTO dbo.t2 (C1)
	VALUES (1),(2),(3);

ALTER TABLE dbo.t2
	REBUILD;

To my surprise, running the same insert statement from SQL Server 2014 instance (source) works perfectly !

INSERT INTO [.\SQL16].Test.dbo.T2 (C1)
	SELECT so.object_id
		FROM sys.objects so;

This means that once again if you need this scenario (Linked Server as a Destination for the data) and Columnstore Indexes, you can use the Nonclustered Columnstore Indexes, starting from the SQL Server 2016 of course, because before that they were non-updatable.

Analysing the Clustered Columnstore and the Linked Server

Running an Extended Events agains our destination server (SQL Server 2016) has revealed a good number of secrets that I was not really aware of.
This is pure magical unicorns on the steroids kind of stuff...
Let us fire a simple trivial query to insert a single row into the remote table

INSERT INTO [.\SQL16].Test.dbo.T1 (C1)
	VALUES (123);

while we monitor the results with a pretty couple of the Extended Events we have got for this purpose. This will give us instantly the basics of the under the hood operations for understanding if there are any cursors involved.

On the picture below, you will see some of the statements that my Extended Events session have caught:

Be sure to recognise the sp_cursoropen procedure, which is responsible for the problem that we are facing.

To prove it, here is a small script that I executed against the [Test] database directly on the SQL Server 2016 instance:

DECLARE @cursor INT
EXEC sp_cursoropen @cursor OUTPUT, N'SELECT * FROM dbo.T1', 2, 8193
 
-- Close the cursor
EXEC sp_cursorclose @cursor

The error message that I have received was very clear, leaving no doubts that Linked Server is not a good friend of the Clustered Columnstore Indexes:

Msg 35370, Level 16, State 1, Line 1
Cursors are not supported on a table which has a clustered columnstore index.
Msg 16945, Level 16, State 2, Procedure sp_cursoropen, Line 1 [Batch Start Line 0]
The cursor was not declared.
Msg 16909, Level 16, State 1, Procedure sp_cursorclose, Line 1 [Batch Start Line 0]
sp_cursorclose: The cursor identifier value provided (0) is not valid.

For the test purpose, we cab run a similar script, but against the T2 table (Clustered Rowstore Index with a Nonclustered Columnstore Index):

DECLARE @cursor INT
EXEC sp_cursoropen @cursor OUTPUT, N'SELECT * FROM dbo.T2', 2, 8193
 
-- Close the cursor
EXEC sp_cursorclose @cursor

This scripts executes successfully without any problem. This confirms us the support situation regarding the Clustered Columnstore Index and the Nonclustered Columnstore Indexes that we have seen above in this blog post.

Final Thoughts

I guess that no one really cared about the scenario of loading the data into the Clustered Columnstore Index through the Linked Server, but as a matter of a fact - I have faced this situation with a real client. Even though I see the support of the Linked Servers as not the most important missing feature, but still there will be people facing this problem and it might disappoint a number of the real clients.
The lack of support for the Clustered Columnstore Indexes can also serve as a call for action for the future of the Linked Server, where abandoning cursors might be ... a good idea :)

to be continued with Columnstore Indexes – part 113 ("Row Groups Merging Limitations")

3 thoughts on “Columnstore Indexes – part 112 (“Linked Servers”)”

Klaus Ondrich September 15, 2017 at 7:09 am

Did you try to fetch the data instead of sending them, i.e. issueing the INSERT on the destination server and getting the data from the linked server per SELECT?
This is (or at least: should be) our common strategy. Getting data from remote to local server before inserting them was always faster than sending to destination, even with rowstores (we realized this long time ago in 2004).
So even when the loading procedure is on the sending servers, we issued the INSERT via linkedserver.database.dbo.sp_executesql to get much faster loads (tests from 2004 showed 5mins vs 1h).

Reply ↓
1. Niko Neugebauer Post authorSeptember 15, 2017 at 8:30 am
  
  Hi Klaus,
  
  yes, fetching data on the remote server and inserting locally works fine.
  Inserting remotely is a always a nightmare, because of the cursors involved (gosh I wish Microsoft would totally rewrite the current code).
  2004 is long time ago :) and quite different engine so to say (assuming you tested on SQL Server 2000), did you try anything similar on 2012+ engines ?
  
  Best regards,
  Niko
  
  Reply ↓
Klaus Ondrich September 15, 2017 at 4:36 pm

We encountered this behavior throughout the versions. I must admit that I do not have any results from SQL Server 2016 concerning this test case. But it would be a good idea to test it again though.

Involvement of tempdb on the receiving server when sending doesn’t make it better.

Reply ↓

Niko Neugebauer

SQL Server, Columnstore, Data Platform & Community