SQL Graph, part IV – Merge DML Support (aka MATCH support for the MERGE against the Edges)

This is the 4th blog post in the series of blogpost on the Graph features within SQL Server and Azure SQL Database.
This particular post is dedicated to the support of the MERGE DML for Graph tables in Azure SQL Database, that Microsoft has announced at the Ignite in September 2018.

I am not the very best friend of the MERGE Statement for Columnstore Indexes, as you might know :), but I am a fan of the concept of the UPSERT statement (aka MERGE for Microsoft Data Platform) and I am definitely facing those statements in production on regular basis.
Even though I honestly wish that Microsoft would invest into further development and performance improvement for this T-SQL command, and while I am not expecting this to happen any time soon – I still expect to see this command being used and abused in the decades to come, which brings us to the topic of the Graph Database …

The Node Table Upsets

The support for the Merge statement against the Graph Tables is already available in SQL Server 2019 Public CTPs and Azure SQL Database, and so you should be able to try it. :)
Here are the basic tests I did for the Node Tables, once it became available, where I create a table dbo.Person:

DROP TABLE IF EXISTS dbo.Person;

CREATE TABLE dbo.Person(
	PersonID BIGINT NOT NULL PRIMARY KEY CLUSTERED,
	FULLNAME NVARCHAR(250) NOT NULL ) AS NODE;

,

load a couple of rows into it:

DELETE FROM dbo.Person;

INSERT INTO dbo.Person
	VALUES (1,'John Doe'), (2,'Andres Martins');

,

set up a new staging table dbo.PersonStagedData which shall contain the first 2 original rows but with more detailed information on the full names of the people, plus an additional row that will contain a new entry with a PersonId of 3 (aka Graph Updatio :)):

DROP TABLE IF EXISTS dbo.PersonStagedData;

CREATE TABLE dbo.PersonStagedData(
	PersonID BIGINT NOT NULL PRIMARY KEY CLUSTERED,
	FULLNAME NVARCHAR(250) NOT NULL );

INSERT INTO dbo.PersonStagedData
	VALUES (1,'John Brian Doe'), (2,'Andres Ignacio Martins'), (3,'Grapho Updatio');

Now we are ready to UPSERT the original dbo.Person table with the information from the dbo.PersonStagedData table:

SELECT [PersonID], [FULLNAME]
	FROM dbo.Person;

MERGE dbo.Person AS Target
        USING (SELECT [PersonID], [FULLNAME]
			  FROM [dbo].[PersonStagedData])
        AS Source
        ON (Target.[PersonID] = Source.[PersonID])
    WHEN MATCHED THEN
        UPDATE SET [FULLNAME] = Source.[FULLNAME]
    WHEN NOT MATCHED THEN
        INSERT ([PersonID], [FULLNAME])
        VALUES (Source.[PersonID], Source.[FULLNAME]) ;

SELECT [PersonID], [FULLNAME]
	FROM dbo.Person;

The above presented code will executes without any warning or errors, presenting the original plus the updated results, as you can see on the image on the left of this text.
The execution plan for the MERGE DML statement is presented below
,
and you might wonder, why would the same operation would not work in the original SQL Server supporting Graph – the SQL Server 2017. For this purpose I ran the very same workload against the SQL Server 2017 …
and VOILÁ! – it worked perfectly fine! :)
because the Node Tables Upserts are perfectly functioning and are supported since the original release of the Graph Database in SQL Server.
You might have noticed that we do not have the MATCH clause in our MERGE statement, but let me assure you that even a statement like the following one will work very fine on both SQL Server 2017 and SQL Server 2019 CTP 2.2 versions:

-- The Setup part for the Node dbo.Company and the Edge dbo.Follows tables:
DROP TABLE IF EXISTS dbo.Follows;
DROP TABLE IF EXISTS dbo.Company;

CREATE TABLE dbo.Company(
	CompanyID BIGINT NOT NULL  PRIMARY KEY CLUSTERED,
	CompanyName NVARCHAR(250) NOT NULL UNIQUE ) AS NODE;

CREATE TABLE dbo.Follows(
	Since DATETIME2 NULL,
        --CONSTRAINT EDG_Person_Follows CONNECTION (Person TO Company, PersonStagedData TO Company)
)
AS EDGE;

INSERT INTO dbo.Company
	VALUES (1,'Creative Value Company'), (2,'Real Stuff');

INSERT INTO dbo.Follows
	SELECT	p1.$node_id as from_obj_id,
			(SELECT p2.$node_id as to_obj_id FROM dbo.Company p2 WHERE p2.CompanyName = 'Real Stuff'),
			'2019-01-01'
		FROM dbo.Person p1

-- ********************************************************************************
-- The Merge Statement itself
MERGE dbo.Person AS Target
        USING (SELECT PersonStagedData.PersonID, PersonStagedData.FullName
				FROM PersonStagedData, Follows, Company
				WHERE MATCH(PersonStagedData-(Follows)->Company)
					AND CompanyName = 'Real Stuff')
        AS Source			
        ON (Target.[PersonID] = Source.[PersonID])
    WHEN MATCHED THEN
        UPDATE SET [FULLNAME] = Source.[FULLNAME]
    WHEN NOT MATCHED THEN
        INSERT ([PersonID], [FULLNAME])
        VALUES (Source.[PersonID], Source.[FULLNAME]) ;

Yeah, that worked well on my SQL Server 2017 …

Then what is really new for the SQL Server 2019 CTPs then, you might ask – and the answer is simple as it is exotic: the support for the MATCH clause for the Edge Tables while using the MERGE statement:

The Edge Table Upserts (aka That Actually Added Feature)

Let us reset everything by recreating the tables and adding the dbo.Follows Edge Table between the 2 Node tables – dbo.Person and dbo.Company, the same way as in the previous blog post:

DROP TABLE IF EXISTS dbo.Follows;
DROP TABLE IF EXISTS dbo.Person;
DROP TABLE IF EXISTS dbo.Company;

CREATE TABLE dbo.Person(
	PersonID BIGINT NOT NULL PRIMARY KEY CLUSTERED,
	FULLNAME NVARCHAR(250) NOT NULL ) AS NODE;

CREATE TABLE dbo.Company(
	CompanyID BIGINT NOT NULL  PRIMARY KEY CLUSTERED,
	CompanyName NVARCHAR(250) NOT NULL UNIQUE ) AS NODE;


CREATE TABLE dbo.Follows(
    Since DATETIME2 NULL,
    CONSTRAINT EDG_Person_Follows CONNECTION (Person TO Company, PersonStagedData TO Company)
)
AS EDGE;

Let’s insert the data that every person will like every company (2 people and 2 companies), starting with the date of the 1st of January of 2019:

TRUNCATE TABLE dbo.Follows;
DELETE FROM dbo.Person;
DELETE FROM dbo.Company;

INSERT INTO dbo.Person
	VALUES (1,'John Doe'), (2,'Andres Martins');

INSERT INTO dbo.Company
	VALUES (1,'Creative Value Company'), (2,'Real Stuff');

INSERT INTO dbo.Follows
	SELECT	p1.$node_id as from_obj_id,
			(SELECT p2.$node_id as to_obj_id FROM dbo.Company p2 WHERE p2.CompanyName = 'Real Stuff'),
			'2019-01-01'
		FROM dbo.Person p1

INSERT INTO dbo.Follows
	SELECT	p1.$node_id as from_obj_id,
			(SELECT p2.$node_id as to_obj_id FROM dbo.Company p2 WHERE p2.CompanyName NOT like 'Real%' ),
			'2019-01-01'
		FROM dbo.Person p1

Now let’s upsert the data for the Edges of the ‘Real Stuff’ company with the value of the 1st of January of 2001

SELECT Person.FullName, Follows.Since as FollowsSince, Company.CompanyName
	FROM dbo.Follows, dbo.Person, dbo.Company
	WHERE MATCH(Person-(Follows)->Company)


MERGE dbo.Follows
	     USING (SELECT PersonID, FullName
				FROM Person)
        AS Source
		INNER JOIN dbo.Company
			ON Company.CompanyName = 'Real Stuff'			
        ON MATCH(Source-(Follows)->Company)
    WHEN MATCHED THEN
        UPDATE SET Since = '2001-01-01';


SELECT Person.FullName, Follows.Since as FollowsSince, Company.CompanyName
	FROM dbo.Follows, dbo.Person, dbo.Company
	WHERE MATCH(Person-(Follows)->Company)

The queries will be executed successfully, while our execution plan will be a little bit exotic:

and the results will be following with the attributes of our Edge Table becoming updated, as shown on the image on the right to this text, where the imaginary ‘Real Stuff’ company followers will be getting their “follow date since” updated to the 1st of January of 2000.

To prove that this support is something brand new, run the very same queries against the SQL Server 2017 (NOTE: Please remove the EDGE Constraint from the setup!) and it will be very explicit in the lack of understanding of this query pattern, by delivering the following error message:

Msg 102, Level 15, State 1, Line 11
Incorrect syntax near '>'.

I can only imagine that there were some important internal investments for the Query Optimiser to enable the support for the MERGE statement, but it is definitely not something that will get me overly excited, and it comes to my mind like a specific request (aka ASK in Redmond Language :)) from a specific important client, which scenario was not supported…

Final Thoughts

The addition of this statement support as a news announcement is quite disappointing for the expected size of the investments into the Graph database in SQL Server 2019 and Azure SQL Database. In the other times (think timeframe of the SQL Server 2005 and SQL Server 2008), very few would be able to believe that a particular T-SQL statement support would become an announcement – you either support the surface or you not. We live in a very fast changing world and as we saw with SQL Server 2014 and beyond, the reality of the specific implementation & scenarios becomes more and more specific.
An additional partial support of the particular T-SQL statement for the particular scenario is nothing that will get everyone to get started with Graph (unless this would be something totally ground-breaking or hugely missing), especially since it is currently possible to do the same operation starting with the SQL Server 2017, with the help of multiple statement within a transaction.

I am not trying to criticise the engineering team, which does an outstanding work by delivering the features in short timeframes – but I am definitely asking myself about the direction and priority of the Graph investment. I am asking myself, which idea & confidence this feature gives to the final customers … and the answer I have for me is the one I do not like, and I really want to have it other way around.

I am confident that this is not the end of the investment for the Graph in SQL Server 2019 and obviously in Azure SQL Database, but I am wondering what is coming in the pipeline, because Edge Constraints, Derived Views for Heterogeneous Nodes & Edges & Merge Statements ain’t enough for a new release of the product, especially since we are looking into 2 years difference and not just 1, like in the case between SQL Server 2016 & SQL Server 2017.
I do not want to set the hopes too high, but let’s say that I still hold my ground, believing that one day not too far in the future we shall be able to do some useful analysis of the weighted shortest path, transitive closure, betweenness and closeness centralities, colouring, minimal spanning tree & others – effectively avoiding paying the heavy costs & risks of extracting our useful data into the unsecured or additional environments.

But I am not setting those hopes on the SQL Server 2019, for sure …

to be continued …

Leave a Reply

Your email address will not be published. Required fields are marked *