SQL Graph, part III – Derived Tables & Views

This is the 3rd blog post in the growing series of blogpost on the Graph features within SQL Server and Azure SQL Database that started at SQL Graph, part I, and hopefully will carry on for the many years to come with the development of the Graph engine in Azure SQL Database & SQL Server.

This blog post is dedicated to the newest features announced in preview for both SQL Server 2019 CTP 2.1 as well as the Azure SQL Database – the support for the derived tables & views for the Heterogeneous Nodes & Edges.

Heterogeneous Node

The Heterogeneous Node is a Node that is connected to more than one other Node through the same Edge (connection type). Consider the the following example from a possible social network: a Company can have 2 different type of the followers -> the private people & the other commercial businesses. The best way to describe it is to build the following graph as on the image on the left, where both Persons and Business as the followers of the enterprise company.
In my books, this is quite a pompous name for a very simple principle, but I understand that in the scientific space there is a need for understanding and labelling the concepts so that everyone can understand what others are talking about.
Let’s take a more detailed look at the proposed schema:

DROP TABLE IF EXISTS dbo.Follows;
DROP TABLE IF EXISTS dbo.Person;
DROP TABLE IF EXISTS dbo.Business;
DROP TABLE IF EXISTS dbo.Company;

CREATE TABLE dbo.Person(
	PersonID BIGINT NOT NULL PRIMARY KEY CLUSTERED,
	FULLNAME NVARCHAR(250) NOT NULL ) AS NODE;

CREATE TABLE dbo.Business(
	BusinessID BIGINT NOT NULL  PRIMARY KEY CLUSTERED,
	BusinessName NVARCHAR(250) NOT NULL UNIQUE ) AS NODE;

CREATE TABLE dbo.Company(
	CompanyID BIGINT NOT NULL  PRIMARY KEY CLUSTERED,
	CompanyName NVARCHAR(250) NOT NULL UNIQUE ) AS NODE;

CREATE TABLE dbo.Follows(
	Since DATETIME2 NULL,
        CONSTRAINT EDG_Person_Follows CONNECTION (Person TO Company, Business TO Company)
)
AS EDGE;

Let’s load a couple of rows into each of the tables and establish edge connections between Person and Company (1) and between Business and Company (2):

INSERT INTO dbo.Person
	VALUES (1,'John Doe'), (2,'Andres Martins');

INSERT INTO dbo.Business
	VALUES (1,'Foody Company'), (2,'Dressing Inc.');

INSERT INTO dbo.Company
	VALUES (1,'Creative Value Company'), (2,'Real Stuff');


INSERT INTO dbo.Follows
	SELECT	p1.$node_id as from_obj_id,
			(SELECT p2.$node_id as to_obj_id FROM dbo.Company p2 WHERE p2.CompanyID = 1),
			GETDATE()
		FROM dbo.Person p1
		WHERE p1.PersonID = 1


INSERT INTO dbo.Follows
	SELECT	p1.$node_id as from_obj_id,
			bus.to_obj_id,
			GETDATE()
		FROM dbo.Business p1
		CROSS APPLY (SELECT p2.$node_id as to_obj_id FROM dbo.Company p2 WHERE p2.CompanyID = 2) as bus

Looking at the above described schema, you might wonder if you actually could represent the Company and the Business as the very same node – and well, yes you could, but the idea here is not to get too complicated at the concept introduction.

Now, in the next step we shall create a derived view, which shall contain the list with all Persons and Businesses, joining them together:

CREATE OR ALTER VIEW dbo.Followers AS 
	SELECT PersonId as Id, FullName
		FROM dbo.Person
	UNION ALL
	SELECT BusinessId, BusinessName
		FROM dbo.Business;

Now, the real new thing is that we can use such derived tables in SQL Server 2019 CTP 2.1 and Azure SQL Database together with the MATCH clause, in the statements such as the one below where we list all the followers of the “Real Stuff” company:

SELECT Followers.ID, Followers.FullName
	FROM Followers, Follows, Company
	WHERE MATCH(Followers-(Follows)->Company)
		AND CompanyName = 'Real Stuff'

This query works fine, delivering us the expected results while generating a pretty complex execution plan in the background.

Running the same setup code (with the noticeable exception of the Edge Constraint) against the SQL Server 2017 will work fine, and that will include the derived view creation, but the search query will fail with the following error message:

Msg 13901, Level 16, State 1, Line 13
Identifier 'Followers' in a MATCH clause is not a node table or an alias for a node table.

Heterogeneous Edges

The Heterogeneous Edges are 2 or more Edges that serve as a connection between the very same Nodes. that is connected to more than one other Node through the same Edge (connection type). The basic example on the picture on the left side of this text is that a Person can work for the company (Edge 1) and like it as well (Edge 2).

Here is some setup code for creating the respective tables and their relationships:

DROP TABLE IF EXISTS dbo.Likes;
DROP TABLE IF EXISTS dbo.WorksFor;
DROP TABLE IF EXISTS dbo.Person;
DROP TABLE IF EXISTS dbo.Company;

CREATE TABLE dbo.Person(
	PersonID BIGINT NOT NULL PRIMARY KEY CLUSTERED,
	FULLNAME NVARCHAR(250) NOT NULL ) AS NODE;

CREATE TABLE dbo.Company(
	CompanyID BIGINT NOT NULL  PRIMARY KEY CLUSTERED,
	CompanyName NVARCHAR(250) NOT NULL UNIQUE ) AS NODE;

CREATE TABLE dbo.Likes(
	Since DATETIME2 NOT NULL,
    CONSTRAINT EDG_Person_Likes CONNECTION (Person TO Company)
)
AS EDGE;

CREATE TABLE dbo.WorksFor(
	Since DATETIME2 NOT NULL,
    CONSTRAINT EDG_Person_WorksFor CONNECTION (Person TO Company)
)
AS EDGE;

Let’s put some basic data for the people and the companies that they work for and that they like:

INSERT INTO dbo.Person
	VALUES (1,'John Doe'), (2,'Andres Martins');

INSERT INTO dbo.Company
	VALUES (1,'Creative Value Company'), (2,'Real Stuff');


INSERT INTO dbo.Likes
	SELECT	p1.$node_id as from_obj_id,
			(SELECT p2.$node_id as to_obj_id FROM dbo.Company p2 WHERE p2.CompanyID = 2),
			GETDATE()
		FROM dbo.Person p1
		WHERE p1.PersonID = 1;


INSERT INTO dbo.WorksFor
	SELECT	p1.$node_id as from_obj_id,
			bus.to_obj_id,
			GETDATE()
		FROM dbo.Person p1
		CROSS APPLY (SELECT p2.$node_id as to_obj_id FROM dbo.Company p2 WHERE p2.CompanyID = 2) as bus;

We can now create a derived table with Heterogeneous Edgesthat will simplify the usage (query writing) of the edges (connections) between the people and the companies:

CREATE OR ALTER VIEW dbo.RelatesTo AS 
	SELECT Since, 'likes' as opStatus
		FROM dbo.Likes
	UNION ALL
	SELECT Since, 'works for' as opStatus
		FROM dbo.WorksFor;

Now we can really find the people who like the companies they work for by issuing the following query:

SELECT Person.FullName, Company.CompanyName
	FROM Person, Company, RelatesTo
	WHERE MATCH(Person-(RelatesTo)->Company)
		and RelatesTo.opStatus in ('likes','works for')
	GROUP BY Person.FullName, Company.CompanyName
	HAVING COUNT(DISTINCT RelatesTo.opStatus) = 2 

which will function well in SQL Server 2019 CTP 2.1, while producing an error message on the SQL Server 2017:

Msg 13902, Level 16, State 1, Line 3
Identifier 'RelatesTo' in a MATCH clause is not an edge table or an alias for an edge table.

that is totally understandable since the View is definitely not an edge object, and the change to the Query Optimiser is something that I did not manage to find in SQL Server 2019 CTP 2.1 so far – all the data in sys.tables and sys.views does not show this change and it might just be some kind of invisible to the final SQL Server user propagation of the metadata that Microsoft has implemented.
I love how the simplification of much more complex queries will improve the capability of developers to deliver great solutions.
To understand it better, here is moral equivalent of the working query that one would need to write on SQL Server 2017:

SELECT FullName, CompanyName
	FROM
		(SELECT Person.FullName, Company.CompanyName, 'likes' as opStatus
			FROM Person, Company, Likes
			WHERE MATCH(Person-(Likes)->Company)
		UNION ALL
		SELECT Person.FullName, Company.CompanyName, 'works for' as opStatus
			FROM Person, Company, WorksFor
			WHERE MATCH(Person-(WorksFor)->Company)
		) res
	GROUP BY FullName, CompanyName
	HAVING COUNT(DISTINCT opStatus) = 2;  

and if this does not look complicated, just imagine that you have 5 different edge types to write for every single query …
Besides a pretty interesting stuff that in SQL Server 2019 CTP 2.1 the execution plan will be like the one shown below:

while for SQL Server 2017 the execution plan will be … a little bit more complicated:

and will function significantly slower when facing bigger amounts of data, then just a couple of rows. :)

Mixing Derived Views for Heterogeneous Nodes & Edges

It is quite possible to mix the derived Derived Views for Heterogeneous Nodes & Edges as shown below with Followers being the derived view for the Heterogeneous Nodes created in the original block and the RelatesTo being the Derived View for the Heterogeneous Edges, that was created in the previous block :

SELECT Followers.FullName, Company.CompanyName
	FROM Followers, Company, RelatesTo
	WHERE MATCH(Followers-(RelatesTo)->Company)
		and RelatesTo.opStatus in ('likes','works for')
	GROUP BY Followers.FullName, Company.CompanyName
	HAVING COUNT(DISTINCT RelatesTo.opStatus) = 2;

The ability of creating Nested Derived Views that is supported is something that while can be something quite helpful and useful is not the feature that will get me to the next cloud of happiness, since I have seen times and times again what can be achieved with the nested views and how painful it is sometimes to fix just a small error.

There are some interesting limitations that are already known and some that will be discovered in time, when people will start takings this feature into production, and I really hope that the development team will get an opportunity to make it work in most of the scenarios and with the most technologies (Yeah, I do not hold my breath here).

Final Thoughts

I am still kind of wondering – why so many database developers are not getting excited by the Graph features in SQL Server, because clearly we can start solving more complex problems with more ease then before. Having seen in the wild pretty complex solutions, I am wondering what will take for people to start recognising that they can tackle things in almost real time that they would otherwise need to build huge complex cursor solutions before.

Do not take me wrong, I still think the investment is rather small and I truly wish that Microsoft leadership would pour more resources into the development of the graph engine, but it is all about the Chicken & Eggs (usage & development), because nobody wants to invest into building a feature that the world does not intend to use, while the uncertainty of the using an early feature might force one to rebuild their application. I also think that smaller improvements for the Graph engine will fly under the radar for the most of the developers, mostly because compared to the past, they seem rather as the “Service Pack Level Improvements” (TM), while in the reality only the development team and the real fans of the Query Optimisation can appreciate the effort behind the curtain.

I am strongly convinced that the current Graph users might not be the best clients of the current state of the graph engine, but companies building new stuff might simply take advantage of this opportunity to enhance their offerings.

to be continued …

Leave a Reply

Your email address will not be published. Required fields are marked *