Azure Arc for Data Services, part 1 – Intro

Welcome to my small series on Azure Arc for Data Services.
In this series I will explain and explore the offerings of Azure Arc for Data Services, point to some limitations (yes, most probably very soon fixed by Microsoft as the churn the evolution of this wonderful product) and share my thoughts on it.

Do not confuse the names of Azure Arc with Azure Arc enabled-services series of products, even though they are using the same name of Azure Arc, they do contain major differences.
Azure Arc is extending the Azure management capabilities to your on-premises servers, I like the name being a true kind of an Azure Extension … I always think about it as a visual Arc being extended from the Microsoft Azure towards the Local Data-center.
With Azure Arc enabled Data Services, you will be able to host the versions of some (at the moment, but hopefully eventually all) Azure Data Services on-premises, on your own infrastructure. Right now you can have Azure SQL Managed Instance and Azure PostgreSQL Hyperscale and besides them the Azure Arc enabled Kubernetes Cluster, which is already the underlying keystone for the Azure Arc enabled Data Services.

Microsoft has made a huge push around 2012 towards Azure, declaring that everyone is going straight and right away into the cloud, trying to force the eventual migrations. In just a couple of years, the attitude towards the market has evolved into much better and let me say, much fair strategy - the Hybrid clod, where the client can chose the place where they can run their workload.

The amazing idea behind running the Azure Arc enabled Data Services is to enable you to have the Azure Services anywhere, no matter if it in your own Data Centre or even any other cloud, such as AWS or Google Cloud. All you need to be able to start is simply to have Kubernetes cluster running. Well, a Kubernetes cluster (aka K8s) that will satisfy the requirements for the deployment of Azure Arc enabled Data Services.

The Kubernetes itself will allow you a very significant abstraction from the underlying infrastructure, be it on-premises and a real hardware below or even a cloud environment where you as a matter of a fact will have a number of virtualisation layers, or you can even spread your K8s cluster between multiple clouds or on-premises and multiple clouds.
Eventually even though I did not try it, but I can even see someone running Azure Arc enabled Data Services on the Edge Devices, which must be a different kind of fun. :)

The idea about a Managed Services Product (Azure Arc enabled Data Services) on your own terms is that you can either:
– experiment with the cloud offerings without committing and transferring your precious data to the Cloud
– get rid of the unmanaged services offerings by using Microsoft Cloud Services on your terms
– consolidate your cloud and on-premises assets into a unified management experience
– safely & easily experiment with the latest and greatest features before using them in production
– and extra point will focus on the licensing aspect where I expect that the customers will be able to run developer editions without limitations, hence no more things such as in “we kind of need a S3 Azure SQL Database but without the licensing part and with better IO capabilities”
– being able to scale up and down the service easily without all usual hustle of resizing the VMs etc (well, yes – you better have those CPU cores and RAM modules ready)

In a way (drumroll and yeah, expect this to poke some fun) – I hope and expect it to be a kind of a Azure Stack, but without the hardware requirements. :)

Right now the Azure Arc for Data Services service is in the preview phase, which was announced in September of 2020 on the Microsoft Ignite. On the available data services side right now in January of 2021 we have Azure SQL Database Managed Instance and Azure Arc enabled PostgreSQL Hyperscale available, but Microsoft promises to bring more and more supported services on this architecture.

Gonna Get Myself Connected

So how it should be work and how it should be connected (or not) to Azure ?

Azure Arc enabled Data Services are focusing on offering 3 distinct scenarios for the connectivity with Azure (and right now only 2 of them are available):
– indirectly connected (the default mode) where the configuration, eventual updates and the most importantly, the reporting on the consumption of the services is done manually (or programmed through the cron jobs). It will force you to do some extra uplifting on a regular basis, and yes, it means you will want to monitor those jobs as well. You do not have to deliver/upload directly from your Kubernetes cluster, you can actually setup a secure environment of indirect reporting, but that’s right now out of the scope of the initial article.
– fully connected (available from the December of 2020) where we do not have to do the consumption reporting, and where the Azure Arc itself will take care of the job. This must be the most desired item for the most clients, I suspect.
– disconnected (not available yet). This mode will address the scenarios where the infrastructure is rarely connected to the internet (critical isolated environments, such as ships for example) or totally disconnected infrastructures (probably military or extremely sensitive ones).

Right now as per documentation (and more on the details are to be found in the part 2 of this series), it seems that each mode has it’s own advantages and limitations, so I guess the real choice for a data centre provider or a big organisation will be base on the concrete project needs.
Keep in mind that since this is an initial public preview phase, most of the things are subject to change without any further notice – the speed of the Development will be the key here.

During the phase of the public preview the price seems to be free and it’s a wonderful opportunity to look into this upcoming technology, that I expect to have a significant impact around the world in the future. Who would care to run a single instance of SQL Server while they can easily set up a Kubernetes Cluster with or without high availability in the matter of seconds and easily migrate to the cloud if/when needed.

In the next blog posts we shall look at the technical requirements, tools and of course the deployments and usage of the Azure Arc enabled Data Services, or as I freely call them Azure Arc for Data Services. :)

to be continued with Azure Arc enabled Data Services, part 2 – Requirements and Tools

Niko Neugebauer

SQL Server, Columnstore, Data Platform & Community

Azure Arc for Data Services, part 1 – Intro

Gonna Get Myself Connected

Leave a Reply Cancel reply