Azure Arc for Data Services, part 3 – Configuring on a plain vanila Kubernetes

This is the the 3rd blog post in the series about Azure Arc enabled Data Services.
The initial post can be found at Azure Arc for Data Services, part 1 – Intro and the whole collection can be found at Azure Arc enabled Data Services.

To advance with the installation of the Azure Arc for Data Services, you will need a Kubernetes installation with all the requirements that were indicated in Azure Arc for Data Services, part 2 – Requirements and Tools, plus the respective Storage Classes. By the respective I mean that those will vary and depend on the type of the K8s you will be using.

I won’t be showing you my K8s config file because there is nothing special about it, besides the needed certificate and stuff to access the cluster remotely.

In this concrete example I will be using a plain vanilla Kubernetes 1.20 by verifying and running the respective KUBECTL command:

kubectl -version


Yes, the client is 1.19 and I am running (surprise, surprise and have been using it … well over 15 years) Mac OS X.

Azure Arc Data Controller

The installation of the Azure Arc for Data Services can be started in a couple of ways, but the initial one is to follow the instruction on the Azure Portal side (which will take you into Azure Data Studio, Azure CLI and KUBECTL anyway). The key element here is called Azure Arc Data Controller and you should choose the respective tenant in you Azure Portal, leading you into the listing of the existing ones and possibility to configure new ones.

Warning: right now is actually not a process execution with the Azure Arc Data Controller creation, but rather just a couple of required parameters gathering,
even though there will be more parameters asked right when we get to the real controller creation.
I would rather have all the requirements directly in the Azure Data Studio Notebook right away, but I understand that it MIGHT be a will of driving the configuration through the portal.
Since this blog post is being written during the preview phase, this is a totally acceptable situation for me.

Step 1 – make sure you have read the requirements:

Step 2 – select the subscription, the resource group, location (like Azure region), connection mode (right now either Direct or Indirect as I have explained in Azure Arc for Data Services, part 1 – Intro and a configuration profile (I am choosing a kubeadm, but as was explained in the part 2 of the series – aka Reuirements and Tools, you need to choose the right profile for you Kubernetes cluster between the supported ones : AKS with either premium or standard storage, Azure AKS HCI (Stack), AKE, EKS, Kubeadm, Openshift, Azure Openshift :

Step 3 – download Azure Data Studio Jupiter Notebook. This notebook file will contain some of the configuration selections that you have chosen in the previous step.

After downloading the notebook file we are ready to go, right ?
Wait a second, let us take a look at the storage configuration with more attention.

Storage Configuration

As specified in the documentation of the Storage Configuration for the Azure Data Arc, we need to force a very powerful and I hope that in the future changed setting, allowRunAsRoot by setting it to true, in the case we are using the NFS storage, as happened to be in our case.

Since right now there is no way of specifying the settings in the Notebook directly and we are not creating the Azure Data Controller from the YAML this time, we should create a controller.json profile file and modify our notebook, forcing it to use the custom configuration file

You will find on the pictures below a couple of screenshots of the actual config.json file that we have generated during the installation and marked with red was the setting of allowing running as Root that was changed with the help of the script:
You do not have to worry about the details right now since it is initially automatically generated, but we can go much further with editing the existing properties or adding those other documented properties, if needed. For example you can force the selection of the image that will be used for Data Controller installation and if needed, we can chose a different one, if for some exceptional reason the latest one is incompatible with our installation.

In the Step 5 : Initialize Creation of data controller, we need to edit the azdata arc dc create command by adding/specifying the location of our json configuration file and hence the step would get the variable JSON_CONFIG with the path of the configuration file that we shall initialise by default, and then we shall go through the JSONPath and force the spec.security.allowRunAsRoot property to be equal to true:

JSON_CONFIG = '/Users/niko.neugebauer/MEOCloud/Data\ Arc/'

try:
    run_command(f'azdata arc dc config init --source azure-arc-kubeadm --path {JSON_CONFIG} --force')
    run_command(f'azdata arc dc config replace --path {JSON_CONFIG}/control.json --json-values ''$.spec.security.allowRunAsRoot=true''')
except:
    raise

in the same step 5, we shall finally indicate the path to our JSON_CONFIG control.json file, forcing it to be used:

try:
    run_command(f'azdata arc dc create --path {JSON_CONFIG} -n {DC_NAME} --namespace {NAMESPACE} --connectivity-mode {CONNECTION_MODE} --location {LOCATION} --subscription {SUBSCRIPTION_ID} --resource-group {RESOURCE_GROUP} --storage-class {STORAGE_CLASS}')
    print('ok')
except:  
    print(sys.exc_info())
    raise

I totally wish this would be easier, but at the moment it is not.

ADS (Azure Data Studio) Notebook

The next step is to simply (haha) run our downloaded and edited in the previous step notebook, that will allow us to create the Azure Arc Data Controller on our Kubernetes cluster.
You can choose either run each cell step by step or to run all cells at once with the “Run All” option.
During the execution the requirements shall be verified and you will be asked to input the following information:
– Kubernetes context (well, default or other if you really know what you are doing)
– Namespace where the Azure Arc Data Controller shall be deployed (make sure it is empty, otherwise the installation will abort)
- Storage Class (depending on the type of Kubernetes Cluster and available storage options it might be something to prepare in advance)
– Username for the Azure Arc Data Controller
– Password for the Azure Arc Data Controller

plus if you are doing a Direct Mode deployment, make sure you will input:
– Service principal tenant id
– Service principal client id
– Service principal secret
and for configuring them we shall see the steps in some other blog posts, or you can find this information in the documentation.
If you are running Indirect Mode, then simply ignore them by hitting enter.

The deployment of all the pods takes some time (expected around 15 minutes, but your mileage will definitely vary, depending on the number of factors such as the resources and the connectivity):

You can and you SHOULD be checking on the progress with the help of the kubeadm command, listing all the current elements in our namespace (we have chosen it to be ARC)

kubectl get all -n arc

You can also see the status of the Azure Arc Data controller with the help of status command:

azdata arc dc status show


In my case the Azure Arc Data Controller is successfully deployed and ready, but you can use this command in order to check the DC status and progress.

Another important thing is to have the list of the connection points to such tools as:
– Cluster Manager
– Metrics Dashboard
– Log Search Dashboard
– Management Proxy
For this purpose you can the endpoint list command on our Azure Arc Data Controller:

azdata arc dc endpoint list

In the end after short wait (I consider it to be blazingly fast taken in account the amount of complexity of all things involved), we can see the end result of our deployment on Azure Portal and notice that it takes up to 24 hours for the first results of the metrics of the direct mode to appear on the Azure Portal, after a successful synchronisation:

In this blog post we went with rather not-trivial installation of Azure Arc Data Controller on our on-premises vanilla Kubernetes Cluster, but don’t give up there is much more information and this is not the end of the easy and not-so-easy things to do.

to be continued with Azure Arc enabled Data Services, part 4 – Configuring on a AKS

Leave a Reply

Your email address will not be published. Required fields are marked *