Deploy a web service to Azure Kubernetes Service

This tutorial demonstrates how to deploy a model as a web service on Azure Kubernetes Service (AKS). AKS is good for high-scale production deployments; use it if you need one or more of the following capabilities:

  • Fast response time
  • Autoscaling of the deployed service
  • Hardware acceleration options such as GPU

You will learn to:

  • Set up your testing environment
  • Register a model
  • Provision an AKS cluster
  • Deploy the model to AKS
  • Test the deployed service

Prerequisites

If you don’t have access to an Azure ML workspace, follow the setup tutorial to configure and create a workspace.

Set up your testing environment

Start by setting up your environment. This includes importing the azuremlsdk package and connecting to your workspace.

Import package

library(azuremlsdk)

Load your workspace

Instantiate a workspace object from your existing workspace. The following code will load the workspace details from a config.json file if you previously wrote one out with write_workspace_config().

ws <- load_workspace_from_config()

Or, you can retrieve a workspace by directly specifying your workspace details:

ws <- get_workspace("<your workspace name>", "<your subscription ID>", "<your resource group>")

Register the model

In this tutorial we will deploy a model that was trained in one of the samples. The model was trained with the Iris dataset and can be used to determine if a flower is one of three Iris flower species (setosa, versicolor, virginica). We have provided the model file (model.rds) for the tutorial; it is located in the deploy-to-aks subfolder of this vignette.

First, register the model to your workspace with register_model(). A registered model can be any collection of files, but in this case the R model file is sufficient. Azure ML will use the registered model for deployment.

model <- register_model(ws, 
                        model_path = "deploy-to-aks/model.rds", 
                        model_name = "iris_model",
                        description = "Predict an Iris flower type")

Provision an AKS cluster

When deploying a web service to AKS, you deploy to an AKS cluster that is connected to your workspace. There are two ways to connect an AKS cluster to your workspace:

  • Create the AKS cluster. The process automatically connects the cluster to the workspace.
  • Attach an existing AKS cluster to your workspace. You can attach a cluster with the attach_aks_compute() method.

Creating or attaching an AKS cluster is a one-time process for your workspace. You can reuse this cluster for multiple deployments. If you delete the cluster or the resource group that contains it, you must create a new cluster the next time you need to deploy.

In this tutorial, we will go with the first method of provisioning a new cluster. See the create_aks_compute() reference for the full set of configurable parameters. If you pick custom values for the agent_count and vm_size parameters, you need to make sure agent_count multiplied by vm_size is greater than or equal to 12 virtual CPUs.

aks_target <- create_aks_compute(ws, cluster_name = 'myakscluster')

wait_for_provisioning_completion(aks_target, show_output = TRUE)

The Azure ML SDK does not provide support for scaling an AKS cluster. To scale the nodes in the cluster, use the UI for your AKS cluster in the Azure portal. You can only change the node count, not the VM size of the cluster.

Deploy as a web service

Define the inference dependencies

To deploy a model, you need an inference configuration, which describes the environment needed to host the model and web service. To create an inference config, you will first need a scoring script and an Azure ML environment.

The scoring script (entry_script) is an R script that will take as input variable values (in JSON format) and output a prediction from your model. For this tutorial, use the provided scoring file score.R. The scoring script must contain an init() method that loads your model and returns a function that uses the model to make a prediction based on the input data. See the documentation for more details.

Next, define an Azure ML environment for your script’s package dependencies. With an environment, you specify R packages (from CRAN or elsewhere) that are needed for your script to run. You can also provide the values of environment variables that your script can reference to modify its behavior.

By default Azure ML will build a default Docker image that includes R, the Azure ML SDK, and additional required dependencies for deployment. See the documentation here for the full list of dependencies that will be installed in the default container. You can also specify additional packages to be installed at runtime, or even a custom Docker image to be used instead of the base image that will be built, using the other available parameters to r_environment().

r_env <- r_environment(name = "deploy_env")

Now you have everything you need to create an inference config for encapsulating your scoring script and environment dependencies.

inference_config <- inference_config(
  entry_script = "score.R",
  source_directory = "deploy-to-aks",
  environment = r_env)

Deploy to AKS

Now, define the deployment configuration that describes the compute resources needed, for example, the number of cores and memory. See the aks_webservice_deployment_config() for the full set of configurable parameters.

aks_config <- aks_webservice_deployment_config(cpu_cores = 1, memory_gb = 1)

Now, deploy your model as a web service to the AKS cluster you created earlier.

aks_service <- deploy_model(ws, 
                            'my-new-aksservice', 
                            models = list(model), 
                            inference_config = inference_config, 
                            deployment_config = aks_config,
                            deployment_target = aks_target)

wait_for_deployment(aks_service, show_output = TRUE)

To inspect the logs from the deployment:

get_webservice_logs(aks_service)

If you encounter any issue in deploying the web service, please visit the troubleshooting guide.

Test the deployed service

Now that your model is deployed as a service, you can test the service from R using invoke_webservice(). Provide a new set of data to predict from, convert it to JSON, and send it to the service.

library(jsonlite)
# versicolor
plant <- data.frame(Sepal.Length = 6.4,
                    Sepal.Width = 2.8,
                    Petal.Length = 4.6,
                    Petal.Width = 1.8)

# setosa
# plant <- data.frame(Sepal.Length = 5.1,
#                    Sepal.Width = 3.5,
#                    Petal.Length = 1.4,
#                    Petal.Width = 0.2)

# virginica
# plant <- data.frame(Sepal.Length = 6.7,
#                    Sepal.Width = 3.3,
#                    Petal.Length = 5.2,
#                    Petal.Width = 2.3)

predicted_val <- invoke_webservice(aks_service, toJSON(plant))
message(predicted_val)

You can also get the web service’s HTTP endpoint, which accepts REST client calls. You can share this endpoint with anyone who wants to test the web service or integrate it into an application.

aks_service$scoring_uri

Web service authentication

When deploying to AKS, key-based authentication is enabled by default. You can also enable token-based authentication. Token-based authentication requires clients to use an Azure Active Directory account to request an authentication token, which is used to make requests to the deployed service.

To disable key-based auth, set the auth_enabled = FALSE parameter when creating the deployment configuration with aks_webservice_deployment_config(). To enable token-based auth, set token_auth_enabled = TRUE when creating the deployment config.

Key-based authentication

If key authentication is enabled, you can use the get_webservice_keys() method to retrieve a primary and secondary authentication key. To generate a new key, use generate_new_webservice_key().

Token-based authentication

If token authentication is enabled, you can use the get_webservice_token() method to retrieve a JWT token and that token’s expiration time. Make sure to request a new token after the token’s expiration time.

Clean up resources

Delete the resources once you no longer need them. Do not delete any resource you plan on still using.

Delete the web service:

delete_webservice(aks_service)

Delete the registered model:

delete_model(model)

Delete the AKS cluster:

delete_compute(aks_target)