--- title: "Deploy a web service to Azure Kubernetes Service" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Deploy a web service to Azure Kubernetes Service} %\VignetteEngine{knitr::rmarkdown} \use_package{UTF-8} --- This tutorial demonstrates how to deploy a model as a web service on [Azure Kubernetes Service](https://azure.microsoft.com/en-us/services/kubernetes-service/) (AKS). AKS is good for high-scale production deployments; use it if you need one or more of the following capabilities: * Fast response time * Autoscaling of the deployed service * Hardware acceleration options such as GPU You will learn to: * Set up your testing environment * Register a model * Provision an AKS cluster * Deploy the model to AKS * Test the deployed service ## Prerequisites If you don’t have access to an Azure ML workspace, follow the [setup tutorial](https://azure.github.io/azureml-sdk-for-r/articles/configuration.html) to configure and create a workspace. ## Set up your testing environment Start by setting up your environment. This includes importing the **azuremlsdk** package and connecting to your workspace. ### Import package ```{r import_package, eval=FALSE} library(azuremlsdk) ``` ### Load your workspace Instantiate a workspace object from your existing workspace. The following code will load the workspace details from a **config.json** file if you previously wrote one out with `write_workspace_config()`. ```{r load_workspace, eval=FALSE} ws <- load_workspace_from_config() ``` Or, you can retrieve a workspace by directly specifying your workspace details: ```{r get_workspace, eval=FALSE} ws <- get_workspace("", "", "") ``` ## Register the model In this tutorial we will deploy a model that was trained in one of the [samples](https://github.com/Azure/azureml-sdk-for-r/blob/master/samples/training/train-on-amlcompute/train-on-amlcompute.R). The model was trained with the Iris dataset and can be used to determine if a flower is one of three Iris flower species (setosa, versicolor, virginica). We have provided the model file (`model.rds`) for the tutorial; it is located in the `deploy-to-aks` subfolder of this vignette. First, register the model to your workspace with [`register_model()`](https://azure.github.io/azureml-sdk-for-r/reference/register_model.html). A registered model can be any collection of files, but in this case the R model file is sufficient. Azure ML will use the registered model for deployment. ```{r register_model, eval=FALSE} model <- register_model(ws, model_path = "deploy-to-aks/model.rds", model_name = "iris_model", description = "Predict an Iris flower type") ``` ## Provision an AKS cluster When deploying a web service to AKS, you deploy to an AKS cluster that is connected to your workspace. There are two ways to connect an AKS cluster to your workspace: * Create the AKS cluster. The process automatically connects the cluster to the workspace. * Attach an existing AKS cluster to your workspace. You can attach a cluster with the [`attach_aks_compute()`](https://azure.github.io/azureml-sdk-for-r/reference/attach_aks_compute.html) method. Creating or attaching an AKS cluster is a one-time process for your workspace. You can reuse this cluster for multiple deployments. If you delete the cluster or the resource group that contains it, you must create a new cluster the next time you need to deploy. In this tutorial, we will go with the first method of provisioning a new cluster. See the [`create_aks_compute()`](https://azure.github.io/azureml-sdk-for-r/reference/create_aks_compute.html) reference for the full set of configurable parameters. If you pick custom values for the `agent_count` and `vm_size` parameters, you need to make sure `agent_count` multiplied by `vm_size` is greater than or equal to `12` virtual CPUs. ``` {r provision_cluster, eval=FALSE} aks_target <- create_aks_compute(ws, cluster_name = 'myakscluster') wait_for_provisioning_completion(aks_target, show_output = TRUE) ``` The Azure ML SDK does not provide support for scaling an AKS cluster. To scale the nodes in the cluster, use the UI for your AKS cluster in the Azure portal. You can only change the node count, not the VM size of the cluster. ## Deploy as a web service ### Define the inference dependencies To deploy a model, you need an **inference configuration**, which describes the environment needed to host the model and web service. To create an inference config, you will first need a scoring script and an Azure ML environment. The scoring script (`entry_script`) is an R script that will take as input variable values (in JSON format) and output a prediction from your model. For this tutorial, use the provided scoring file `score.R`. The scoring script must contain an `init()` method that loads your model and returns a function that uses the model to make a prediction based on the input data. See the [documentation](https://azure.github.io/azureml-sdk-for-r/reference/inference_config.html#details) for more details. Next, define an Azure ML **environment** for your script’s package dependencies. With an environment, you specify R packages (from CRAN or elsewhere) that are needed for your script to run. You can also provide the values of environment variables that your script can reference to modify its behavior. By default Azure ML will build a default Docker image that includes R, the Azure ML SDK, and additional required dependencies for deployment. See the documentation here for the full list of dependencies that will be installed in the default container. You can also specify additional packages to be installed at runtime, or even a custom Docker image to be used instead of the base image that will be built, using the other available parameters to [`r_environment()`](https://azure.github.io/azureml-sdk-for-r/reference/r_environment.html). ```{r create_env, eval=FALSE} r_env <- r_environment(name = "deploy_env") ``` Now you have everything you need to create an inference config for encapsulating your scoring script and environment dependencies. ``` {r create_inference_config, eval=FALSE} inference_config <- inference_config( entry_script = "score.R", source_directory = "deploy-to-aks", environment = r_env) ``` ### Deploy to AKS Now, define the deployment configuration that describes the compute resources needed, for example, the number of cores and memory. See the [`aks_webservice_deployment_config()`](https://azure.github.io/azureml-sdk-for-r/reference/aks_webservice_deployment_config.html) for the full set of configurable parameters. ``` {r deploy_config, eval=FALSE} aks_config <- aks_webservice_deployment_config(cpu_cores = 1, memory_gb = 1) ``` Now, deploy your model as a web service to the AKS cluster you created earlier. ```{r deploy_service, eval=FALSE} aks_service <- deploy_model(ws, 'my-new-aksservice', models = list(model), inference_config = inference_config, deployment_config = aks_config, deployment_target = aks_target) wait_for_deployment(aks_service, show_output = TRUE) ``` To inspect the logs from the deployment: ```{r get_logs, eval=FALSE} get_webservice_logs(aks_service) ``` If you encounter any issue in deploying the web service, please visit the [troubleshooting guide](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-troubleshoot-deployment). ## Test the deployed service Now that your model is deployed as a service, you can test the service from R using [`invoke_webservice()`](https://azure.github.io/azureml-sdk-for-r/reference/invoke_webservice.html). Provide a new set of data to predict from, convert it to JSON, and send it to the service. ``` {r test_service, eval=FALSE} library(jsonlite) # versicolor plant <- data.frame(Sepal.Length = 6.4, Sepal.Width = 2.8, Petal.Length = 4.6, Petal.Width = 1.8) # setosa # plant <- data.frame(Sepal.Length = 5.1, # Sepal.Width = 3.5, # Petal.Length = 1.4, # Petal.Width = 0.2) # virginica # plant <- data.frame(Sepal.Length = 6.7, # Sepal.Width = 3.3, # Petal.Length = 5.2, # Petal.Width = 2.3) predicted_val <- invoke_webservice(aks_service, toJSON(plant)) message(predicted_val) ``` You can also get the web service’s HTTP endpoint, which accepts REST client calls. You can share this endpoint with anyone who wants to test the web service or integrate it into an application. ``` {r eval=FALSE} aks_service$scoring_uri ``` ## Web service authentication When deploying to AKS, key-based authentication is enabled by default. You can also enable token-based authentication. Token-based authentication requires clients to use an Azure Active Directory account to request an authentication token, which is used to make requests to the deployed service. To disable key-based auth, set the `auth_enabled = FALSE` parameter when creating the deployment configuration with [`aks_webservice_deployment_config()`](https://azure.github.io/azureml-sdk-for-r/reference/aks_webservice_deployment_config.html). To enable token-based auth, set `token_auth_enabled = TRUE` when creating the deployment config. ### Key-based authentication If key authentication is enabled, you can use the [`get_webservice_keys()`](https://azure.github.io/azureml-sdk-for-r/reference/get_webservice_keys.html) method to retrieve a primary and secondary authentication key. To generate a new key, use [`generate_new_webservice_key()`](https://azure.github.io/azureml-sdk-for-r/reference/generate_new_webservice_key.html). ### Token-based authentication If token authentication is enabled, you can use the [`get_webservice_token()`](https://azure.github.io/azureml-sdk-for-r/reference/get_webservice_token.html) method to retrieve a JWT token and that token's expiration time. Make sure to request a new token after the token's expiration time. ## Clean up resources Delete the resources once you no longer need them. Do not delete any resource you plan on still using. Delete the web service: ```{r delete_service, eval=FALSE} delete_webservice(aks_service) ``` Delete the registered model: ```{r delete_model, eval=FALSE} delete_model(model) ``` Delete the AKS cluster: ```{r delete_cluster, eval=FALSE} delete_compute(aks_target) ```