This guide covers the details for deploying models as web services on Azure Machine Learning.
A registered model is a logical container for one or more files that make up your model. For example, if you have a model that’s stored in multiple files, you can register them as a single model in the workspace. After you register the files, you can then download or deploy the registered model and receive all the files that you registered.
Machine learning models are registered in your Azure ML workspace. The model can come from Azure ML or from somewhere else. When registering a model, you can optionally provide metadata about the model. The tags and properties that you apply to a model registration can then be used to filter models.
When you use the SDK to train a model, you will have a corresponding
Run
object. If your training script wrote the model file(s)
to the 'outputs'
folder, then you can register a model
directly from that training run.
# Register a single file as the model
model <- register_model_from_run(run,
model_path='outputs/my_model.rds',
model_name='my_model')
For more information, see the documentation for register_model_from_run()
You can register a model by providing the local path of the model. You can provide the path of either a folder or a single file. You can use this method to register models that were trained with Azure ML and then downloaded locally. You can also use this method to register models trained outside of Azure ML.
# Register a single file as the model
model <- register_model(ws,
model_path='my_model.rds',
model_name='my_model')
# Register a folder of files as the model
model <- register_model(ws,
model_path='my_model_dir/',
model_name='my_model')
For more information, see the documentation for register_model()
You can use the following compute targets, or compute resources, to host your web service deployment:
Compute target | Used for | GPU support | Description |
---|---|---|---|
Local web service | Testing/debugging | Use for limited testing and troubleshooting. Hardware acceleration depends on use of libraries in the local system. | |
Azure ML compute instance web service | Testing/debugging | Used for limited testing and troubleshooting. | |
Azure Container Instances (ACI) | Testing or development | Use for low-scale CPU-based workloads that require less than 48 GB of RAM. | |
Azure Kubernetes Service (AKS) | Real-time inference | Yes | Use for high-scale production deployments. Provides fast response time and autoscaling of the deployed service. Cluster autoscaling isn’t supported through the Azure Machine Learning SDK. To change the nodes in the AKS cluster, use the UI for your AKS cluster in the Azure portal. |
NOTE:
Although compute targets like local and Azure Machine Learning compute
instance support GPU for training and experimentation, using GPU for
inference when deployed as a web service is supported only on Azure
Kubernetes Service.
To deploy a model, you need the following:
To deploy a model, you must provide an entry script (also referred to as the scoring script) that accepts requests, scores the requests by using the model, and returns the results. The entry script is specific to your model. It must understand the format of the incoming request data, the format of the data expected by your model, and the format of the data returned to clients. If the request data is in a format that is not usable by your model, the script can transform it into an acceptable format. It can also transform the response before returning it to the client.
The entry script must contain an init()
method that loads your model and then returns a function that uses the
model to make a prediction based on the input data passed to the
function. Azure ML runs the init()
method once, when the
Docker container for your web service is started. The prediction
function returned by init()
will be run every time the
service is invoked to make a prediction on some input data. The inputs
and outputs of this prediction function typically use JSON for
serialization and deserialization.
To locate the registered model(s) in your entry script, use the
AZUREML_MODEL_DIR
environment variable that is created
during the service deployment. This environment variable contains the
path to the directory that contains the the deployed model(s).
The following table describes the value of
AZUREML_MODEL_DIR
depending on the number of models
deployed:
Deployment | Environment variable value |
---|---|
Single model | The path to the folder containing the model. |
Multiple models | The path to the folder containing all models. Models are located by
name and version in this folder ($MODEL_NAME/$VERSION ) |
To get the path to the model file in your entry script, combine the environment variable with the file path you’re looking for.
Single model example
# Example when the model is a file
model_path <- file.path(Sys.getenv('AZUREML_MODEL_DIR'), 'my_model.rds')
# Example when the model is a folder containing a file
model_path <- file.path(Sys.getenv('AZUREML_MODEL_DIR'), 'my_model_dir', 'my_model.rds')
Multiple model example
The following is an example entry script. You can see the full tutorial here.
library(jsonlite)
init <- function()
{
# Get the path to the model location of the registered model in Azure ML
model_path <- Sys.getenv("AZUREML_MODEL_DIR")
# Load the model
model <- readRDS(file.path(model_path, "model.rds"))
message("logistic regression model loaded")
# The following method will be called by Azure ML each time the deployed web service is invoked
function(data)
{
# Deserialize the input data to the service
vars <- as.data.frame(fromJSON(data))
# Evaluate the data on the deployed model
prediction <- as.numeric(predict(model, vars, type="response")*100)
# Return the prediction serialized to JSON
toJSON(prediction)
}
}
You will also need to provide an Azure ML environment (r_environment()
)
that defines all the dependencies required to execute your scoring
script. You can create a new environment for deployment, or use a
previously instatiated environment or registered environment.
Then define the inference configuration, which consists of the entry
script, the environment, and optionally the directory that contains all
the files needed to package and deploy your model (such as helper files
for the entry script). See the reference documentation for inference_config()
.
myenv = get_environment(ws, name = 'myenv', version = '1')
inference_config = inference_config(entry_script = 'score.R',
source_directory = './my_scoring_folder',
environment = myenv)
Note that if you specify the source_directory
parameter,
the entry script file must be located in that directory, and the value
to entry_script
should be the relative path of the file
inside that directory.
Before deploying your model, you must define the deployment configuration. The deployment configuration is specific to the compute target that will host the web service. For example, when you deploy a model locally, you must specify the port where the service accepts requests.
The following table provides examples for creating the deployment configuration for each compute target:
Compute target | Deployment configuration | Example |
---|---|---|
Local | local_webservice_deployment_config() |
deployment_config <- local_webservice_deployment_config(port = 8890) |
Azure Container Instances (ACI) | aci_webservice_deployment_config() |
deployment_config <- aci_webservice_deployment_config(cpu_cores = 1, memory_gb = 1) |
Azure Kubernetes Service (AKS) | aks_webservice_deployment_config() |
deployment_config <- aks_webservice_deployment_config(cpu_cores = 1, memory_gb = 1) |
Finally, deploy your model(s) as a web service to the target of your
choice. To deploy the model(s), you will provide the inference
configuration and deployment configuration you created in the above
steps, in addition to the models you want to deploy, to deploy_model()
.
If you are deploying to AKS, you will also have to provide the AKS
compute target.
To deploy a model locally, you need to have Docker installed on your local machine. If you are deploying locally from a compute instance, Docker will already be installed.
For an example of local deployment, see the deploy-to-local sample.
For an example of deploying to ACI, see the train-and-deploy-to-aci vignette.
To deploy a model to AKS, you will first need an AKS cluster for the deployment compute target. You can either
create_aks_compute()
attach_aks_compute()
You can instead also create or attach an AKS cluster via the CLI or studio UI.
For an example of deploying to AKS, see the deploy-to-aks vignette.
If your service deployment fails, you can use get_webservice_logs()
to inspect the detailed Docker engine log messages from your web service
deployment. Note that if your initial deployment fails and you want to
attempt a new deployment, you will first need to delete the original web
service if you want to use the same web service name. You can use the delete_webservice()
method.
For a more detailed guide on working around or solving common deployment errors, see Troubleshooting AKS and ACI deployments.
The easiest way to authenticate to deployed web services is to use key-based authentication, which generates static bearer-type authentication keys that do not need to be refreshed.
AKS deployments additionally support token-based auth.
The primary difference is that keys are static and can be regenerated manually, and tokens need to be refreshed upon expiration.
Authentication method | ACI | AKS |
---|---|---|
Key | Disabled by default | Enabled by default |
Token | Not available | Disabled by default |
Web services deployed on AKS have key-based auth enabled by default.
ACI-deployed services have key-based auth disabled by default, but you
can enable it by setting auth_enabled = TRUE
when creating
the ACI web service. The following is an example of creating an ACI
deployment configuration with key-based auth enabled.
deployment_config <- aci_webservice_deployment_config(cpu_cores = 1,
memory_gb = 1,
auth_enabled = TRUE)
To fetch the auth keys, use get_webservice_keys()
.
To regenerate a key, use the generate_new_webservice_key()
function:
When you enable token authentication for a web service, users must present an Azure Machine Learning JSON Web Token (JWT) to the web service to access it. The token expires after a specified timeframe and needs to be refreshed to continue making calls.
Token auth is disabled by default when you deploy to AKS. To control
token auth, use the token_auth_enabled
parameter when you
create or update a deployment:
deployment_config <- aks_webservice_deployment_config(cpu_cores = 1,
memory_gb = 1,
token_auth_enabled = TRUE)
If token authentication is enabled, you can use the get_webservice_token()
method to retrieve a JWT. You will need to request a new token by the
token’s refresh_after
time.
aks_service_access_token <- get_webservice_token(service)
# Get the JWT
jwt <- aks_service_access_token$access_token
# Get the time after which token should be refreshed
refresh_after <- aks_service_access_token$refresh_after
We strongly recommend that you create your Azure ML workspace in the same region as your AKS cluster. To authenticate with a token, the web service will make a call to the region in which your workspace is created. If your workspace’s region is unavailable, then you will not be able to fetch a token for your web service, even if your cluster is in a different region than your workspace. This effectively results in token-based auth being unavailable until your workspace’s region is available again. In addition, the greater the distance between your cluster’s region and your workspace’s region, the longer it will take to fetch a token.
For more information on authentication in Azure ML, see Set up authentication for Azure Machine Learning resources and workflows.
Every deployed web service provides a REST endpoint, so you can create client applications in any programming language. If you’ve enabled key-based authentication for your service, you need to provide a service key as a token in your request header. If you’ve enabled token-based authentication for your service, you need to provide an Azure Machine Learning JSON Web Token (JWT) as a bearer token in your request header.
To get the endpoint for the deployed web service, use the scoring_uri property:
service$scoring_uri
You can also retrieve the schema JSON document after you deploy the service. Use the swagger_uri property from the deployed web service to get the URI to the local web service’s Swagger file:
service$swagger_uri
You can then use the scoring URI and a package such as httr to invoke the web service via request-response consumption.
Optionally, you can use the invoke_webservice()
method from azuremlsdk to directly invoke the web
service if you have the web service object:
library(jsonlite)
newdata <- data.frame( # valid values shown below
dvcat="10-24", # "1-9km/h" "10-24" "25-39" "40-54" "55+"
seatbelt="none", # "none" "belted"
frontal="frontal", # "notfrontal" "frontal"
sex="f", # "f" "m"
ageOFocc=22, # age in years, 16-97
yearVeh=2002, # year of vehicle, 1955-2003
airbag="none", # "none" "airbag"
occRole="pass" # "driver" "pass"
)
prob <- invoke_webservice(service, toJSON(newdata))
prob
To update a web service, use the corresponding
update_*()
method. You can update the web service to use a
new model, a new entry script, or new dependencies that can be specified
in an inference configuration.
update_local_webservice()
update_aci_webservice()
update_aks_webservice()
delete_webservice()
delete_local_webservice()
delete_model()
For additional resources on model deployments, you can refer to the following: