OpenVINO Model Server predictor extension for FuseML workflows
Overview
The OVMS predictor workflow step can be used to create and manage OpenVINO Model Server inference servers to serve input ML models as part of the execution of FuseML workflows. The OVMS predictor only accepts models in IR (Intermediate Representation) format as input. The OVMS converter workflow extension can be used to convert models to IR format.
The OVMS predictor step expects a model URL to be supplied as input, pointing to the location in a remote artifact store where the model is stored. The protocols supported for the remote artifact store are the same as those supported by the OVMS implementation:
- AWS S3 or S3 compatible
- GCS (authentication credentials not supported)
- Azure Blob Storage (authentication credentials not supported)
The predictor performs the following tasks:
- it creates an OVMS prediction service instance to serve the model, and an Istio virtualservice that exposes the prediction service
- it registers the OVMS prediction service with FuseML as an Application object. Information about the Application, such as the type and exposed inference URL can be retrieved at any time using the FuseML API and CLI.
The OVMS predictor has a single output: the URL where the prediction service can be accessed to process inference requests.
The Dockerfile and associated scripts that implement the OVMS predictor container image are available in the FuseML extensions repository.
The OVMS predictor is featured in a number of FuseML tutorials, such as:
- Benchmarking ML Models on Intel CPUs with Intel OpenVINO
- FuseML Extension Development Use-Case - OpenVINO
Using the OVMS Predictor Step
The workflow example below shows how to create a FuseML workflow that uses the OVMS predictor step to serve a model from a remote artifact store. The model is already in IR (Intermediate Representation) format. For models that are not in IR format, the OVMS converter step can be used as part of the same workflow to convert the model before it is served with OVMS.
name: ovms-workflow
description: |
Workflow that takes in a URL containing one of more OVMS models in IR format
then uses OVMS to deploy a prediction service that can be used to run predictions
against the model.
inputs:
- name: model
description: URL where an IR model is stored
type: string
default: gs://ovms-public-eu/resnet50-binary
outputs:
- name: prediction-url
description: "The URL where the exposed prediction service endpoint can be contacted to run predictions."
type: string
steps:
- name: ovms-predictor
image: ghcr.io/fuseml/ovms-predictor:v0.3.0
inputs:
- name: model
value: '{{ inputs.model }}'
- name: app_name
value: resnet50-ovms
- name: resources
value: '{"requests": {"cpu": 1}}'
- name: verbose
value: false
outputs:
- name: prediction-url
extensions:
- name: ovms-operator
service_resource: ovms-operator
Aside from the mandatory model
input parameter that needs to be a URL pointing to the location of a trained ML model in IR format saved in an artifact store, the OVMS predictor workflow step also accepts the following optional input parameters that can be used to customize how the prediction service is created and updated:
model_name
: the name used to identify the model in the OVMS prediction API (default value is:default
)ovms_image_tag
: the version of the OVMS container image to use (default value is:2021.4.1
)loglevel
,nireq
,plugin_config
,batch_size
,shape
,target_device
andlayout
are parameters that are mapped to the corresponding OVMS deployment parameters. See the OVMS documentation for a description of these parameters.replicas
: number of replicas for the OVMS prediction serviceresources
: can be used to configure Kubernetes resource requests and limits for the OVMS prediction serviceprediction_type
: determines the type of TensorFlow Serving prediction requests that the returned prediction URL is made for:classify
,regress
orpredict
. The default value ispredict
.-
app_name
: use this to explicitly set the name of the FuseML application used to represent the OVMS prediction service. Its value also determines the prediction URL as well as the names of the Kubernetes resources created by the OVMS predictor. If not set explicitly, the application name is constructed by combining the workflow name with the name of an input codeset name and project, if one is provided as input. In the absence of an input codeset, the application name is generated by concatenating the workflow name with a randomly generated string.Note
Choosing a value for the
app_name
parameter should be done with care, as it is used to uniquely identify a FuseML application and its associated Kubernetes resources (i.e. the name of the OVMS prediction service, prediction URL etc.). It can lead to a situation where the same OVMS prediction service is managed by more than one FuseML workflows. In this case, the results can be unpredictable, because multiple workflows will compete over managing the same application.Warning
If an
app_name
value is not provided and the predictor step doesn't receive an input codeset, the generated application name will be random, which means that every workflow run will create a new application and prediction service. This should be avoided, as it easily lead to resource exhaustion. -
resources_limits
- use to set the Kubernetes resource limits to allocate hardware resources to the prediction service. E.g:- name: resources_limits value: '{nvidia.com/gpu: 1}'
-
verbose
- set totrue
to enable verbose logging in the predictor workflow step (default isfalse
).
The OVMS runtime workflow step can also take in some environment variables that are used to configure the credentials of the remote artifact store where the input ML model is stored:
Note
Some of these environment variables contain sensitive data, such as keys and passwords and should not be explicitly configured as workflow step env vars. Instead, they should be registered in the FuseML Extension Registry and only referenced in the FuseML workflows as extension requirements.
S3_ENDPOINT
- required when the ML model is stored in a custom S3 compatible artifact store such as minioAWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
- credentials for the AWS S3 and S3-compatible artifact store