How to deploy a custom model on AI Platform (Unified)

Archana Inapudi
6 min readDec 16, 2020

Introduction

Google’s AI Platform (Unified) brings AutoML and AI Platform (Classic) together into a unified API, client library, and user interface.

AutoML allows you to train models on image, video, and tabular datasets without writing code, while training in AI Platform (Classic) lets you run custom training code. With AI Platform (Unified), both AutoML training and custom training are available options. Whichever option you choose for training, you can save models, deploy models and request predictions with AI Platform (Unified).

*AI Platform in this document refers to the Unified version unless mentioned.

AI Platform can be used to manage the following stages in the ML workflow:

  1. Define and upload a dataset.
  2. Train an ML model

a. Train model

b. Evaluate model accuracy

c. Tune hyper parameters (custom training only)

3. Upload and store your model in the AI Platform.

4. Deploy your trained model and get an endpoint for serving predictions.

5. Send prediction requests to your endpoint.

6. Specify a prediction traffic split in your endpoint.

7. Manage models and endpoints.

This document focuses on stages 3, 4 and 5 above, additional information can be found in documentation here.

Models can be deployed on AI Platform and served using an endpoint. Models that are not trained on AI Platform can also be deployed for prediction.

For the purposes of this demo, a model was trained on AutoML as a “Mobile Best Trade-Off” model type. The resulting saved_model.pb is exported to Google Cloud Storage.

Deploying a model to the AI Platform

  1. Create a new Project
  2. From the Navigation Menu, select AI Platform (Unified) under ARTIFICIAL INTELLIGENCE
AI Platform (Unified) menu option

3. Enable AI Platform API

Enable AI Platform API

AI Platform Dashboard is displayed after enabling the API

AI Platform Dashboard

4. Select Models from the Navigation Menu

Models

5. Click IMPORT, to import a new model

Enter the model name.

Model Name and Region

6. Select model settings.

For this demo, Tensorflow 1.15 version was selected. Browse to the GCS path where the saved_model.pb file is stored. Predict schema can be left blank.

Model settings

7. Deploy the model to an endpoint

The model should be deployed to an endpoint in order to serve online predictions. Batch predictions can be set up without an endpoint. This demo focuses on setting up online predictions. Select “Deploy to Endpoint”

Deploy model to endpoint

8. Name the endpoint and select settings.

More than one model can be deployed to an endpoint and the Traffic split field can be used to split prediction request traffic between the models. Each model can also be deployed to multiple endpoints. For this demo, we deploy one model to one endpoint leaving the Traffic split as default.

Minimum compute nodes can be set as 1 (default) or more. These compute nodes run even when there is no traffic. Deploy/undeploy section below highlights a way to programmatically deploy models when there is demand and undeploy when there is no traffic demand.

Maximum compute nodes is an optional field which will enable autoscaling.

Endpoint settings

9. Select “Deploy”

Deploy endpoint

Sending an online prediction request

Deploying the model to an endpoint enables the model for online predictions. A sample request is provided on the console for reference.

Request an online prediction by sending input data instances as a JSON string in a predict request. Below are gcloud and Python sample codes for formatting prediction requests.

gcloud command

  1. Convert the image file into json

For online prediction send a JSON request like:

{‘instances’ : [{‘key’: ‘first_key’, ‘image_bytes’ : {‘b64’: …}},{‘key’: ‘second_key’, ‘image_bytes’: {‘b64’: …}}]}

To format the request as above, convert the image using base64 saving the output as json.

with open("sample_image.jpg", "rb") as f:
file_content = f.read()
base64.b64encode(file_content).decode('utf-8')

2. Use gcloud command to send request to the endpoint

gcloud beta ai endpoints predict <endpointID> — region=us-central1 — json-request=sample_image.json

Endpoint ID is found in the sample request on AI Platform.

Python

The code below showcases the prediction request in Python. The input image is converted into a payload and passed to the request.

Response from online prediction

The response from the above request is depicted below. The prediction returns the labels and the confidence score for each label from the model. Unlike AutoML online predictions, the response contains confidence scores for all the labels. This helps in defining a more strategic pipeline logic.

responsedeployed_model_id: 3811910056476147712predictionsprediction: {'key': 'first_key', 'labels': [string_value: "sunflowers", string_value: "tulips", string_value: "dandelion", string_value: "roses", string_value: "daisy"], 'scores': [number_value: 0.0357748382, number_value: 0.0445411801, number_value: 0.0346666723, number_value: 0.0520200431, number_value: 0.973080397]}

Deploy and undeploy models

Deploying models to endpoints consumes compute resources even when there is no traffic demand. In order to avoid the associated costs, the models can be undeployed and deployed as needed.

Deploy model

Endpoint can be created on the AI Platform console. Deploying the model to the endpoint can be accomplished by running the following gcloud command:

gcloud beta ai endpoints deploy-model <endpoint_id>\--region=us-central1 \--model=<model_id> \--display-name=<display_name> \--machine-type=<machine_type> \--min-replica-count=1 \--traffic-split=0=100

EndpointID and modelID can be found on the console.

Undeploy model

Undeploying the model from the endpoint, retains both the model and the endpoint for deploying in the future. Deployed-Model-ID is not found on the console and can be retrieved by running the describe command on the endpoint.

gcloud beta ai endpoints describe <endpoint_id>\--region=us-central1

Response from the above command:

Using endpoint [https://us-central1-aiplatform.googleapis.com/]createTime: '2020-12-09T17:01:58.195128Z'deployedModels:- createTime: '2020-12-09T17:01:58.195128Z'dedicatedResources:machineSpec:machineType: n1-standard-4maxReplicaCount: 1minReplicaCount: 1displayName: <model_display_name>id: <deployed_model_id>model: projects/<project_id>/locations/us-central1/models/<model_id>displayName: <endpoint_display_name>etag: AMEw9yNxgpwB03iUKZtjORl5fTuzmrEMHLJeqbTHh-flyKNWcsmoreNQdND9T8HlCwl6name: projects/<project_id>/locations/us-central1/endpoints/<endpoint_id>trafficSplit:<deployed_model_id>: 100updateTime: '2020-12-09T17:10:07.852971Z'

Using the <deployed_model_id> from above response to undeploy model:

gcloud beta ai endpoints undeploy-model <endpoint_id>\--region=us-central1 \--deployed-model-id=<deployed_model_id>

Summary

My final takeaway is that AI Platform provides a single pane of glass for defining datasets, training models, importing custom models, deploying models to endpoints, running online and batch predictions. It provides all the AI tools in one platform making MLOps a breeze.

--

--