Real-time Serving (Beta)
By deploying a trained model version for real-time-serving, you can deploy your model on dedicated serving engines to enable high-throughput inference via a hosted endpoint.
Creating a Deployment
Deploy button will bring up the "Create Deployment" modal, which will ask for three inputs:
- The deployment name. This should be unique across the organization.
- The serving engine. This is the engine that will host your deployment.
- Comment. This can be anything helpful to describe the deployment in detail.
Once you have clicked
Deploy on the above page, the model will begin the deployment process. Note that this can take up to 10 minutes, especially if it's the first time deploying to a particular serving engine.
Dropping a Deployment
Model deployments can be shown through PQL, the SDK, or the UI. To see all deployments for a given Model Repository, click on the deployments tab on the Model Repository page.