Predibase supports a wide range of Vision Language Models (VLMs) for image
understanding.
Vision Language Model support is currently in beta. If you encounter any
issues, please reach out at support@predibase.com.
Quick Start
First, install the Predibase Python SDK:
Deploying a Vision Model
Vision models require private deployments:
from predibase import Predibase, DeploymentConfig
pb = Predibase(api_token="<API_TOKEN>")
# Create a production deployment with Llama-3 Vision
deployment = pb.deployments.create(
name="my-vision-model",
config=DeploymentConfig(
base_model="llama-3-2-11b-vision-instruct",
min_replicas=0, # Scale down to 0 replicas
max_replicas=1, # Scale up to 1 replica automatically when you get requests
accelerator="a100_80gb_100", # Use A100
speculator="disabled",
)
)
# Use your private deployment
client = pb.deployments.client("my-vision-model")
VLM deployments use the same Generate API as language models, with support for
images in the input. You can provide images as either:
- Public URLs
- Base64-encoded byte strings
Insert one or more images into your prompt using this syntax:
Using Public Image URLs
Process images from publicly accessible URLs:
from predibase import Predibase
pb = Predibase(api_token="<API_TOKEN>")
client = pb.deployments.client("my-vision-model")
# Example using a public image URL
image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png"
response = client.generate(f"What is this a picture of?")
print(response.generated_text)
# Stream the response token by token
for response in client.generate_stream(f"Describe this image in detail:"):
if not response.token.special:
print(response.token.text, end="", flush=True)
Using Local Images
Process images stored on your local machine:
import base64
from predibase import Predibase
pb = Predibase(api_token="<API_TOKEN>")
client = pb.deployments.client("my-vision-model")
# Load and encode a local image
image_path = "/path/to/image.png"
with open(image_path, "rb") as f:
image = base64.b64encode(f.read()).decode("utf-8")
image_bytes = f"data:image/png;base64,{image}"
# Generate description for the local image
response = client.generate(f"Describe what you see in this image.")
print(response.generated_text)
# Stream the response for local image
for response in client.generate_stream(f"Analyze this image in detail:"):
if not response.token.special:
print(response.token.text, end="", flush=True)
REST API Integration
Access VLMs through the REST API for language-agnostic integration:
# Using curl with a public image URL
IMAGE_URL="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png"
# Standard request
curl https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate \
-X POST \
-d '{
"inputs": "What is this a picture of?"
}' \
-H "Authorization: Bearer $PREDIBASE_API_TOKEN" \
-H "Content-Type: application/json"
# Streaming request
curl https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate_stream \
-X POST \
-d '{
"inputs": "What is this a picture of?"
}' \
-H "Authorization: Bearer $PREDIBASE_API_TOKEN" \
-H "Content-Type: application/json"
# Using curl with a local image
IMAGE_PATH="/path/to/image.png"
IMAGE_BYTES=$(base64 -w 0 $IMAGE_PATH)
IMAGE_BYTES="data:image/png;base64,$IMAGE_BYTES"
curl https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate \
-X POST \
-d '{
"inputs": "What is this a picture of?"
}' \
-H "Authorization: Bearer $PREDIBASE_API_TOKEN" \
-H "Content-Type: application/json"
Supported Models
The following VLMs are officially supported for deployment on Predibase. Other
VLMs with the same architectures can be deployed on a best-effort basis from
Hugging Face.
Deployment Name | Parameters | Architecture | License | Context Window | Always-On Shared Endpoint |
---|
llama-3-2-11b-vision | 11B | Llama-3 | Meta (request for commercial use) | 32K | ❌ |
llama-3-2-11b-vision-instruct | 11B | Llama-3 | Meta (request for commercial use) | 32K | ❌ |
Next Steps
- Fine-tune vision models for your
specific use case
- Set up a private deployment for production
use