Predibase supports a wide range of Vision Language Models (VLMs) for image
understanding.
Vision Language Model support is currently in beta. If you encounter any
issues, please reach out at support@predibase.com.
Quick Start
First, install the Predibase Python SDK:
Deploying a Vision Model
Vision models require private deployments:
from predibase import Predibase, DeploymentConfig
pb = Predibase(api_token="<API_TOKEN>")
# Create a production deployment with Qwen2.5-VL
deployment = pb.deployments.create(
name="my-vision-model",
config=DeploymentConfig(
base_model="qwen2-5-vl-7b-instruct",
min_replicas=0, # Scale down to 0 replicas
max_replicas=1, # Scale up to 1 replica automatically when you get requests
accelerator="a100_80gb_100", # Use A100
speculator="disabled",
)
)
# Use your private deployment
client = pb.deployments.client("my-vision-model")
We suggest using OpenAI chat completions for querying your deployment.
You can provide images as either:
- Public URLs
- Base64-encoded byte strings
Using Public Image URLs
Process images from publicly accessible URLs:
from openai import OpenAI
# Initialize client
api_token = "<PREDIBASE_API_TOKEN>"
tenant_id = "<PREDIBASE_TENANT_ID>"
model_name = "<DEPLOYMENT_NAME>" # Ex. "qwen2-5-vl-7b-instruct"
adapter = "<ADAPTER_REPO_NAME>/<VERSION_NUMBER>" # Ex. "adapter-repo/1" (optional)
base_url = f"https://serving.app.predibase.com/{tenant_id}/deployments/v2/llms/{model_name}/v1"
client = OpenAI(
api_key=api_token,
base_url=base_url
)
# Chat completion
completion = client.chat.completions.create(
model=adapter, # Use empty string "" for base model
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is this an image of?"
},
{
"type": "image_url",
"image_url": {
"url": <IMAGE_URL_HERE>
}
}
]
}
],
max_tokens=100
)
print(completion.choices[0].message.content)
If you want to query using local images, you can base64 encode them
first and then pass them to the deployment:
from openai import OpenAI
# Initialize client
api_token = "<PREDIBASE_API_TOKEN>"
tenant_id = "<PREDIBASE_TENANT_ID>"
model_name = "<DEPLOYMENT_NAME>" # Ex. "qwen2-5-vl-7b-instruct"
adapter = "<ADAPTER_REPO_NAME>/<VERSION_NUMBER>" # Ex. "adapter-repo/1" (optional)
base_url = f"https://serving.app.predibase.com/{tenant_id}/deployments/v2/llms/{model_name}/v1"
client = OpenAI(
api_key=api_token,
base_url=base_url
)
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
encoded_image = encode_image(<PATH_TO_LOCAL_IMAGE>)
# Chat completion
completion = client.chat.completions.create(
model=adapter, # Use empty string "" for base model
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": "What is this an image of?"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{encoded_image}"
}
}
]
}
],
max_tokens=100
)
print(completion.choices[0].message.content)
Supported Models
The following VLMs are officially supported for deployment on Predibase. Other
VLMs with the same architectures can be deployed on a best-effort basis from
Hugging Face.
Deployment Name | Parameters | Architecture | License | Context Window | Always-On Shared Endpoint |
---|
qwen2-vl-7b-instruct | 7B | Qwen2 | Tongyi Qianwen | 32K | ❌ |
qwen2-5-vl-3b-instruct | 3B | Qwen2.5 | Tongyi Qianwen | 32K | ❌ |
qwen2-5-vl-7b-instruct | 7B | Qwen2.5 | Tongyi Qianwen | 32K | ❌ |
llama-3-2-11b-vision* | 11B | Llama-3 | Meta (request for commercial use) | 32K | ❌ |
llama-3-2-11b-vision-instruct* | 11B | Llama-3 | Meta (request for commercial use) | 32K | ❌ |
*We only support base model inference for these models
Next Steps
- Fine-tune vision models for your
specific use case
- Set up a private deployment for production
use