Predibase supports a wide range of Vision Language Models (VLMs) for image
understanding.
Vision Language Model support is currently in beta. If you encounter any
issues, please reach out at support@predibase.com.
Quick Start
First, install the Predibase Python SDK:
Deploying a Vision Model
Vision models require private deployments:
from predibase import Predibase, DeploymentConfig
pb = Predibase(api_token="<API_TOKEN>")
# Create a production deployment with Qwen2.5-VL
deployment = pb.deployments.create(
name="my-vision-model",
config=DeploymentConfig(
base_model="qwen2-5-vl-7b-instruct",
min_replicas=0, # Scale down to 0 replicas
max_replicas=1, # Scale up to 1 replica automatically when you get requests
accelerator="a100_80gb_100", # Use A100
speculator="disabled",
)
)
# Use your private deployment
client = pb.deployments.client("my-vision-model")
We suggest using OpenAI chat completions for querying your deployment.
You can provide images as either:
- Public URLs
- Base64-encoded byte strings
Order of Content
Regardless of the image format, the order of the content passed in
impacts the order in which the model receives the content. We highly
recommend placing the image BEFORE the text prompt unless a specific
order is required. Also note that we allow for multiple images to be
passed in, as well as interleaved images and text. For example:
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": <FIRST_IMAGE_URL_HERE>
}
},
{
"type": "text",
"text": "Take a look at the image I just provided. Then take a look at this one:"
},
{
"type": "image_url",
"image_url": {
"url": <SECOND_IMAGE_URL_HERE>
}
},
{
"type": "text",
"text": "What are the differences between the two images?"
}
]
}
],
Using Public Image URLs
Process images from publicly accessible URLs:
from openai import OpenAI
# Initialize client
api_token = "<PREDIBASE_API_TOKEN>"
tenant_id = "<PREDIBASE_TENANT_ID>"
model_name = "<DEPLOYMENT_NAME>" # Ex. "qwen2-5-vl-7b-instruct"
adapter = "<ADAPTER_REPO_NAME>/<VERSION_NUMBER>" # Ex. "adapter-repo/1" (optional)
base_url = f"https://serving.app.predibase.com/{tenant_id}/deployments/v2/llms/{model_name}/v1"
client = OpenAI(
api_key=api_token,
base_url=base_url
)
# Chat completion
completion = client.chat.completions.create(
model=adapter, # Use empty string "" for base model
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": <IMAGE_URL_HERE>
}
},
{
"type": "text",
"text": "What is this an image of?"
}
]
}
],
max_tokens=100
)
print(completion.choices[0].message.content)
Using Local Images
If you want to query using local images, you can base64 encode them
first and then pass them to the deployment:
from openai import OpenAI
# Initialize client
api_token = "<PREDIBASE_API_TOKEN>"
tenant_id = "<PREDIBASE_TENANT_ID>"
model_name = "<DEPLOYMENT_NAME>" # Ex. "qwen2-5-vl-7b-instruct"
adapter = "<ADAPTER_REPO_NAME>/<VERSION_NUMBER>" # Ex. "adapter-repo/1" (optional)
base_url = f"https://serving.app.predibase.com/{tenant_id}/deployments/v2/llms/{model_name}/v1"
client = OpenAI(
api_key=api_token,
base_url=base_url
)
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
encoded_image = encode_image(<PATH_TO_LOCAL_IMAGE>)
# Chat completion
completion = client.chat.completions.create(
model=adapter, # Use empty string "" for base model
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{encoded_image}"
}
},
{
"type": "text",
"text": "What is this an image of?"
}
]
}
],
max_tokens=100
)
print(completion.choices[0].message.content)
Supported Models
The following VLMs are officially supported for deployment on Predibase. Other
VLMs with the same architectures can be deployed on a best-effort basis from
Hugging Face.
Deployment Name | Parameters | Architecture | License | Context Window | Always-On Shared Endpoint |
---|
qwen2-vl-7b-instruct | 7B | Qwen2 | Tongyi Qianwen | 32K | ❌ |
qwen2-5-vl-3b-instruct | 3B | Qwen2.5 | Tongyi Qianwen | 32K | ❌ |
qwen2-5-vl-7b-instruct | 7B | Qwen2.5 | Tongyi Qianwen | 32K | ❌ |
llama-3-2-11b-vision* | 11B | Llama-3 | Meta (request for commercial use) | 32K | ❌ |
llama-3-2-11b-vision-instruct* | 11B | Llama-3 | Meta (request for commercial use) | 32K | ❌ |
*We only support base model inference for these models
Next Steps
- Fine-tune vision models for your
specific use case
- Set up a private deployment for production use