Predibase supports a wide range of Vision Language Models (VLMs) for image understanding.

Vision Language Model support is currently in beta. If you encounter any issues, please reach out at support@predibase.com.

Quick Start

First, install the Predibase Python SDK:

pip install -U predibase

Deploying a Vision Model

Vision models require private deployments:

from predibase import Predibase, DeploymentConfig

pb = Predibase(api_token="<API_TOKEN>")

# Create a production deployment with Qwen2.5-VL
deployment = pb.deployments.create(
    name="my-vision-model",
    config=DeploymentConfig(
        base_model="qwen2-5-vl-7b-instruct",
        min_replicas=0,  # Scale down to 0 replicas
        max_replicas=1,  # Scale up to 1 replica automatically when you get requests
        accelerator="a100_80gb_100", # Use A100
        speculator="disabled",
    )
)

# Use your private deployment
client = pb.deployments.client("my-vision-model")

Image Input Format

We suggest using OpenAI chat completions for querying your deployment. You can provide images as either:

  • Public URLs
  • Base64-encoded byte strings

Using Public Image URLs

Process images from publicly accessible URLs:

from openai import OpenAI

# Initialize client
api_token = "<PREDIBASE_API_TOKEN>"
tenant_id = "<PREDIBASE_TENANT_ID>"
model_name = "<DEPLOYMENT_NAME>"  # Ex. "qwen2-5-vl-7b-instruct"
adapter = "<ADAPTER_REPO_NAME>/<VERSION_NUMBER>"  # Ex. "adapter-repo/1" (optional)
base_url = f"https://serving.app.predibase.com/{tenant_id}/deployments/v2/llms/{model_name}/v1"

client = OpenAI(
    api_key=api_token,
    base_url=base_url
)

# Chat completion
completion = client.chat.completions.create(
    model=adapter,  # Use empty string "" for base model
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What is this an image of?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": <IMAGE_URL_HERE>
                    }
                }
            ]
        }
    ],
    max_tokens=100
)
print(completion.choices[0].message.content)

If you want to query using local images, you can base64 encode them first and then pass them to the deployment:

from openai import OpenAI

# Initialize client
api_token = "<PREDIBASE_API_TOKEN>"
tenant_id = "<PREDIBASE_TENANT_ID>"
model_name = "<DEPLOYMENT_NAME>"  # Ex. "qwen2-5-vl-7b-instruct"
adapter = "<ADAPTER_REPO_NAME>/<VERSION_NUMBER>"  # Ex. "adapter-repo/1" (optional)
base_url = f"https://serving.app.predibase.com/{tenant_id}/deployments/v2/llms/{model_name}/v1"

client = OpenAI(
    api_key=api_token,
    base_url=base_url
)

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

encoded_image = encode_image(<PATH_TO_LOCAL_IMAGE>)

# Chat completion
completion = client.chat.completions.create(
    model=adapter,  # Use empty string "" for base model
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What is this an image of?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{encoded_image}"
                    }
                }
            ]
        }
    ],
    max_tokens=100
)
print(completion.choices[0].message.content)

Supported Models

The following VLMs are officially supported for deployment on Predibase. Other VLMs with the same architectures can be deployed on a best-effort basis from Hugging Face.

Deployment NameParametersArchitectureLicenseContext WindowAlways-On Shared Endpoint
qwen2-vl-7b-instruct7BQwen2Tongyi Qianwen32K
qwen2-5-vl-3b-instruct3BQwen2.5Tongyi Qianwen32K
qwen2-5-vl-7b-instruct7BQwen2.5Tongyi Qianwen32K
llama-3-2-11b-vision*11BLlama-3Meta (request for commercial use)32K
llama-3-2-11b-vision-instruct*11BLlama-3Meta (request for commercial use)32K

*We only support base model inference for these models

Next Steps

  1. Fine-tune vision models for your specific use case
  2. Set up a private deployment for production use