Vision Models

Predibase supports a wide range of Vision Language Models (VLMs) for image understanding.

Vision Language Model support is currently in beta. If you encounter any issues, please reach out at support@predibase.com.

Quick Start

First, install the Predibase Python SDK:

pip install -U predibase

Deploying a Vision Model

Vision models require private deployments:

from predibase import Predibase, DeploymentConfig

pb = Predibase(api_token="<API_TOKEN>")

# Create a production deployment with Qwen2.5-VL
deployment = pb.deployments.create(
    name="my-vision-model",
    config=DeploymentConfig(
        base_model="qwen2-5-vl-7b-instruct",
        min_replicas=0,  # Scale down to 0 replicas
        max_replicas=1,  # Scale up to 1 replica automatically when you get requests
        accelerator="a100_80gb_100", # Use A100
        speculator="disabled",
    )
)

# Use your private deployment
client = pb.deployments.client("my-vision-model")

Image Input Format

We suggest using OpenAI chat completions for querying your deployment. You can provide images as either:

Public URLs
Base64-encoded byte strings

Order of Content

Regardless of the image format, the order of the content passed in impacts the order in which the model receives the content. We highly recommend placing the image BEFORE the text prompt unless a specific order is required. Also note that we allow for multiple images to be passed in, as well as interleaved images and text. For example:

messages=[
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": <FIRST_IMAGE_URL_HERE>
                }
            },
            {
                "type": "text",
                "text": "Take a look at the image I just provided. Then take a look at this one:"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": <SECOND_IMAGE_URL_HERE>
                }
            },
            {
                "type": "text",
                "text": "What are the differences between the two images?"
            }
        ]
    }
],

Using Public Image URLs

Process images from publicly accessible URLs:

from openai import OpenAI

# Initialize client
api_token = "<PREDIBASE_API_TOKEN>"
tenant_id = "<PREDIBASE_TENANT_ID>"
model_name = "<DEPLOYMENT_NAME>"  # Ex. "qwen2-5-vl-7b-instruct"
adapter = "<ADAPTER_REPO_NAME>/<VERSION_NUMBER>"  # Ex. "adapter-repo/1" (optional)
base_url = f"https://serving.app.predibase.com/{tenant_id}/deployments/v2/llms/{model_name}/v1"

client = OpenAI(
    api_key=api_token,
    base_url=base_url
)

# Chat completion
completion = client.chat.completions.create(
    model=adapter,  # Use empty string "" for base model
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": <IMAGE_URL_HERE>
                    }
                },
                {
                    "type": "text",
                    "text": "What is this an image of?"
                }
            ]
        }
    ],
    max_tokens=100
)
print(completion.choices[0].message.content)

Using Local Images

If you want to query using local images, you can base64 encode them first and then pass them to the deployment:

from openai import OpenAI

# Initialize client
api_token = "<PREDIBASE_API_TOKEN>"
tenant_id = "<PREDIBASE_TENANT_ID>"
model_name = "<DEPLOYMENT_NAME>"  # Ex. "qwen2-5-vl-7b-instruct"
adapter = "<ADAPTER_REPO_NAME>/<VERSION_NUMBER>"  # Ex. "adapter-repo/1" (optional)
base_url = f"https://serving.app.predibase.com/{tenant_id}/deployments/v2/llms/{model_name}/v1"

client = OpenAI(
    api_key=api_token,
    base_url=base_url
)

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

encoded_image = encode_image(<PATH_TO_LOCAL_IMAGE>)

# Chat completion
completion = client.chat.completions.create(
    model=adapter,  # Use empty string "" for base model
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{encoded_image}"
                    }
                },
                {
                    "type": "text",
                    "text": "What is this an image of?"
                }
            ]
        }
    ],
    max_tokens=100
)
print(completion.choices[0].message.content)

Supported Models

The following VLMs are officially supported for deployment on Predibase. Other VLMs with the same architectures can be deployed on a best-effort basis from Hugging Face.

Deployment Name	Parameters	Architecture	License	Context Window	Always-On Shared Endpoint
qwen2-vl-7b-instruct	7B	Qwen2	Tongyi Qianwen	32K	❌
qwen2-5-vl-3b-instruct	3B	Qwen2.5	Tongyi Qianwen	32K	❌
qwen2-5-vl-7b-instruct	7B	Qwen2.5	Tongyi Qianwen	32K	❌
llama-3-2-11b-vision*	11B	Llama-3	Meta (request for commercial use)	32K	❌
llama-3-2-11b-vision-instruct*	11B	Llama-3	Meta (request for commercial use)	32K	❌

*We only support base model inference for these models

Next Steps

Fine-tune vision models for your specific use case
Set up a private deployment for production use

Getting Started

Inference

Fine-Tuning

Account

Integrations

Examples

Resources

Quick Start

Deploying a Vision Model

Image Input Format

Order of Content

Using Public Image URLs

Using Local Images

Supported Models

Next Steps

Getting Started

Inference

Fine-Tuning

Account

Integrations

Examples

Resources

​Quick Start

​Deploying a Vision Model

​Image Input Format

​Order of Content

​Using Public Image URLs

​Using Local Images

​Supported Models

​Next Steps

Quick Start

Deploying a Vision Model

Image Input Format

Order of Content

Using Public Image URLs

Using Local Images

Supported Models

Next Steps