Predibase supports a wide range of Vision Language Models (VLMs) for image understanding.

Vision Language Model support is currently in beta. If you encounter any issues, please reach out at support@predibase.com.

Quick Start

First, install the Predibase Python SDK:

pip install -U predibase

Deploying a Vision Model

Vision models require private deployments:

from predibase import Predibase, DeploymentConfig

pb = Predibase(api_token="<API_TOKEN>")

# Create a production deployment with Qwen2.5-VL
deployment = pb.deployments.create(
    name="my-vision-model",
    config=DeploymentConfig(
        base_model="qwen2-5-vl-7b-instruct",
        min_replicas=0,  # Scale down to 0 replicas
        max_replicas=1,  # Scale up to 1 replica automatically when you get requests
        accelerator="a100_80gb_100", # Use A100
        speculator="disabled",
    )
)

# Use your private deployment
client = pb.deployments.client("my-vision-model")

Image Input Format

We suggest using OpenAI chat completions for querying your deployment. You can provide images as either:

  • Public URLs
  • Base64-encoded byte strings

Order of Content

Regardless of the image format, the order of the content passed in impacts the order in which the model receives the content. We highly recommend placing the image BEFORE the text prompt unless a specific order is required. Also note that we allow for multiple images to be passed in, as well as interleaved images and text. For example:

messages=[
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": <FIRST_IMAGE_URL_HERE>
                }
            },
            {
                "type": "text",
                "text": "Take a look at the image I just provided. Then take a look at this one:"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": <SECOND_IMAGE_URL_HERE>
                }
            },
            {
                "type": "text",
                "text": "What are the differences between the two images?"
            }
        ]
    }
],

Using Public Image URLs

Process images from publicly accessible URLs:

from openai import OpenAI

# Initialize client
api_token = "<PREDIBASE_API_TOKEN>"
tenant_id = "<PREDIBASE_TENANT_ID>"
model_name = "<DEPLOYMENT_NAME>"  # Ex. "qwen2-5-vl-7b-instruct"
adapter = "<ADAPTER_REPO_NAME>/<VERSION_NUMBER>"  # Ex. "adapter-repo/1" (optional)
base_url = f"https://serving.app.predibase.com/{tenant_id}/deployments/v2/llms/{model_name}/v1"

client = OpenAI(
    api_key=api_token,
    base_url=base_url
)

# Chat completion
completion = client.chat.completions.create(
    model=adapter,  # Use empty string "" for base model
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": <IMAGE_URL_HERE>
                    }
                },
                {
                    "type": "text",
                    "text": "What is this an image of?"
                }
            ]
        }
    ],
    max_tokens=100
)
print(completion.choices[0].message.content)

Using Local Images

If you want to query using local images, you can base64 encode them first and then pass them to the deployment:

from openai import OpenAI

# Initialize client
api_token = "<PREDIBASE_API_TOKEN>"
tenant_id = "<PREDIBASE_TENANT_ID>"
model_name = "<DEPLOYMENT_NAME>"  # Ex. "qwen2-5-vl-7b-instruct"
adapter = "<ADAPTER_REPO_NAME>/<VERSION_NUMBER>"  # Ex. "adapter-repo/1" (optional)
base_url = f"https://serving.app.predibase.com/{tenant_id}/deployments/v2/llms/{model_name}/v1"

client = OpenAI(
    api_key=api_token,
    base_url=base_url
)

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

encoded_image = encode_image(<PATH_TO_LOCAL_IMAGE>)

# Chat completion
completion = client.chat.completions.create(
    model=adapter,  # Use empty string "" for base model
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{encoded_image}"
                    }
                },
                {
                    "type": "text",
                    "text": "What is this an image of?"
                }
            ]
        }
    ],
    max_tokens=100
)
print(completion.choices[0].message.content)

Supported Models

The following VLMs are officially supported for deployment on Predibase. Other VLMs with the same architectures can be deployed on a best-effort basis from Hugging Face.

Deployment NameParametersArchitectureLicenseContext WindowAlways-On Shared Endpoint
qwen2-vl-7b-instruct7BQwen2Tongyi Qianwen32K
qwen2-5-vl-3b-instruct3BQwen2.5Tongyi Qianwen32K
qwen2-5-vl-7b-instruct7BQwen2.5Tongyi Qianwen32K
llama-3-2-11b-vision*11BLlama-3Meta (request for commercial use)32K
llama-3-2-11b-vision-instruct*11BLlama-3Meta (request for commercial use)32K

*We only support base model inference for these models

Next Steps

  1. Fine-tune vision models for your specific use case
  2. Set up a private deployment for production use