Predibase supports a wide range of Vision Language Models (VLMs) for image understanding.

Vision Language Model support is currently in beta. If you encounter any issues, please reach out at support@predibase.com.

Quick Start

First, install the Predibase Python SDK:

pip install -U predibase

Deploying a Vision Model

Vision models require private deployments:

from predibase import Predibase, DeploymentConfig

pb = Predibase(api_token="<API_TOKEN>")

# Create a production deployment with Llama-3 Vision
deployment = pb.deployments.create(
    name="my-vision-model",
    config=DeploymentConfig(
        base_model="llama-3-2-11b-vision-instruct",
        min_replicas=0,  # Scale down to 0 replicas
        max_replicas=1,  # Scale up to 1 replica automatically when you get requests
        accelerator="a100_80gb_100", # Use A100
        speculator="disabled",
    )
)

# Use your private deployment
client = pb.deployments.client("my-vision-model")

Image Input Format

VLM deployments use the same Generate API as language models, with support for images in the input. You can provide images as either:

  • Public URLs
  • Base64-encoded byte strings

Insert one or more images into your prompt using this syntax:

![](URL_OR_BYTES)

Using Public Image URLs

Process images from publicly accessible URLs:

from predibase import Predibase

pb = Predibase(api_token="<API_TOKEN>")
client = pb.deployments.client("my-vision-model")

# Example using a public image URL
image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png"
response = client.generate(f"![]({image_url})What is this a picture of?")
print(response.generated_text)

# Stream the response token by token
for response in client.generate_stream(f"![]({image_url})Describe this image in detail:"):
    if not response.token.special:
        print(response.token.text, end="", flush=True)

Using Local Images

Process images stored on your local machine:

import base64
from predibase import Predibase

pb = Predibase(api_token="<API_TOKEN>")
client = pb.deployments.client("my-vision-model")

# Load and encode a local image
image_path = "/path/to/image.png"
with open(image_path, "rb") as f:
    image = base64.b64encode(f.read()).decode("utf-8")
image_bytes = f"data:image/png;base64,{image}"

# Generate description for the local image
response = client.generate(f"![]({image_bytes})Describe what you see in this image.")
print(response.generated_text)

# Stream the response for local image
for response in client.generate_stream(f"![]({image_bytes})Analyze this image in detail:"):
    if not response.token.special:
        print(response.token.text, end="", flush=True)

REST API Integration

Access VLMs through the REST API for language-agnostic integration:

# Using curl with a public image URL
IMAGE_URL="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png"

# Standard request
curl https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate \
    -X POST \
    -d '{
        "inputs": "![]('$IMAGE_URL')What is this a picture of?"
    }' \
    -H "Authorization: Bearer $PREDIBASE_API_TOKEN" \
    -H "Content-Type: application/json"

# Streaming request
curl https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate_stream \
    -X POST \
    -d '{
        "inputs": "![]('$IMAGE_URL')What is this a picture of?"
    }' \
    -H "Authorization: Bearer $PREDIBASE_API_TOKEN" \
    -H "Content-Type: application/json"

# Using curl with a local image
IMAGE_PATH="/path/to/image.png"
IMAGE_BYTES=$(base64 -w 0 $IMAGE_PATH)
IMAGE_BYTES="data:image/png;base64,$IMAGE_BYTES"

curl https://serving.app.predibase.com/$PREDIBASE_TENANT_ID/deployments/v2/llms/$PREDIBASE_DEPLOYMENT/generate \
    -X POST \
    -d '{
        "inputs": "![]('$IMAGE_BYTES')What is this a picture of?"
    }' \
    -H "Authorization: Bearer $PREDIBASE_API_TOKEN" \
    -H "Content-Type: application/json"

Supported Models

The following VLMs are officially supported for deployment on Predibase. Other VLMs with the same architectures can be deployed on a best-effort basis from Hugging Face.

Deployment NameParametersArchitectureLicenseContext WindowAlways-On Shared Endpoint
llama-3-2-11b-vision11BLlama-3Meta (request for commercial use)32K
llama-3-2-11b-vision-instruct11BLlama-3Meta (request for commercial use)32K

Next Steps

  1. Fine-tune vision models for your specific use case
  2. Set up a private deployment for production use