Vision Models
Guide to using vision language models for image understanding tasks
Predibase supports a wide range of Vision Language Models (VLMs) for image understanding.
Vision Language Model support is currently in beta. If you encounter any issues, please reach out at support@predibase.com.
Quick Start
First, install the Predibase Python SDK:
Deploying a Vision Model
Vision models require private deployments:
Image Input Format
VLM deployments use the same Generate API as language models, with support for images in the input. You can provide images as either:
- Public URLs
- Base64-encoded byte strings
Insert one or more images into your prompt using this syntax:
Using Public Image URLs
Process images from publicly accessible URLs:
Using Local Images
Process images stored on your local machine:
REST API Integration
Access VLMs through the REST API for language-agnostic integration:
Supported Models
The following VLMs are officially supported for deployment on Predibase. Other VLMs with the same architectures can be deployed on a best-effort basis from Hugging Face.
Deployment Name | Parameters | Architecture | License | Context Window | Always-On Shared Endpoint |
---|---|---|---|---|---|
llama-3-2-11b-vision | 11B | Llama-3 | Meta (request for commercial use) | 32K | ❌ |
llama-3-2-11b-vision-instruct | 11B | Llama-3 | Meta (request for commercial use) | 32K | ❌ |
Next Steps
- Fine-tune vision models for your specific use case
- Set up a private deployment for production use