Vision Language Model support is currently in beta. If you encounter any
issues, please reach out at support@predibase.com.
Quick Start
First, install the Predibase Python SDK:Deploying a Vision Model
Vision models require private deployments:Image Input Format
We suggest using OpenAI chat completions for querying your deployment. You can provide images as either:- Public URLs
- Base64-encoded byte strings
Order of Content
Regardless of the image format, the order of the content passed in impacts the order in which the model receives the content. We highly recommend placing the image BEFORE the text prompt unless a specific order is required. Also note that we allow for multiple images to be passed in, as well as interleaved images and text. For example:Using Public Image URLs
Process images from publicly accessible URLs:Using Local Images
If you want to query using local images, you can base64 encode them first and then pass them to the deployment:Supported Models
The following VLMs are officially supported for deployment on Predibase. Other VLMs with the same architectures can be deployed on a best-effort basis from Hugging Face.Deployment Name | Parameters | Architecture | License | Context Window | Always-On Shared Endpoint |
---|---|---|---|---|---|
qwen2-vl-7b-instruct | 7B | Qwen2 | Tongyi Qianwen | 32K | ❌ |
qwen2-5-vl-3b-instruct | 3B | Qwen2.5 | Tongyi Qianwen | 32K | ❌ |
qwen2-5-vl-7b-instruct | 7B | Qwen2.5 | Tongyi Qianwen | 32K | ❌ |
llama-3-2-11b-vision* | 11B | Llama-3 | Meta (request for commercial use) | 32K | ❌ |
llama-3-2-11b-vision-instruct* | 11B | Llama-3 | Meta (request for commercial use) | 32K | ❌ |
Next Steps
- Fine-tune vision models for your specific use case
- Set up a private deployment for production use