- Find available models for your use case
- Understand model capabilities and requirements
- Choose between different model options
Quick Start
First, install the Predibase Python SDK:Creating Private Deployments
For production use cases, create your own private embedding model deployment:Base Model Only
With Adapters
You can also create deployments with adapters; we recommend preloading them to avoid the overhead of dynamically swapping large portions of the model during inference:Running Inference
Python SDK
Base Model
Predibase Adapters
HF Adapters
REST API
Base Model
Predibase Adapters
HF Adapters
Supported Models
The following embedding models are officially supported for deployment on Predibase:Model Name | Architecture | Output Dimensions | License | Always-On Shared Endpoint |
---|---|---|---|---|
WhereIsAI/UAE-Large-V1 | BERT | 1024 | MIT | ❌ |
dunzhang/stella_en_1.5B_v5 | Qwen | 1024 | Apache 2.0 | ❌ |
distilbert-base-uncased | DistilBERT | 768 | Apache 2.0 | ❌ |
We are able to add new models to our catalog on a case-by-case basis. If you
have a specific model in mind, please reach out to us at
support@predibase.com.
Model Details
BERT-based Models
Best for: High-quality embeddings with proven architecture-
WhereIsAI/UAE-Large-V1
- Strong performance on similarity tasks
- 1024-dimensional embeddings
- MIT license
- Efficient inference
-
distilbert-base-uncased
- Compressed BERT architecture
- 768-dimensional embeddings
- Apache 2.0 license
- Fast inference speed
Qwen-based Models
Best for: State-of-the-art embedding quality- dunzhang/stella_en_1.5B_v5
- Large model with 1.5B parameters
- 1024-dimensional embeddings
- Apache 2.0 license
- Advanced semantic understanding
Best Practices
-
Input Size
- Check model documentation for maximum input length
- Consider truncating or chunking long inputs
- Balance between context and performance
-
Batch Processing
- Implement custom batching for large datasets
- Monitor memory usage during batch processing
- Consider async processing for large workloads
-
Deployment Configuration
- Use auto-scaling for cost optimization
- Monitor performance metrics
- Choose appropriate GPU based on workload
-
Model Selection
- Consider embedding dimensions vs. quality
- Match model size to hardware capabilities
- Evaluate licensing requirements