Embedding Models
Explore our catalog of supported embedding models for text embeddings and similarity search
Predibase supports a wide range of embedding models for text embeddings and similarity search. This guide helps you:
- Find available models for your use case
- Understand model capabilities and requirements
- Choose between different model options
Quick Start
First, install the Predibase Python SDK:
Creating Private Deployments
For production use cases, create your own private embedding model deployment:
Supported Models
The following embedding models are officially supported for deployment on Predibase:
Model Name | Architecture | Output Dimensions | License | Always-On Shared Endpoint |
---|---|---|---|---|
WhereIsAI/UAE-Large-V1 | BERT | 1024 | MIT | ❌ |
dunzhang/stella_en_1.5B_v5 | Qwen | 1024 | Apache 2.0 | ❌ |
distilbert-base-uncased | DistilBERT | 768 | Apache 2.0 | ❌ |
We are able to add new models to our catalog on a case-by-case basis. If you have a specific model in mind, please reach out to us at support@predibase.com.
Model Details
BERT-based Models
Best for: High-quality embeddings with proven architecture
-
WhereIsAI/UAE-Large-V1
- Strong performance on similarity tasks
- 1024-dimensional embeddings
- MIT license
- Efficient inference
-
distilbert-base-uncased
- Compressed BERT architecture
- 768-dimensional embeddings
- Apache 2.0 license
- Fast inference speed
Qwen-based Models
Best for: State-of-the-art embedding quality
- dunzhang/stella_en_1.5B_v5
- Large model with 1.5B parameters
- 1024-dimensional embeddings
- Apache 2.0 license
- Advanced semantic understanding
Best Practices
-
Input Size
- Check model documentation for maximum input length
- Consider truncating or chunking long inputs
- Balance between context and performance
-
Batch Processing
- Implement custom batching for large datasets
- Monitor memory usage during batch processing
- Consider async processing for large workloads
-
Deployment Configuration
- Use auto-scaling for cost optimization
- Monitor performance metrics
- Choose appropriate GPU based on workload
-
Model Selection
- Consider embedding dimensions vs. quality
- Match model size to hardware capabilities
- Evaluate licensing requirements