Predibase supports a wide range of embedding models for text embeddings and similarity search. This guide helps you:

  • Find available models for your use case
  • Understand model capabilities and requirements
  • Choose between different model options

Quick Start

First, install the Predibase Python SDK:

pip install -U predibase

Creating Private Deployments

For production use cases, create your own private embedding model deployment:

from predibase import Predibase, DeploymentConfig

pb = Predibase(api_token="<PREDIBASE_API_TOKEN>")

# Create a production deployment with UAE-Large
deployment = pb.deployments.create(
    name="my-embedding-model",
    config=DeploymentConfig(
        base_model="WhereIsAI/UAE-Large-V1",  # High-performance embedding model
        min_replicas=1,     # Always keep one replica running
        max_replicas=2,     # Scale up to 2 replicas under load
        accelerator="a10_24gb_100", # Uses A10G GPU
        speculator="disabled",
        disable_adapters=True, # Adapters are not enabled on embedding models
        max_total_tokens=512 # This model requires setting max tokens = 512 to start on an a10g GPU 
    )
)

# Generate embeddings for a single chunk of data
text = "Generate embeddings using your dedicated deployment."
response = pb.embeddings.create(model="my-embedding-model", input=text)
print(f"Generated {len(response.data[0].embedding)}-dimensional embedding")

# Process a batch of documents
documents = [
    "First document for embedding",
    "Second document with different content",
    "Third document to process in batch"
]
batch_embeddings = [pb.embeddings.create(model="my-embedding-model", input=doc).data[0].embedding for doc in documents]

Supported Models

The following embedding models are officially supported for deployment on Predibase:

Model NameArchitectureOutput DimensionsLicenseAlways-On Shared Endpoint
WhereIsAI/UAE-Large-V1BERT1024MIT
dunzhang/stella_en_1.5B_v5Qwen1024Apache 2.0
distilbert-base-uncasedDistilBERT768Apache 2.0

We are able to add new models to our catalog on a case-by-case basis. If you have a specific model in mind, please reach out to us at support@predibase.com.

Model Details

BERT-based Models

Best for: High-quality embeddings with proven architecture

  • WhereIsAI/UAE-Large-V1

    • Strong performance on similarity tasks
    • 1024-dimensional embeddings
    • MIT license
    • Efficient inference
  • distilbert-base-uncased

    • Compressed BERT architecture
    • 768-dimensional embeddings
    • Apache 2.0 license
    • Fast inference speed

Qwen-based Models

Best for: State-of-the-art embedding quality

  • dunzhang/stella_en_1.5B_v5
    • Large model with 1.5B parameters
    • 1024-dimensional embeddings
    • Apache 2.0 license
    • Advanced semantic understanding

Best Practices

  1. Input Size

    • Check model documentation for maximum input length
    • Consider truncating or chunking long inputs
    • Balance between context and performance
  2. Batch Processing

    • Implement custom batching for large datasets
    • Monitor memory usage during batch processing
    • Consider async processing for large workloads
  3. Deployment Configuration

    • Use auto-scaling for cost optimization
    • Monitor performance metrics
    • Choose appropriate GPU based on workload
  4. Model Selection

    • Consider embedding dimensions vs. quality
    • Match model size to hardware capabilities
    • Evaluate licensing requirements