Overview
Introduction to inference with Predibase
Predibase offers flexible options for serving language models, vision models, and embeddings. This guide will help you choose the right deployment option and access method for your needs.
Deployment Options
Private Deployments
🚀 Best for: Production workloads and enterprise use cases
Private deployments are our recommended solution for production environments, offering:
- Dedicated resources with guaranteed availability
- Production-grade SLAs and support
- Customizable configuration for your specific needs
- Auto-scaling options to handle varying workloads
- Full security and isolation
- Learn about private deployments →
Shared Endpoints
✨ Best for: Quick experimentation and development
Shared endpoints are designed for testing and development purposes only:
- Pre-deployed models for rapid prototyping
- Subject to rate limits
- No infrastructure setup needed
- Support for testing custom adapters
- Not recommended for production workloads
- Try shared endpoints →
Access Methods
Python SDK
Our Python SDK is our recommended way to interact with Predibase models. It offers a simple, intuitive interface with full feature support. To get started, first install the SDK using pip:
REST API
Language-agnostic HTTP interface for integration with any programming language or framework. REST API documentation →
Available Models
Browse our catalog of supported models:
- Language Models - Text generation models (Mistral, Mixtral, Llama 2, etc.)
- Vision Models - Image understanding and generation
- Embedding Models - Text embeddings for search and similarity
Custom Models
- Fine-tuned Adapters - By default, all Predibase deployments support serving LoRAs. You can fine-tune adapters on Predibase or use an already trained LoRA for serving.
- Custom Base Models - Deploy custom models from Hugging Face
Additional Features
- Batch Inference - Process multiple inputs efficiently
- Structured Output - Enforce JSON schema on responses
- OpenAI Migration Guide - Easily switch from OpenAI
Next Steps
- Set up a private deployment for production use
- Learn about fine-tuning your own models
- Try shared endpoints for quick experimentation