Shared Deployments
Use Predibase’s shared model deployments
Shared endpoints provide instant access to popular models for development and testing purposes. They’re designed for:
- Quick experimentation and prototyping
- Development and testing environments
- Learning and evaluation of models
- Proof of concept development
Note: Shared endpoints are not intended for production use. For production workloads, we strongly recommend using private deployments.
Quick Start
First, install the Predibase Python SDK:
Then you can start experimenting with shared endpoints using just a few lines of code:
Available Models
Predibase offers several popular models as shared endpoints for testing and development. See our supported models for the complete list.
Using Shared Endpoints
With Python SDK
Here’s a detailed example showing both basic text generation and streaming responses for development:
With REST API
For testing language-agnostic integration, you can use our REST API:
Testing Custom Models
You can test your fine-tuned adapters on shared endpoints during development:
For more information about using fine-tuned models, see:
Rate Limits
Shared endpoints are subject to rate limits. Rate limits are restrictions that our API enforces on how often users can access our services within a given time period and can be identified via HTTP 429 error codes.
Rate Limits by Tier
Tier | Rate Limit | Daily | Monthly |
---|---|---|---|
Free | 1 request / sec | 1 million tokens / day | 10 million tokens / day |
Developer & Enterprise | 100 requests / sec | 1 million tokens / day | 10 million tokens / day |
VPC | Does not apply | Does not apply | Does not apply |
Rate Limit Headers
When making API requests, you’ll receive the following headers that help you monitor your rate limit status:
Header | Description |
---|---|
x-envoy-ratelimited | Whether the rate limit has been reached |
x-ratelimit-limit | The max number of requests until the rate limit is reached |
x-ratelimit-remaining | The remaining number of requests until the rate limit is reached |
x-ratelimit-reset | Amount of time (seconds) until you can query again |
For production use cases, you should use private deployments which do not have any rate limits.
Moving to Production
When you’re ready to move your application to production:
- Set up a private deployment for production-grade reliability
- Configure auto-scaling to handle your workload
- Take advantage of dedicated resources and SLAs