Rate Limits on Shared Endpoints
The following rate limits apply to our shared endpoints, which are a shared resource we offer for getting started, experimentation, and fast iteration. Once you're ready for production, create a private serverless deployment.
Rate limits are restrictions that our API enforces on how often users can access our services within a given time period. Rate limits can be identified via HTTP 429 error codes.
Rate Limits by Tier
Tier | Rate Limit | Daily | Monthly |
---|---|---|---|
Free | 1 request / sec | 1 million tokens / day | 10 million tokens / day |
Developer & Enterprise | 100 requests / sec | 1 million tokens / day | 10 million tokens / day |
VPC* | Does not apply | Does not apply | Does not apply |
*VPC users do not have access to shared endpoints.
Rate Limit Headers
Header | Explanation |
---|---|
x-envoy-ratelimited | Whether the rate limit has been reached |
x-ratelimit-limit | The max number of requests until the rate limit is reached |
x-ratelimit-remaining | The remaining number of requests until the rate limit is reached |
x-ratelimit-reset | Amount of time (seconds) until you can query again |