Skip to main content

Rate Limits on Serverless

The following rate limits apply to our serverless deployments, which are a shared resource we offer for getting started, experimentation, and fast iteration. Once you're ready for production, create a dedicated deployment.

Rate limits are restrictions that our API enforces on how often users can access our services within a given time period. Rate limits can be identified via HTTP 429 error codes.

Rate Limits by Tier

TierRate LimitDailyMonthly
Free1 request / sec1 million tokens / day10 million tokens / day
Developer & Enterprise100 requests / sec1 million tokens / day10 million tokens / day
VPC*Does not applyDoes not applyDoes not apply

*VPC users do not have access to serverless deployments.

Rate Limit Headers

HeaderExplanation
x-envoy-ratelimitedWhether the rate limit has been reached
x-ratelimit-limitThe max number of requests until the rate limit is reached
x-ratelimit-remainingThe remaining number of requests until the rate limit is reached
x-ratelimit-resetAmount of time (seconds) until you can query again