Venice APIs
Rate Limits
This page describes the request and token rate limits for the Venice API.
Failed Request Rate Limits
Failed requests including 500 errors, 503 capacity errors, 429 rate limit errors are should be retried with exponential back off.
For 429 rate limit errors, please use x-ratelimit-reset-requests
and x-ratelimit-remaining-requests
to determine when to next retry.
To protect our infrastructure from abuse, if an user generates more than 20 failed requests in a 30 second window, the API will return a 429 error indicating the error rate limit has been reached:
Paid Tier Rate Limits
Rate limits apply to users who have purchased API credits or staked VVV to gain Diem.
Helpful links:
- Real time rate limits
- Rate limit logs - View requests that have hit the rate limiter
We will continue to monitor usage. As we add compute capacity to the network, we will review these limits. If you are consistently hitting rate limits, please contact [email protected] or post in the #API channel in Discord for assistance and we can work with you to raise your limits.
Paid Tier - LLMs
Model | Model ID | Req / Min | Req / Day | Tokens / Min |
---|---|---|---|---|
Llama 3.2 3B | llama-3.2-3b | 500 | 288,000 | 1,000,000 |
Qwen 3 4B | qwen3-4b | 500 | 288,000 | 1,000,000 |
Deepseek Coder V2 | deepseek-coder-v2-lite | 75 | 54,000 | 750,000 |
Qwen 2.5 Coder 32B | qwen-2.5-coder-32b | 75 | 54,000 | 750,000 |
Qwen 2.5 QWQ 32B | qwen-2.5-qwq-32b | 75 | 54,000 | 750,000 |
Dolphin 72B | dolphin-2.9.2-qwen2-72b | 50 | 36,000 | 750,000 |
Llama 3.3 70B | llama-3.3-70b | 50 | 36,000 | 750,000 |
Mistral Small 3.1 24B | mistral-31-24b | 50 | 36,000 | 750,000 |
Qwen 2.5 VL 72B | qwen-2.5-vl | 50 | 36,000 | 750,000 |
Qwen 3 235B | qwen3-235b | 50 | 36,000 | 750,000 |
Llama 3.1 405B | llama-3.1-405b | 20 | 15,000 | 750,000 |
Deepseek R1 671B | deepseek-r1-671b | 15 | 10,000 | 200,000 |
Paid Tier - Image Models
Model | Model ID | Req / Min | Req / Day |
---|---|---|---|
Flux | flux-dev / flux-dev-uncensored | 20 | 14,400 |
All others | All | 20 | 28,800 |
Paid Tier - Audio Models
Model | Model ID | Req / Min | Req / Day |
---|---|---|---|
All Audio Models | All | 60 | 86,400 |
Rate Limit and Consumption Headers
You can monitor your API utilization and remaining requests by evaluating the following headers:
Header | Description |
---|---|
x-ratelimit-limit-requests | The number of requests you’ve made in the current evaluation period. |
x-ratelimit-remaining-requests | The remaining requests you can make in the current evaluation period. |
x-ratelimit-reset-requests | The unix time stamp when the rate limit will reset. |
x-ratelimit-limit-tokens | The number of total (prompt + completion) tokens used within a 1 minute sliding window. |
x-ratelimit-remaining-tokens | The remaining number of total tokens that can be used during the evaluation period. |
x-ratelimit-reset-tokens | The duration of time in seconds until the token rate limit resets. |
x-venice-balance-diem | The user’s Diem balance before the request has been processed. |
x-venice-balance-usd | The user’s USD balance before the request has been processed. |