This page describes the request and token rate limits for the Venice API.
x-ratelimit-reset-requests
and x-ratelimit-remaining-requests
to determine when to next retry.
To protect our infrastructure from abuse, if an user generates more than 20 failed requests in a 30 second window, the API will return a 429 error indicating the error rate limit has been reached:
Model | Model ID | Req / Min | Req / Day | Tokens / Min |
---|---|---|---|---|
Llama 3.2 3B | llama-3.2-3b | 500 | 288,000 | 1,000,000 |
Qwen 3 4B | qwen3-4b | 500 | 288,000 | 1,000,000 |
Deepseek Coder V2 | deepseek-coder-v2-lite | 75 | 54,000 | 750,000 |
Qwen 2.5 Coder 32B | qwen-2.5-coder-32b | 75 | 54,000 | 750,000 |
Qwen 2.5 QWQ 32B | qwen-2.5-qwq-32b | 75 | 54,000 | 750,000 |
Dolphin 72B | dolphin-2.9.2-qwen2-72b | 50 | 36,000 | 750,000 |
Llama 3.3 70B | llama-3.3-70b | 50 | 36,000 | 750,000 |
Mistral Small 3.1 24B | mistral-31-24b | 50 | 36,000 | 750,000 |
Qwen 2.5 VL 72B | qwen-2.5-vl | 50 | 36,000 | 750,000 |
Qwen 3 235B | qwen3-235b | 50 | 36,000 | 750,000 |
Llama 3.1 405B | llama-3.1-405b | 20 | 15,000 | 750,000 |
Deepseek R1 671B | deepseek-r1-671b | 15 | 10,000 | 200,000 |
Model | Model ID | Req / Min | Req / Day |
---|---|---|---|
Flux | flux-dev / flux-dev-uncensored | 20 | 14,400 |
All others | All | 20 | 28,800 |
Model | Model ID | Req / Min | Req / Day |
---|---|---|---|
All Audio Models | All | 60 | 86,400 |
Header | Description |
---|---|
x-ratelimit-limit-requests | The number of requests you’ve made in the current evaluation period. |
x-ratelimit-remaining-requests | The remaining requests you can make in the current evaluation period. |
x-ratelimit-reset-requests | The unix time stamp when the rate limit will reset. |
x-ratelimit-limit-tokens | The number of total (prompt + completion) tokens used within a 1 minute sliding window. |
x-ratelimit-remaining-tokens | The remaining number of total tokens that can be used during the evaluation period. |
x-ratelimit-reset-tokens | The duration of time in seconds until the token rate limit resets. |
x-venice-balance-diem | The user’s Diem balance before the request has been processed. |
x-venice-balance-usd | The user’s USD balance before the request has been processed. |