Venice APIs
Rate Limits
This page describes the request and token rate limits for the Venice API.
Paid Tier Rate Limits
Rate limits apply to users who have purchased API credits or staked VVV to gain VCU.
Helpful links:
- Real time rate limits
- Rate limit logs - View requests that have hit the rate limiter
We will continue to monitor usage. As we add compute capacity to the network, we will review these limits. If you are consistently hitting rate limits, please contact [email protected] or post in the #API channel in Discord for assistance and we can work with you to raise your limits.
Paid Tier - LLMs
Model | Model ID | Req / Min | Req / Day | Tokens / Min |
---|---|---|---|---|
Llama 3.2 3B | llama-3.2-3b | 500 | 288,000 | 1,000,000 |
Qwen 3 4B | qwen3-4b | 500 | 288,000 | 1,000,000 |
Deepseek Coder V2 | deepseek-coder-v2-lite | 75 | 54,000 | 750,000 |
Qwen 2.5 Coder 32B | qwen-2.5-coder-32b | 75 | 54,000 | 750,000 |
Qwen 2.5 QWQ 32B | qwen-2.5-qwq-32b | 75 | 54,000 | 750,000 |
Dolphin 72B | dolphin-2.9.2-qwen2-72b | 50 | 36,000 | 750,000 |
Llama 3.3 70B | llama-3.3-70b | 50 | 36,000 | 750,000 |
Mistral Small 3.1 24B | mistral-31-24b | 50 | 36,000 | 750,000 |
Qwen 2.5 VL 72B | qwen-2.5-vl | 50 | 36,000 | 750,000 |
Qwen 3 235B | qwen3-235b | 50 | 36,000 | 750,000 |
Llama 3.1 405B | llama-3.1-405b | 20 | 15,000 | 750,000 |
Deepseek R1 671B | deepseek-r1-671b | 15 | 10,000 | 200,000 |
Paid Tier - Image Models
Model | Model ID | Req / Min | Req / Day |
---|---|---|---|
Flux | flux-dev / flux-dev-uncensored | 20 | 14,400 |
All others | All | 20 | 28,800 |
Paid Tier - Audio Models
Model | Model ID | Req / Min | Req / Day |
---|---|---|---|
All Audio Models | All | 60 | 86,400 |
Rate Limit and Consumption Headers
You can monitor your API utilization and remaining requests by evaluating the following headers:
Header | Description |
---|---|
x-ratelimit-limit-requests | The number of requests you’ve made in the current evaluation period. |
x-ratelimit-remaining-requests | The remaining requests you can make in the current evaluation period. |
x-ratelimit-reset-requests | The unix time stamp when the rate limit will reset. |
x-ratelimit-limit-tokens | The number of total (prompt + completion) tokens used within a 1 minute sliding window. |
x-ratelimit-remaining-tokens | The remaining number of total tokens that can be used during the evaluation period. |
x-ratelimit-reset-tokens | The duration of time in seconds until the token rate limit resets. |
x-venice-balance-vcu | The user’s VCU balance before the request has been processed. |
x-venice-balance-usd | The user’s USD balance before the request has been processed. |