Venice APIs
Rate Limits
This page describes the request and token rate limits for the Venice API.
Paid Tier Rate Limits
These rate limits apply to users who have purchased USD credits or staked VVV to gain VCU.
We will continue to monitor usage. As we add compute capacity to the network, we will review these limits. If you are consistently hitting rate limits, please contact support@venice.ai or post in the #API channel in Discord for assistance and we can work with you to raise your limits.
Paid Tier - LLMs
Model | Model ID | Req / Min | Req / Day | Tokens / Min |
---|---|---|---|---|
Llama 3.2 3B | llama-3.2-3b | 500 | 288,000 | 1,000,000 |
Deepseek Coder V2 | deepseek-coder-v2-lite | 75 | 54,000 | 750,000 |
Qwen 2.5 Coder 32B | qwen-2.5-coder-32b | 75 | 54,000 | 750,000 |
Qwen 2.5 QWQ 32B | qwen-2.5-qwq-32b | 75 | 54,000 | 750,000 |
Dolphin 72B | dolphin-2.9.2-qwen2-72b | 50 | 36,000 | 750,000 |
Llama 3.3 70B | llama-3.3-70b | 50 | 36,000 | 750,000 |
Mistral Small 3.1 24B | mistral-31-24b | 50 | 36,000 | 750,000 |
Qwen 2.5 VL 72B | qwen-2.5-vl | 50 | 36,000 | 750,000 |
Llama 4 Maverick 17B (402B Total Params) | llama-4-maverick-17b | 50 | 36,000 | 750,000 |
Llama 3.1 405B | llama-3.1-405b | 20 | 15,000 | 750,000 |
Deepseek R1 671B | deepseek-r1-671b | 15 | 10,000 | 200,000 |
Paid Tier - Image Models
Model | Model ID | Req / Min | Req / Day |
---|---|---|---|
Flux | flux-dev / flux-dev-uncensored | 20 | 14,400 |
All others | All | 20 | 28,800 |
Paid Tier - Audio Models
Model | Model ID | Req / Min | Req / Day |
---|---|---|---|
All Audio Models | All | 60 | 86,400 |
Rate Limit and Consumption Headers
You can monitor your API utilization and remaining requests by evaluating the following headers:
Header | Description |
---|---|
x-ratelimit-limit-requests | The number of requests you’ve made in the current evaluation period. |
x-ratelimit-remaining-requests | The remaining requests you can make in the current evaluation period. |
x-ratelimit-reset-requests | The unix time stamp when the rate limit will reset. |
x-ratelimit-limit-tokens | The number of total (prompt + completion) tokens used within a 1 minute sliding window. |
x-ratelimit-remaining-tokens | The remaining number of total tokens that can be used during the evaluation period. |
x-ratelimit-reset-tokens | The duration of time in seconds until the token rate limit resets. |
x-venice-balance-vcu | The user’s VCU balance before the request has been processed. |
x-venice-balance-usd | The user’s USD balance before the request has been processed. |