Rate Limits
This page describes the request and token rate limits for the Venice API.
The Venice API implements two tiers of rate limits:
Explorer:
This is a trial tier designed to allow testing the Venice AI. It is included for PRO users.
Paid:
This is a paid tier. Users will be debited VCU first, and when VCU is depleted, will be debited USD. If no balance of either VCU or USD is available, requests will default to explorer tier (if PRO user), or fail (if free user).
These rate limits apply at the user level, regardless of how many keys are created.
Explorer Tier Rate Limits
These rate limits apply to Pro users who are testing the API with their application. These users can upgrade to paid tier limits by by purchasing USD credits or staking VVV to gain VCU.
Explorer Tier - LLMs
Model | Model ID | Req / Min | Req / Day | Tokens / Min |
---|---|---|---|---|
Deepseek Coder V2 | deepseek-coder-v2-lite | 5 | 100 | 30,000 |
Deepseek R1 70B | deepseek-r1-llama-70b | 5 | 100 | 30,000 |
Deepseek R1 671B | deepseek-r1-671b | 1 | 20 | 30,000 |
Dolphin 72B | dolphin-2.9.2-qwen2-72b | 5 | 100 | 30,000 |
Llama 3.2 3B | llama-3.2-3b | 5 | 100 | 30,000 |
Llama 3.3 70B | llama-3.3-70b | 5 | 100 | 30,000 |
Llama 3.1 405B | llama-3.1-405b | 2 | 40 | 30,000 |
Qwen 2.5 Coder 32B | qwen-2.5-coder-32b | 5 | 100 | 30,000 |
Qwen 2.5 QWQ 32B | qwen-2.5-qwq-32b | 5 | 100 | 30,000 |
Qwen 2.5 VL | qwen-2.5-vl | 5 | 100 | 30,000 |
Mistral Small 3.1 24B | mistral-31-24b | 5 | 100 | 30,000 |
Explorer Tier - Image Models
Model | Model ID | Req / Min | Req / Day |
---|---|---|---|
All Image Models | All | 1 | 20 |
Explorer Tier - Audio Models
Model | Model ID | Req / Min | Req / Day |
---|---|---|---|
All Audio Models | All | 5 | 100 |
Paid Tier Rate Limits
These rate limits apply to users who have purchased USD credits or staked VVV to gain VCU.
Paid Tier - LLMs
Model | Model ID | Req / Min | Req / Day | Tokens / Min |
---|---|---|---|---|
Deepseek Coder V2 | deepseek-coder-v2-lite | 75 | 54,000 | 450,000 |
Deepseek R1 671B | deepseek-r1-671b | 15 | 10,000 | 200,000 |
Dolphin 72B | dolphin-2.9.2-qwen2-72b | 50 | 36,000 | 300,000 |
Llama 3.2 3B | llama-3.2-3b | 500 | 288,000 | 100,000 |
Llama 3.3 70B | llama-3.3-70b | 50 | 36,000 | 300,000 |
Llama 3.1 405B | llama-3.1-405b | 20 | 15,000 | 100,000 |
Qwen 2.5 Coder 32B | qwen-2.5-coder-32b | 75 | 54,000 | 450,000 |
Qwen 2.5 QWQ 32B | qwen-2.5-qwq-32b | 75 | 54,000 | 450,000 |
Qwen 2.5 VL | qwen-2.5-vl | 50 | 36,000 | 300,000 |
Mistral Small 3.1 24B | mistral-31-24b | 50 | 36,000 | 300,000 |
Paid Tier - Image Models
Model | Model ID | Req / Min | Req / Day |
---|---|---|---|
Flux | flux-dev / flux-dev-uncensored | 20 | 14,400 |
All others | All | 20 | 28,800 |
Paid Tier - Audio Models
Model | Model ID | Req / Min | Req / Day |
---|---|---|---|
All Audio Models | All | 60 | 86,400 |
Rate Limit and Consumption Headers
You can monitor your API utilization and remaining requests by evaluating the following headers:
Header | Description |
---|---|
x-ratelimit-limit-requests | The number of requests you’ve made in the current evaluation period. |
x-ratelimit-remaining-requests | The remaining requests you can make in the current evaluation period. |
x-ratelimit-reset-requests | The unix time stamp when the rate limit will reset. |
x-ratelimit-limit-tokens | The number of total (prompt + completion) tokens used within a 1 minute sliding window. |
x-ratelimit-remaining-tokens | The remaining number of total tokens that can be used during the evaluation period. |
x-ratelimit-reset-tokens | The duration of time in seconds until the token rate limit resets. |
x-venice-balance-vcu | The user’s VCU balance before the request has been processed. |
x-venice-balance-usd | The user’s USD balance before the request has been processed. |