Rate Limits
Venice has two tiers of utilization:
Explorer- This is a trial tier designed to allow testing the Venice AI. It is included for PRO users and users staking VVV who haven’t yet had VCU allocated.
Paid- This is a paid tier. Users will be debited VCU first, and when VCU is depleted, will be debited USD. If no balance of either VCU or USD is available, requests will fail.
Explorer Tier Rate Limits
The rate limits have been put in place for PRO users who are not spending either VCUs or USD for Venice API inference
Model | ModelID | Req/Min | Req/Day | Tokens/Min |
---|---|---|---|---|
Llama 3.2 3B | llama-3.2-3b | 5 | 100 | 30,000 |
Qwen 2.5 Coder 32B | qwen2.5-coder-32b | 5 | 100 | 30,000 |
Llama 3.3 70B | llama-3.3-70b | 5 | 100 | 30,000 |
Dolphin 72B | dolphin-2.9.2-qwen2-72b | 5 | 100 | 30,000 |
Qwen 2.5 VL | qwen-2.5-vl | 5 | 100 | 30,000 |
Deepseek R1 70B | deepseek-r1-llama-70b | 5 | 100 | 30,000 |
Llama 3.1 405B | llama-3.1-405b | 2 | 40 | 30,000 |
Deepseek R1 671B | deepseek-r1-671b | 1 | 20 | 30,000 |
All Image Models | All | 1 | 20 | N/A |
These rate limits apply at the user level, regardless of how many keys are created.
Paid Tier Rate Limits
The rate limits have been extended for Paid users spending either VCUs or USD for Venice API inference
Model | ModelID | Req/Min | Req/Day | Tokens/Min |
---|---|---|---|---|
Llama 3.2 3B | llama-3.2-3b | 500 | 288,000 | 100,000 |
Qwen 2.5 Coder 32B | qwen2.5-coder-32b | 75 | 54,000 | 450,000 |
Llama 3.3 70B | llama-3.3-70b | 50 | 36,000 | 300,000 |
Dolphin 72B | dolphin-2.9.2-qwen2-72b | 50 | 36,000 | 300,000 |
Qwen 2.5 VL | qwen-2.5-vl | 50 | 36,000 | 300,000 |
Deepseek R1 70B | deepseek-r1-llama-70b | 50 | 36,000 | 300,000 |
Llama 3.1 405B | llama-3.1-405b | 20 | 15,000 | 100,000 |
Deepseek R1 671B | deepseek-r1-671b | 5 | 10,000 | 200,000 |
All Image Models | All | 20 | 4,000 | N/A |
These rate limits apply at the user level, regardless of how many keys are created.
You can monitor your API utilization and remaining requests by evaluating the following headers:
Header | Description |
---|---|
x-ratelimit-limit-requests | The number of requests you’ve made in the current evaluation period. |
x-ratelimit-remaining-requests | The remaining requests you can make in the current evaluation period. |
x-ratelimit-reset-requests | The unix time stamp when the rate limit will reset |
x-ratelimit-limit-tokens | The number of tokens generated during the evaluation period |
x-ratelimit-remaining-tokens | The remaining number of tokens that can be generated during the evaluation period |
x-ratelimit-reset-tokens | The duration of time in seconds until the token rate limit resets |
x-venice-balance-vcu | The user’s VCU balance before the request has been processed. |
x-venice-balance-usd | The user’s USD balance before the request has been processed. |
Note: We will continue to monitor usage. As we add compute capacity to the network, we will review these limits. If you are consistently hitting rate limits, please contact support@venice.ai or post in the #API channel in discord for support.