Venice has two tiers of utilization:

Explorer- This is a trial tier designed to allow testing the Venice AI. It is included for PRO users and users staking VVV who haven’t yet had VCU allocated.

Paid- This is a paid tier. Users will be debited VCU first, and when VCU is depleted, will be debited USD. If no balance of either VCU or USD is available, requests will fail.

Explorer Tier Rate Limits

The rate limits have been put in place for PRO users who are not spending either VCUs or USD for Venice API inference

ModelModelIDReq/MinReq/DayTokens/Min
Llama 3.2 3Bllama-3.2-3b510030,000
Qwen 2.5 Coder 32Bqwen2.5-coder-32b510030,000
Llama 3.3 70Bllama-3.3-70b510030,000
Dolphin 72Bdolphin-2.9.2-qwen2-72b510030,000
Qwen 2.5 VLqwen-2.5-vl510030,000
Deepseek R1 70Bdeepseek-r1-llama-70b510030,000
Llama 3.1 405Bllama-3.1-405b24030,000
Deepseek R1 671Bdeepseek-r1-671b12030,000
All Image ModelsAll120N/A

These rate limits apply at the user level, regardless of how many keys are created.

The rate limits have been extended for Paid users spending either VCUs or USD for Venice API inference

ModelModelIDReq/MinReq/DayTokens/Min
Llama 3.2 3Bllama-3.2-3b500288,000100,000
Qwen 2.5 Coder 32Bqwen2.5-coder-32b7554,000450,000
Llama 3.3 70Bllama-3.3-70b5036,000300,000
Dolphin 72Bdolphin-2.9.2-qwen2-72b5036,000300,000
Qwen 2.5 VLqwen-2.5-vl5036,000300,000
Deepseek R1 70Bdeepseek-r1-llama-70b5036,000300,000
Llama 3.1 405Bllama-3.1-405b2015,000100,000
Deepseek R1 671Bdeepseek-r1-671b510,000200,000
All Image ModelsAll204,000N/A

These rate limits apply at the user level, regardless of how many keys are created.

Please note: This will evolve to change as the beta evolves.

You can monitor your API utilization and remaining requests by evaluating the following headers:

HeaderDescription
x-ratelimit-limit-requestsThe number of requests you’ve made in the current evaluation period.
x-ratelimit-remaining-requestsThe remaining requests you can make in the current evaluation period.
x-ratelimit-reset-requestsThe unix time stamp when the rate limit will reset
x-ratelimit-limit-tokensThe number of tokens generated during the evaluation period
x-ratelimit-remaining-tokensThe remaining number of tokens that can be generated during the evaluation period
x-ratelimit-reset-tokensThe duration of time in seconds until the token rate limit resets
x-venice-balance-vcuThe user’s VCU balance before the request has been processed.
x-venice-balance-usdThe user’s USD balance before the request has been processed.

Note: We will continue to monitor usage. As we add compute capacity to the network, we will review these limits. If you are consistently hitting rate limits, please contact support@venice.ai or post in the #API channel in discord for support.