The Venice API implements two tiers of rate limits:

Explorer:
This is a trial tier designed to allow testing the Venice AI. It is included for PRO users.

Paid:
This is a paid tier. Users will be debited VCU first, and when VCU is depleted, will be debited USD. If no balance of either VCU or USD is available, requests will default to explorer tier (if PRO user), or fail (if free user).

These rate limits apply at the user level, regardless of how many keys are created.

Explorer Tier Rate Limits

These rate limits apply to Pro users who are testing the API with their application. These users can upgrade to paid tier limits by by purchasing USD credits or staking VVV to gain VCU.

Explorer Tier - LLMs


ModelModel IDReq / MinReq / DayTokens / Min
Deepseek Coder V2deepseek-coder-v2-lite510030,000
Deepseek R1 70Bdeepseek-r1-llama-70b510030,000
Deepseek R1 671Bdeepseek-r1-671b12030,000
Dolphin 72Bdolphin-2.9.2-qwen2-72b510030,000
Llama 3.2 3Bllama-3.2-3b510030,000
Llama 3.3 70Bllama-3.3-70b510030,000
Llama 3.1 405Bllama-3.1-405b24030,000
Qwen 2.5 Coder 32Bqwen-2.5-coder-32b510030,000
Qwen 2.5 QWQ 32Bqwen-2.5-qwq-32b510030,000
Qwen 2.5 VLqwen-2.5-vl510030,000
Mistral Small 3.1 24Bmistral-31-24b510030,000

Explorer Tier - Image Models


ModelModel IDReq / MinReq / Day
All Image ModelsAll120

Explorer Tier - Audio Models


ModelModel IDReq / MinReq / Day
All Audio ModelsAll5100

These rate limits apply to users who have purchased USD credits or staked VVV to gain VCU.

We will continue to monitor usage. As we add compute capacity to the network, we will review these limits. If you are consistently hitting rate limits, please contact support@venice.ai or post in the #API channel in Discord for support.

ModelModel IDReq / MinReq / DayTokens / Min
Deepseek Coder V2deepseek-coder-v2-lite7554,000450,000
Deepseek R1 671Bdeepseek-r1-671b1510,000200,000
Dolphin 72Bdolphin-2.9.2-qwen2-72b5036,000300,000
Llama 3.2 3Bllama-3.2-3b500288,000100,000
Llama 3.3 70Bllama-3.3-70b5036,000300,000
Llama 3.1 405Bllama-3.1-405b2015,000100,000
Qwen 2.5 Coder 32Bqwen-2.5-coder-32b7554,000450,000
Qwen 2.5 QWQ 32Bqwen-2.5-qwq-32b7554,000450,000
Qwen 2.5 VLqwen-2.5-vl5036,000300,000
Mistral Small 3.1 24Bmistral-31-24b5036,000300,000

ModelModel IDReq / MinReq / Day
Fluxflux-dev / flux-dev-uncensored2014,400
All othersAll2028,800

ModelModel IDReq / MinReq / Day
All Audio ModelsAll6086,400

Rate Limit and Consumption Headers

You can monitor your API utilization and remaining requests by evaluating the following headers:

HeaderDescription
x-ratelimit-limit-requests
The number of requests you’ve made in the current evaluation period.
x-ratelimit-remaining-requests
The remaining requests you can make in the current evaluation period.
x-ratelimit-reset-requests
The unix time stamp when the rate limit will reset.
x-ratelimit-limit-tokens
The number of total (prompt + completion) tokens used within a 1 minute sliding window.
x-ratelimit-remaining-tokens
The remaining number of total tokens that can be used during the evaluation period.
x-ratelimit-reset-tokens
The duration of time in seconds until the token rate limit resets.
x-venice-balance-vcu
The user’s VCU balance before the request has been processed.
x-venice-balance-usd
The user’s USD balance before the request has been processed.