Failed Request Rate Limits

Failed requests including 500 errors, 503 capacity errors, 429 rate limit errors are should be retried with exponential back off.

For 429 rate limit errors, please use x-ratelimit-reset-requests and x-ratelimit-remaining-requests to determine when to next retry.

To protect our infrastructure from abuse, if an user generates more than 20 failed requests in a 30 second window, the API will return a 429 error indicating the error rate limit has been reached:

Too many failed attempts (> 20) resulting in a non-success status code. Please wait 30s and try again. See https://docs.venice.ai/api-reference/rate-limiting for more information.

Rate limits apply to users who have purchased API credits or staked VVV to gain Diem.

Helpful links:

We will continue to monitor usage. As we add compute capacity to the network, we will review these limits. If you are consistently hitting rate limits, please contact [email protected] or post in the #API channel in Discord for assistance and we can work with you to raise your limits.

ModelModel IDReq / MinReq / DayTokens / Min
Llama 3.2 3Bllama-3.2-3b500288,0001,000,000
Qwen 3 4Bqwen3-4b500288,0001,000,000
Deepseek Coder V2deepseek-coder-v2-lite7554,000750,000
Qwen 2.5 Coder 32Bqwen-2.5-coder-32b7554,000750,000
Qwen 2.5 QWQ 32Bqwen-2.5-qwq-32b7554,000750,000
Dolphin 72Bdolphin-2.9.2-qwen2-72b5036,000750,000
Llama 3.3 70Bllama-3.3-70b5036,000750,000
Mistral Small 3.1 24Bmistral-31-24b5036,000750,000
Qwen 2.5 VL 72Bqwen-2.5-vl5036,000750,000
Qwen 3 235Bqwen3-235b5036,000750,000
Llama 3.1 405Bllama-3.1-405b2015,000750,000
Deepseek R1 671Bdeepseek-r1-671b1510,000200,000

ModelModel IDReq / MinReq / Day
Fluxflux-dev / flux-dev-uncensored2014,400
All othersAll2028,800

ModelModel IDReq / MinReq / Day
All Audio ModelsAll6086,400

Rate Limit and Consumption Headers

You can monitor your API utilization and remaining requests by evaluating the following headers:

HeaderDescription
x-ratelimit-limit-requests
The number of requests you’ve made in the current evaluation period.
x-ratelimit-remaining-requests
The remaining requests you can make in the current evaluation period.
x-ratelimit-reset-requests
The unix time stamp when the rate limit will reset.
x-ratelimit-limit-tokens
The number of total (prompt + completion) tokens used within a 1 minute sliding window.
x-ratelimit-remaining-tokens
The remaining number of total tokens that can be used during the evaluation period.
x-ratelimit-reset-tokens
The duration of time in seconds until the token rate limit resets.
x-venice-balance-diem
The user’s Diem balance before the request has been processed.
x-venice-balance-usd
The user’s USD balance before the request has been processed.