Rate Limits

Failed Request Rate Limits

Failed requests including 500 errors, 503 capacity errors, 429 rate limit errors are should be retried with exponential back off. For 429 rate limit errors, please use x-ratelimit-reset-requests and x-ratelimit-remaining-requests to determine when to next retry. To protect our infrastructure from abuse, if an user generates more than 20 failed requests in a 30 second window, the API will return a 429 error indicating the error rate limit has been reached:

Too many failed attempts (> 20) resulting in a non-success status code. Please wait 30s and try again. See https://docs.venice.ai/api-reference/rate-limiting for more information.

Paid Tier Rate Limits

Rate limits apply to users who have purchased API credits or staked VVV to gain Diem. Helpful links:

Real time rate limits
Rate limit logs - View requests that have hit the rate limiter

We will continue to monitor usage. As we add compute capacity to the network, we will review these limits. If you are consistently hitting rate limits, please contact [email protected] or post in the #API channel in Discord for assistance and we can work with you to raise your limits.

Paid Tier - LLMs

Model	Model ID	Req / Min	Req / Day	Tokens / Min
Llama 3.2 3B	llama-3.2-3b	500	288,000	1,000,000
Qwen 3 4B	qwen3-4b	500	288,000	1,000,000
Deepseek Coder V2	deepseek-coder-v2-lite	75	54,000	750,000
Qwen 2.5 Coder 32B	qwen-2.5-coder-32b	75	54,000	750,000
Qwen 2.5 QWQ 32B	qwen-2.5-qwq-32b	75	54,000	750,000
Dolphin 72B	dolphin-2.9.2-qwen2-72b	50	36,000	750,000
Llama 3.3 70B	llama-3.3-70b	50	36,000	750,000
Mistral Small 3.1 24B	mistral-31-24b	50	36,000	750,000
Qwen 2.5 VL 72B	qwen-2.5-vl	50	36,000	750,000
Qwen 3 235B	qwen3-235b	50	36,000	750,000
Llama 3.1 405B	llama-3.1-405b	20	15,000	750,000
Deepseek R1 671B	deepseek-r1-671b	15	10,000	200,000

Paid Tier - Image Models

Model	Model ID	Req / Min	Req / Day
Flux	flux-dev / flux-dev-uncensored	20	14,400
All others	All	20	28,800

Paid Tier - Audio Models

Model	Model ID	Req / Min	Req / Day
All Audio Models	All	60	86,400

Rate Limit and Consumption Headers

You can monitor your API utilization and remaining requests by evaluating the following headers:

Header	Description
x-ratelimit-limit-requests	The number of requests you’ve made in the current evaluation period.
x-ratelimit-remaining-requests	The remaining requests you can make in the current evaluation period.
x-ratelimit-reset-requests	The unix time stamp when the rate limit will reset.
x-ratelimit-limit-tokens	The number of total (prompt + completion) tokens used within a 1 minute sliding window.
x-ratelimit-remaining-tokens	The remaining number of total tokens that can be used during the evaluation period.
x-ratelimit-reset-tokens	The duration of time in seconds until the token rate limit resets.
x-venice-balance-diem	The user’s Diem balance before the request has been processed.
x-venice-balance-usd	The user’s USD balance before the request has been processed.

Venice APIs

​Failed Request Rate Limits

​Paid Tier Rate Limits

​Paid Tier - LLMs

​Paid Tier - Image Models

​Paid Tier - Audio Models

​Rate Limit and Consumption Headers

Failed Request Rate Limits

Paid Tier Rate Limits

Paid Tier - LLMs

Paid Tier - Image Models

Paid Tier - Audio Models

Rate Limit and Consumption Headers