Rate limits vary by model and tier. You can check your exact limits anytime:
curl https://api.venice.ai/api/v1/api_keys/rate_limits \
-H "Authorization: Bearer $VENICE_API_KEY "
Default Limits
Text Models
Text models are grouped into tiers based on size. Each model card on the Models page displays its tier badge.
Tier Requests/min Tokens/min XS 500 1,000,000 S 75 750,000 M 50 750,000 L 20 500,000
Which models are in each tier?
XS qwen3-4b llama-3.2-3bS mistral-31-24b venice-uncensoredM llama-3.3-70b qwen3-next-80b google-gemma-3-27b-itL qwen3-235b-a22b-instruct-2507 qwen3-235b-a22b-thinking-2507 deepseek-ai-DeepSeek-R1 grok-41-fast kimi-k2-thinking gemini-3-pro-preview hermes-3-llama-3.1-405b qwen3-coder-480b-a35b-instruct zai-org-glm-4.6 openai-gpt-oss-120b
Other Models
Type Requests/min Image 20 Audio 60 Embedding 500 Video (queue) 20 Video (retrieve) 120
Handling Errors
Failed requests (500, 503, 429) should be retried with exponential backoff.
For 429 errors specifically, check the x-ratelimit-reset-requests header for the exact Unix timestamp when you can retry. Most HTTP libraries have built-in retry mechanisms that handle this automatically.
Abuse Protection
If you generate more than 20 failed requests in 30 seconds, the API will block further requests for 30 seconds:
Too many failed attempts (> 20) resulting in a non-success status code. Please wait 30s and try again.
Every response includes these headers:
Header Description x-ratelimit-limit-requestsMax requests allowed in current window x-ratelimit-remaining-requestsRequests remaining in current window x-ratelimit-reset-requestsUnix timestamp when window resets x-ratelimit-limit-tokensMax tokens allowed per minute x-ratelimit-remaining-tokensTokens remaining in current minute x-ratelimit-reset-tokensSeconds until token limit resets
Partner Tier
Partners get significantly higher rate limits:
Tier Requests/min Tokens/min XS 500 2,000,000 S 150 1,500,000 M 100 1,500,000 L 60 1,000,000
Type Requests/min Image 60 Audio 120 Embedding 500
If you’re consistently hitting your rate limits and your usage patterns show sustained demand over time , reach out to discuss partner access: [email protected] .
Partner tier limits can be adjusted based on your specific needs.