# API Spec
Source: https://docs.venice.ai/api-reference/api-spec
## Swagger Configuration
You can find the complete swagger definition for the Venice API here:
[https://api.venice.ai/doc/api/swagger.yaml](https://api.venice.ai/doc/api/swagger.yaml)
***
## OpenAI Compatibility
Venice's API implements the OpenAI API specification, ensuring compatibility with existing OpenAI clients and tools. This document outlines how to integrate with Venice using this familiar interface. The image API supports Open AI's format, but for a full set of options, we also offer a custom Venice API you can utilize.
### Base Configuration
#### Required Base URL
All API requests must use Venice's base URL:
```javascript
const BASE_URL = "https://api.venice.ai/api/v1"
```
### Client Setup
Configure your OpenAI client with Venice's base URL:
```javascript
import OpenAI from "openai";
new OpenAI({
apiKey: "--Your API Key--",
baseURL: "https://api.venice.ai/api/v1",
});
```
## Available Endpoints
### Models
* **Endpoint**: `/api/v1/models`
* **Documentation**: [Models API Reference](/api-reference/endpoint/models/list)
* **Purpose**: Retrieve available models and their capabilities
### Chat Completions
* **Endpoint**: `/api/v1/chat/completions`
* **Documentation**: [Chat Completions API Reference](/api-reference/endpoint/chat/completions)
* **Purpose**: Generate text responses in a chat-like format
### Image generations
* \*\* Endpoint\*\*: `/api/v1/image/generations`
* **Documentation**: [Image Generations API Reference](/api-reference/endpoint/image/generations)
* **Purpose**: Generate images based on text prompts
## System Prompts
Venice provides default system prompts designed to ensure uncensored and natural model responses. You have two options for handling system prompts:
1. **Default Behavior**: Your system prompts are appended to Venice's defaults
2. **Custom Behavior**: Disable Venice's system prompts entirely
### Disabling Venice System Prompts
Use the `venice_parameters` option to remove Venice's default system prompts:
```javascript
const completionStream = await openAI.chat.completions.create({
model: "default",
messages: [
{
role: "system",
content: "Your system prompt",
},
{
role: "user",
content: "Why is the sky blue?",
},
],
// @ts-expect-error Venice.ai paramters are unique to Venice.
venice_parameters: {
include_venice_system_prompt: false,
},
});
```
## Best Practices
1. **Error Handling**: Implement robust error handling for API responses
2. **Rate Limiting**: Be mindful of rate limits during the beta period
3. **System Prompts**: Test both with and without Venice's system prompts to determine the best fit for your use case
4. **API Keys**: Keep your API keys secure and rotate them regularly
## Differences from OpenAI's API
While Venice maintains high compatibility with the OpenAI API specification, there are some Venice-specific features and parameters:
1. **venice\_parameters**: Venice offers additional configurations not available via OpenAI
2. **System Prompts**: Different default behavior for system prompt handling
3. **Model Names**: Venice provides transformation for some common OpenAI model selection to comparable Venice support models, although it is recommended to review the models available on Venice directly ([https://docs.venice.ai/api-reference/endpoint/models/list](https://docs.venice.ai/api-reference/endpoint/models/list))
# Create API Key
Source: https://docs.venice.ai/api-reference/endpoint/api_keys/create
POST /api_keys
Create a new API key.
# Delete API Key
Source: https://docs.venice.ai/api-reference/endpoint/api_keys/delete
DELETE /api_keys
Delete an API key.
# Generate API Key with Web3 Wallet
Source: https://docs.venice.ai/api-reference/endpoint/api_keys/generate_web3_key/get
GET /api_keys/generate_web3_key
Returns the token required to generate an API key via a wallet.
## Autonomous Agent API Key Creation
Please see [this guide](/overview/guides/generating-api-key-agent) on how to use this endpoint.
***
# Generate API Key with Web3 Wallet
Source: https://docs.venice.ai/api-reference/endpoint/api_keys/generate_web3_key/post
POST /api_keys/generate_web3_key
Authenticates a wallet holding sVVV and creates an API key.
## Autonomous Agent API Key Creation
Please see [this guide](/overview/guides/generating-api-key-agent) on how to use this endpoint.
***
# Get API Key Details
Source: https://docs.venice.ai/api-reference/endpoint/api_keys/get
GET /api_keys/{id}
Return details about a specific API key, including rate limits and balance data.
# List API Keys
Source: https://docs.venice.ai/api-reference/endpoint/api_keys/list
GET /api_keys
Return a list of API keys.
# Rate Limit Logs
Source: https://docs.venice.ai/api-reference/endpoint/api_keys/rate_limit_logs
GET /api_keys/rate_limits/log
Returns the last 50 rate limits that the account exceeded.
## Experimental Endpoint
This is an experimental endpoint and may be subject to change.
## Postman Collection
For additional examples, please see this [Postman Collection](https://www.postman.com/veniceai/workspace/venice-ai-workspace/folder/38652128-b1bd9f3e-507b-46c5-ad35-be7419ea5ad3?action=share\&creator=38652128\&ctx=documentation\&active-environment=38652128-ef110f4e-d3e1-43b5-8029-4d6877e62041).
# Rate Limits and Balances
Source: https://docs.venice.ai/api-reference/endpoint/api_keys/rate_limits
GET /api_keys/rate_limits
Return details about user balances and rate limits.
# Speech API (Beta)
Source: https://docs.venice.ai/api-reference/endpoint/audio/speech
POST /audio/speech
Converts text to speech using various voice models and formats.
# Billing Usage API (Beta)
Source: https://docs.venice.ai/api-reference/endpoint/billing/usage
GET /billing/usage
Get paginated billing usage data for the authenticated user. NOTE: This is a beta endpoint and may be subject to change.
Exports usage data for a user. Descriptions of response fields can be found below:
* **timestamp**: The timestamp the billing usage entry was created
* **sku**: The product associated with the billing usage entry
* **pricePerUnitUsd**: The price per unit in USD
* **unit**: The number of units consumed
* **amount**: The total amount charged for the billing usage entry
* **currency**: The currency charged for the billing usage entry
* **notes**: Notes about the billing usage entry
* **inferenceDetails.requestId**: The request ID associated with the inference
* **inferenceDetails.inferenceExecutionTime**: Time taken for inference execution in milliseconds
* **inferenceDetails.promptTokens**: Number of tokens requested in the prompt. Only present for LLM usage.
* **inferenceDetails.completionTokens**: Number of tokens used in the completion. Only present for LLM usage.
# List Characters
Source: https://docs.venice.ai/api-reference/endpoint/characters/list
GET /characters
This is a preview API and may change. Returns a list of characters supported in the API.
## Experimental Endpoint
This is an experimental endpoint and may be subject to change.
## Postman Collection
For additional examples, please see this [Postman Collection](https://www.postman.com/veniceai/workspace/venice-ai-workspace/folder/38652128-b1bd9f3e-507b-46c5-ad35-be7419ea5ad3?action=share\&creator=38652128\&ctx=documentation\&active-environment=38652128-ef110f4e-d3e1-43b5-8029-4d6877e62041).
# Chat Completions
Source: https://docs.venice.ai/api-reference/endpoint/chat/completions
POST /chat/completions
Run text inference based on the supplied parameters. Long running requests should use the streaming API by setting stream=true in your request.
## Postman Collection
For additional examples, please see this [Postman Collection](https://www.postman.com/veniceai/workspace/venice-ai-workspace/folder/38652128-5a71391b-5dd8-4fe8-80be-197a958907fe?action=share\&creator=38652128\&ctx=documentation\&active-environment=38652128-ef110f4e-d3e1-43b5-8029-4d6877e62041).
***
# Model Feature Suffix
Source: https://docs.venice.ai/api-reference/endpoint/chat/model_feature_suffix
Venice supports additional capabilities within it's models that can be powered by the `venice_parameters` input on the chat completions endpoint.
In certain circumstances, you may be using a client that does not let you modify the request body. For those platforms, you can utilize Venice's Model Feature Suffix offering to pass flags in via the model ID.
## Instructions
You can append any valid `venice_parameter` value to the end of the model ID as follows. These feature suffix should follow the model name with a `:` and you can chain multiple features together:
### To Set Web Search to Auto
```
default:enable_web_search=auto
```
### To Enable Web Search and Disable System Prompt
```
default:enable_web_search=on&include_venice_system_prompt=false
```
### To Enable Web Search and Add Citations to the Response
```
default:enable_web_search=on&enable_web_citations=true
```
### To Use a Character
```
default:character_slug=alan-watts
```
### To Hide Thinking Blocks on a Reasoning Model Response
```
qwen3-4b:strip_thinking_response=true
```
### To Disable Thinking on Supported Reasoning Models
Certain reasoning models (like Qwen 3) support disabling the thinking process. You can activate using the suffix below:
```
qwen3-4b:disable_thinking=true
```
### To Add Web Search Results to a Streaming Response
This will enable web search, add citations to the response body and include the search results in the stream as the final response message.
You can see an example of this in our [Postman Collection here](https://www.postman.com/veniceai/workspace/venice-ai-workspace/request/38652128-ceef3395-451c-4391-bc7e-a40377e0357b?action=share\&source=copy-link\&creator=38652128\&active-environment=ef110f4e-d3e1-43b5-8029-4d6877e62041).
```
qwen3-4b:enable_web_search=on&enable_web_citations=true&include_search_results_in_stream=true
```
### To Add Web Search Results to a Non-Streaming Response
## Postman Example
You can view an example of this feature in our [Postman Collection here](https://www.postman.com/veniceai/workspace/venice-ai-workspace/request/38652128-857f29ff-ee70-4c7c-beba-ef884bdc93be?action=share\&creator=38652128\&ctx=documentation\&active-environment=38652128-ef110f4e-d3e1-43b5-8029-4d6877e62041).
# Generate Embeddings
Source: https://docs.venice.ai/api-reference/endpoint/embeddings/generate
POST /embeddings
Create embeddings for the supplied input.
# Edit (aka Inpaint)
Source: https://docs.venice.ai/api-reference/endpoint/image/edit
POST /image/edit
Edit or modify an image based on the supplied prompt. The image can be provided either as a multipart form-data file upload or as a base64-encoded string in a JSON request.
## Postman Collection
For additional examples, please see this [Postman Collection](https://www.postman.com/veniceai/workspace/venice-ai-workspace/folder/38652128-2d156cd6-a9bc-4586-8a8b-98e4b5c4435d?action=share\&source=copy-link\&creator=38652128\&ctx=documentation).
***
Venice’s image editor runs on the Flux Kontext Dev model, which blocks any request that tries to generate or add explicit sexual imagery, sexualise minors or make adults look child-like, or depict real-world violence or gore.
# Generate Images
Source: https://docs.venice.ai/api-reference/endpoint/image/generate
POST /image/generate
Generate an image based on input parameters
## Postman Collection
For additional examples, please see this [Postman Collection](https://www.postman.com/veniceai/workspace/venice-ai-workspace/folder/38652128-0adc004d-2edf-4b88-a3bb-0f868c791c9c?action=share\&source=copy-link\&creator=38652128\&ctx=documentation).
***
# Generate Images (OpenAI Compatible API)
Source: https://docs.venice.ai/api-reference/endpoint/image/generations
POST /images/generations
Generate an image based on input parameters using an OpenAI compatible endpoint. This endpoint does not support the full feature set of the Venice Image Generation endpoint, but is compatible with the existing OpenAI endpoint.
# Image Styles
Source: https://docs.venice.ai/api-reference/endpoint/image/styles
GET /image/styles
List available image styles that can be used with the generate API.
## Postman Collection
For additional examples, please see this [Postman Collection](https://www.postman.com/veniceai/workspace/venice-ai-workspace/folder/38652128-04b32328-197f-4548-b15e-79d4ab0728b1?action=share\&source=copy-link\&creator=38652128\&ctx=documentation).
***
# Upscale and Enhance
Source: https://docs.venice.ai/api-reference/endpoint/image/upscale
POST /image/upscale
Upscale or enhance an image based on the supplied parameters. Using a scale of 1 with enhance enabled will only run the enhancer. The image can be provided either as a multipart form-data file upload or as a base64-encoded string in a JSON request.
## Postman Collection
For additional examples, please see this [Postman Collection](https://www.postman.com/veniceai/workspace/venice-ai-workspace/folder/38652128-8c268e3a-614f-4e49-9816-e4b8d1597818?action=share\&source=copy-link\&creator=38652128\&ctx=documentation).
***
# Compatibility Mapping
Source: https://docs.venice.ai/api-reference/endpoint/models/compatibility_mapping
GET /models/compatibility_mapping
Returns a list of model compatibility mappings and the associated model.
## Postman Collection
For additional examples, please see this [Postman Collection](https://www.postman.com/veniceai/workspace/venice-ai-workspace/folder/38652128-59dfa959-7038-4cd8-b8ba-80cf09f2f026?action=share\&source=copy-link\&creator=38652128\&ctx=documentation).
***
# List Models
Source: https://docs.venice.ai/api-reference/endpoint/models/list
GET /models
Returns a list of available models supported by the Venice.ai API for both text and image inference.
## Postman Collection
For additional examples, please see this [Postman Collection](https://www.postman.com/veniceai/workspace/venice-ai-workspace/folder/38652128-59dfa959-7038-4cd8-b8ba-80cf09f2f026?action=share\&source=copy-link\&creator=38652128\&ctx=documentation).
***
# Traits
Source: https://docs.venice.ai/api-reference/endpoint/models/traits
GET /models/traits
Returns a list of model traits and the associated model.
## Postman Collection
For additional examples, please see this [Postman Collection](https://www.postman.com/veniceai/workspace/venice-ai-workspace/folder/38652128-59dfa959-7038-4cd8-b8ba-80cf09f2f026?action=share\&source=copy-link\&creator=38652128\&ctx=documentation).
***
# Error Codes
Source: https://docs.venice.ai/api-reference/error-codes
Predictable error codes for the Venice API
When an error occurs in the API, we return a consistent error response format that includes an error code, HTTP status code, and a descriptive message. This reference lists all possible error codes that you might encounter while using our API, along with their corresponding HTTP status codes and messages.
| Error Code | HTTP Status | Message | Log Level |
| ------------------------------------ | ----------- | ----------------------------------------------------------------------------------------------------------------- | --------- |
| `AUTHENTICATION_FAILED` | 401 | Authentication failed | - |
| `AUTHENTICATION_FAILED_INACTIVE_KEY` | 401 | Authentication failed - Pro subscription is inactive. Please upgrade your subscription to continue using the API. | - |
| `INVALID_API_KEY` | 401 | Invalid API key provided | - |
| `UNAUTHORIZED` | 403 | Unauthorized access | - |
| `INVALID_REQUEST` | 400 | Invalid request parameters | - |
| `INVALID_MODEL` | 400 | Invalid model specified | - |
| `CHARACTER_NOT_FOUND` | 404 | No character could be found from the provided character\_slug | - |
| `INVALID_CONTENT_TYPE` | 415 | Invalid content type | - |
| `INVALID_FILE_SIZE` | 413 | File size exceeds maximum limit | - |
| `INVALID_IMAGE_FORMAT` | 400 | Invalid image format | - |
| `CORRUPTED_IMAGE` | 400 | The image file is corrupted or unreadable | - |
| `RATE_LIMIT_EXCEEDED` | 429 | Rate limit exceeded | - |
| `MODEL_NOT_FOUND` | 404 | Specified model not found | - |
| `INFERENCE_FAILED` | 500 | Inference processing failed | error |
| `UPSCALE_FAILED` | 500 | Image upscaling failed | error |
| `UNKNOWN_ERROR` | 500 | An unknown error occurred | error |
# Rate Limits
Source: https://docs.venice.ai/api-reference/rate-limiting
This page describes the request and token rate limits for the Venice API.
## Failed Request Rate Limits
Failed requests including 500 errors, 503 capacity errors, 429 rate limit errors are should be retried with exponential back off.
For 429 rate limit errors, please use `x-ratelimit-reset-requests` and `x-ratelimit-remaining-requests` to determine when to next retry.
To protect our infrastructure from abuse, if an user generates more than 20 failed requests in a 30 second window, the API will return a 429 error indicating the error rate limit has been reached:
```
Too many failed attempts (> 20) resulting in a non-success status code. Please wait 30s and try again. See https://docs.venice.ai/api-reference/rate-limiting for more information.
```
## Paid Tier Rate Limits
Rate limits apply to users who have purchased API credits or staked VVV to gain Diem.
Helpful links:
* [Real time rate limits](https://docs.venice.ai/api-reference/endpoint/api_keys/rate_limits?playground=open)
* [Rate limit logs](https://docs.venice.ai/api-reference/endpoint/api_keys/rate_limit_logs?playground=open) - View requests that have hit the rate limiter
We will continue to monitor usage. As we add compute capacity to the network, we will review these limits. If you are consistently hitting rate limits, please contact [**support@venice.ai**](mailto:support@venice.ai) or post in the #API channel in Discord for assistance and we can work with you to raise your limits.
### Paid Tier - LLMs
***
| Model | Model ID | Req / Min | Req / Day | Tokens / Min |
| --------------------- | ----------------------- | :-------: | :-------- | :----------: |
| Llama 3.2 3B | llama-3.2-3b | 500 | 288,000 | 1,000,000 |
| Qwen 3 4B | qwen3-4b | 500 | 288,000 | 1,000,000 |
| Deepseek Coder V2 | deepseek-coder-v2-lite | 75 | 54,000 | 750,000 |
| Qwen 2.5 Coder 32B | qwen-2.5-coder-32b | 75 | 54,000 | 750,000 |
| Qwen 2.5 QWQ 32B | qwen-2.5-qwq-32b | 75 | 54,000 | 750,000 |
| Dolphin 72B | dolphin-2.9.2-qwen2-72b | 50 | 36,000 | 750,000 |
| Llama 3.3 70B | llama-3.3-70b | 50 | 36,000 | 750,000 |
| Mistral Small 3.1 24B | mistral-31-24b | 50 | 36,000 | 750,000 |
| Qwen 2.5 VL 72B | qwen-2.5-vl | 50 | 36,000 | 750,000 |
| Qwen 3 235B | qwen3-235b | 50 | 36,000 | 750,000 |
| Llama 3.1 405B | llama-3.1-405b | 20 | 15,000 | 750,000 |
| Deepseek R1 671B | deepseek-r1-671b | 15 | 10,000 | 200,000 |
### Paid Tier - Image Models
***
| Model | Model ID | Req / Min | Req / Day |
| ---------- | ------------------------------ | --------- | :-------- |
| Flux | flux-dev / flux-dev-uncensored | 20 | 14,400 |
| All others | All | 20 | 28,800 |
### Paid Tier - Audio Models
***
| Model | Model ID | Req / Min | Req / Day |
| ---------------- | -------- | :-------: | :-------: |
| All Audio Models | All | 60 | 86,400 |
## Rate Limit and Consumption Headers
You can monitor your API utilization and remaining requests by evaluating the following headers:
| Header | Description |
| ---------------------------------------------------------------------------- | --------------------------------------------------------------------------------------- |
|
**x-ratelimit-limit-requests**
| The number of requests you've made in the current evaluation period. |
|
**x-ratelimit-remaining-requests**
| The remaining requests you can make in the current evaluation period. |
|
**x-ratelimit-reset-requests**
| The unix time stamp when the rate limit will reset. |
|
**x-ratelimit-limit-tokens**
| The number of total (prompt + completion) tokens used within a 1 minute sliding window. |
|
**x-ratelimit-remaining-tokens**
| The remaining number of total tokens that can be used during the evaluation period. |
|
**x-ratelimit-reset-tokens**
| The duration of time in seconds until the token rate limit resets. |
|
**x-venice-balance-diem**
| The user's Diem balance before the request has been processed. |
|
**x-venice-balance-usd**
| The user's USD balance before the request has been processed. |
# About Venice
Source: https://docs.venice.ai/overview/about-venice
Welcome to Venice.ai's API documentation! Our API enables you to harness the power of advanced AI models for text and image generation while maintaining the highest standards of privacy and performance.
Venice's API is rapidly evolving. Please help us improve our offering by providing feedback. Join our [Discord](https://discord.gg/askvenice) to interact with our community or request new featues.
* Features and endpoints may evolve
* Model availability may change
* Your feedback shapes our development. We take your feedback seriously and work quickly to ensure we are providing you with the best possible product.
## Venice's Values
* **Privacy-First Architecture**: Built from the ground up with user privacy as a core principle. Venice does not utilize or store user data for any purposes whatsoever.
* **Open-Source**: Venice only utilizes open-source models to ensure users have full transparency into the models they are interacting with.
* **OpenAI API Compatible**: Seamless integration with existing OpenAI clients using the Venice API base URL.
## What Can I do with Venice API?
* **Chat**: Prompt any of the supported models directly for simple chat applications with custom parameters and configurations. Use default settings, or customize as deeply as you prefer.
* **Generate Images**: Use the image models to generate new images from a simple prompt, or modify images with "inpainting".
* **Assist with Coding**: Prompt models for coding related outputs or integrate the Venice API into your preferred IDE or Visual Studio Code plugin.
* **Transcribe Audio (BETA)**: Use Venice's new Voice models to transcribe text into Voice using your preferred "speaker".
* **Analyze Documents**: Send images or PDF documents for interpretation, analysis or summarization.
* **Interact with Characters**: Chat with your favorite Venice characters through the API.
* **Anything you can imagine**: The Venice API has no bounds. Tie the API into your preferred integration using the API base URL and build anything you can imagine and code.
## Accessing the API
Venice users can access the API in 3 ways:
1. **Pro Account:** Users with a PRO account are issued a one time \$10 in API credit to experiment with the Venice API.
2. **Diem:** With Venice’s [launch of the VVV token](https://venice.ai/blog/introducing-the-venice-token-vvv), users who stake tokens within the Venice protocol gain access to a daily AI inference allocation (as well as ongoing staking yield). When staking, users receive Diem, which represent a portion of the overall Venice compute capacity. You can stake VVV tokens and [see your Diem allocation here](https://venice.ai/token).
3. **USD:** Users can also opt to deposit USD into their account to pay for API inference the same way that they would on other platforms, like OpenAI or Anthropic. Users with positive USD balance are entitled to “Paid Tier” rate limits.
## API settings
Venice recognizes that users may be integrating with various applications and require API key separation and usage limitation. Venice now offers the following settings for API Keys:
1. **Administrator Settings**: Users can create new API keys directly through the API, reducing the need for UI interactions.
2. **Expiration Time**: Users can set a date for API keys expiration.
3. **Usage Limits**: Users can set daily Diem or USD limits per API key.
## Resources
Learn more about how our API handles your data and privacy.
Learn more about our pricing.
Learn more about how our API handles rate limits and usage.
Explore our API reference.
## Start Building
Ready to begin? Head to our Getting Started Guide for a step-by-step walk-through of making your first API call.
These docs are open source and can be contributed to on [Github](https://github.com/veniceai/api-docs) by submitting a pull request. Here is a simple reference guide for ["How to use Venice API"](https://venice.ai/blog/how-to-use-venice-api)
# Deprecations
Source: https://docs.venice.ai/overview/deprecations
Model inclusion and lifecycle policy and deprecations for the Venice API
## Model inclusion and lifecycle policy for the Venice API
The Venice API exists to give developers unrestricted private access to production-grade models free from hidden filters or black-box decisions.
As models improve, we occasionally retire older ones in favor of smarter, faster, or more capable alternatives. We design these transitions to be predictable and low‑friction.
## Model Deprecations
We know deprecations can be disruptive. That’s why we aim to deprecate only when necessary, and we design features like traits and Venice-branded models to minimize disruption.
We may deprecate a model when:
* A newer model offers a clear improvement for the same use case
* The model no longer meets our standards for performance or reliability
* It sees consistently low usage, and continuing to support it would fragment the experience for everyone else
## Deprecation Process
When a model meets deprecation criteria, we announce the change with 30–60 days' notice. Deprecation notices are published via the [changelog](https://featurebase.venice.ai/changelog) and our [Discord server](https://discord.gg/askvenice). When you call a deprecated model during the notice period, the API response will include a deprecation warning.
During the notice period, the model remains available, though in some cases we may reduce infrastructure capacity. We always provide a recommended replacement, and when needed, offer migration guidance to help the transition.
After the sunset date, requests to the model will automatically route to a model of similar processing power at the same or lower price. If routing is not possible for technical or safety reasons, the API will return a 410 Gone response. If a deprecated model was selected via a trait (such as `default_code`, `default_vision`, or `fastest`) that trait will be reassigned to a compatible replacement.
We never remove models silently or alter behavior without versioning. You’ll always know what’s running and how to prepare for what’s next.
Performance-only upgrades: We may roll out improvements that preserve model behavior while improving performance, latency, or cost efficiency. These updates are backward-compatible and require no customer action.
See the [Model Deprecation Tracker](#model-deprecation-tracker) below. For earlier announcements, consult the [changelog](https://featurebase.venice.ai/changelog) and our [Discord server](https://discord.gg/askvenice).
## How models are selected for the Venice API
We carefully select which models to make available based on performance, reliability, and real-world developer needs. To be included, a model must demonstrate strong performance, behave consistently under OpenAI-compatible endpoints, and offer a clear improvement over at least one of the models we already support.
Models we’re evaluating may first be released in beta to gather feedback and validate performance at scale.
We don’t expose models that are redundant, unproven, or not ready for consistent production use. Our goal is to keep the Venice API clean, capable, and optimized for what developers actually build.
Learn more in [Model Deprecations](/overview/deprecations#model-deprecations) and Current Model List.
## Versioning and Aliases
All Venice models are identified by a unique, permanent ID. For example:
`venice-uncensored`
`qwen3-235b`
`llama-3.3-70b`
`deepseek-r1-671b`
Model IDs are stable. If there's a breaking change, we will release a new model ID (for example, add a version like v2). If there are no breaking changes, we may update the existing model and will communicate significant changes.
To provide flexibility, Venice also maintains symbolic aliases — implemented through traits — that point to the recommended default model for a given task. Examples include:
* `default` → currently routes to `llama-3.3-70b`
* `default_code` → currently routes to `qwen-2.5-coder-32b`
* `default_vision` → currently routes to `mistral-31-24b`
* `default_reasoning` → currently routes to `deepseek-r1-671b`
Traits offer a stable abstraction for selecting models while giving Venice the flexibility to improve the underlying implementation. Developers who prefer automatic access to the latest recommended models can rely on trait-based aliases.
For applications that require strict consistency and predictable behavior, we recommend referencing fixed model IDs.
## Beta Models
We sometimes release models in beta to gather feedback and confirm their performance before a full production rollout. Beta status does not guarantee promotion to production. A beta model may be removed if it is too costly to run, performs poorly at scale, or raises safety concerns. Beta models can change without notice and may have limited documentation or support. Models that prove stable, broadly useful, and aligned with our standards are promoted to general availability.
To request early access, join us on [Discord](https://discord.gg/askvenice) and let us know why you’d like to join the beta tester group.
## Feedback
You can submit your feedback or request through our [Featurebase portal](https://featurebase.venice.ai). We maintain a public [changelog](https://featurebase.venice.ai/changelog), roadmap tracker, and transparent rationale for adding, upgrading, or removing models, and we encourage continuous community participation.
## Model Deprecation Tracker
The following models are scheduled for deprecation. We recommend migrating to the suggested replacements before the removal date.
| Deprecated Model | Replacement | Removal by | Status | Reason |
| ------------------------- | ------------------------------- | ------------ | --------- | --------------------------------- |
| `deepseek-r1-671b` | `qwen3-235b` | Sep 22, 2025 | Available | Better model available, low usage |
| `llama-3.1-405b` | `qwen3-235b` | Sep 22, 2025 | Available | Better model available, low usage |
| `dolphin-2.9.2-qwen2-72b` | `venice-uncensored` | Sep 22, 2025 | Available | Better model available, low usage |
| `qwen-2.5-vl` | `mistral-31-24b` | Sep 22, 2025 | Available | Low usage |
| `qwen-2.5-qwq-32b` | `qwen3-235b` (disable thinking) | Sep 22, 2025 | Available | Low usage |
| `qwen-2.5-coder-32b` | `qwen3-235b` | Sep 22, 2025 | Available | Low usage |
| `deepseek-coder-v2-lite` | `qwen3-235b` | Sep 22, 2025 | Available | Low usage |
| `pony-realism` | `lustify-sdxl` | Sep 22, 2025 | Available | Better model available |
| `stable-diffusion-3.5` | `qwen-image` | Sep 22, 2025 | Available | Low usage |
| `flux-dev` | `qwen-image` | Oct 22, 2025 | Available | Better model available |
| `flux-dev-uncensored` | `lustify-sdxl` | Oct 22, 2025 | Available | Better model available |
# Quickstart
Source: https://docs.venice.ai/overview/getting-started
## Step-by-step guide
To get started with Venice quickly, you'll need to:
Navigate to your user settings within your [Venice API Settings](https://venice.ai/settings/api) and generate a new API key.
For a more detailed guide, check out the [API Key](/overview/guides/generating-api-key) page.
Go to the ["List Models"](https://docs.venice.ai/api-reference/endpoint/models/list) API reference page and enter your API key to output a list of all models, or use the following command in a terminal
```bash Curl
# Open a terminal, replace with your actual API key, and run the following command
curl --request GET \
--url https://api.venice.ai/api/v1/models \
--header 'Authorization: Bearer '
```
```go Go
package main
import (
"fmt"
"net/http"
"io"
)
func main() {
url := "https://api.venice.ai/api/v1/models"
method := "GET"
client := &http.Client {}
req, err := http.NewRequest(method, url, nil)
if err != nil {
fmt.Println(err)
return
}
req.Header.Add("Authorization", "Bearer ")
res, err := client.Do(req)
if err != nil {
fmt.Println(err)
return
}
defer res.Body.Close()
body, err := io.ReadAll(res.Body)
if err != nil {
fmt.Println(err)
return
}
fmt.Println(string(body))
}
```
```python Python
import http.client
conn = http.client.HTTPSConnection("api.venice.ai")
payload = ''
headers = {
'Authorization': 'Bearer '
}
conn.request("GET", "/api/v1/models", payload, headers)
res = conn.getresponse()
data = res.read()
print(data.decode("utf-8"))
```
```js Javascript
/**
* Keep in mind that you will likely run into CORS issues when making requests from the browser.
* You can get around this by using a proxy service like
* https://corsproxy.io/
*
* If you're looking for a React/NextJS example, check out:
* https://codesandbox.io/p/devbox/adoring-cori-6skflx
**/
const myHeaders = new Headers();
myHeaders.append("Authorization", "Bearer ");
const requestOptions = {
method: "GET",
headers: myHeaders,
redirect: "follow"
};
fetch("https://api.venice.ai/api/v1/models", requestOptions)
.then((response) => response.text())
.then((result) => console.log(result))
.catch((error) => console.error(error));
```
Go to the ["Chat Completions"](https://docs.venice.ai/api-reference/endpoint/chat/completions) API reference page and enter your API key as well as text prompt configuration options, or modify the command below in a terminal
```bash Curl
# Open a terminal, replace with your actual API key, edit the information to your needs and run the following command
curl --request POST \
--url https://api.venice.ai/api/v1/chat/completions \
--header 'Authorization: Bearer ' \
--header 'Content-Type: application/json' \
--data '{
"model": "llama-3.3-70b",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant"
},
{
"role": "user",
"content": "Tell me about AI"
}
],
"venice_parameters": {
"enable_web_search": "on",
"include_venice_system_prompt": true
},
"frequency_penalty": 0,
"presence_penalty": 0,
"max_tokens": 1000,
"max_completion_tokens": 998,
"temperature": 1,
"top_p": 0.1,
"stream": false
}'
```
Go to the ["Generate Images"](https://docs.venice.ai/api-reference/endpoint/image/generate) API reference page and enter your API key as well as image prompt configuration options, or modify the command below in a terminal
```bash Curl
# Open a terminal, replace with your actual API key, edit the information to your needs and run the following command
curl --request POST \
--url https://api.venice.ai/api/v1/image/generate \
--header 'Authorization: Bearer ' \
--header 'Content-Type: application/json' \
--data '{
"model": "fluently-xl",
"prompt": "A beautiful sunset over a mountain range",
"negative_prompt": "Clouds, Rain, Snow",
"style_preset": "3D Model",
"height": 1024,
"width": 1024,
"steps": 30,
"cfg_scale": 7.5,
"seed": 123456789,
"lora_strength": 50,
"safe_mode": false,
"return_binary": false,
"hide_watermark": false
}'
```
# AI Agents
Source: https://docs.venice.ai/overview/guides/ai-agents
Venice is supported with the following AI Agent communities.
* [Coinbase Agentkit](https://www.coinbase.com/developer-platform/discover/launches/introducing-agentkit)
* [Eliza](https://github.com/ai16z/eliza) - Venice support introduced via this [PR](https://github.com/ai16z/eliza/pull/1008).
## Eliza Instructions
To setup Eliza with Venice, follow these instructions. A full blog post with more detail can be found [here](https://venice.ai/blog/how-to-build-a-social-media-ai-agent-with-elizaos-venice-api).
* Clone the Eliza repository:
```bash
# Clone the repository
git clone https://github.com/ai16z/eliza.git
```
* Copy `.env.example` to `.env`
* Update `.env` specifying your `VENICE_API_KEY`, and model selections for `SMALL_VENICE_MODEL`, `MEDIUM_VENICE_MODEL`, `LARGE_VENICE_MODEL`, `IMAGE_VENICE_MODEL`, instructions on generating your key can be found [here](/overview/guides/generating-api-key).
* Create a new character in the `/characters/` folder with a filename similar to `your_character.character.json`to specify the character profile, tools/functions, and Venice.ai as the model provider:
```typescript
modelProvider: "venice"
```
* Build the repo:
```bash
pnpm i
pnpm build
pnpm start
```
* Start your character
```bash
pnpm start --characters="characters/.character.json"
```
* Start the local UI to chat with the agent
# Generating an API Key
Source: https://docs.venice.ai/overview/guides/generating-api-key
Venice's API is protected via API keys. To begin using the Venice API, you'll first need to generate a new key. Follow these steps to get started.
To get to the API settings page, by visiting [https://venice.ai/settings/api](https://venice.ai/settings/api). This page is accessible by clicking "API" in the left hand toolbar, or by clicking “API” within your user settings.
Within this dashboard, you're able to view your Diem and USD balances, your API Tier, your API Usage, and your API Keys.
Scroll down the dashboard and select "Generate New API Key". You'll be presented with a list of options.
* **Description:** This is used to name your API key
* **API Key Type:**
* “Admin” keys have the ability to delete or generate additional API keys programmatically.
* “Inference Only” keys are only permitted to run inference.
* **Expires at:** You can choose to set an expiration date for the API key after which it will cease to function. By default, a date will not be set, and the key will work in perpetuity.
* **Epoch Consumption Limits:** This allows you to create limits for API usage from the individual API key. You can choose to limit the Diem or USD amount allowable within a given epoch (24hrs).
Clicking Generate will show you the API key.
**Important:** This key is only shown once. Make sure to copy it and store it in a safe place. If you lose it, you'll need to delete it and create a new one.
# Autonomous Agent API Key Creation
Source: https://docs.venice.ai/overview/guides/generating-api-key-agent
Autonomous AI Agents can programmatically access Venice.ai's APIs without any human interaction using the "api\_keys" endpoint. AI Agents are now able to manage their own wallets on the BASE blockchain, allowing them to programmatically acquire and stake VVV token to earn a daily Diem inference allocation. Venice's new API endpoint allows them to automate further by generating their own API key.
To autonomously generate an API key within an agent, you must:
The agent will need VVV token to complete this process. This can be achieved by sending tokens directly to the agent wallet, or having the agent swap on a Decentralized Exchange (DEX), like [Aerodrome](https://aerodrome.finance/swap?from=eth\&to=0xacfe6019ed1a7dc6f7b508c02d1b04ec88cc21bf\&chain0=8453\&chain1=8453) or [Uniswap](https://app.uniswap.org/swap?chain=base\&inputCurrency=NATIVE\&outputCurrency=0xacfe6019ed1a7dc6f7b508c02d1b04ec88cc21bf).
Once funded, the agent will need to stake the VVV tokens within the [Venice Staking Smart Contract](https://basescan.org/address/0x321b7ff75154472b18edb199033ff4d116f340ff#code). To accomplish this you first must approve VVV tokens for staking, then execute a "stake" transaction.
When the transaction is complete, you will see the VVV tokens exit the wallet and sVVV tokens returned to your wallet. This indicates a successful stake.
To generate an API key, you need to first obtain your validation token. You can get this by calling this [API endpoint ](https://docs.venice.ai/api-reference/endpoint/api_keys/generate_web3_key/get)`https://api.venice.ai/api/v1/api_keys/generate_web3_key` . The API response will provide you with a "token".
Here is an example request:
```
curl --request GET \
--url https://api.venice.ai/api/v1/api_keys/generate_web3_key
```
Sign the token with the wallet holding VVV to complete the association between the wallet and token.
Now you can call this same [API endpoint](https://docs.venice.ai/api-reference/endpoint/api_keys/generate_web3_key/get) `https://api.venice.ai/api/v1/api_keys/generate_web3_key` to create your API key.
You will need the following information to proceed, which is described further within the "[Generating API Key Guide](https://docs.venice.ai/overview/guides/generating-api-key)":
* API Key Type: Inference or Admin
* ConsumptionLimit: To be used if you want to limit the API key usage
* Signature: The signed token from step 4
* Token: The unsigned token from step 3
* Address: The agent's wallet address
* Description: String to describe your API Key
* ExpiresAt: Option to set an expiration date for the API key (empty for no expiration)
Here is an example request:
```
curl --request POST \
--url https://api.venice.ai/api/v1/api_keys/generate_web3_key \
--header 'Authorization: Bearer ' \
--header 'Content-Type: application/json' \
--data '{
"description": "Web3 API Key",
"apiKeyType": "INFERENCE",
"signature": "",
"token": "",
"address": "",
"consumptionLimit": {
"diem": 1
}
}'
```
Example code to interact with this API can be found below:
```
import { ethers } from "ethers";
// NOTE: This is an example. To successfully generate a key, your address must be holding
// and staking VVV.
const wallet = ethers.Wallet.createRandom()
const address = wallet.address
console.log("Created address:", address)
// Request a JWT from Venice's API
const response = await fetch('https://api.venice.ai/api/v1/api_keys/generate_web3_key')
const token = (await response.json()).data.token
console.log("Validation Token:", token)
// Sign the token with your wallet and pass that back to the API to generate an API key
const signature = await wallet.signMessage(token)
const postResponse = await fetch('https://api.venice.ai/api/v1/api_keys/generate_web3_key', {
method: 'POST',
body: JSON.stringify({
address,
signature,
token,
apiKeyType: 'ADMIN'
})
})
await postResponse.json()
```
# Integrations
Source: https://docs.venice.ai/overview/guides/integrations
Here is a list of third party tools with Venice.ai integrations.
[How to use Venice API](https://venice.ai/blog/how-to-use-venice-api) reference guide.
## Venice Confirmed Integrations
* Agents
* [ElizaOS](https://venice.ai/blog/how-to-build-a-social-media-ai-agent-with-elizaos-venice-api) (local build)
* [ElizaOS](https://venice.ai/blog/how-to-launch-an-elizaos-agent-on-akash-using-venice-api-in-less-than-10-minutes) (via [Akash Template](https://console.akash.network/templates/akash-network-awesome-akash-Venice-ElizaOS))
* Coding
* [Cursor IDE](https://venice.ai/blog/how-to-code-with-the-venice-api-in-cursor-a-quick-guide)
* [Cline](https://venice.ai/blog/how-to-use-the-venice-api-with-cline-in-vscode-a-developers-guide) (VSC Extension)
* [ROO Code ](https://venice.ai/blog/how-to-use-the-roo-ai-coding-assistant-in-private-with-venice-api-a-quick-guide)(VSC Extension)
* [VOID IDE](https://venice.ai/blog/how-to-use-open-source-ai-code-editor-void-in-private-with-venice-api)
* Assistants
* [Brave Leo Browser ](https://venice.ai/blog/how-to-use-brave-leo-ai-with-venice-api-a-privacy-first-browser-ai-assistant)
## Community Confirmed
These integrations have been confirmed by the community. Venice is in the process of confirming these integrations and creating how-to guides for each of the following:
* Agents/Bots
* [Coinbase Agentkit](https://www.coinbase.com/developer-platform/discover/launches/introducing-agentkit)
* [Eliza\_Starter](https://github.com/Baidis/eliza-Venice) Simplified Eliza setup.
* [Venice AI Discord Bot](https://bobbiebeach.space/blog/venice-ai-discord-bot-full-setup-guide-features/)
* [JanitorAI](https://janitorai.com/)
* Coding
* [Aider](https://github.com/Aider-AI/aider), AI pair programming in your terminal
* [Alexcodes.app](https://alexcodes.app/)
* Assistants
* [Jan - Local AI Assistant](https://github.com/janhq/jan)
* [llm-venice](https://github.com/ar-jan/llm-venice)
* [unOfficial PHP SDK for Venice](https://github.com/georgeglarson/venice-ai-php)
* [Msty](https://msty.app)
* [Open WebUI](https://github.com/open-webui/open-webui)
* [Librechat](https://www.librechat.ai/)
* [ScreenSnapAI](https://screensnap.ai/)
## Venice API Raw Data
Many users have requested access to Venice API docs and data in a format acceptable for use with RAG (Retrieval-Augmented Generation) for various purposes. The full API specification is available within the "API Swagger" document below, in yaml format. The Venice API documents included throughout this API Reference webpage are available from the link below, with most documents in .mdx format.
[API Swagger](https://api.venice.ai/doc/api/swagger.yaml)
[API Docs](https://github.com/veniceai/api-docs/archive/refs/heads/main.zip)
# Using Postman
Source: https://docs.venice.ai/overview/guides/postman
## Overview
Venice provides a comprehensive Postman collection that allows developers to explore and test the full capabilities of our API. This collection includes pre-configured requests, examples, and environment variables to help you get started quickly with Venice's AI services.
## Accessing the Collection
Our official Postman collection is available in the Venice AI Workspace:
* [Venice AI Postman Workspace](https://www.postman.com/veniceai/workspace/venice-ai-workspace)
* [Venice AI Postman Examples](https://postman.venice.ai/)
## Collection Features
* **Ready-to-Use Requests**: Pre-configured API calls for all Venice endpoints
* **Environment Templates**: Properly structured environment variables
* **Request Examples**: Real-world usage examples for each endpoint
* **Response Samples**: Example responses to help you understand the API's output
* **Documentation**: Inline documentation for each request
## Getting Started
* Navigate to the Venice AI Workspace
* Click "Fork" to create your own copy of the collection
* Choose your workspace destination
* Create a new environment in Postman
* Add your Venice API key
* Configure the base URL: `https://api.venice.ai/api/v1`
* Select any request from the collection
* Ensure your environment is selected
* Click "Send" to test the API
## Available Endpoints
The collection includes examples for all Venice API endpoints:
* Text Generation
* Image Generation
* Model Information
* Image Upscaling
* System Prompt Configuration
## Best Practices
* Keep your API key secure and never share it
* Use environment variables for sensitive information
* Test responses in the Postman console before implementation
* Review the example responses for expected data structures
*Note: The Postman collection is regularly updated to reflect the latest API changes and features.*
# Structured Responses
Source: https://docs.venice.ai/overview/guides/structured-responses
Using structured responses within the Venice API
Venice has now included structured outputs via “response\_format” as an available field in the API. This field enables you to generate responses to your prompts that follow a specific pre-defined format. With this new method, the models are less likely to hallucinate incorrect keys or values within the response, which was more prevalent when attempting through system prompt manipulation or via function calling.
The structured output “response\_format” field utilizes the OpenAI API format, and is further described in the openAI guide [here](https://platform.openai.com/docs/guides/structured-outputs). OpenAI also released an introduction article to using stuctured outputs within the API specifically [here](https://openai.com/index/introducing-structured-outputs-in-the-api/). As this is advanced functionality, there are a handful of “gotchas” on the bottom of this page that should be followed.
This functionality is not natively available for all models. Please refer to the models section [here](https://docs.venice.ai/api-reference/endpoint/models/list?playground=open), and look for “supportsResponseSchema” for applicable models.
```json
{
"id": "dolphin-2.9.2-qwen2-72b",
"type": "text",
"object": "model",
"created": 1726869022,
"owned_by": "venice.ai",
"model_spec": {
"availableContextTokens": 32768,
"capabilities": {
"supportsFunctionCalling": true,
"supportsResponseSchema": true,
"supportsWebSearch": true
},
```
### How to use Structured Responses
To properly use the “response\_format” you can define your schema with various “properties”, representing categories of outputs, each with individually configured data types. These objects can be nested to create more advanced structures of outputs.
Here is an example of an API call using response\_format to explain the step-by-step process of solving a math equation.
You can see that the properties were configured to require both “steps” and “final\_answer” within the response. Within nesting, the steps category consists of both an “explanation” and an “output”, each as strings.
```json
curl --request POST \
--url https://api.venice.ai/api/v1/chat/completions \
--header 'Authorization: Bearer ' \
--header 'Content-Type: application/json' \
--data '{
"model": "dolphin-2.9.2-qwen2-72b",
"messages": [
{
"role": "system",
"content": "You are a helpful math tutor."
},
{
"role": "user",
"content": "solve 8x + 31 = 2"
}
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "math_response",
"strict": true,
"schema": {
"type": "object",
"properties": {
"steps": {
"type": "array",
"items": {
"type": "object",
"properties": {
"explanation": {
"type": "string"
},
"output": {
"type": "string"
}
},
"required": ["explanation", "output"],
"additionalProperties": false
}
},
"final_answer": {
"type": "string"
}
},
"required": ["steps", "final_answer"],
"additionalProperties": false
}
}
}
}
```
Here is the response that was received from the model. You can see that the structure followed the requirements by first providing the “steps” with the “explanation” and “output” of each step, and then the “final answer”.
```json
{
"steps": [
{
"explanation": "Subtract 31 from both sides to isolate the term with x.",
"output": "8x + 31 - 31 = 2 - 31"
},
{
"explanation": "This simplifies to 8x = -29.",
"output": "8x = -29"
},
{
"explanation": "Divide both sides by 8 to solve for x.",
"output": "x = -29 / 8"
}
],
"final_answer": "x = -29 / 8"
}
```
Although this is a simple example, this can be extrapolated into more advanced use cases like: Data Extraction, Chain of Thought Exercises, UI Generation, Data Categorization and many others.
### Gotchas
Here are some key requirements to keep in mind when using Structured Outputs via response\_format:
* Initial requests using response\_format may take longer to generate a response. Subsequent requests will not experience the same latency as the initial request.
* For larger queries, the model can fail to complete if either `max_tokens` or model timeout are reached, or if any rate limits are violated
* Incorrect schema format will result in errors on completion, usually due to timeout
* Although response\_format ensures the model will output a particular way, it does not guarantee that the model provided the correct information within. The content is driven by the prompt and the model performance.
* Structured Outputs via response\_format are not compatible with parallel function calls
* Important: All fields or parameters must include a `required` tag. To make a field optional, you need to add a `null` option within the `type`of the field, like this `"type": ["string", "null"]`
* It is possible to make fields optional by giving a `null` options within the required field to allow an empty response.
* Important: `additionalProperties` must be set to false for response\_format to work properly
* Important: `strict` must be set to true for response\_format to work properly
# Current Models
Source: https://docs.venice.ai/overview/models
Complete list of available models on Venice AI platform
## Text Models
| Model Name | Model ID | Price (in/out) | Context Limit | Capabilities | Traits |
| ------------------------ | ------------------------- | ---------------- | ------------- | --------------------------- | ------------------ |
| Venice Uncensored 1.1 | `venice-uncensored` | `$0.50 / $2.00` | 32,768 | — | — |
| Venice Reasoning | `qwen-2.5-qwq-32b` | `$0.50 / $2.00` | 32,768 | Reasoning | — |
| Venice Small | `qwen3-4b` | `$0.15 / $0.60` | 40,960 | Function Calling, Reasoning | — |
| Venice Medium (3.2 beta) | `mistral-32-24b` | `$0.50 / $2.00` | 131,072 | Function Calling, Vision | — |
| Venice Medium (3.1) | `mistral-31-24b` | `$0.50 / $2.00` | 131,072 | Function Calling, Vision | default\_vision |
| Venice Large 1.1 | `qwen3-235b` | `$1.50 / $6.00` | 131,072 | Function Calling, Reasoning | — |
| Llama 3.2 3B | `llama-3.2-3b` | `$0.15 / $0.60` | 131,072 | Function Calling | fastest |
| Llama 3.3 70B | `llama-3.3-70b` | `$0.70 / $2.80` | 65,536 | Function Calling | default |
| Llama 3.1 405B (D) | `llama-3.1-405b` | `$1.50 / $6.00` | 65,536 | — | most\_intelligent |
| Dolphin 72B (D) | `dolphin-2.9.2-qwen2-72b` | `$0.70 / $2.80` | 32,768 | — | most\_uncensored |
| Qwen 2.5 VL 72B (D) | `qwen-2.5-vl` | `$0.70 / $2.80` | 32,768 | Vision | — |
| Qwen 2.5 Coder 32B (D) | `qwen-2.5-coder-32b` | `$0.50 / $2.00` | 32,768 | — | default\_code |
| DeepSeek R1 671B (D) | `deepseek-r1-671b` | `$3.50 / $14.00` | 131,072 | Reasoning | default\_reasoning |
| DeepSeek Coder V2 Lite | `deepseek-coder-v2-lite` | `$0.50 / $2.00` | 131,072 | — | — |
*Pricing is per 1M tokens (input / output). Models with reasoning capabilities support advanced reasoning via thinking mode*.
### Popular Text Models
`qwen3-235b` Venice Large 1.1 - Most powerful flagship model\
`mistral-31-24b` Venice Medium (3.1) - Vision + function calling\
`qwen3-4b` Venice Small - Fast, affordable for most tasks\
`llama-3.3-70b` Llama 3.3 70B - Balanced high-performance model
### Text Model Categories
**Reasoning Models**
`qwen3-235b` Venice Large 1.1 - Advanced reasoning capabilities\
`qwen3-4b` Venice Small - Efficient reasoning model
**Vision-Capable Models**
`mistral-31-24b` Venice Medium (3.1) - Vision-capable model
**Cost-Optimized Models**
`qwen3-4b` Venice Small - Best balance of speed and cost\
`llama-3.2-3b` Llama 3.2 3B - Fastest for simple tasks
**Uncensored Models**
`venice-uncensored` Venice Uncensored 1.1 - No content filtering
**High-Intelligence Models**
`llama-3.3-70b` Llama 3.3 70B - Balanced high-intelligence\
`qwen3-235b` Venice Large 1.1 - Most powerful flagship model
***
## Image Models
| Model Name | Model ID | Price | Model Source | Traits |
| ------------------------ | ---------------------- | ------- | -------------------------- | ---------------------- |
| Venice SD35 | `venice-sd35` | `$0.01` | Stable Diffusion 3.5 Large | default, eliza-default |
| HiDream | `hidream` | `$0.01` | HiDream I1 Dev | — |
| Qwen Image | `qwen-image` | `$0.01` | Qwen Image | — |
| FLUX Standard (D) | `flux-dev` | `$0.01` | FLUX.1 Dev | highest\_quality |
| FLUX Custom (D) | `flux-dev-uncensored` | `$0.01` | FLUX.1 Dev | — |
| Lustify SDXL | `lustify-sdxl` | `$0.01` | Lustify SDXL | — |
| Pony Realism (D) | `pony-realism` | `$0.01` | Pony Realism | most\_uncensored |
| Stable Diffusion 3.5 (D) | `stable-diffusion-3.5` | `$0.01` | Stable Diffusion 3.5 Large | — |
| Anime (WAI) | `wai-Illustrious` | `$0.01` | WAI-Illustrious | — |
### Popular Image Models
`qwen-image` Qwen Image - Highest quality image generation\
`venice-sd35` Venice SD35 - Default choice with Eliza integration\
`lustify-sdxl` Lustify SDXL - Uncensored image generation\
`hidream` HiDream - Production-ready generation
### Image Model Categories
**High-Quality Models**
`qwen-image` Qwen Image - Highest quality output\
`hidream` HiDream - Production-ready generation
**Default Models**
`venice-sd35` Venice SD35 - Default choice, Eliza-optimized
**Uncensored Models**
`lustify-sdxl` Lustify SDXL - Adult content generation\
`wai-Illustrious` Anime (WAI) - Best for anime/wai NSFW capable
***
## Audio Models
### Text-to-Speech Models
`tts-kokoro` Kokoro TTS - 60+ multilingual voices for natural speech
| Model Name | Model ID | Price | Voices Available | Model Source |
| --------------------- | ------------ | -------------------- | ---------------- | ------------ |
| Kokoro Text to Speech | `tts-kokoro` | `$3.50` per 1M chars | 60+ voices | Kokoro-82M |
The tts-kokoro model supports a wide range of multilingual and stylistic voices (including af\_nova, am\_liam, bf\_emma, zf\_xiaobei, and jm\_kumo). Voice is selected using the voice parameter in the request payload.
***
## Embedding Models
`text-embedding-bge-m3` BGE-M3 - Versatile embedding model for text similarity
| Model Name | Model ID | Price | Model Source |
| ---------- | ----------------------- | ----------------------------- | ------------------- |
| BGE-M3 | `text-embedding-bge-m3` | `$0.15 / $0.60` per 1K tokens | KimChen/bge-m3-GGUF |
## Image Processing Models
`upscaler` Image Upscaler - Enhance image resolution up to 4x\
`flux-kontext-dev` Flux Kontext DEV - Multimodal image editing model
### Image Upscaler
| Model Name | Model ID | Price | Upscale Options |
| ---------- | ---------- | ------- | ------------------------ |
| Upscaler | `upscaler` | `$0.01` | `2x ($0.02), 4x ($0.08)` |
### Image Editing (Inpaint)
| Model Name | Model ID | Price | Model Source | Traits |
| ---------------- | ------------------ | ------- | ------------ | -------------------- |
| Flux Kontext DEV | `flux-kontext-dev` | `$0.04` | Flux Kontext | specialized\_editing |
## Model Features
* **Vision**: Ability to process and understand images
* **Reasoning**: Advanced logical reasoning capabilities
* **Function Calling**: Support for calling external functions and tools
* **Traits**: Special characteristics or optimizations (e.g., fastest, most\_intelligent, most\_uncensored)
## Usage Notes
* Input pricing refers to tokens sent to the model
* Output pricing refers to tokens generated by the model
* Context limits define the maximum number of tokens the model can process in a single request
* (D) Scheduled for deprecation. For timelines and migration guidance, see the [Deprecation Tracker](/overview/deprecations#model-deprecation-tracker).
# API Pricing
Source: https://docs.venice.ai/overview/pricing
### Pro Users
Pro subscribers automatically receive a one-time \$10 API credit upon upgrading to Pro – double the credit amount compared to competitors. This credit provides capacity for testing and small applications, with seamless pathways to scale via VVV staking or direct USD payments for larger implementations.
### Paid Tier
Paid access to the Venice API can be obtained in two ways:
Users can purchase API credits via the [API Dashboard](https://venice.ai/settings/api).
Users can [stake VVV](https://venice.ai/blog/how-to-stake-and-claim-your-venice-tokens-vvv) which in return, provides you proportional access to
Venice's compute pool in units called Diem. A Diem is worth \$1 of API credit per day. The more you stake, the higher your Diem allocation, and they renew daily. You also earn staking rewards while staked. Visit the [Token Dashboard](https://venice.ai/token) to stake VVV and to see how much Diem you control.
## Model Pricing
### Chat Models
Chat models are priced per million tokens, with separate pricing for input and output tokens. While the price is per million tokens, you will only be charged for the tokens you use.
You can estimate the token count of a chat request using [this calculator](https://quizgecko.com/tools/token-counter).
| Model | Input Tokens (per M.) | Input Tokens (per M.) | Output Tokens (per M.) | Output Tokens (per M.) |
| ---------------------------------------------------------------------------------------------------------------------------------------------------- | :-------------------: | :-------------------: | :--------------------: | :--------------------: |
| Venice Small (Qwen 3 4B)
Llama 3.2 3B
BGE 3 Embeddings
| 0.15 Diem | \$0.15 | 0.6 Diem | \$0.60 |
| Venice Medium (Mistral Small 3.1 24B)
Venice Uncensored
Qwen 2.5 Coder 32B
Qwen 2.5 QWQ 32B
| 0.5 Diem | \$0.50 | 2.0 Diem | \$2.00 |
| Llama 3.3 70B
Dolphin 72B
Qwen 2.5 VL 72B
| 0.7 Diem | \$0.70 | 2.8 Diem | \$2.80 |
| Venice Large (Qwen 3 235B)
Llama 3.1 405B
| 1.5 Diem | \$1.50 | 6.0 Diem | \$6.00 |
| DeepSeek R1 671B
| 3.5 Diem | \$3.50 | 14.0 Diem | \$14.00 |
### Image Models
Venice Image models are currently priced at the following rates:
| Model | Diem Pricing | USD Pricing |
| ---------------------- | :----------: | :---------: |
| Generation | 0.01 Diem | \$0.01 USD |
| Upscale / Enhance (2x) | 0.02 Diem | \$0.02 USD |
| Upscale / Enhance (4x) | 0.08 Diem | \$0.08 USD |
| Edit (aka Inpaint) | 0.04 Diem | \$0.04 USD |
### Audio Models
All Venice Audio models are currently priced at the following rates:
| Model | Input Characters (per M.) | Input Characters (per M.) |
| ----- | :-----------------------: | :-----------------------: |
| All | 3.5 Diem | \$3.50 USD |
# Privacy
Source: https://docs.venice.ai/overview/privacy
Nearly all AI apps and services collect user data (personal information, prompt text, and AI text and image responses) in central servers, which they can access, and which they can (and do) share with third parties, ranging from ad networks to governments. Even if a company wants to keep this data safe, data breaches happen [all the time](https://www.wired.com/story/wired-guide-to-data-breaches/), often unreported.
> The only way to achieve reasonable user privacy is to avoid collecting this information in the first place. This is harder to do from an engineering perspective, but we believe it’s the correct approach.
### Privacy as a principle
One of Venice’s guiding principles is user privacy. The platform's architecture flows from this philosophical principle, and every component is designed with this objective in mind.
#### Architecture
The Venice API replicates the same technical architecture as the Venice platform from a backend perspective.
**Venice does not store or log any prompt or model responses on our servers.** API calls are forwarded directly to GPUs running across a collection of decentralized providers over encrypted HTTPS paths.