Authentication
The Venice API uses API keys for authentication. Create and manage your API keys in your API settings. All API requests require HTTP Bearer authentication:Your API key is a secret. Do not share it or expose it in any client-side code.
OpenAI Compatibility
Venice’s API implements the OpenAI API specification, ensuring compatibility with existing OpenAI clients and tools. This allows you to integrate with Venice using the familiar OpenAI interface while accessing Venice’s unique features and uncensored models.Setup
Configure your client to use Venice’s base URL (https://api.venice.ai/api/v1
) and make your first request:
Venice-Specific Features
System Prompts
Venice provides default system prompts designed to ensure uncensored and natural model responses. You have two options for handling system prompts:- Default Behavior: Your system prompts are appended to Venice’s defaults
- Custom Behavior: Disable Venice’s system prompts entirely
Disabling Venice System Prompts
Use thevenice_parameters
option to remove Venice’s default system prompts:
Venice Parameters
Thevenice_parameters
object allows you to access Venice-specific features not available in the standard OpenAI API:
Parameter | Type | Description | Default |
---|---|---|---|
character_slug | string | The character slug of a public Venice character (discoverable as “Public ID” on the published character page) | - |
strip_thinking_response | boolean | Strip <think></think> blocks from the response (applicable to reasoning/thinking models) | false |
disable_thinking | boolean | On supported reasoning models, disable thinking and strip the <think></think> blocks from the response | false |
enable_web_search | string | Enable web search for this request (off , on , auto - auto enables based on model’s discretion) | off |
enable_web_citations | boolean | When web search is enabled, request that the LLM cite its sources using [REF]0[/REF] format | false |
include_search_results_in_stream | boolean | Experimental: Include search results in the stream as the first emitted chunk | false |
return_search_results_as_documents | boolean | Surface search results in an OpenAI-compatible tool call named venice_web_search_documents for LangChain integration | false |
include_venice_system_prompt | boolean | Whether to include Venice’s default system prompts alongside specified system prompts | true |
These parameters can also be specified as model suffixes appended to the model name (e.g.,
qwen3-235b:enable_web_search=auto
). See Model Feature Suffixes for details.Response Headers Reference
All Venice API responses include HTTP headers that provide metadata about the request, rate limits, model information, and account balance. In addition to error codes returned from API responses, you can inspect these headers to get the unique ID of a particular API request, monitor rate limiting, and track your account balance. Venice recommends logging request IDs (CF-RAY
header) in production deployments for more efficient troubleshooting with our support team, should the need arise.
The table below provides a comprehensive reference of all headers you may encounter:
Header | Type | Purpose | When Returned |
---|---|---|---|
Standard HTTP Headers | |||
Content-Type | string | MIME type of the response body (application/json , text/csv , image/png , etc.) | Always |
Content-Encoding | string | Encoding used to compress the response body (gzip , br ) | When client sends Accept-Encoding header |
Content-Disposition | string | How content should be displayed (e.g., attachment; filename=export.csv ) | When downloading files or exports |
Date | string | RFC 7231 formatted timestamp when the response was generated | Always |
Request Identification | |||
CF-RAY | string | Unique identifier for this API request, used for troubleshooting and support requests | Always |
x-venice-version | string | Current version/revision of the Venice API service (e.g., 20250828.222653 ) | Always |
x-venice-timestamp | string | Server timestamp when the request was processed (ISO 8601 format) | When timestamp tracking is enabled |
x-venice-host-name | string | Hostname of the server that processed the request | Error responses and debugging scenarios |
Model Information | |||
x-venice-model-id | string | Unique identifier of the AI model used for the request (e.g., venice-01-lite ) | Inference endpoints using AI models |
x-venice-model-name | string | Friendly/display name of the AI model used (e.g., Venice Lite ) | Inference endpoints using AI models |
x-venice-model-router | string | Router/backend service that handled the model inference | Inference endpoints when routing info available |
x-venice-model-deprecation-warning | string | Warning message for models scheduled for deprecation | When using a deprecated model |
x-venice-model-deprecation-date | string | Date when the model will be deprecated (ISO 8601 date) | When using a deprecated model |
Rate Limiting Information | |||
x-ratelimit-limit-requests | number | Maximum number of requests allowed in the current time window | All authenticated requests |
x-ratelimit-remaining-requests | number | Number of requests remaining in the current time window | All authenticated requests |
x-ratelimit-reset-requests | number | Unix timestamp when the request rate limit resets | All authenticated requests |
x-ratelimit-limit-tokens | number | Maximum number of tokens (prompt + completion) allowed in the time window | All authenticated requests |
x-ratelimit-remaining-tokens | number | Number of tokens remaining in the current time window | All authenticated requests |
x-ratelimit-reset-tokens | number | Duration in seconds until the token rate limit resets | All authenticated requests |
x-ratelimit-type | string | Type of rate limit applied (user , api_key , global ) | When rate limiting is enforced |
Pagination Headers | |||
x-pagination-limit | number | Number of items per page | Paginated endpoints |
x-pagination-page | number | Current page number (1-based) | Paginated endpoints |
x-pagination-total | number | Total number of items across all pages | Paginated endpoints |
x-pagination-total-pages | number | Total number of pages | Paginated endpoints |
Account Balance Information | |||
x-venice-balance-diem | string | Your DIEM token balance before the request was processed | All authenticated requests |
x-venice-balance-usd | string | Your USD credit balance before the request was processed | All authenticated requests |
x-venice-balance-vcu | string | Your Venice Compute Unit (VCU) balance before the request was processed | All authenticated requests |
Content Safety Headers | |||
x-venice-is-blurred | string | Indicates if generated image was blurred due to content policies (true /false ) | Image generation with Safe Venice enabled |
x-venice-is-content-violation | string | Indicates if content violates Venice’s content policies (true /false ) | Content generation endpoints |
x-venice-is-adult-model-content-violation | string | Indicates if content violates adult model content policies (true /false ) | Image generation endpoints |
x-venice-contains-minor | string | Indicates if image contains minors (true /false ) | Image analysis endpoints with age detection |
Client Information | |||
x-venice-middleface-version | string | Version of the Venice middleface client | Requests from Venice middleface clients |
x-venice-mobile-version | string | Version of the Venice mobile app client | Requests from mobile applications |
x-venice-request-timestamp-ms | number | Client-provided request timestamp in milliseconds | When client provides timestamp in request |
x-venice-control-instance | string | Control instance identifier for debugging | Image generation endpoints for debugging |
Authentication Headers | |||
x-auth-refreshed | string | Indicates authentication token was refreshed during request (true /false ) | When authentication tokens are auto-refreshed |
x-retry-count | number | Number of retry attempts for the request | When request retries occur |
Important Notes
- Header Name Case: HTTP headers are case-insensitive, but Venice uses lowercase with hyphens for consistency
- String Values: Boolean values in headers are returned as strings (
"true"
or"false"
) - Numeric Values: Large numbers and balance values may be returned as strings to prevent precision loss
- Optional Headers: Not all headers are returned in every response; presence depends on the endpoint and request context
- Compression: Use
Accept-Encoding: gzip, br
in requests to receive compressed responses where supported
Example: Accessing Response Headers
Best Practices
- Rate Limiting: Monitor
x-ratelimit-remaining-requests
andx-ratelimit-remaining-tokens
headers and implement exponential backoff - Balance Monitoring: Track
x-venice-balance-usd
andx-venice-balance-diem
headers to avoid service interruptions - System Prompts: Test with and without Venice’s system prompts to find the best fit for your use case
- API Keys: Keep your API keys secure and rotate them regularly
- Request Logging: Log
CF-RAY
header values for troubleshooting with support - Model Deprecation: Check for
x-venice-model-deprecation-warning
headers when using models
Differences from OpenAI’s API
While Venice maintains high compatibility with the OpenAI API specification, there are some key differences:- venice_parameters: Additional configurations like
enable_web_search
,character_slug
, andstrip_thinking_response
for extended functionality - System Prompts: Venice appends your system prompts to defaults that optimize for uncensored responses (disable with
include_venice_system_prompt: false
) - Model Ecosystem: Venice offers its own model lineup including uncensored and reasoning models - use Venice model IDs rather than OpenAI mappings
- Response Headers: Unique headers for balance tracking (
x-venice-balance-usd
,x-venice-balance-diem
), model deprecation warnings, and content safety flags - Content Policies: More permissive policies with dedicated uncensored models and optional content filtering