Skip to main content
The Venice API offers HTTP-based REST and streaming interfaces for building AI applications with uncensored models and private inference. You can create with text generation, image creation, embeddings, and more, all without restrictive content policies. Integration examples and SDKs are available in the documentation.

Authentication

The Venice API uses API keys for authentication. Create and manage your API keys in your API settings. All API requests require HTTP Bearer authentication:
Authorization: Bearer VENICE_API_KEY
Your API key is a secret. Do not share it or expose it in any client-side code.

OpenAI Compatibility

Venice’s API implements the OpenAI API specification, ensuring compatibility with existing OpenAI clients and tools. This allows you to integrate with Venice using the familiar OpenAI interface while accessing Venice’s unique features and uncensored models.

Setup

Configure your client to use Venice’s base URL (https://api.venice.ai/api/v1) and make your first request:
curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "venice-uncensored",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Venice-Specific Features

System Prompts

Venice provides default system prompts designed to ensure uncensored and natural model responses. You have two options for handling system prompts:
  1. Default Behavior: Your system prompts are appended to Venice’s defaults
  2. Custom Behavior: Disable Venice’s system prompts entirely

Disabling Venice System Prompts

Use the venice_parameters option to remove Venice’s default system prompts:
curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "venice-uncensored",
    "messages": [
      {"role": "system", "content": "Your custom system prompt"},
      {"role": "user", "content": "Why is the sky blue?"}
    ],
    "venice_parameters": {
      "include_venice_system_prompt": false
    }
  }'

Venice Parameters

The venice_parameters object allows you to access Venice-specific features not available in the standard OpenAI API:
ParameterTypeDescriptionDefault
character_slugstringThe character slug of a public Venice character (discoverable as “Public ID” on the published character page)-
strip_thinking_responsebooleanStrip <think></think> blocks from the response (applicable to reasoning/thinking models)false
disable_thinkingbooleanOn supported reasoning models, disable thinking and strip the <think></think> blocks from the responsefalse
enable_web_searchstringEnable web search for this request (off, on, auto - auto enables based on model’s discretion)off
enable_web_citationsbooleanWhen web search is enabled, request that the LLM cite its sources using [REF]0[/REF] formatfalse
include_search_results_in_streambooleanExperimental: Include search results in the stream as the first emitted chunkfalse
return_search_results_as_documentsbooleanSurface search results in an OpenAI-compatible tool call named venice_web_search_documents for LangChain integrationfalse
include_venice_system_promptbooleanWhether to include Venice’s default system prompts alongside specified system promptstrue
These parameters can also be specified as model suffixes appended to the model name (e.g., qwen3-235b:enable_web_search=auto). See Model Feature Suffixes for details.

Response Headers Reference

All Venice API responses include HTTP headers that provide metadata about the request, rate limits, model information, and account balance. In addition to error codes returned from API responses, you can inspect these headers to get the unique ID of a particular API request, monitor rate limiting, and track your account balance. Venice recommends logging request IDs (CF-RAY header) in production deployments for more efficient troubleshooting with our support team, should the need arise. The table below provides a comprehensive reference of all headers you may encounter:
HeaderTypePurposeWhen Returned
Standard HTTP Headers
Content-TypestringMIME type of the response body (application/json, text/csv, image/png, etc.)Always
Content-EncodingstringEncoding used to compress the response body (gzip, br)When client sends Accept-Encoding header
Content-DispositionstringHow content should be displayed (e.g., attachment; filename=export.csv)When downloading files or exports
DatestringRFC 7231 formatted timestamp when the response was generatedAlways
Request Identification
CF-RAYstringUnique identifier for this API request, used for troubleshooting and support requestsAlways
x-venice-versionstringCurrent version/revision of the Venice API service (e.g., 20250828.222653)Always
x-venice-timestampstringServer timestamp when the request was processed (ISO 8601 format)When timestamp tracking is enabled
x-venice-host-namestringHostname of the server that processed the requestError responses and debugging scenarios
Model Information
x-venice-model-idstringUnique identifier of the AI model used for the request (e.g., venice-01-lite)Inference endpoints using AI models
x-venice-model-namestringFriendly/display name of the AI model used (e.g., Venice Lite)Inference endpoints using AI models
x-venice-model-routerstringRouter/backend service that handled the model inferenceInference endpoints when routing info available
x-venice-model-deprecation-warningstringWarning message for models scheduled for deprecationWhen using a deprecated model
x-venice-model-deprecation-datestringDate when the model will be deprecated (ISO 8601 date)When using a deprecated model
Rate Limiting Information
x-ratelimit-limit-requestsnumberMaximum number of requests allowed in the current time windowAll authenticated requests
x-ratelimit-remaining-requestsnumberNumber of requests remaining in the current time windowAll authenticated requests
x-ratelimit-reset-requestsnumberUnix timestamp when the request rate limit resetsAll authenticated requests
x-ratelimit-limit-tokensnumberMaximum number of tokens (prompt + completion) allowed in the time windowAll authenticated requests
x-ratelimit-remaining-tokensnumberNumber of tokens remaining in the current time windowAll authenticated requests
x-ratelimit-reset-tokensnumberDuration in seconds until the token rate limit resetsAll authenticated requests
x-ratelimit-typestringType of rate limit applied (user, api_key, global)When rate limiting is enforced
Pagination Headers
x-pagination-limitnumberNumber of items per pagePaginated endpoints
x-pagination-pagenumberCurrent page number (1-based)Paginated endpoints
x-pagination-totalnumberTotal number of items across all pagesPaginated endpoints
x-pagination-total-pagesnumberTotal number of pagesPaginated endpoints
Account Balance Information
x-venice-balance-diemstringYour DIEM token balance before the request was processedAll authenticated requests
x-venice-balance-usdstringYour USD credit balance before the request was processedAll authenticated requests
x-venice-balance-vcustringYour Venice Compute Unit (VCU) balance before the request was processedAll authenticated requests
Content Safety Headers
x-venice-is-blurredstringIndicates if generated image was blurred due to content policies (true/false)Image generation with Safe Venice enabled
x-venice-is-content-violationstringIndicates if content violates Venice’s content policies (true/false)Content generation endpoints
x-venice-is-adult-model-content-violationstringIndicates if content violates adult model content policies (true/false)Image generation endpoints
x-venice-contains-minorstringIndicates if image contains minors (true/false)Image analysis endpoints with age detection
Client Information
x-venice-middleface-versionstringVersion of the Venice middleface clientRequests from Venice middleface clients
x-venice-mobile-versionstringVersion of the Venice mobile app clientRequests from mobile applications
x-venice-request-timestamp-msnumberClient-provided request timestamp in millisecondsWhen client provides timestamp in request
x-venice-control-instancestringControl instance identifier for debuggingImage generation endpoints for debugging
Authentication Headers
x-auth-refreshedstringIndicates authentication token was refreshed during request (true/false)When authentication tokens are auto-refreshed
x-retry-countnumberNumber of retry attempts for the requestWhen request retries occur

Important Notes

  • Header Name Case: HTTP headers are case-insensitive, but Venice uses lowercase with hyphens for consistency
  • String Values: Boolean values in headers are returned as strings ("true" or "false")
  • Numeric Values: Large numbers and balance values may be returned as strings to prevent precision loss
  • Optional Headers: Not all headers are returned in every response; presence depends on the endpoint and request context
  • Compression: Use Accept-Encoding: gzip, br in requests to receive compressed responses where supported

Example: Accessing Response Headers

// After making an API request, access headers from the response object
const requestId = response.headers.get('CF-RAY');
const remainingRequests = response.headers.get('x-ratelimit-remaining-requests');
const remainingTokens = response.headers.get('x-ratelimit-remaining-tokens');
const usdBalance = response.headers.get('x-venice-balance-usd');

// Check for model deprecation warnings
const deprecationWarning = response.headers.get('x-venice-model-deprecation-warning');
if (deprecationWarning) {
  console.warn(`Model Deprecation: ${deprecationWarning}`);
}

Best Practices

  1. Rate Limiting: Monitor x-ratelimit-remaining-requests and x-ratelimit-remaining-tokens headers and implement exponential backoff
  2. Balance Monitoring: Track x-venice-balance-usd and x-venice-balance-diem headers to avoid service interruptions
  3. System Prompts: Test with and without Venice’s system prompts to find the best fit for your use case
  4. API Keys: Keep your API keys secure and rotate them regularly
  5. Request Logging: Log CF-RAY header values for troubleshooting with support
  6. Model Deprecation: Check for x-venice-model-deprecation-warning headers when using models

Differences from OpenAI’s API

While Venice maintains high compatibility with the OpenAI API specification, there are some key differences:
  1. venice_parameters: Additional configurations like enable_web_search, character_slug, and strip_thinking_response for extended functionality
  2. System Prompts: Venice appends your system prompts to defaults that optimize for uncensored responses (disable with include_venice_system_prompt: false)
  3. Model Ecosystem: Venice offers its own model lineup including uncensored and reasoning models - use Venice model IDs rather than OpenAI mappings
  4. Response Headers: Unique headers for balance tracking (x-venice-balance-usd, x-venice-balance-diem), model deprecation warnings, and content safety flags
  5. Content Policies: More permissive policies with dedicated uncensored models and optional content filtering

API Stability

Venice maintains backward compatibility for v1 endpoints and parameters. For model lifecycle policy, deprecation notices, and migration guidance, see Deprecations.

Swagger Configuration

You can find the complete swagger definition for the Venice API here: https://api.venice.ai/doc/api/swagger.yaml
I