> ## Documentation Index
> Fetch the complete documentation index at: https://docs.venice.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Introduction

> Reference documentation for the Venice API

The Venice API offers HTTP-based REST and streaming interfaces for building AI applications with uncensored models and private inference. You can create with text generation, image creation, embeddings, and more, all without restrictive content policies. Integration examples and SDKs are available in the [documentation](/overview/getting-started). Our API reference is also available as a [OpenAPI YAML spec.](https://api.venice.ai/doc/api/swagger.yaml)

## Authentication

The Venice API uses API keys for authentication. Create and manage your API keys in your [API settings](https://venice.ai/settings/api).

All API requests require HTTP Bearer authentication:

```
Authorization: Bearer VENICE_API_KEY
```

<Note>
  Your API key is a secret. Do not share it or expose it in any client-side code.
</Note>

## OpenAI Compatibility

Venice's API implements the OpenAI API specification, ensuring compatibility with existing OpenAI clients and tools. This allows you to integrate with Venice using the familiar OpenAI interface while accessing Venice's unique features and uncensored models.

### Setup

Configure your client to use Venice's base URL (`https://api.venice.ai/api/v1`) and make your first request:

<CodeGroup>
  ```bash curl theme={"system"}
  curl https://api.venice.ai/api/v1/chat/completions \
    -H "Authorization: Bearer $VENICE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "venice-uncensored",
      "messages": [{"role": "user", "content": "Hello!"}]
    }'
  ```

  ```javascript JavaScript theme={"system"}
  import OpenAI from "openai";

  const client = new OpenAI({
    apiKey: process.env.VENICE_API_KEY,
    baseURL: "https://api.venice.ai/api/v1",
  });

  const response = await client.chat.completions.create({
    model: "venice-uncensored",
    messages: [{ role: "user", content: "Hello!" }]
  });

  console.log(response.choices[0].message.content);
  ```

  ```python Python theme={"system"}
  import os
  from openai import OpenAI

  client = OpenAI(
      api_key=os.environ.get("VENICE_API_KEY"),
      base_url="https://api.venice.ai/api/v1"
  )

  response = client.chat.completions.create(
      model="venice-uncensored",
      messages=[{"role": "user", "content": "Hello!"}]
  )

  print(response.choices[0].message.content)
  ```
</CodeGroup>

## Venice-Specific Features

### System Prompts

Venice provides default system prompts designed to ensure uncensored and natural model responses. You have two options for handling system prompts:

1. **Default Behavior**: Your system prompts are appended to Venice's defaults
2. **Custom Behavior**: Disable Venice's system prompts entirely

#### Disabling Venice System Prompts

Use the `venice_parameters` option to remove Venice's default system prompts:

<CodeGroup>
  ```bash curl theme={"system"}
  curl https://api.venice.ai/api/v1/chat/completions \
    -H "Authorization: Bearer $VENICE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "venice-uncensored",
      "messages": [
        {"role": "system", "content": "Your custom system prompt"},
        {"role": "user", "content": "Why is the sky blue?"}
      ],
      "venice_parameters": {
        "include_venice_system_prompt": false
      }
    }'
  ```

  ```javascript JavaScript theme={"system"}
  const completion = await client.chat.completions.create({
    model: "venice-uncensored",
    messages: [
      {
        role: "system",
        content: "Your custom system prompt",
      },
      {
        role: "user",
        content: "Why is the sky blue?",
      },
    ],
    venice_parameters: {
      include_venice_system_prompt: false,
    },
  });
  ```

  ```python Python theme={"system"}
  response = client.chat.completions.create(
      model="venice-uncensored",
      messages=[
          {"role": "system", "content": "Your custom system prompt"},
          {"role": "user", "content": "Why is the sky blue?"}
      ],
      extra_body={
          "venice_parameters": {
              "include_venice_system_prompt": False
          }
      }
  )
  ```
</CodeGroup>

### Venice Parameters

The `venice_parameters` object allows you to access Venice-specific features not available in the standard OpenAI API:

| Parameter                            | Type    | Description                                                                                                                                                                                                                                                                                                                                  | Default |
| ------------------------------------ | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- |
| `character_slug`                     | string  | The character slug of a public Venice character (discoverable as "Public ID" on the published character page)                                                                                                                                                                                                                                | -       |
| `strip_thinking_response`            | boolean | Strip `<think></think>` blocks from the response (models using legacy `<think>` tag format). See [Reasoning Models](/guides/features/reasoning-models).                                                                                                                                                                                      | `false` |
| `disable_thinking`                   | boolean | On supported reasoning models, disable thinking and strip the `<think></think>` blocks from the response                                                                                                                                                                                                                                     | `false` |
| `enable_web_search`                  | string  | Enable web search for this request (`off`, `on`, `auto` - auto enables based on model's discretion)<br />Additional usage-based pricing applies, see [pricing](/overview/pricing#web-search-and-scraping).                                                                                                                                   | `off`   |
| `enable_web_scraping`                | boolean | Enable web scraping of up to 5 URLs detected in the user message. Scraped content augments responses and bypasses web search. Only successfully scraped URLs are billed.<br />Additional usage-based pricing applies, see [pricing](/overview/pricing#web-search-and-scraping).                                                              | `false` |
| `enable_x_search`                    | boolean | Enable xAI's native search (web + X/Twitter) for supported Grok models (e.g., `grok-4-20-beta`). Provides higher quality search results by using xAI's search infrastructure. When enabled, Venice's standard web search is bypassed.<br />Additional usage-based pricing applies, see [pricing](/overview/pricing#web-search-and-scraping). | `false` |
| `enable_web_citations`               | boolean | When web search is enabled, request that the LLM cite its sources using `[REF]0[/REF]` format                                                                                                                                                                                                                                                | `false` |
| `include_search_results_in_stream`   | boolean | Experimental: Include search results in the stream as the first emitted chunk                                                                                                                                                                                                                                                                | `false` |
| `return_search_results_as_documents` | boolean | Surface search results in an OpenAI-compatible tool call named `venice_web_search_documents` for LangChain integration                                                                                                                                                                                                                       | `false` |
| `include_venice_system_prompt`       | boolean | Whether to include Venice's default system prompts alongside specified system prompts                                                                                                                                                                                                                                                        | `true`  |

<Note>
  These parameters can also be specified as model suffixes appended to the model name (e.g., `zai-org-glm-5:enable_web_search=auto`). See [Model Feature Suffixes](/api-reference/endpoint/chat/model_feature_suffix) for details.
</Note>

### Prompt Caching

Venice supports prompt caching on select models to reduce latency and costs for repeated content. For supported models, Venice automatically caches system prompts—no code changes required. You can also manually mark content for caching using the `cache_control` property on message content.

| Parameter          | Type   | Description                                                                                                                                                                                          |
| ------------------ | ------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `prompt_cache_key` | string | Optional routing hint to improve cache hit rates. When supplied, Venice routes requests to the same backend infrastructure, increasing the likelihood of cache hits across multi-turn conversations. |

See [Prompt Caching](/guides/features/prompt-caching) for details on how caching works, billing, and best practices.

## Response Headers Reference

All Venice API responses include HTTP headers that provide metadata about the request, rate limits, model information, and account balance. In addition to error codes returned from API responses, you can inspect these headers to get the unique ID of a particular API request, monitor rate limiting, and track your account balance.

Venice recommends logging request IDs (`CF-RAY` header) in production deployments for more efficient troubleshooting with our support team, should the need arise.

The table below provides a comprehensive reference of all headers you may encounter:

| Header                                      | Type   | Purpose                                                                               | When Returned                                   |
| ------------------------------------------- | ------ | ------------------------------------------------------------------------------------- | ----------------------------------------------- |
| **Standard HTTP Headers**                   |        |                                                                                       |                                                 |
| `Content-Type`                              | string | MIME type of the response body (`application/json`, `text/csv`, `image/png`, etc.)    | Always                                          |
| `Content-Encoding`                          | string | Encoding used to compress the response body (`gzip`, `br`)                            | When client sends `Accept-Encoding` header      |
| `Content-Disposition`                       | string | How content should be displayed (e.g., `attachment; filename=export.csv`)             | When downloading files or exports               |
| `Date`                                      | string | RFC 7231 formatted timestamp when the response was generated                          | Always                                          |
| **Request Identification**                  |        |                                                                                       |                                                 |
| `CF-RAY`                                    | string | Unique identifier for this API request, used for troubleshooting and support requests | Always                                          |
| `x-venice-version`                          | string | Current version/revision of the Venice API service (e.g., `20250828.222653`)          | Always                                          |
| `x-venice-timestamp`                        | string | Server timestamp when the request was processed (ISO 8601 format)                     | When timestamp tracking is enabled              |
| `x-venice-host-name`                        | string | Hostname of the server that processed the request                                     | Error responses and debugging scenarios         |
| **Model Information**                       |        |                                                                                       |                                                 |
| `x-venice-model-id`                         | string | Unique identifier of the AI model used for the request (e.g., `venice-01-lite`)       | Inference endpoints using AI models             |
| `x-venice-model-name`                       | string | Friendly/display name of the AI model used (e.g., `Venice Lite`)                      | Inference endpoints using AI models             |
| `x-venice-model-router`                     | string | Router/backend service that handled the model inference                               | Inference endpoints when routing info available |
| `x-venice-model-deprecation-warning`        | string | Warning message for models scheduled for deprecation                                  | When using a deprecated model                   |
| `x-venice-model-deprecation-date`           | string | Date when the model will be deprecated (ISO 8601 date)                                | When using a deprecated model                   |
| **Rate Limiting Information**               |        |                                                                                       |                                                 |
| `x-ratelimit-limit-requests`                | number | Maximum number of requests allowed in the current time window                         | All authenticated requests                      |
| `x-ratelimit-remaining-requests`            | number | Number of requests remaining in the current time window                               | All authenticated requests                      |
| `x-ratelimit-reset-requests`                | number | Unix timestamp when the request rate limit resets                                     | All authenticated requests                      |
| `x-ratelimit-limit-tokens`                  | number | Maximum number of tokens (prompt + completion) allowed in the time window             | All authenticated requests                      |
| `x-ratelimit-remaining-tokens`              | number | Number of tokens remaining in the current time window                                 | All authenticated requests                      |
| `x-ratelimit-reset-tokens`                  | number | Duration in seconds until the token rate limit resets                                 | All authenticated requests                      |
| `x-ratelimit-type`                          | string | Type of rate limit applied (`user`, `api_key`, `global`)                              | When rate limiting is enforced                  |
| **Pagination Headers**                      |        |                                                                                       |                                                 |
| `x-pagination-limit`                        | number | Number of items per page                                                              | Paginated endpoints                             |
| `x-pagination-page`                         | number | Current page number (1-based)                                                         | Paginated endpoints                             |
| `x-pagination-total`                        | number | Total number of items across all pages                                                | Paginated endpoints                             |
| `x-pagination-total-pages`                  | number | Total number of pages                                                                 | Paginated endpoints                             |
| **Account Balance Information**             |        |                                                                                       |                                                 |
| `x-venice-balance-diem`                     | string | Your DIEM token balance before the request was processed                              | All authenticated requests                      |
| `x-venice-balance-usd`                      | string | Your USD credit balance before the request was processed                              | All authenticated requests                      |
| **Content Safety Headers**                  |        |                                                                                       |                                                 |
| `x-venice-is-blurred`                       | string | Indicates if generated image was blurred due to content policies (`true`/`false`)     | Image generation with Safe Venice enabled       |
| `x-venice-is-content-violation`             | string | Indicates if content violates Venice's content policies (`true`/`false`)              | Content generation endpoints                    |
| `x-venice-is-adult-model-content-violation` | string | Indicates if content violates adult model content policies (`true`/`false`)           | Image generation endpoints                      |
| `x-venice-contains-minor`                   | string | Indicates if image contains minors (`true`/`false`)                                   | Image analysis endpoints with age detection     |
| **Client Information**                      |        |                                                                                       |                                                 |
| `x-venice-middleface-version`               | string | Version of the Venice middleface client                                               | Requests from Venice middleface clients         |
| `x-venice-mobile-version`                   | string | Version of the Venice mobile app client                                               | Requests from mobile applications               |
| `x-venice-request-timestamp-ms`             | number | Client-provided request timestamp in milliseconds                                     | When client provides timestamp in request       |
| `x-venice-control-instance`                 | string | Control instance identifier for debugging                                             | Image generation endpoints for debugging        |
| **Authentication Headers**                  |        |                                                                                       |                                                 |
| `x-auth-refreshed`                          | string | Indicates authentication token was refreshed during request (`true`/`false`)          | When authentication tokens are auto-refreshed   |
| `x-retry-count`                             | number | Number of retry attempts for the request                                              | When request retries occur                      |

### Important Notes

* **Header Name Case**: HTTP headers are case-insensitive, but Venice uses lowercase with hyphens for consistency
* **String Values**: Boolean values in headers are returned as strings (`"true"` or `"false"`)
* **Numeric Values**: Large numbers and balance values may be returned as strings to prevent precision loss
* **Optional Headers**: Not all headers are returned in every response; presence depends on the endpoint and request context
* **Compression**: Use `Accept-Encoding: gzip, br` in requests to receive compressed responses where supported

### Example: Accessing Response Headers

```javascript theme={"system"}
// After making an API request, access headers from the response object
const requestId = response.headers.get('CF-RAY');
const remainingRequests = response.headers.get('x-ratelimit-remaining-requests');
const remainingTokens = response.headers.get('x-ratelimit-remaining-tokens');
const usdBalance = response.headers.get('x-venice-balance-usd');

// Check for model deprecation warnings
const deprecationWarning = response.headers.get('x-venice-model-deprecation-warning');
if (deprecationWarning) {
  console.warn(`Model Deprecation: ${deprecationWarning}`);
}
```

## Best Practices

1. **Rate Limiting**: Monitor `x-ratelimit-remaining-requests` and `x-ratelimit-remaining-tokens` headers and implement exponential backoff
2. **Balance Monitoring**: Track `x-venice-balance-usd` and `x-venice-balance-diem` headers to avoid service interruptions
3. **System Prompts**: Test with and without Venice's system prompts to find the best fit for your use case
4. **API Keys**: Keep your API keys secure and rotate them regularly
5. **Request Logging**: Log `CF-RAY` header values for troubleshooting with support
6. **Model Deprecation**: Check for `x-venice-model-deprecation-warning` headers when using models

## Differences from OpenAI's API

While Venice maintains high compatibility with the OpenAI API specification, there are some key differences:

1. **venice\_parameters**: Additional configurations like `enable_web_search`, `character_slug`, and `strip_thinking_response` for extended functionality
2. **System Prompts**: Venice appends your system prompts to defaults that optimize for uncensored responses (disable with `include_venice_system_prompt: false`)
3. **Model Ecosystem**: Venice offers its own [model lineup](/overview/models) including uncensored and reasoning models - use Venice model IDs rather than OpenAI mappings
4. **Response Headers**: Unique headers for balance tracking (`x-venice-balance-usd`, `x-venice-balance-diem`), model deprecation warnings, and content safety flags
5. **Content Policies**: More permissive policies with dedicated uncensored models and optional content filtering

## API Stability

Venice maintains backward compatibility for v1 endpoints and parameters. For model lifecycle policy, deprecation notices, and migration guidance, see [Deprecations](/overview/deprecations).

## OpenAPI Specification & Raw Data

For programmatic access to Venice API docs and data — including use with RAG (Retrieval-Augmented Generation) — the following resources are available:

* [OpenAPI Spec (YAML)](https://api.venice.ai/doc/api/swagger.yaml) — the full API specification in YAML format
* [API Docs Source](https://github.com/veniceai/api-docs/archive/refs/heads/main.zip) — all documentation pages (`.mdx` format) as a downloadable archive

***

<sub>Request fields not listed in this documentation may be passed through but are not validated or guaranteed to work.</sub>