> ## Documentation Index
> Fetch the complete documentation index at: https://docs.venice.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Reasoning Models

> Call Venice reasoning models that expose chain-of-thought tokens, control thinking effort, and surface step-by-step answers in chat completions.

Some models think out loud before answering. They work through problems step by step, then give you a final answer. This makes them stronger at math, code, and logic-heavy tasks.

<div id="reasoning-models-placeholder" />

See the full list of models, pricing and context limits on the [Models page](/overview/models). Not all reasoning models support the [`reasoning_effort`](#reasoning-effort) parameter. See [model support](#model-support) for details.

## Reading the output

Reasoning models return their thinking in a separate `reasoning_content` field, keeping `content` clean:

<CodeGroup>
  ```python Python theme={"system"}
  response = client.chat.completions.create(
      model="zai-org-glm-5-1",
      messages=[{"role": "user", "content": "What is 15% of 240?"}]
  )

  thinking = response.choices[0].message.reasoning_content
  answer = response.choices[0].message.content
  ```

  ```javascript Node.js theme={"system"}
  const response = await client.chat.completions.create({
      model: "zai-org-glm-5-1",
      messages: [{ role: "user", content: "What is 15% of 240?" }]
  });

  const thinking = response.choices[0].message.reasoning_content;
  const answer = response.choices[0].message.content;
  ```

  ```bash cURL theme={"system"}
  curl https://api.venice.ai/api/v1/chat/completions \
    -H "Authorization: Bearer $VENICE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "zai-org-glm-5-1",
      "messages": [{"role": "user", "content": "What is 15% of 240?"}]
    }'
  ```
</CodeGroup>

<Info>
  Some providers (Anthropic, Google, OpenAI, Qwen) return encrypted or summarized reasoning tokens. When this happens, `reasoning_content` contains a `"[Some reasoning content is encrypted]"` placeholder.
</Info>

### Streaming

When streaming, `reasoning_content` arrives in the delta before the final answer:

<CodeGroup>
  ```python Python theme={"system"}
  stream = client.chat.completions.create(
      model="zai-org-glm-5-1",
      messages=[{"role": "user", "content": "Explain photosynthesis"}],
      stream=True
  )

  for chunk in stream:
      if chunk.choices:
          delta = chunk.choices[0].delta
          if delta.reasoning_content:
              print(delta.reasoning_content, end="")
          if delta.content:
              print(delta.content, end="")
  ```

  ```javascript Node.js theme={"system"}
  const stream = await client.chat.completions.create({
      model: "zai-org-glm-5-1",
      messages: [{ role: "user", content: "Explain photosynthesis" }],
      stream: true
  });

  for await (const chunk of stream) {
      if (chunk.choices?.[0]?.delta) {
          const delta = chunk.choices[0].delta;
          if (delta.reasoning_content) process.stdout.write(delta.reasoning_content);
          if (delta.content) process.stdout.write(delta.content);
      }
  }
  ```

  ```bash cURL theme={"system"}
  curl https://api.venice.ai/api/v1/chat/completions \
    -H "Authorization: Bearer $VENICE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "zai-org-glm-5-1",
      "messages": [{"role": "user", "content": "Explain photosynthesis"}],
      "stream": true
    }'
  ```
</CodeGroup>

## Reasoning effort

The `reasoning_effort` parameter controls how much thinking a model does before responding. Higher effort means deeper reasoning but more tokens and latency.

### Accepted values

| Value     | Description                                |
| --------- | ------------------------------------------ |
| `none`    | Disables reasoning entirely                |
| `minimal` | Basic reasoning with minimal effort        |
| `low`     | Light reasoning for simple problems        |
| `medium`  | Balanced reasoning for moderate complexity |
| `high`    | Deep reasoning for complex problems        |
| `xhigh`   | Extra-high reasoning depth                 |
| `max`     | Maximum reasoning capability               |

<Warning>
  Not all models support all values. Venice does **not** auto-map to the nearest supported level. Unsupported values return a 400 error from the upstream provider. For example, sending `xhigh` to Claude or `max` to GPT-5.2 will fail.

  When in doubt, use `low`, `medium`, or `high`. These are the most widely supported values.
</Warning>

### Model support

#### OpenAI

| Model                        | Supported values                         |
| ---------------------------- | ---------------------------------------- |
| GPT-5.2                      | `none`, `low`, `medium`, `high`, `xhigh` |
| GPT-5.2 Codex, GPT-5.3 Codex | `low`, `medium`, `high`, `xhigh`         |

#### Anthropic

| Model                                   | Supported values               |
| --------------------------------------- | ------------------------------ |
| Claude Opus 4.6, Opus 4.6 Fast          | `low`, `medium`, `high`, `max` |
| Claude Opus 4.5, Sonnet 4.5, Sonnet 4.6 | `low`, `medium`, `high`        |

#### Google

| Model                  | Supported values                   |
| ---------------------- | ---------------------------------- |
| Gemini 3 Pro Preview   | `low`, `high`                      |
| Gemini 3.1 Pro Preview | `low`, `medium`, `high`            |
| Gemini 3 Flash Preview | `minimal`, `low`, `medium`, `high` |

#### xAI

Grok models (Grok 4.1 Fast, Grok Code Fast) do **not** support `reasoning_effort`. Specifying it will result in an error.

#### Other models

| Model                                       | Supported values                          |
| ------------------------------------------- | ----------------------------------------- |
| Qwen 3 235B A22B Thinking, Qwen 3.5 35B A3B | `low`, `medium`, `high`                   |
| Kimi K2.5                                   | `low`, `medium`, `high`                   |
| MiniMax M2.5, M2.1                          | `low`, `medium`, `high`                   |
| GLM 5.1 series                              | Built-in reasoning only, not configurable |
| DeepSeek R1                                 | Built-in reasoning only, not configurable |

### Usage

Pass `reasoning_effort` as a top-level parameter or use the nested `reasoning.effort` format:

<CodeGroup>
  ```python Python theme={"system"}
  response = client.chat.completions.create(
      model="minimax-m25",
      messages=[{"role": "user", "content": "Prove that there are infinitely many primes"}],
      extra_body={"reasoning": {"effort": "high"}}
  )
  ```

  ```javascript Node.js theme={"system"}
  const response = await client.chat.completions.create({
      model: "minimax-m25",
      messages: [{ role: "user", content: "Prove that there are infinitely many primes" }],
      reasoning: { effort: "high" }
  });
  ```

  ```bash cURL theme={"system"}
  curl https://api.venice.ai/api/v1/chat/completions \
    -H "Authorization: Bearer $VENICE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "minimax-m25",
      "messages": [{"role": "user", "content": "Prove that there are infinitely many primes"}],
      "reasoning": {"effort": "high"}
    }'
  ```
</CodeGroup>

The flat format `"reasoning_effort": "high"` is also accepted.

## Disabling reasoning

There are two ways to disable reasoning:

| Method                     | Syntax                            | How it works                                                                                             |
| -------------------------- | --------------------------------- | -------------------------------------------------------------------------------------------------------- |
| `reasoning.enabled: false` | `"reasoning": {"enabled": false}` | Venice-level toggle that prevents reasoning parameters from being sent to the provider. **Recommended.** |
| `reasoning.effort: "none"` | `"reasoning": {"effort": "none"}` | Passed to the provider, which decides how to handle it. Only supported by some models (e.g. GPT-5.x).    |

For models that support it, `reasoning.enabled: false` is the more reliable option:

| Model                                        | Can disable?                          |
| -------------------------------------------- | ------------------------------------- |
| GPT-5.2                                      | Yes                                   |
| GPT-5.2 Codex, GPT-5.3 Codex                 | Yes (but `none` effort not supported) |
| Qwen 3 235B A22B Thinking, Qwen 3.5 35B A3B  | Yes                                   |
| GLM 5.1 series                               | Yes                                   |
| Claude Opus 4.5/4.6/4.6 Fast, Sonnet 4.5/4.6 | No (always reasons)                   |
| Gemini 3 Pro, 3.1 Pro, 3 Flash               | No (always reasons)                   |
| DeepSeek R1                                  | No (always reasons)                   |

<CodeGroup>
  ```python Python theme={"system"}
  response = client.chat.completions.create(
      model="openai-gpt-52",
      messages=[{"role": "user", "content": "What's the capital of France?"}],
      extra_body={"reasoning": {"enabled": False}}
  )
  ```

  ```javascript Node.js theme={"system"}
  const response = await client.chat.completions.create({
      model: "openai-gpt-52",
      messages: [{ role: "user", content: "What's the capital of France?" }],
      reasoning: { enabled: false }
  });
  ```

  ```bash cURL theme={"system"}
  curl https://api.venice.ai/api/v1/chat/completions \
    -H "Authorization: Bearer $VENICE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "openai-gpt-52",
      "messages": [{"role": "user", "content": "What is the capital of France?"}],
      "reasoning": {"enabled": false}
    }'
  ```
</CodeGroup>

## Token limits

Reasoning models generate visible answer tokens (in `content`) and reasoning tokens (in `reasoning_content`). Both count toward your token budget.

### Setting a token cap

Use `max_completion_tokens` to cap the total number of tokens the model generates, including reasoning:

```json theme={"system"}
{
  "model": "deepseek-v4-flash",
  "messages": [...],
  "max_completion_tokens": 500
}
```

`max_tokens` is also accepted and behaves the same way. If both are set, `max_completion_tokens` takes precedence.

To get more visible output, raise the cap, lower `reasoning_effort`, or [disable reasoning](#disabling-reasoning).

### Reading the breakdown

The `usage` object shows how your budget was spent:

```json theme={"system"}
"usage": {
  "completion_tokens": 501,
  "completion_tokens_details": { "reasoning_tokens": 169 },
  "prompt_tokens": 13,
  "total_tokens": 514
}
```

In this example, 169 tokens were spent on reasoning and 332 on the visible answer. When the cap is reached, `finish_reason` is `length`.

Each model's upper bound is available as `maxCompletionTokens` on the [`/v1/models`](/api-reference/endpoint/models/list) endpoint.

### Non-reasoning models

`max_tokens` and `max_completion_tokens` behave the same on non-reasoning models, capping visible output directly.

## Capability discovery

Check what a model supports via the [`/v1/models`](/api-reference/endpoint/models/list) endpoint:

| Field                     | Meaning                                                             |
| ------------------------- | ------------------------------------------------------------------- |
| `supportsReasoning`       | Model has reasoning capability (chain-of-thought)                   |
| `supportsReasoningEffort` | Model accepts the `reasoning_effort` / `reasoning.effort` parameter |

## Best practices

* Default to `medium` for general use
* Use `high` or `xhigh` for complex tasks (math, code, analysis)
* Use `low` for latency-sensitive applications
* Use `reasoning.enabled: false` or set effort to `none` to disable reasoning
* When in doubt, use `low`, `medium`, or `high`. These are the most widely supported values
