Reasoning Models

Some models think out loud before answering. They work through problems step by step, then give you a final answer. This makes them stronger at math, code, and logic-heavy tasks.

See the full list of models, pricing and context limits on the Models page. Not all reasoning models support the reasoning_effort parameter. See model support for details.

Reading the output

Reasoning models return their thinking in a separate reasoning_content field, keeping content clean:

response = client.chat.completions.create(
    model="zai-org-glm-4.7",
    messages=[{"role": "user", "content": "What is 15% of 240?"}]
)

thinking = response.choices[0].message.reasoning_content
answer = response.choices[0].message.content

Some providers (Anthropic, Google, OpenAI, Qwen) return encrypted or summarized reasoning tokens. When this happens, reasoning_content contains a "[Some reasoning content is encrypted]" placeholder.

Streaming

When streaming, reasoning_content arrives in the delta before the final answer:

stream = client.chat.completions.create(
    model="zai-org-glm-4.7",
    messages=[{"role": "user", "content": "Explain photosynthesis"}],
    stream=True
)

for chunk in stream:
    if chunk.choices:
        delta = chunk.choices[0].delta
        if delta.reasoning_content:
            print(delta.reasoning_content, end="")
        if delta.content:
            print(delta.content, end="")

Reasoning effort

The reasoning_effort parameter controls how much thinking a model does before responding. Higher effort means deeper reasoning but more tokens and latency.

Accepted values

Value	Description
`none`	Disables reasoning entirely
`minimal`	Basic reasoning with minimal effort
`low`	Light reasoning for simple problems
`medium`	Balanced reasoning for moderate complexity
`high`	Deep reasoning for complex problems
`xhigh`	Extra-high reasoning depth
`max`	Maximum reasoning capability

Not all models support all values. Venice does not auto-map to the nearest supported level. Unsupported values return a 400 error from the upstream provider. For example, sending xhigh to Claude or max to GPT-5.2 will fail.When in doubt, use low, medium, or high. These are the most widely supported values.

Model support

OpenAI

Model	Supported values
GPT-5.2	`none`, `low`, `medium`, `high`, `xhigh`
GPT-5.2 Codex, GPT-5.3 Codex	`low`, `medium`, `high`, `xhigh`

Anthropic

Model	Supported values
Claude Opus 4.6	`low`, `medium`, `high`, `max`
Claude Opus 4.5, Sonnet 4.5, Sonnet 4.6	`low`, `medium`, `high`

Google

Model	Supported values
Gemini 3 Pro Preview	`low`, `high`
Gemini 3.1 Pro Preview	`low`, `medium`, `high`
Gemini 3 Flash Preview	`minimal`, `low`, `medium`, `high`

xAI

Grok models (Grok 4.1 Fast, Grok Code Fast) do not support reasoning_effort. Specifying it will result in an error.

Other models

Model	Supported values
Qwen 3 235B A22B Thinking, Qwen 3.5 35B A3B	`low`, `medium`, `high`
Kimi K2.5	`low`, `medium`, `high`
MiniMax M2.5, M2.1	`low`, `medium`, `high`
GLM 4.7 series	`low`, `medium`, `high`
DeepSeek R1	Built-in reasoning only, not configurable

Usage

Pass reasoning_effort as a top-level parameter or use the nested reasoning.effort format:

response = client.chat.completions.create(
    model="zai-org-glm-4.7",
    messages=[{"role": "user", "content": "Prove that there are infinitely many primes"}],
    extra_body={"reasoning": {"effort": "high"}}
)

The flat format "reasoning_effort": "high" is also accepted.

Disabling reasoning

There are two ways to disable reasoning:

Method	Syntax	How it works
`reasoning.enabled: false`	`"reasoning": {"enabled": false}`	Venice-level toggle that prevents reasoning parameters from being sent to the provider. Recommended.
`reasoning.effort: "none"`	`"reasoning": {"effort": "none"}`	Passed to the provider, which decides how to handle it. Only supported by some models (e.g. GPT-5.x).

For models that support it, reasoning.enabled: false is the more reliable option:

Model	Can disable?
GPT-5.2	Yes
GPT-5.2 Codex, GPT-5.3 Codex	Yes (but `none` effort not supported)
Qwen 3 235B A22B Thinking, Qwen 3.5 35B A3B	Yes
GLM 4.7 series	Yes
Claude Opus 4.5/4.6, Sonnet 4.5/4.6	No (always reasons)
Gemini 3 Pro, 3.1 Pro, 3 Flash	No (always reasons)
DeepSeek R1	No (always reasons)

response = client.chat.completions.create(
    model="openai-gpt-52",
    messages=[{"role": "user", "content": "What's the capital of France?"}],
    extra_body={"reasoning": {"enabled": False}}
)

Capability discovery

Check what a model supports via the /v1/models endpoint:

Field	Meaning
`supportsReasoning`	Model has reasoning capability (chain-of-thought)
`supportsReasoningEffort`	Model accepts the `reasoning_effort` / `reasoning.effort` parameter

Best practices

Default to medium for general use
Use high or xhigh for complex tasks (math, code, analysis)
Use low for latency-sensitive applications
Use reasoning.enabled: false or set effort to none to disable reasoning
When in doubt, use low, medium, or high. These are the most widely supported values

Overview

Guides

Reading the output

Streaming

Reasoning effort

Accepted values

Model support

OpenAI

Anthropic

Google

xAI

Other models

Usage

Disabling reasoning

Capability discovery

Best practices

Overview

Guides

​Reading the output

​Streaming

​Reasoning effort

​Accepted values

​Model support

​OpenAI

​Anthropic

​Google

​xAI

​Other models

​Usage

​Disabling reasoning

​Capability discovery

​Best practices

Reading the output

Streaming

Reasoning effort

Accepted values

Model support

OpenAI

Anthropic

Google

xAI

Other models

Usage

Disabling reasoning

Capability discovery

Best practices