Some models think out loud before answering. They work through problems step by step, then give you a final answer. This makes them stronger at math, code, and logic-heavy tasks.
See the full list of models, pricing and context limits on the Models page. Not all reasoning models support the reasoning_effort parameter. See model support for details.
Reading the output
Reasoning models return their thinking in a separate reasoning_content field, keeping content clean:
response = client.chat.completions.create(
model="zai-org-glm-4.7",
messages=[{"role": "user", "content": "What is 15% of 240?"}]
)
thinking = response.choices[0].message.reasoning_content
answer = response.choices[0].message.content
Some providers (Anthropic, Google, OpenAI, Qwen) return encrypted or summarized reasoning tokens. When this happens, reasoning_content contains a "[Some reasoning content is encrypted]" placeholder.
Streaming
When streaming, reasoning_content arrives in the delta before the final answer:
stream = client.chat.completions.create(
model="zai-org-glm-4.7",
messages=[{"role": "user", "content": "Explain photosynthesis"}],
stream=True
)
for chunk in stream:
if chunk.choices:
delta = chunk.choices[0].delta
if delta.reasoning_content:
print(delta.reasoning_content, end="")
if delta.content:
print(delta.content, end="")
Reasoning effort
The reasoning_effort parameter controls how much thinking a model does before responding. Higher effort means deeper reasoning but more tokens and latency.
Accepted values
| Value | Description |
|---|
none | Disables reasoning entirely |
minimal | Basic reasoning with minimal effort |
low | Light reasoning for simple problems |
medium | Balanced reasoning for moderate complexity |
high | Deep reasoning for complex problems |
xhigh | Extra-high reasoning depth |
max | Maximum reasoning capability |
Not all models support all values. Venice does not auto-map to the nearest supported level. Unsupported values return a 400 error from the upstream provider. For example, sending xhigh to Claude or max to GPT-5.2 will fail.When in doubt, use low, medium, or high. These are the most widely supported values.
Model support
OpenAI
| Model | Supported values |
|---|
| GPT-5.2 | none, low, medium, high, xhigh |
| GPT-5.2 Codex, GPT-5.3 Codex | low, medium, high, xhigh |
Anthropic
| Model | Supported values |
|---|
| Claude Opus 4.6 | low, medium, high, max |
| Claude Opus 4.5, Sonnet 4.5, Sonnet 4.6 | low, medium, high |
Google
| Model | Supported values |
|---|
| Gemini 3 Pro Preview | low, high |
| Gemini 3.1 Pro Preview | low, medium, high |
| Gemini 3 Flash Preview | minimal, low, medium, high |
xAI
Grok models (Grok 4.1 Fast, Grok Code Fast) do not support reasoning_effort. Specifying it will result in an error.
Other models
| Model | Supported values |
|---|
| Qwen 3 235B A22B Thinking, Qwen 3.5 35B A3B | low, medium, high |
| Kimi K2.5 | low, medium, high |
| MiniMax M2.5, M2.1 | low, medium, high |
| GLM 4.7 series | low, medium, high |
| DeepSeek R1 | Built-in reasoning only, not configurable |
Usage
Pass reasoning_effort as a top-level parameter or use the nested reasoning.effort format:
response = client.chat.completions.create(
model="zai-org-glm-4.7",
messages=[{"role": "user", "content": "Prove that there are infinitely many primes"}],
extra_body={"reasoning": {"effort": "high"}}
)
The flat format "reasoning_effort": "high" is also accepted.
Disabling reasoning
There are two ways to disable reasoning:
| Method | Syntax | How it works |
|---|
reasoning.enabled: false | "reasoning": {"enabled": false} | Venice-level toggle that prevents reasoning parameters from being sent to the provider. Recommended. |
reasoning.effort: "none" | "reasoning": {"effort": "none"} | Passed to the provider, which decides how to handle it. Only supported by some models (e.g. GPT-5.x). |
For models that support it, reasoning.enabled: false is the more reliable option:
| Model | Can disable? |
|---|
| GPT-5.2 | Yes |
| GPT-5.2 Codex, GPT-5.3 Codex | Yes (but none effort not supported) |
| Qwen 3 235B A22B Thinking, Qwen 3.5 35B A3B | Yes |
| GLM 4.7 series | Yes |
| Claude Opus 4.5/4.6, Sonnet 4.5/4.6 | No (always reasons) |
| Gemini 3 Pro, 3.1 Pro, 3 Flash | No (always reasons) |
| DeepSeek R1 | No (always reasons) |
response = client.chat.completions.create(
model="openai-gpt-52",
messages=[{"role": "user", "content": "What's the capital of France?"}],
extra_body={"reasoning": {"enabled": False}}
)
Capability discovery
Check what a model supports via the /v1/models endpoint:
| Field | Meaning |
|---|
supportsReasoning | Model has reasoning capability (chain-of-thought) |
supportsReasoningEffort | Model accepts the reasoning_effort / reasoning.effort parameter |
Best practices
- Default to
medium for general use
- Use
high or xhigh for complex tasks (math, code, analysis)
- Use
low for latency-sensitive applications
- Use
reasoning.enabled: false or set effort to none to disable reasoning
- When in doubt, use
low, medium, or high. These are the most widely supported values