Skip to main content
Some models think out loud before answering. They work through problems step by step, then give you a final answer. This makes them stronger at math, code, and logic-heavy tasks.
See the full list of models, pricing and context limits on the Models page. Not all reasoning models support the reasoning_effort parameter. See model support for details.

Reading the output

Reasoning models return their thinking in a separate reasoning_content field, keeping content clean:
response = client.chat.completions.create(
    model="zai-org-glm-4.7",
    messages=[{"role": "user", "content": "What is 15% of 240?"}]
)

thinking = response.choices[0].message.reasoning_content
answer = response.choices[0].message.content
Some providers (Anthropic, Google, OpenAI, Qwen) return encrypted or summarized reasoning tokens. When this happens, reasoning_content contains a "[Some reasoning content is encrypted]" placeholder.

Streaming

When streaming, reasoning_content arrives in the delta before the final answer:
stream = client.chat.completions.create(
    model="zai-org-glm-4.7",
    messages=[{"role": "user", "content": "Explain photosynthesis"}],
    stream=True
)

for chunk in stream:
    if chunk.choices:
        delta = chunk.choices[0].delta
        if delta.reasoning_content:
            print(delta.reasoning_content, end="")
        if delta.content:
            print(delta.content, end="")

Reasoning effort

The reasoning_effort parameter controls how much thinking a model does before responding. Higher effort means deeper reasoning but more tokens and latency.

Accepted values

ValueDescription
noneDisables reasoning entirely
minimalBasic reasoning with minimal effort
lowLight reasoning for simple problems
mediumBalanced reasoning for moderate complexity
highDeep reasoning for complex problems
xhighExtra-high reasoning depth
maxMaximum reasoning capability
Not all models support all values. Venice does not auto-map to the nearest supported level. Unsupported values return a 400 error from the upstream provider. For example, sending xhigh to Claude or max to GPT-5.2 will fail.When in doubt, use low, medium, or high. These are the most widely supported values.

Model support

OpenAI

ModelSupported values
GPT-5.2none, low, medium, high, xhigh
GPT-5.2 Codex, GPT-5.3 Codexlow, medium, high, xhigh

Anthropic

ModelSupported values
Claude Opus 4.6low, medium, high, max
Claude Opus 4.5, Sonnet 4.5, Sonnet 4.6low, medium, high

Google

ModelSupported values
Gemini 3 Pro Previewlow, high
Gemini 3.1 Pro Previewlow, medium, high
Gemini 3 Flash Previewminimal, low, medium, high

xAI

Grok models (Grok 4.1 Fast, Grok Code Fast) do not support reasoning_effort. Specifying it will result in an error.

Other models

ModelSupported values
Qwen 3 235B A22B Thinking, Qwen 3.5 35B A3Blow, medium, high
Kimi K2.5low, medium, high
MiniMax M2.5, M2.1low, medium, high
GLM 4.7 serieslow, medium, high
DeepSeek R1Built-in reasoning only, not configurable

Usage

Pass reasoning_effort as a top-level parameter or use the nested reasoning.effort format:
response = client.chat.completions.create(
    model="zai-org-glm-4.7",
    messages=[{"role": "user", "content": "Prove that there are infinitely many primes"}],
    extra_body={"reasoning": {"effort": "high"}}
)
The flat format "reasoning_effort": "high" is also accepted.

Disabling reasoning

There are two ways to disable reasoning:
MethodSyntaxHow it works
reasoning.enabled: false"reasoning": {"enabled": false}Venice-level toggle that prevents reasoning parameters from being sent to the provider. Recommended.
reasoning.effort: "none""reasoning": {"effort": "none"}Passed to the provider, which decides how to handle it. Only supported by some models (e.g. GPT-5.x).
For models that support it, reasoning.enabled: false is the more reliable option:
ModelCan disable?
GPT-5.2Yes
GPT-5.2 Codex, GPT-5.3 CodexYes (but none effort not supported)
Qwen 3 235B A22B Thinking, Qwen 3.5 35B A3BYes
GLM 4.7 seriesYes
Claude Opus 4.5/4.6, Sonnet 4.5/4.6No (always reasons)
Gemini 3 Pro, 3.1 Pro, 3 FlashNo (always reasons)
DeepSeek R1No (always reasons)
response = client.chat.completions.create(
    model="openai-gpt-52",
    messages=[{"role": "user", "content": "What's the capital of France?"}],
    extra_body={"reasoning": {"enabled": False}}
)

Capability discovery

Check what a model supports via the /v1/models endpoint:
FieldMeaning
supportsReasoningModel has reasoning capability (chain-of-thought)
supportsReasoningEffortModel accepts the reasoning_effort / reasoning.effort parameter

Best practices

  • Default to medium for general use
  • Use high or xhigh for complex tasks (math, code, analysis)
  • Use low for latency-sensitive applications
  • Use reasoning.enabled: false or set effort to none to disable reasoning
  • When in doubt, use low, medium, or high. These are the most widely supported values