Skip to main content
Some models think out loud before answering. They work through problems step by step, then give you a final answer. This makes them stronger at math, code, and logic-heavy tasks.
ModelID
Claude Opus 4.5claude-opus-45
Claude Opus 4.6claude-opus-4-6
Claude Sonnet 4.5claude-sonnet-45
Claude Sonnet 4.6claude-sonnet-4-6
DeepSeek V3.2deepseek-v3.2
Gemini 3 Flash Previewgemini-3-flash-preview
Gemini 3 Pro Previewgemini-3-pro-preview
Gemini 3.1 Pro Previewgemini-3-1-pro-preview
GLM 4.7zai-org-glm-4.7
GLM 4.7 Flashzai-org-glm-4.7-flash
GLM 4.7 Flash Hereticolafangensan-glm-4.7-flash-heretic
GLM 5zai-org-glm-5
GPT-5.2openai-gpt-52
GPT-5.2 Codexopenai-gpt-52-codex
Grok 4.1 Fastgrok-41-fast
Grok Code Fast 1grok-code-fast-1
Kimi K2 Thinkingkimi-k2-thinking
Kimi K2.5kimi-k2-5
MiniMax M2.1minimax-m21
MiniMax M2.5minimax-m25
Qwen 3 235B A22B Thinking 2507qwen3-235b-a22b-thinking-2507
Venice Smallqwen3-4b
See the full list of models, pricing and context limits on the Models page.

Reading the output

Reasoning models return their thinking in one of two ways.

The reasoning_content field

Models like zai-org-glm-4.7 return thinking in a separate reasoning_content field, keeping content clean:
response = client.chat.completions.create(
    model="zai-org-glm-4.7",
    messages=[{"role": "user", "content": "What is 15% of 240?"}]
)

thinking = response.choices[0].message.reasoning_content
answer = response.choices[0].message.content

Streaming

When streaming, reasoning_content arrives in the delta before the final answer:
stream = client.chat.completions.create(
    model="zai-org-glm-4.7",
    messages=[{"role": "user", "content": "Explain photosynthesis"}],
    stream=True
)

for chunk in stream:
    if chunk.choices:
        delta = chunk.choices[0].delta
        if delta.reasoning_content:
            print(delta.reasoning_content, end="")
        if delta.content:
            print(delta.content, end="")

Reasoning effort

Reasoning models spend tokens “thinking” before they answer. The reasoning_effort parameter controls how much thinking the model does.
ValueBehavior
lowMinimal thinking. Fast and cheap. Best for simple factual questions.
mediumBalanced thinking. The default for most tasks.
highDeep thinking. Slower and uses more tokens, but produces better answers on complex problems like math proofs or debugging.
response = client.chat.completions.create(
    model="zai-org-glm-4.7",
    messages=[{"role": "user", "content": "Prove that there are infinitely many primes"}],
    extra_body={"reasoning_effort": "high"}
)
Works on all supported models listed above.
Venice also accepts the format: "reasoning": {"effort": "high"}. Same behavior, different syntax.

Disabling reasoning

Skip reasoning entirely for faster, cheaper responses:
response = client.chat.completions.create(
    model="qwen3-4b",
    messages=[{"role": "user", "content": "What's the capital of France?"}],
    extra_body={"venice_parameters": {"disable_thinking": True}}
)
Or use an instruct model like qwen3-235b-a22b-instruct-2507 instead.

Parameters

ParameterValuesDescription
reasoning_effortlow, medium, highControls thinking depth
reasoning.effortlow, medium, highAlternative format
disable_thinkingbooleanSkips reasoning entirely
Pass disable_thinking in venice_parameters, or use it as a model suffix.