Skip to main content
Some models think out loud before answering. They work through problems step by step, then give you a final answer. This makes them stronger at math, code, and logic-heavy tasks. Supported models: claude-opus-45, grok-41-fast, kimi-k2-thinking, gemini-3-pro-preview, qwen3-235b-a22b-thinking-2507, qwen3-4b, deepseek-ai-DeepSeek-R1

Reading the output

Reasoning models return their thinking in one of two ways.

The reasoning_content field

Models like qwen3-235b-a22b-thinking-2507 return thinking in a separate reasoning_content field, keeping content clean:
response = client.chat.completions.create(
    model="qwen3-235b-a22b-thinking-2507",
    messages=[{"role": "user", "content": "What is 15% of 240?"}]
)

thinking = response.choices[0].message.reasoning_content
answer = response.choices[0].message.content

<think> tags

Other models (qwen3-4b, deepseek-ai-DeepSeek-R1) wrap thinking in <think> tags within the content field:
<think>
The user wants 15% of 240.
15% = 0.15
0.15 × 240 = 36
</think>

15% of 240 is **36**.
Parse or strip as needed, or use strip_thinking_response to have Venice remove them server-side.

Streaming

When streaming, reasoning_content arrives in the delta before the final answer:
stream = client.chat.completions.create(
    model="qwen3-235b-a22b-thinking-2507",
    messages=[{"role": "user", "content": "Explain photosynthesis"}],
    stream=True
)

for chunk in stream:
    if chunk.choices:
        delta = chunk.choices[0].delta
        if delta.reasoning_content:
            print(delta.reasoning_content, end="")
        if delta.content:
            print(delta.content, end="")
For models using <think> tags, the thinking streams before the answer. Collect the full response, then parse.

Reasoning effort

Reasoning models spend tokens “thinking” before they answer. The reasoning_effort parameter controls how much thinking the model does.
ValueBehavior
lowMinimal thinking. Fast and cheap. Best for simple factual questions.
mediumBalanced thinking. The default for most tasks.
highDeep thinking. Slower and uses more tokens, but produces better answers on complex problems like math proofs or debugging.
response = client.chat.completions.create(
    model="qwen3-235b-a22b-thinking-2507",
    messages=[{"role": "user", "content": "Prove that there are infinitely many primes"}],
    extra_body={"reasoning_effort": "high"}
)
Works on: claude-opus-45, grok-41-fast, kimi-k2-thinking, gemini-3-pro-preview, qwen3-235b-a22b-thinking-2507
Venice also accepts the OpenRouter format: "reasoning": {"effort": "high"}. Same behavior, different syntax.

Disabling reasoning

Skip reasoning entirely for faster, cheaper responses:
response = client.chat.completions.create(
    model="qwen3-4b",
    messages=[{"role": "user", "content": "What's the capital of France?"}],
    extra_body={"venice_parameters": {"disable_thinking": True}}
)
Or use an instruct model like qwen3-235b-a22b-instruct-2507 instead.

Stripping thinking from responses

For models using <think> tags, have Venice remove them server-side:
response = client.chat.completions.create(
    model="qwen3-4b",
    messages=[{"role": "user", "content": "What is 15% of 240?"}],
    extra_body={"venice_parameters": {"strip_thinking_response": True}}
)
Or use a model suffix: qwen3-4b:strip_thinking_response=true

Parameters

ParameterValuesDescription
reasoning_effortlow, medium, highControls thinking depth
reasoning.effortlow, medium, highOpenRouter format
disable_thinkingbooleanSkips reasoning entirely
strip_thinking_responsebooleanRemoves <think> tags
Pass disable_thinking and strip_thinking_response in venice_parameters, or use them as model suffixes.

Deprecations

qwen3-235b → qwen3-235b-a22b-thinking-2507Starting December 14, 2025, qwen3-235b routes to qwen3-235b-a22b-thinking-2507.What changes:
  • disable_thinking gets ignored
  • <think> tags no longer appear in content
  • Thinking moves to reasoning_content instead
What stays the same:
  • strip_thinking_response still works
Action required: If you parse <think> tags, switch to reading reasoning_content. If you use disable_thinking=true, switch to qwen3-235b-a22b-instruct-2507 before December 14.
<think> tags will eventually be deprecated across all models in favor of the reasoning_content field.
For pricing and context limits, see Current Models.