Skip to main content
Some models think out loud before answering. They work through the problem step by step, then give you a final answer. This makes them better at math, code, and logic-heavy tasks. Current models that support reasoning: qwen3-235b, qwen3-235b-a22b-thinking-2507, qwen3-4b, deepseek-ai-DeepSeek-R1

The reasoning_content field

qwen3-235b-a22b-thinking-2507 returns thinking in a separate reasoning_content field, keeping the content field clean:
response = client.chat.completions.create(
    model="qwen3-235b-a22b-thinking-2507",
    messages=[{"role": "user", "content": "What is 15% of 240?"}]
)

thinking = response.choices[0].message.reasoning_content
answer = response.choices[0].message.content

<think> tags

Other reasoning models (qwen3-235b, qwen3-4b, deepseek-ai-DeepSeek-R1) wrap their thinking in <think> tags within the content field:
<think>
The user wants 15% of 240.
15% = 0.15
0.15 × 240 = 36
</think>

15% of 240 is **36**.
Parse or strip as needed, or use strip_thinking_response to have Venice remove them server-side.

Skip thinking

If you don’t need the model to reason (faster, cheaper), use an instruct model like qwen3-235b-a22b-instruct-2507 or disable thinking where supported:
response = client.chat.completions.create(
    model="qwen3-4b",
    messages=[{"role": "user", "content": "What's the capital of France?"}],
    extra_body={
        "venice_parameters": {
            "disable_thinking": True
        }
    }
)

Strip thinking

For models that use <think> tags, set strip_thinking_response to have Venice remove the thinking block server-side:
response = client.chat.completions.create(
    model="qwen3-235b",
    messages=[{"role": "user", "content": "What is 15% of 240?"}],
    extra_body={
        "venice_parameters": {
            "strip_thinking_response": True
        }
    }
)
# Returns just the answer, no <think> block
Or use a model suffix: qwen3-235b:strip_thinking_response=true

Streaming

When streaming with qwen3-235b-a22b-thinking-2507, reasoning_content arrives in the delta before the final answer:
stream = client.chat.completions.create(
    model="qwen3-235b-a22b-thinking-2507",
    messages=[{"role": "user", "content": "Explain photosynthesis"}],
    stream=True
)

for chunk in stream:
    if chunk.choices:
        delta = chunk.choices[0].delta
        if delta.reasoning_content:
            print(delta.reasoning_content, end="")
        if delta.content:
            print(delta.content, end="")
For models that use <think> tags, collect the full response first then parse. The <think> content streams before the answer.

Deprecations

qwen3-235b → qwen3-235b-a22b-thinking-2507Starting December 14, 2025, qwen3-235b routes to qwen3-235b-a22b-thinking-2507.What changes:
  • disable_thinking gets ignored
  • <think> tags no longer appear in content
  • Thinking moves to the reasoning_content field instead
What stays the same:
  • strip_thinking_response still works
Action required: If you parse <think> tags from the response, switch to reading reasoning_content instead. If you use disable_thinking=true, switch to qwen3-235b-a22b-instruct-2507 before December 14.
<think> tags will eventually be deprecated across all models in favor of the reasoning_content field.

Parameters

ParameterWhat it does
strip_thinking_responseRemove <think> tags, return only the answer
disable_thinkingSkip reasoning entirely (faster)
Both go in venice_parameters or as model suffixes. For pricing and context limits, see Current Models.