Some models think out loud before answering. They work through the problem step by step, then give you a final answer. This makes them better at math, code, and logic-heavy tasks.
Current models that support reasoning: qwen3-235b, qwen3-235b-a22b-thinking-2507, qwen3-4b, deepseek-ai-DeepSeek-R1
The reasoning_content field
qwen3-235b-a22b-thinking-2507 returns thinking in a separate reasoning_content field, keeping the content field clean:
response = client.chat.completions.create(
model="qwen3-235b-a22b-thinking-2507",
messages=[{"role": "user", "content": "What is 15% of 240?"}]
)
thinking = response.choices[0].message.reasoning_content
answer = response.choices[0].message.content
Other reasoning models (qwen3-235b, qwen3-4b, deepseek-ai-DeepSeek-R1) wrap their thinking in <think> tags within the content field:
<think>
The user wants 15% of 240.
15% = 0.15
0.15 × 240 = 36
</think>
15% of 240 is **36**.
Parse or strip as needed, or use strip_thinking_response to have Venice remove them server-side.
Skip thinking
If you don’t need the model to reason (faster, cheaper), use an instruct model like qwen3-235b-a22b-instruct-2507 or disable thinking where supported:
response = client.chat.completions.create(
model="qwen3-4b",
messages=[{"role": "user", "content": "What's the capital of France?"}],
extra_body={
"venice_parameters": {
"disable_thinking": True
}
}
)
Strip thinking
For models that use <think> tags, set strip_thinking_response to have Venice remove the thinking block server-side:
response = client.chat.completions.create(
model="qwen3-235b",
messages=[{"role": "user", "content": "What is 15% of 240?"}],
extra_body={
"venice_parameters": {
"strip_thinking_response": True
}
}
)
# Returns just the answer, no <think> block
Or use a model suffix: qwen3-235b:strip_thinking_response=true
Streaming
When streaming with qwen3-235b-a22b-thinking-2507, reasoning_content arrives in the delta before the final answer:
stream = client.chat.completions.create(
model="qwen3-235b-a22b-thinking-2507",
messages=[{"role": "user", "content": "Explain photosynthesis"}],
stream=True
)
for chunk in stream:
if chunk.choices:
delta = chunk.choices[0].delta
if delta.reasoning_content:
print(delta.reasoning_content, end="")
if delta.content:
print(delta.content, end="")
For models that use <think> tags, collect the full response first then parse. The <think> content streams before the answer.
Deprecations
qwen3-235b → qwen3-235b-a22b-thinking-2507Starting December 14, 2025, qwen3-235b routes to qwen3-235b-a22b-thinking-2507.What changes:
disable_thinking gets ignored
<think> tags no longer appear in content
- Thinking moves to the
reasoning_content field instead
What stays the same:
strip_thinking_response still works
Action required: If you parse <think> tags from the response, switch to reading reasoning_content instead. If you use disable_thinking=true, switch to qwen3-235b-a22b-instruct-2507 before December 14.
<think> tags will eventually be deprecated across all models in favor of the reasoning_content field.
Parameters
| Parameter | What it does |
|---|
strip_thinking_response | Remove <think> tags, return only the answer |
disable_thinking | Skip reasoning entirely (faster) |
Both go in venice_parameters or as model suffixes.
For pricing and context limits, see Current Models.