Some models think out loud before answering. They work through problems step by step, then give you a final answer. This makes them stronger at math, code, and logic-heavy tasks.
| Model | ID |
|---|
| Claude Opus 4.5 | claude-opus-45 |
| Claude Opus 4.6 | claude-opus-4-6 |
| Claude Sonnet 4.5 | claude-sonnet-45 |
| Claude Sonnet 4.6 | claude-sonnet-4-6 |
| DeepSeek V3.2 | deepseek-v3.2 |
| Gemini 3 Flash Preview | gemini-3-flash-preview |
| Gemini 3 Pro Preview | gemini-3-pro-preview |
| Gemini 3.1 Pro Preview | gemini-3-1-pro-preview |
| GLM 4.7 | zai-org-glm-4.7 |
| GLM 4.7 Flash | zai-org-glm-4.7-flash |
| GLM 4.7 Flash Heretic | olafangensan-glm-4.7-flash-heretic |
| GLM 5 | zai-org-glm-5 |
| GPT-5.2 | openai-gpt-52 |
| GPT-5.2 Codex | openai-gpt-52-codex |
| Grok 4.1 Fast | grok-41-fast |
| Grok Code Fast 1 | grok-code-fast-1 |
| Kimi K2 Thinking | kimi-k2-thinking |
| Kimi K2.5 | kimi-k2-5 |
| MiniMax M2.1 | minimax-m21 |
| MiniMax M2.5 | minimax-m25 |
| Qwen 3 235B A22B Thinking 2507 | qwen3-235b-a22b-thinking-2507 |
| Venice Small | qwen3-4b |
See the full list of models, pricing and context limits on the Models page.
Reading the output
Reasoning models return their thinking in one of two ways.
The reasoning_content field
Models like zai-org-glm-4.7 return thinking in a separate reasoning_content field, keeping content clean:
response = client.chat.completions.create(
model="zai-org-glm-4.7",
messages=[{"role": "user", "content": "What is 15% of 240?"}]
)
thinking = response.choices[0].message.reasoning_content
answer = response.choices[0].message.content
Streaming
When streaming, reasoning_content arrives in the delta before the final answer:
stream = client.chat.completions.create(
model="zai-org-glm-4.7",
messages=[{"role": "user", "content": "Explain photosynthesis"}],
stream=True
)
for chunk in stream:
if chunk.choices:
delta = chunk.choices[0].delta
if delta.reasoning_content:
print(delta.reasoning_content, end="")
if delta.content:
print(delta.content, end="")
Reasoning effort
Reasoning models spend tokens “thinking” before they answer. The reasoning_effort parameter controls how much thinking the model does.
| Value | Behavior |
|---|
low | Minimal thinking. Fast and cheap. Best for simple factual questions. |
medium | Balanced thinking. The default for most tasks. |
high | Deep thinking. Slower and uses more tokens, but produces better answers on complex problems like math proofs or debugging. |
response = client.chat.completions.create(
model="zai-org-glm-4.7",
messages=[{"role": "user", "content": "Prove that there are infinitely many primes"}],
extra_body={"reasoning_effort": "high"}
)
Works on all supported models listed above.
Venice also accepts the format: "reasoning": {"effort": "high"}. Same behavior, different syntax.
Disabling reasoning
Skip reasoning entirely for faster, cheaper responses:
response = client.chat.completions.create(
model="qwen3-4b",
messages=[{"role": "user", "content": "What's the capital of France?"}],
extra_body={"venice_parameters": {"disable_thinking": True}}
)
Or use an instruct model like qwen3-235b-a22b-instruct-2507 instead.
Parameters
| Parameter | Values | Description |
|---|
reasoning_effort | low, medium, high | Controls thinking depth |
reasoning.effort | low, medium, high | Alternative format |
disable_thinking | boolean | Skips reasoning entirely |
Pass disable_thinking in venice_parameters, or use it as a model suffix.