Reasoning Models | Venice API Docs

某些模型在回答之前会大声思考。它们一步步推理问题，然后给出最终答案。这使得它们在数学、代码和逻辑密集型任务上表现更强。

完整的模型列表、定价和上下文限制请参阅模型页面。并非所有推理模型都支持 reasoning_effort 参数。详情请参阅模型支持。

读取输出

推理模型在单独的 reasoning_content 字段中返回它们的思考，保持 content 干净：

response = client.chat.completions.create(
    model="zai-org-glm-5-1",
    messages=[{"role": "user", "content": "What is 15% of 240?"}]
)

thinking = response.choices[0].message.reasoning_content
answer = response.choices[0].message.content

const response = await client.chat.completions.create({
    model: "zai-org-glm-5-1",
    messages: [{ role: "user", content: "What is 15% of 240?" }]
});

const thinking = response.choices[0].message.reasoning_content;
const answer = response.choices[0].message.content;

curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org-glm-5-1",
    "messages": [{"role": "user", "content": "What is 15% of 240?"}]
  }'

某些提供商（Anthropic、Google、OpenAI、Qwen）返回加密或摘要的推理 token。发生时，reasoning_content 包含 "[Some reasoning content is encrypted]" 占位符。

流式

流式传输时，reasoning_content 在最终答案之前的 delta 中到达：

stream = client.chat.completions.create(
    model="zai-org-glm-5-1",
    messages=[{"role": "user", "content": "Explain photosynthesis"}],
    stream=True
)

for chunk in stream:
    if chunk.choices:
        delta = chunk.choices[0].delta
        if delta.reasoning_content:
            print(delta.reasoning_content, end="")
        if delta.content:
            print(delta.content, end="")

const stream = await client.chat.completions.create({
    model: "zai-org-glm-5-1",
    messages: [{ role: "user", content: "Explain photosynthesis" }],
    stream: true
});

for await (const chunk of stream) {
    if (chunk.choices?.[0]?.delta) {
        const delta = chunk.choices[0].delta;
        if (delta.reasoning_content) process.stdout.write(delta.reasoning_content);
        if (delta.content) process.stdout.write(delta.content);
    }
}

curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org-glm-5-1",
    "messages": [{"role": "user", "content": "Explain photosynthesis"}],
    "stream": true
  }'

推理强度

reasoning_effort 参数控制模型在响应之前进行多少思考。更高的强度意味着更深入的推理，但需要更多 token 和更长延迟。

接受的值

值	说明
`none`	完全禁用推理
`minimal`	最低限度的基本推理
`low`	用于简单问题的轻度推理
`medium`	中等复杂度的平衡推理
`high`	用于复杂问题的深入推理
`xhigh`	超高推理深度
`max`	最大推理能力

并非所有模型都支持所有值。Venice 不会自动映射到最接近的支持级别。不支持的值会从上游提供商返回 400 错误。例如，向 Claude 发送 xhigh 或向 GPT-5.2 发送 max 都会失败。不确定时，请使用 low、medium 或 high。这些是最广泛支持的值。

模型支持

OpenAI

模型	支持的值
GPT-5.2	`none`、`low`、`medium`、`high`、`xhigh`
GPT-5.2 Codex、GPT-5.3 Codex	`low`、`medium`、`high`、`xhigh`

Anthropic

模型	支持的值
Claude Opus 4.6、Opus 4.6 Fast	`low`、`medium`、`high`、`max`
Claude Opus 4.5、Sonnet 4.5、Sonnet 4.6	`low`、`medium`、`high`

Google

模型	支持的值
Gemini 3 Pro Preview	`low`、`high`
Gemini 3.1 Pro Preview	`low`、`medium`、`high`
Gemini 3 Flash Preview	`minimal`、`low`、`medium`、`high`

xAI

Grok 模型（Grok 4.1 Fast、Grok Code Fast）不支持 reasoning_effort。指定它将导致错误。

其他模型

模型	支持的值
Qwen 3 235B A22B Thinking、Qwen 3.5 35B A3B	`low`、`medium`、`high`
Kimi K2.5	`low`、`medium`、`high`
MiniMax M2.5、M2.1	`low`、`medium`、`high`
GLM 5.1 系列	仅内置推理，不可配置
DeepSeek R1	仅内置推理，不可配置

使用方式

将 reasoning_effort 作为顶层参数传递，或使用嵌套的 reasoning.effort 格式：

response = client.chat.completions.create(
    model="minimax-m25",
    messages=[{"role": "user", "content": "Prove that there are infinitely many primes"}],
    extra_body={"reasoning": {"effort": "high"}}
)

const response = await client.chat.completions.create({
    model: "minimax-m25",
    messages: [{ role: "user", content: "Prove that there are infinitely many primes" }],
    reasoning: { effort: "high" }
});

curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "minimax-m25",
    "messages": [{"role": "user", "content": "Prove that there are infinitely many primes"}],
    "reasoning": {"effort": "high"}
  }'

也接受扁平格式 "reasoning_effort": "high"。

禁用推理

有两种方法可以禁用推理：

方法	语法	工作方式
`reasoning.enabled: false`	`"reasoning": {"enabled": false}`	Venice 级别的开关，可防止推理参数被发送到提供商。推荐。
`reasoning.effort: "none"`	`"reasoning": {"effort": "none"}`	传递给提供商，由其决定如何处理。仅部分模型支持（如 GPT-5.x）。

对于支持的模型，reasoning.enabled: false 是更可靠的选项：

模型	可以禁用？
GPT-5.2	是
GPT-5.2 Codex、GPT-5.3 Codex	是（但不支持 `none` effort）
Qwen 3 235B A22B Thinking、Qwen 3.5 35B A3B	是
GLM 5.1 系列	是
Claude Opus 4.5/4.6/4.6 Fast、Sonnet 4.5/4.6	否（始终推理）
Gemini 3 Pro、3.1 Pro、3 Flash	否（始终推理）
DeepSeek R1	否（始终推理）

response = client.chat.completions.create(
    model="openai-gpt-52",
    messages=[{"role": "user", "content": "What's the capital of France?"}],
    extra_body={"reasoning": {"enabled": False}}
)

const response = await client.chat.completions.create({
    model: "openai-gpt-52",
    messages: [{ role: "user", content: "What's the capital of France?" }],
    reasoning: { enabled: false }
});

curl https://api.venice.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai-gpt-52",
    "messages": [{"role": "user", "content": "What is the capital of France?"}],
    "reasoning": {"enabled": false}
  }'

Token 限制

推理模型生成可见答案 token（在 content 中）和推理 token（在 reasoning_content 中）。两者都计入您的 token 预算。

设置 token 上限

使用 max_completion_tokens 限制模型生成的总 token 数（包括推理）：

{
  "model": "deepseek-v4-flash",
  "messages": [...],
  "max_completion_tokens": 500
}

max_tokens 也被接受并行为相同。如果两者都设置，max_completion_tokens 优先。要获得更多可见输出，请提高上限、降低 reasoning_effort，或禁用推理。

读取细分

usage 对象显示您的预算如何分配：

"usage": {
  "completion_tokens": 501,
  "completion_tokens_details": { "reasoning_tokens": 169 },
  "prompt_tokens": 13,
  "total_tokens": 514
}

在此示例中，169 个 token 用于推理，332 个用于可见答案。达到上限时，finish_reason 为 length。每个模型的上限可在 /v1/models 端点上以 maxCompletionTokens 字段获取。

非推理模型

在非推理模型上，max_tokens 和 max_completion_tokens 行为相同，直接限制可见输出。

能力发现

通过 /v1/models 端点检查模型支持什么：

字段	含义
`supportsReasoning`	模型具备推理能力（chain-of-thought）
`supportsReasoningEffort`	模型接受 `reasoning_effort` / `reasoning.effort` 参数

最佳实践

通用情况默认 medium
复杂任务（数学、代码、分析）使用 high 或 xhigh
延迟敏感的应用使用 low
使用 reasoning.enabled: false 或将 effort 设为 none 来禁用推理
不确定时，使用 low、medium 或 high。这些是最广泛支持的值

文档

快速开始

文本与聊天

图像、视频与音频

API 工具

智能体与集成

编程工具

智能体工具

SDK 与框架

推理模型

读取输出

流式

推理强度

接受的值

模型支持

OpenAI

Anthropic

Google

xAI

其他模型

使用方式

禁用推理

Token 限制

设置 token 上限

读取细分

非推理模型

能力发现

最佳实践

​读取输出

​流式

​推理强度

​接受的值

​模型支持

​OpenAI

​Anthropic

​Google

​xAI

​其他模型

​使用方式

​禁用推理

​Token 限制

​设置 token 上限

​读取细分

​非推理模型

​能力发现

​最佳实践

读取输出

流式

推理强度

接受的值

模型支持

OpenAI

Anthropic

Google

xAI

其他模型

使用方式

禁用推理

Token 限制

设置 token 上限

读取细分

非推理模型

能力发现

最佳实践