Seedance 2.0 是一个旗舰多模态视频模型,Venice 将其暴露为三个变体的系列,用于文本驱动、图像驱动和参考驱动的视频生成。reference-to-video 变体异常强大:单个端点和单个模型 ID 处理四种不同的工作流(Reference、Edit、Extend、Stitch)——工作流从您的 prompt 形态推断而出。
本指南介绍各变体、四种工作流及其规范 prompt、多模态输入限制、定价和完整的 curl 示例。
| 模型 ID | 变体 | 输出分辨率 | 备注 |
|---|
seedance-2-0-text-to-video | T2V | 480p / 720p / 1080p | 仅文本 prompt |
seedance-2-0-image-to-video | I2V | 480p / 720p / 1080p | 第一帧(及可选最后一帧)图像基础 |
seedance-2-0-reference-to-video | R2V | 480p / 720p / 1080p | 最多 9 张参考图像 + 3 段参考视频 + 3 段参考音频供体。驱动 Reference / Edit / Extend / Stitch |
seedance-2-0-fast-text-to-video | Fast T2V | 480p / 720p | 更快、较低保真度层级 |
seedance-2-0-fast-image-to-video | Fast I2V | 480p / 720p | 更快、较低保真度层级 |
seedance-2-0-fast-reference-to-video | Fast R2V | 480p / 720p | 更快、较低保真度层级;相同工作流集 |
所有变体都是异步的。通过 POST /api/v1/video/queue 提交,然后轮询 POST /api/v1/video/retrieve 直到响应体为 video/mp4。常规队列流程请参阅视频生成。
“一个模型,四个工作流”模型
reference-to-video 变体(seedance-2-0-reference-to-video 及其 Fast 兄弟)是同一个底层模型服务四种不同任务。模型从 prompt 前缀和您输入的形态推断任务。 没有 task 或 workflow 字段——prompt 语法即路由。
| 工作流 | 功能 | Prompt 前缀 | 输入 |
|---|
| Reference | 使用上传的参考文件作为主体/动作/风格/音频的供体生成新视频 | Refer to ... in <Image|Video|Audio N> to generate ... | 文本 + ≥1 张图像 或 视频参考(0-9 张图像,0-3 段视频),可选最多 3 段音频供体 |
| Edit | 修改单个输入视频同时保留其余部分 | Strictly edit <Video 1>, changing its ... | 1 段输入视频 + 文本(图像可选基础) |
| Extend | 单个剪辑的前向/后向扩展 | Extend <Video 1>, generate ... | 1 段输入视频 + 文本 |
| Stitch | 用自动生成的过渡拼接 2-3 段剪辑 | <Video 1> + <transition description> + followed by <Video 2> + ... | 2-3 段输入视频 + 文本 |
prompt 语法是规范的且区分大小写:尖括号、首字母大写、数字前单个空格——<Video 1>、<Image 1>、<Audio 1>。
工作流模式
Reference 工作流
将上传的参考文件用作供体——主体、场景、动作、风格、人声音色——以生成全新的视频。
规范 prompt 模式:
Refer to <Subject N> in <Image N> to generate ...
Refer to the [action | camera scene | style | sound effect] in <Video N> to generate ...
Refer to the [tone | timbre] in <Audio N> to generate ...
示例:
Refer to <Subject 1> in <Image 1> to generate a 5-second clip of the same character riding a horse through snow.
Refer to the camera scene in <Video 1> to generate a similar establishing shot of a futuristic city at dawn.
Refer to <Subject 1> in <Image 1> and use the timbre in <Audio 1> for the narrator describing the scene.(音频供体必须与至少一张图像或视频参考配对——仅音频会被拒绝)
Edit 工作流
修改单个输入视频。prompt 中未明确命名的任何内容都会被保留。 当您想要局部更改(主体替换、天气/颜色变化、元素添加/移除)而非全新视频时使用此项。
规范 prompt 模式:
Strictly edit <Video 1>, changing its [original feature] to [new feature] ...
更精细控制的子模式:
Add Elements:
At [timestamp / timing] and [spatial location] of <Video 1>, add [description of intended element].
Remove Elements:
Remove [element to be deleted] from <Video 1>, keeping the rest of the video content unchanged.
Modify Elements:
Replace [description of element to be changed] in <Video 1> with [description of intended element].
示例:
Strictly edit <Video 1>, changing its weather from sunny to a heavy rainstorm.
Add snacks such as fried chicken and pizza to the countertop in <Video 1>.
Remove the red car from <Video 1>, keeping the rest of the video content unchanged.
Replace the perfume featured in <Video 1> with the face cream from <Image 1>, with all original motions and camera work preserved.
最后一个示例结合了 Edit 和图像参考——完全合法,模型使用 <Image 1> 作为替换的视觉供体。
Extend 工作流
将单个剪辑在时间上向前或向后延续。默认情况下 Seedance 仅返回新内容——而不是与扩展连接的原始输入。这是出于设计目的,用于过渡连续性;如果您希望保留输入剪辑与扩展一起,请明确说明:
Extend <Video 1>, generate [description of extended content]
Extend <Video 1> backward, [description of extended content]
Extend <Video 1>, start with <Video 1>, then [description of extended content] ← 在开头保留输入
Extend <Video 1> backward, [description], and then end with <Video 1> ← 在结尾保留输入
过渡处理:模型自动提取过渡帧进行无缝混合,输入视频的原始片段不会被重新生成。
示例:
Extend <Video 1>, generate a dramatic chase scene through narrow alleys at dusk.
Extend <Video 1> backward, the same character walking toward the camera before the original shot begins.
Extend <Video 1>, start with <Video 1>, then the camera pulls back to reveal a vast landscape.
Stitch 工作流(Track Completion)
用 AI 生成的过渡连接 2-3 段输入剪辑。总组合输入时长必须 ≤ 15 s。
规范 prompt 模式:
<Video 1> + [transition description] + followed by <Video 2> [+ [transition description] + followed by <Video 3>]
示例:
<Video 1> + a smooth seamless cut + followed by <Video 2>
<Video 1>. The moment a leaf falls to the ground, it sets off a special effect of golden particles. A gust of wind blows by, leading into <Video 2>.
<Video 1> + a wisp of smoke transforms into a flock of birds + followed by <Video 2> + a slow dolly-in + followed by <Video 3>
模型在连接点自动修剪连接片段以保持连续性。
通用 prompt 公式
在所有四种工作流中,推荐的撰写公式为:
Subject + Motion + Environment (Optional)
+ Camera Movement / Cut (Optional)
+ Aesthetic Description (Optional)
+ Audio (Optional)
- Subject + Motion:逻辑基础——定义”谁”在执行”什么动作”
- Environment + Aesthetics:空间背景、光照、视觉风格
- Camera:明确的镜头类型或运动
- Audio:用于沉浸式输出的环境音效或人声方向
将其叠加在工作流前缀之上(例如 Strictly edit <Video 1>, changing its <subject + motion + environment + ...>)能产生最高质量的输出。
多模态输入限制
下面的值是 Venice API 接受的内容。超出这些范围的请求在到达推理之前会在 schema 层以 400 拒绝。
| 约束 | 值 |
|---|
| 输入方法 | URL(http://、https://)或 Base64 data URL(data:image/...) |
| 格式 | .jpeg、.png、.webp、.bmp、.tiff、.gif、.heic、.heif |
| 宽高比(W / H) | 开区间 (0.4, 2.5) |
| 最小边 | ≥ 300 px |
| 图像数量:I2V 第一帧 | 1 |
| 图像数量:I2V 第一 + 最后一帧 | 2 |
| 图像数量:R2V(V2 / Fast) | 1 – 9 |
| 约束 | 值 |
|---|
| 输入方法 | URL(http://、https://)或 Base64 data URL(data:video/...) |
| 格式 | .mp4、.mov |
| 视频编解码器 | H.264 / AVC、H.265 / HEVC |
| 音频编解码器(容器内) | AAC、MP3 |
| 每个剪辑时长 | [2, 15] s(含端点) |
| 最大剪辑数量 | 3(R2V / Stitch / Extend) |
| 总组合时长 | 所有剪辑 ≤ 15 s |
| 每个剪辑大小 | ≤ 50 MB |
| 约束 | 值 |
|---|
| 输入方法 | URL(http://、https://)或 Base64 data URL(data:audio/...) |
| 格式 | .wav、.mp3 |
| 每个剪辑时长 | [2, 15] s |
| 最大剪辑数量 | 3 |
| 总组合时长 | 所有剪辑 ≤ 15 s |
| 每个剪辑大小 | ≤ 15 MB |
参考音频仅在 R2V 变体上支持。每个条目作为 role: "reference_audio" 内容项转发到模型,prompt 中以 <Audio 1>、<Audio 2>、<Audio 3> 寻址——模型根据 prompt 框架将每个剪辑用于人声音色、音效或背景音乐。旧版单一 audio_url 字段映射到相同的内容形态,现在等同于传递一元素的 reference_audio_urls。
reference_audio_urls 不能作为唯一的参考输入。 模型要求在任何音频供体之外至少有一个图像或视频参考。将 reference_audio_urls 与 reference_image_urls、reference_video_urls、image_url 或 video_url 配对——纯音频提交会被拒绝。
请求大小
队列端点接受最大 35 MB 的 JSON 正文。大型视频的内联 data URL 可能会超过此值——尤其是多剪辑 Stitch,建议使用 URL 而非内联 base64。
在提交到 /video/queue 之前,调用 POST /api/v1/video/quote 获取给定请求形态的报价。报价端点是唯一权威来源;定价细节可能会变化,不应在客户端缓存或复制。
当请求包含参考视频时,还要传递 reference_video_total_duration(所有参考剪辑时长的总秒数),以便报价与 /video/queue 实际收费匹配:
curl -X POST https://api.venice.ai/api/v1/video/quote \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-reference-to-video",
"duration": "5s",
"resolution": "1080p",
"aspect_ratio": "16:9",
"reference_video_total_duration": 5
}'
完整示例
所有示例假设环境中设置了 VENICE_API_KEY。
Text-to-video
curl -X POST https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-text-to-video",
"prompt": "A golden retriever frolicking through a sunlit meadow at sunset, slow camera dolly-in, shallow depth of field, warm cinematic lighting.",
"duration": "5s",
"aspect_ratio": "16:9",
"resolution": "1080p"
}'
Image-to-video(第一帧)
curl -X POST https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-image-to-video",
"prompt": "The lighthouse keeper turns toward the storm, lantern raised, waves crashing against the rocks.",
"image_url": "https://example.com/lighthouse.jpg",
"duration": "5s",
"resolution": "720p"
}'
seedance-2-0-image-to-video(及其 Fast 变体)不接受 aspect_ratio ——输出宽高比从输入图像的尺寸自动派生。传递该字段会返回 400 错误,并显示 “This model does not support aspect_ratio”。如果需要显式宽高比控制,请使用 T2V 或 R2V 变体。
Reference 工作流——主体供体
curl -X POST https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-reference-to-video",
"prompt": "Refer to <Subject 1> in <Image 1> to generate a 5-second clip of the same character walking through a neon-lit Tokyo street at night.",
"reference_image_urls": ["https://example.com/character.png"],
"duration": "5s",
"aspect_ratio": "9:16",
"resolution": "1080p"
}'
Reference 工作流——主体 + 音频供体
curl -X POST https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-reference-to-video",
"prompt": "Refer to <Subject 1> in <Image 1> to generate a 5-second clip of the same character walking through a neon-lit Tokyo street at night. Refer to the timbre in <Audio 1> for a soft female voiceover describing the scene.",
"reference_image_urls": ["https://example.com/character.png"],
"reference_audio_urls": ["https://example.com/voice-sample.mp3"],
"duration": "5s",
"aspect_ratio": "9:16",
"resolution": "1080p"
}'
Edit 工作流
curl -X POST https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-reference-to-video",
"prompt": "Strictly edit <Video 1>, changing its weather from sunny to a heavy rainstorm, with all original motions and camera work preserved.",
"reference_video_urls": ["https://example.com/sunny-scene.mp4"],
"reference_video_total_duration": 5,
"duration": "5s",
"aspect_ratio": "16:9",
"resolution": "1080p"
}'
带图像基础的 Edit 工作流
curl -X POST https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-reference-to-video",
"prompt": "Replace the perfume featured in <Video 1> with the face cream from <Image 1>, with all original motions and camera work preserved.",
"reference_video_urls": ["https://example.com/perfume-ad.mp4"],
"reference_image_urls": ["https://example.com/face-cream.png"],
"reference_video_total_duration": 4,
"duration": "5s",
"aspect_ratio": "16:9",
"resolution": "1080p"
}'
向前 Extend
curl -X POST https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-reference-to-video",
"prompt": "Extend <Video 1>, generate a dramatic chase scene through narrow alleys at dusk, with neon signs flickering and rain on the pavement.",
"reference_video_urls": ["https://example.com/alley-intro.mp4"],
"reference_video_total_duration": 4,
"duration": "5s",
"aspect_ratio": "16:9",
"resolution": "1080p"
}'
Stitch(3 段剪辑)
curl -X POST https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-reference-to-video",
"prompt": "<Video 1> + a wisp of smoke transforms into a flock of birds + followed by <Video 2> + a slow dolly-in + followed by <Video 3>",
"reference_video_urls": [
"https://example.com/clip-1.mp4",
"https://example.com/clip-2.mp4",
"https://example.com/clip-3.mp4"
],
"reference_video_total_duration": 12,
"duration": "5s",
"aspect_ratio": "16:9",
"resolution": "1080p"
}'
轮询完成
每次队列提交后,保存返回的 queue_id 并轮询 /video/retrieve 直到响应体为 video/mp4:
curl -X POST https://api.venice.ai/api/v1/video/retrieve \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-reference-to-video",
"queue_id": "123e4567-e89b-12d3-a456-426614174000"
}' \
-o output.mp4
直到作业完成,响应都是 JSON({ "status": "queued" | "running" | "failed", ... }),完成时响应体切换为 video/mp4 字节。完整轮询模式请参阅视频生成。
故障排查
At least one reference is required for this model
Reference-to-video 提交必须包含 reference_image_urls、reference_video_urls、image_references 或 video_references 中的至少一个。纯文本生成不是有效的 R2V 工作流——请改用 seedance-2-0-text-to-video。仅 reference_audio_urls 不够(请参阅上面的音频章节)。
reference_video_urls must have at most 3 videos
模型将参考视频上限设为 3。如果您需要更多剪辑,先运行一次 Stitch(3 → 1),然后将输出用作后续的参考。
Per clip must be 2–15s / 聚合 > 15s
每个剪辑时长是含端点的 [2, 15] 秒;所有参考视频的总和也上限为 15 秒。在提交之前在客户端修剪剪辑。
Prompt 路由到错误的工作流
工作流从 prompt 语法推断。常见的错误路由:
- 想要 Extend 但写
Refer to ... → 模型将您的视频视为供体,而非要延续的画布
- 想要 Stitch 但写
Refer to ... → 模型挑选一个作为供体,忽略其余的
- 想要 Edit 但写
Generate a video based on <Video 1> → 含糊;模型可能默认 Reference
完全按所写使用规范前缀:Strictly edit <Video 1>, ...、Extend <Video 1>, ...、<Video 1> + ... + followed by <Video 2>。
报价与队列金额不匹配
如果您包含了参考视频但没有将 reference_video_total_duration 传递给 /video/quote,报价和队列金额可能不同。当存在参考视频时,始终传递 reference_video_total_duration(所有参考剪辑时长的总秒数)。