Documentation Index
Fetch the complete documentation index at: https://docs.venice.ai/llms.txt
Use this file to discover all available pages before exploring further.
Seedance 2.0 is a flagship multimodal video model exposed on Venice as a family of three variants for text-, image-, and reference-driven video generation. The reference-to-video variant is unusually powerful: a single endpoint and a single model ID handle four distinct workflows (Reference, Edit, Extend, Stitch) — the workflow is inferred from the shape of your prompt.
This guide walks through the variants, the four workflows with their canonical prompts, the multimodal input limits, pricing, and complete curl examples.
Variants
| Model ID | Variant | Output resolutions | Notes |
|---|
seedance-2-0-text-to-video | T2V | 480p / 720p / 1080p | Text prompt only |
seedance-2-0-image-to-video | I2V | 480p / 720p / 1080p | First-frame (and optionally last-frame) image grounding |
seedance-2-0-reference-to-video | R2V | 480p / 720p / 1080p | Up to 9 reference images + 3 reference videos + 3 reference audio donors. Powers Reference / Edit / Extend / Stitch |
seedance-2-0-fast-text-to-video | Fast T2V | 480p / 720p | Faster, lower-fidelity tier |
seedance-2-0-fast-image-to-video | Fast I2V | 480p / 720p | Faster, lower-fidelity tier |
seedance-2-0-fast-reference-to-video | Fast R2V | 480p / 720p | Faster, lower-fidelity tier; same workflow set |
All variants are async. Submit via POST /api/v1/video/queue, then poll POST /api/v1/video/retrieve until the response body is video/mp4. See Video Generation for the general queue flow.
The “one model, four workflows” model
The reference-to-video variant (seedance-2-0-reference-to-video and its Fast sibling) is the same underlying model serving four different tasks. The model infers the task from the prompt prefix and the shape of your inputs. There is no task or workflow field — the prompt syntax is the routing.
| Workflow | What it does | Prompt prefix | Inputs |
|---|
| Reference | Generate a new video using uploaded reference files as donors for subject / motion / style / audio | Refer to ... in <Image|Video|Audio N> to generate ... | Text + ≥1 image OR video reference (0-9 images, 0-3 videos), plus optionally up to 3 audio donors |
| Edit | Modify a single input video while preserving the rest | Strictly edit <Video 1>, changing its ... | 1 input video + text (images optional grounding) |
| Extend | Forward / backward extension of one clip | Extend <Video 1>, generate ... | 1 input video + text |
| Stitch | Stitch 2-3 clips with auto-generated transitions | <Video 1> + <transition description> + followed by <Video 2> + ... | 2-3 input videos + text |
The prompt syntax is canonical and case-sensitive: angle brackets, capital first letter, single space before the number — <Video 1>, <Image 1>, <Audio 1>.
Workflow patterns
Reference workflow
Use the uploaded reference files as donors — subject, scene, motion, style, vocal timbre — to generate a brand-new video.
Canonical prompt patterns:
Refer to <Subject N> in <Image N> to generate ...
Refer to the [action | camera scene | style | sound effect] in <Video N> to generate ...
Refer to the [tone | timbre] in <Audio N> to generate ...
Examples:
Refer to <Subject 1> in <Image 1> to generate a 5-second clip of the same character riding a horse through snow.
Refer to the camera scene in <Video 1> to generate a similar establishing shot of a futuristic city at dawn.
Refer to <Subject 1> in <Image 1> and use the timbre in <Audio 1> for the narrator describing the scene. (audio donors must be paired with at least one image or video reference — audio alone is rejected)
Edit workflow
Modify a single input video. Anything not explicitly named in the prompt is preserved. Use this when you want a localized change (subject swap, weather/color change, element add/remove) rather than a wholly new video.
Canonical prompt pattern:
Strictly edit <Video 1>, changing its [original feature] to [new feature] ...
Sub-patterns for finer control:
Add Elements:
At [timestamp / timing] and [spatial location] of <Video 1>, add [description of intended element].
Remove Elements:
Remove [element to be deleted] from <Video 1>, keeping the rest of the video content unchanged.
Modify Elements:
Replace [description of element to be changed] in <Video 1> with [description of intended element].
Examples:
Strictly edit <Video 1>, changing its weather from sunny to a heavy rainstorm.
Add snacks such as fried chicken and pizza to the countertop in <Video 1>.
Remove the red car from <Video 1>, keeping the rest of the video content unchanged.
Replace the perfume featured in <Video 1> with the face cream from <Image 1>, with all original motions and camera work preserved.
The last example combines Edit with an image reference — perfectly legal, the model uses <Image 1> as a visual donor for the replacement.
Extend workflow
Continue a single clip forward or backward in time. By default Seedance returns only the new content — not the original input concatenated with the extension. This is by design, for transition continuity; if you want the input clip preserved alongside the extension, say so explicitly:
Extend <Video 1>, generate [description of extended content]
Extend <Video 1> backward, [description of extended content]
Extend <Video 1>, start with <Video 1>, then [description of extended content] ← preserves input at start
Extend <Video 1> backward, [description], and then end with <Video 1> ← preserves input at end
Transition handling: the model automatically extracts the transition frames for seamless blending, and the original segments of the input video are not re-generated.
Examples:
Extend <Video 1>, generate a dramatic chase scene through narrow alleys at dusk.
Extend <Video 1> backward, the same character walking toward the camera before the original shot begins.
Extend <Video 1>, start with <Video 1>, then the camera pulls back to reveal a vast landscape.
Stitch workflow (Track Completion)
Connect 2-3 input clips with AI-generated transitions. Total combined input duration must be ≤ 15 s.
Canonical prompt pattern:
<Video 1> + [transition description] + followed by <Video 2> [+ [transition description] + followed by <Video 3>]
Examples:
<Video 1> + a smooth seamless cut + followed by <Video 2>
<Video 1>. The moment a leaf falls to the ground, it sets off a special effect of golden particles. A gust of wind blows by, leading into <Video 2>.
<Video 1> + a wisp of smoke transforms into a flock of birds + followed by <Video 2> + a slow dolly-in + followed by <Video 3>
The model auto-trims connecting segments at the join points for continuity.
Across all four workflows, the recommended authoring formula is:
Subject + Motion + Environment (Optional)
+ Camera Movement / Cut (Optional)
+ Aesthetic Description (Optional)
+ Audio (Optional)
- Subject + Motion: the logical foundation — define “Who” is performing “What action”
- Environment + Aesthetics: spatial background, lighting, visual style
- Camera: explicit shot type or movement
- Audio: ambient sound effects or vocal direction for immersive output
Layering this on top of a workflow prefix (e.g., Strictly edit <Video 1>, changing its <subject + motion + environment + ...>) produces the highest-quality outputs.
Values below are what the Venice API accepts. Requests outside these ranges are rejected at the schema layer with a 400 before reaching inference.
Images
| Constraint | Value |
|---|
| Input methods | URL (http://, https://) or Base64 data URL (data:image/...) |
| Formats | .jpeg, .png, .webp, .bmp, .tiff, .gif, .heic, .heif |
| Aspect ratio (W / H) | exclusive (0.4, 2.5) |
| Minimum side | ≥ 300 px |
| Image count: I2V first-frame | 1 |
| Image count: I2V first + last frame | 2 |
| Image count: R2V (V2 / Fast) | 1 – 9 |
Videos
| Constraint | Value |
|---|
| Input methods | URL (http://, https://) or Base64 data URL (data:video/...) |
| Formats | .mp4, .mov |
| Video codecs | H.264 / AVC, H.265 / HEVC |
| Audio codecs (in container) | AAC, MP3 |
| Duration per clip | [2, 15] s (inclusive) |
| Max clip count | 3 (R2V / Stitch / Extend) |
| Total combined duration | ≤ 15 s across all clips |
| Per-clip size | ≤ 50 MB |
Audio
| Constraint | Value |
|---|
| Input methods | URL (http://, https://) or Base64 data URL (data:audio/...) |
| Formats | .wav, .mp3 |
| Duration per clip | [2, 15] s |
| Max clip count | 3 |
| Total combined duration | ≤ 15 s across all clips |
| Per-clip size | ≤ 15 MB |
Reference audio is supported on the R2V variants only. Each entry is forwarded to the model as a role: "reference_audio" content item that the prompt addresses as <Audio 1>, <Audio 2>, <Audio 3> — the model uses each clip for vocal timbre, sound effects, or background music depending on how the prompt frames it. The legacy singular audio_url field maps to the same content shape and is now equivalent to passing a one-element reference_audio_urls.
reference_audio_urls cannot be the only reference input. The model requires at least one image or video reference alongside any audio donor. Pair reference_audio_urls with reference_image_urls, reference_video_urls, image_url, or video_url — audio-only submissions are rejected.
Request size
The queue endpoint accepts JSON bodies up to 35 MB. Inline data URLs for large videos can push past this — for multi-clip Stitch in particular, prefer URLs over inline base64.
Pricing
Call POST /api/v1/video/quote to get a quote for a given request shape before submitting it to /video/queue. The quote endpoint is the only authoritative source; pricing details may change and shouldn’t be cached or duplicated client-side.
When reference video(s) are part of the request, also pass reference_video_total_duration (the sum of all reference clip durations in seconds) so the quote matches what /video/queue will charge:
curl -X POST https://api.venice.ai/api/v1/video/quote \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-reference-to-video",
"duration": "5s",
"resolution": "1080p",
"aspect_ratio": "16:9",
"reference_video_total_duration": 5
}'
Complete examples
All examples assume VENICE_API_KEY is set in the environment.
Text-to-video
curl -X POST https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-text-to-video",
"prompt": "A golden retriever frolicking through a sunlit meadow at sunset, slow camera dolly-in, shallow depth of field, warm cinematic lighting.",
"duration": "5s",
"aspect_ratio": "16:9",
"resolution": "1080p"
}'
Image-to-video (first frame)
curl -X POST https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-image-to-video",
"prompt": "The lighthouse keeper turns toward the storm, lantern raised, waves crashing against the rocks.",
"image_url": "https://example.com/lighthouse.jpg",
"duration": "5s",
"resolution": "720p"
}'
seedance-2-0-image-to-video (and its Fast variant) do not accept aspect_ratio — the output aspect ratio is auto-derived from the input image’s dimensions. Passing the field returns a 400 with “This model does not support aspect_ratio”. Use the T2V or R2V variants if you need explicit aspect-ratio control.
Reference workflow — subject donor
curl -X POST https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-reference-to-video",
"prompt": "Refer to <Subject 1> in <Image 1> to generate a 5-second clip of the same character walking through a neon-lit Tokyo street at night.",
"reference_image_urls": ["https://example.com/character.png"],
"duration": "5s",
"aspect_ratio": "9:16",
"resolution": "1080p"
}'
Reference workflow — subject + audio donor
curl -X POST https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-reference-to-video",
"prompt": "Refer to <Subject 1> in <Image 1> to generate a 5-second clip of the same character walking through a neon-lit Tokyo street at night. Refer to the timbre in <Audio 1> for a soft female voiceover describing the scene.",
"reference_image_urls": ["https://example.com/character.png"],
"reference_audio_urls": ["https://example.com/voice-sample.mp3"],
"duration": "5s",
"aspect_ratio": "9:16",
"resolution": "1080p"
}'
Edit workflow
curl -X POST https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-reference-to-video",
"prompt": "Strictly edit <Video 1>, changing its weather from sunny to a heavy rainstorm, with all original motions and camera work preserved.",
"reference_video_urls": ["https://example.com/sunny-scene.mp4"],
"reference_video_total_duration": 5,
"duration": "5s",
"aspect_ratio": "16:9",
"resolution": "1080p"
}'
Edit workflow with image grounding
curl -X POST https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-reference-to-video",
"prompt": "Replace the perfume featured in <Video 1> with the face cream from <Image 1>, with all original motions and camera work preserved.",
"reference_video_urls": ["https://example.com/perfume-ad.mp4"],
"reference_image_urls": ["https://example.com/face-cream.png"],
"reference_video_total_duration": 4,
"duration": "5s",
"aspect_ratio": "16:9",
"resolution": "1080p"
}'
Extend forward
curl -X POST https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-reference-to-video",
"prompt": "Extend <Video 1>, generate a dramatic chase scene through narrow alleys at dusk, with neon signs flickering and rain on the pavement.",
"reference_video_urls": ["https://example.com/alley-intro.mp4"],
"reference_video_total_duration": 4,
"duration": "5s",
"aspect_ratio": "16:9",
"resolution": "1080p"
}'
Stitch (3 clips)
curl -X POST https://api.venice.ai/api/v1/video/queue \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-reference-to-video",
"prompt": "<Video 1> + a wisp of smoke transforms into a flock of birds + followed by <Video 2> + a slow dolly-in + followed by <Video 3>",
"reference_video_urls": [
"https://example.com/clip-1.mp4",
"https://example.com/clip-2.mp4",
"https://example.com/clip-3.mp4"
],
"reference_video_total_duration": 12,
"duration": "5s",
"aspect_ratio": "16:9",
"resolution": "1080p"
}'
Polling for completion
After every queue submission, save the returned queue_id and poll /video/retrieve until the response body is video/mp4:
curl -X POST https://api.venice.ai/api/v1/video/retrieve \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "seedance-2-0-reference-to-video",
"queue_id": "123e4567-e89b-12d3-a456-426614174000"
}' \
-o output.mp4
The response is JSON ({ "status": "queued" | "running" | "failed", ... }) until the job completes, at which point the response body switches to video/mp4 bytes. See Video Generation for the full polling pattern.
Troubleshooting
At least one reference is required for this model
Reference-to-video submissions must include at least one of reference_image_urls, reference_video_urls, image_references, or video_references. Pure text-only generation isn’t a valid R2V workflow — use seedance-2-0-text-to-video instead. reference_audio_urls alone is not sufficient (see the Audio section above).
reference_video_urls must have at most 3 videos
The model caps reference videos at 3. If you need more clips, run a Stitch first (3 → 1), then use the output as a reference for a follow-up.
Per clip must be 2–15s / aggregate > 15s
Per-clip duration is [2, 15] seconds inclusive; the sum across all reference videos is also capped at 15 seconds. Trim clips client-side before submission.
Prompt routes to the wrong workflow
Workflow is inferred from prompt syntax. Common misroutings:
- Wanting to Extend but writing
Refer to ... → model treats your video as a donor, not a canvas to continue
- Wanting to Stitch but writing
Refer to ... → model picks one as the donor, ignores the others
- Wanting to Edit but writing
Generate a video based on <Video 1> → ambiguous; model may default to Reference
Use the canonical prefixes exactly as written: Strictly edit <Video 1>, ..., Extend <Video 1>, ..., <Video 1> + ... + followed by <Video 2>.
Quote doesn’t match the queued amount
If you included a reference video but didn’t pass reference_video_total_duration to /video/quote, the quote and the queued amount may differ. Always pass reference_video_total_duration (sum of all reference clip durations, in seconds) when reference videos are present.
References