Seedance 2.0 Guide | Venice API Docs

Seedance 2.0 is a flagship multimodal video model exposed on Venice as a family of three variants for text-, image-, and reference-driven video generation. The reference-to-video variant is unusually powerful: a single endpoint and a single model ID handle four distinct workflows (Reference, Edit, Extend, Stitch) — the workflow is inferred from the shape of your prompt. This guide walks through the variants, the four workflows with their canonical prompts, the multimodal input limits, pricing, and complete curl examples.

Variants

Model ID	Variant	Output resolutions	Notes
`seedance-2-0-text-to-video`	T2V	480p / 720p / 1080p	Text prompt only
`seedance-2-0-image-to-video`	I2V	480p / 720p / 1080p	First-frame (and optionally last-frame) image grounding
`seedance-2-0-reference-to-video`	R2V	480p / 720p / 1080p	Up to 9 reference images + 3 reference videos + 3 reference audio donors. Powers Reference / Edit / Extend / Stitch
`seedance-2-0-fast-text-to-video`	Fast T2V	480p / 720p	Faster, lower-fidelity tier
`seedance-2-0-fast-image-to-video`	Fast I2V	480p / 720p	Faster, lower-fidelity tier
`seedance-2-0-fast-reference-to-video`	Fast R2V	480p / 720p	Faster, lower-fidelity tier; same workflow set

All variants are async. Submit via POST /api/v1/video/queue, then poll POST /api/v1/video/retrieve until the response body is video/mp4. See Video Generation for the general queue flow.

The “one model, four workflows” model

The reference-to-video variant (seedance-2-0-reference-to-video and its Fast sibling) is the same underlying model serving four different tasks. The model infers the task from the prompt prefix and the shape of your inputs. There is no task or workflow field — the prompt syntax is the routing.

Workflow	What it does	Prompt prefix	Inputs
Reference	Generate a new video using uploaded reference files as donors for subject / motion / style / audio	`Refer to ... in <Image\|Video\|Audio N> to generate ...`	Text + ≥1 image OR video reference (0-9 images, 0-3 videos), plus optionally up to 3 audio donors
Edit	Modify a single input video while preserving the rest	`Strictly edit <Video 1>, changing its ...`	1 input video + text (images optional grounding)
Extend	Forward / backward extension of one clip	`Extend <Video 1>, generate ...`	1 input video + text
Stitch	Stitch 2-3 clips with auto-generated transitions	`<Video 1> + <transition description> + followed by <Video 2> + ...`	2-3 input videos + text

The prompt syntax is canonical and case-sensitive: angle brackets, capital first letter, single space before the number — <Video 1>, <Image 1>, <Audio 1>.

Workflow patterns

Reference workflow

Use the uploaded reference files as donors — subject, scene, motion, style, vocal timbre — to generate a brand-new video. Canonical prompt patterns:

Refer to <Subject N> in <Image N> to generate ...
Refer to the [action | camera scene | style | sound effect] in <Video N> to generate ...
Refer to the [tone | timbre] in <Audio N> to generate ...

Examples:

Refer to <Subject 1> in <Image 1> to generate a 5-second clip of the same character riding a horse through snow.
Refer to the camera scene in <Video 1> to generate a similar establishing shot of a futuristic city at dawn.
Refer to <Subject 1> in <Image 1> and use the timbre in <Audio 1> for the narrator describing the scene. (audio donors must be paired with at least one image or video reference — audio alone is rejected)

Edit workflow

Modify a single input video. Anything not explicitly named in the prompt is preserved. Use this when you want a localized change (subject swap, weather/color change, element add/remove) rather than a wholly new video. Canonical prompt pattern:

Strictly edit <Video 1>, changing its [original feature] to [new feature] ...

Sub-patterns for finer control:

Add Elements:
  At [timestamp / timing] and [spatial location] of <Video 1>, add [description of intended element].

Remove Elements:
  Remove [element to be deleted] from <Video 1>, keeping the rest of the video content unchanged.

Modify Elements:
  Replace [description of element to be changed] in <Video 1> with [description of intended element].

Examples:

Strictly edit <Video 1>, changing its weather from sunny to a heavy rainstorm.
Add snacks such as fried chicken and pizza to the countertop in <Video 1>.
Remove the red car from <Video 1>, keeping the rest of the video content unchanged.
Replace the perfume featured in <Video 1> with the face cream from <Image 1>, with all original motions and camera work preserved.

The last example combines Edit with an image reference — perfectly legal, the model uses <Image 1> as a visual donor for the replacement.

Extend workflow

Continue a single clip forward or backward in time. By default Seedance returns only the new content — not the original input concatenated with the extension. This is by design, for transition continuity; if you want the input clip preserved alongside the extension, say so explicitly:

Extend <Video 1>, generate [description of extended content]
Extend <Video 1> backward, [description of extended content]
Extend <Video 1>, start with <Video 1>, then [description of extended content]      ← preserves input at start
Extend <Video 1> backward, [description], and then end with <Video 1>               ← preserves input at end

Transition handling: the model automatically extracts the transition frames for seamless blending, and the original segments of the input video are not re-generated. Examples:

Extend <Video 1>, generate a dramatic chase scene through narrow alleys at dusk.
Extend <Video 1> backward, the same character walking toward the camera before the original shot begins.
Extend <Video 1>, start with <Video 1>, then the camera pulls back to reveal a vast landscape.

Stitch workflow (Track Completion)

Connect 2-3 input clips with AI-generated transitions. Total combined input duration must be ≤ 15 s. Canonical prompt pattern:

<Video 1> + [transition description] + followed by <Video 2> [+ [transition description] + followed by <Video 3>]

Examples:

<Video 1> + a smooth seamless cut + followed by <Video 2>
<Video 1>. The moment a leaf falls to the ground, it sets off a special effect of golden particles. A gust of wind blows by, leading into <Video 2>.
<Video 1> + a wisp of smoke transforms into a flock of birds + followed by <Video 2> + a slow dolly-in + followed by <Video 3>

The model auto-trims connecting segments at the join points for continuity.

Universal prompt formula

Across all four workflows, the recommended authoring formula is:

Subject + Motion + Environment (Optional)
       + Camera Movement / Cut (Optional)
       + Aesthetic Description (Optional)
       + Audio (Optional)

Subject + Motion: the logical foundation — define “Who” is performing “What action”
Environment + Aesthetics: spatial background, lighting, visual style
Camera: explicit shot type or movement
Audio: ambient sound effects or vocal direction for immersive output

Layering this on top of a workflow prefix (e.g., Strictly edit <Video 1>, changing its <subject + motion + environment + ...>) produces the highest-quality outputs.

Multimodal input limits

Values below are what the Venice API accepts. Requests outside these ranges are rejected at the schema layer with a 400 before reaching inference.

Images

Constraint	Value
Input methods	URL (`http://`, `https://`) or Base64 data URL (`data:image/...`)
Formats	`.jpeg`, `.png`, `.webp`, `.bmp`, `.tiff`, `.gif`, `.heic`, `.heif`
Aspect ratio (W / H)	exclusive `(0.4, 2.5)`
Minimum side	≥ 300 px
Image count: I2V first-frame	1
Image count: I2V first + last frame	2
Image count: R2V (V2 / Fast)	1 – 9

Videos

Constraint	Value
Input methods	URL (`http://`, `https://`) or Base64 data URL (`data:video/...`)
Formats	`.mp4`, `.mov`
Video codecs	H.264 / AVC, H.265 / HEVC
Audio codecs (in container)	AAC, MP3
Duration per clip	`[2, 15]` s (inclusive)
Max clip count	3 (R2V / Stitch / Extend)
Total combined duration	≤ 15 s across all clips
Per-clip size	≤ 50 MB

Audio

Constraint	Value
Input methods	URL (`http://`, `https://`) or Base64 data URL (`data:audio/...`)
Formats	`.wav`, `.mp3`
Duration per clip	`[2, 15]` s
Max clip count	3
Total combined duration	≤ 15 s across all clips
Per-clip size	≤ 15 MB

Reference audio is supported on the R2V variants only. Each entry is forwarded to the model as a role: "reference_audio" content item that the prompt addresses as <Audio 1>, <Audio 2>, <Audio 3> — the model uses each clip for vocal timbre, sound effects, or background music depending on how the prompt frames it. The legacy singular audio_url field maps to the same content shape and is now equivalent to passing a one-element reference_audio_urls.

reference_audio_urls cannot be the only reference input. The model requires at least one image or video reference alongside any audio donor. Pair reference_audio_urls with reference_image_urls, reference_video_urls, image_url, or video_url — audio-only submissions are rejected.

Request size

The queue endpoint accepts JSON bodies up to 35 MB. Inline data URLs for large videos can push past this — for multi-clip Stitch in particular, prefer URLs over inline base64.

Pricing

Call POST /api/v1/video/quote to get a quote for a given request shape before submitting it to /video/queue. The quote endpoint is the only authoritative source; pricing details may change and shouldn’t be cached or duplicated client-side. When reference video(s) are part of the request, also pass reference_video_total_duration (the sum of all reference clip durations in seconds) so the quote matches what /video/queue will charge:

curl -X POST https://api.venice.ai/api/v1/video/quote \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2-0-reference-to-video",
    "duration": "5s",
    "resolution": "1080p",
    "aspect_ratio": "16:9",
    "reference_video_total_duration": 5
  }'

Complete examples

All examples assume VENICE_API_KEY is set in the environment.

Text-to-video

curl -X POST https://api.venice.ai/api/v1/video/queue \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2-0-text-to-video",
    "prompt": "A golden retriever frolicking through a sunlit meadow at sunset, slow camera dolly-in, shallow depth of field, warm cinematic lighting.",
    "duration": "5s",
    "aspect_ratio": "16:9",
    "resolution": "1080p"
  }'

Image-to-video (first frame)

curl -X POST https://api.venice.ai/api/v1/video/queue \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2-0-image-to-video",
    "prompt": "The lighthouse keeper turns toward the storm, lantern raised, waves crashing against the rocks.",
    "image_url": "https://example.com/lighthouse.jpg",
    "duration": "5s",
    "resolution": "720p"
  }'

seedance-2-0-image-to-video (and its Fast variant) do not accept aspect_ratio — the output aspect ratio is auto-derived from the input image’s dimensions. Passing the field returns a 400 with “This model does not support aspect_ratio”. Use the T2V or R2V variants if you need explicit aspect-ratio control.

Reference workflow — subject donor

curl -X POST https://api.venice.ai/api/v1/video/queue \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2-0-reference-to-video",
    "prompt": "Refer to <Subject 1> in <Image 1> to generate a 5-second clip of the same character walking through a neon-lit Tokyo street at night.",
    "reference_image_urls": ["https://example.com/character.png"],
    "duration": "5s",
    "aspect_ratio": "9:16",
    "resolution": "1080p"
  }'

Reference workflow — subject + audio donor

curl -X POST https://api.venice.ai/api/v1/video/queue \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2-0-reference-to-video",
    "prompt": "Refer to <Subject 1> in <Image 1> to generate a 5-second clip of the same character walking through a neon-lit Tokyo street at night. Refer to the timbre in <Audio 1> for a soft female voiceover describing the scene.",
    "reference_image_urls": ["https://example.com/character.png"],
    "reference_audio_urls": ["https://example.com/voice-sample.mp3"],
    "duration": "5s",
    "aspect_ratio": "9:16",
    "resolution": "1080p"
  }'

Edit workflow

curl -X POST https://api.venice.ai/api/v1/video/queue \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2-0-reference-to-video",
    "prompt": "Strictly edit <Video 1>, changing its weather from sunny to a heavy rainstorm, with all original motions and camera work preserved.",
    "reference_video_urls": ["https://example.com/sunny-scene.mp4"],
    "reference_video_total_duration": 5,
    "duration": "5s",
    "aspect_ratio": "16:9",
    "resolution": "1080p"
  }'

Edit workflow with image grounding

curl -X POST https://api.venice.ai/api/v1/video/queue \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2-0-reference-to-video",
    "prompt": "Replace the perfume featured in <Video 1> with the face cream from <Image 1>, with all original motions and camera work preserved.",
    "reference_video_urls": ["https://example.com/perfume-ad.mp4"],
    "reference_image_urls": ["https://example.com/face-cream.png"],
    "reference_video_total_duration": 4,
    "duration": "5s",
    "aspect_ratio": "16:9",
    "resolution": "1080p"
  }'

Extend forward

curl -X POST https://api.venice.ai/api/v1/video/queue \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2-0-reference-to-video",
    "prompt": "Extend <Video 1>, generate a dramatic chase scene through narrow alleys at dusk, with neon signs flickering and rain on the pavement.",
    "reference_video_urls": ["https://example.com/alley-intro.mp4"],
    "reference_video_total_duration": 4,
    "duration": "5s",
    "aspect_ratio": "16:9",
    "resolution": "1080p"
  }'

Stitch (3 clips)

curl -X POST https://api.venice.ai/api/v1/video/queue \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2-0-reference-to-video",
    "prompt": "<Video 1> + a wisp of smoke transforms into a flock of birds + followed by <Video 2> + a slow dolly-in + followed by <Video 3>",
    "reference_video_urls": [
      "https://example.com/clip-1.mp4",
      "https://example.com/clip-2.mp4",
      "https://example.com/clip-3.mp4"
    ],
    "reference_video_total_duration": 12,
    "duration": "5s",
    "aspect_ratio": "16:9",
    "resolution": "1080p"
  }'

Polling for completion

After every queue submission, save the returned queue_id and poll /video/retrieve until the response body is video/mp4:

curl -X POST https://api.venice.ai/api/v1/video/retrieve \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "seedance-2-0-reference-to-video",
    "queue_id": "123e4567-e89b-12d3-a456-426614174000"
  }' \
  -o output.mp4

The response is JSON ({ "status": "queued" | "running" | "failed", ... }) until the job completes, at which point the response body switches to video/mp4 bytes. See Video Generation for the full polling pattern.

Troubleshooting

`At least one reference is required for this model`

Reference-to-video submissions must include at least one of reference_image_urls, reference_video_urls, image_references, or video_references. Pure text-only generation isn’t a valid R2V workflow — use seedance-2-0-text-to-video instead. reference_audio_urls alone is not sufficient (see the Audio section above).

`reference_video_urls must have at most 3 videos`

The model caps reference videos at 3. If you need more clips, run a Stitch first (3 → 1), then use the output as a reference for a follow-up.

`Per clip must be 2–15s` / aggregate `> 15s`

Per-clip duration is [2, 15] seconds inclusive; the sum across all reference videos is also capped at 15 seconds. Trim clips client-side before submission.

Prompt routes to the wrong workflow

Workflow is inferred from prompt syntax. Common misroutings:

Wanting to Extend but writing Refer to ... → model treats your video as a donor, not a canvas to continue
Wanting to Stitch but writing Refer to ... → model picks one as the donor, ignores the others
Wanting to Edit but writing Generate a video based on <Video 1> → ambiguous; model may default to Reference

Use the canonical prefixes exactly as written: Strictly edit <Video 1>, ..., Extend <Video 1>, ..., <Video 1> + ... + followed by <Video 2>.

Quote doesn’t match the queued amount

If you included a reference video but didn’t pass reference_video_total_duration to /video/quote, the quote and the queued amount may differ. Always pass reference_video_total_duration (sum of all reference clip durations, in seconds) when reference videos are present.

References

Venice video queue endpoint: POST /api/v1/video/queue
Venice quote endpoint: POST /api/v1/video/quote
Companion guide: Reference to Video (covers Kling O3 + Grok Imagine R2V)
Companion guide: Video Generation (queue / polling overview)

​Variants

​The “one model, four workflows” model

​Workflow patterns

​Reference workflow

​Edit workflow

​Extend workflow

​Stitch workflow (Track Completion)

​Universal prompt formula

​Multimodal input limits

​Images

​Videos

​Audio

​Request size

​Pricing

​Complete examples

​Text-to-video

​Image-to-video (first frame)

​Reference workflow — subject donor

​Reference workflow — subject + audio donor

​Edit workflow

​Edit workflow with image grounding

​Extend forward

​Stitch (3 clips)

​Polling for completion

​Troubleshooting

​At least one reference is required for this model

​reference_video_urls must have at most 3 videos

​Per clip must be 2–15s / aggregate > 15s

​Prompt routes to the wrong workflow

​Quote doesn’t match the queued amount

​References

Variants

The “one model, four workflows” model

Workflow patterns

Reference workflow

Edit workflow

Extend workflow

Stitch workflow (Track Completion)

Universal prompt formula

Multimodal input limits

Images

Videos

Audio

Request size

Pricing

Complete examples

Text-to-video

Image-to-video (first frame)

Reference workflow — subject donor

Reference workflow — subject + audio donor

Edit workflow

Edit workflow with image grounding

Extend forward

Stitch (3 clips)

Polling for completion

Troubleshooting

`At least one reference is required for this model`

`reference_video_urls must have at most 3 videos`

`Per clip must be 2–15s` / aggregate `> 15s`

Prompt routes to the wrong workflow

Quote doesn’t match the queued amount

References