The Venice Video Harness is a community, agent-first, Venice-optimized toolkit for consistency-first video creation at any length. It turns an IDE agent (Claude Code, Cursor, Codex, etc.) into an operator of a reusable Venice production system covering 50+ Venice video, image, audio, and music models.Documentation Index
Fetch the complete documentation index at: https://docs.venice.ai/llms.txt
Use this file to discover all available pages before exploring further.
GitHub: venice-video-harness
MIT licensed. Community-maintained.
Character-consistent video
Lock characters, voices, and aesthetics across an entire series
Storyboard-to-video
Two-pass panel generation with Venice multi-edit refinement
Text-first editing
Transcribe locally with whisper.cpp, cut from a 12KB pack, self-eval at every boundary
What this is
Most Venice integrations are thin wrappers around API calls. The Venice Video Harness is the higher-level layer that sits between your agent and the Venice API:- Orchestration rules in
CLAUDE.md - Reusable playbooks in
.claude/commands/(19 workflow commands) - Specialized agents in
.claude/agents/(art-director, prompt-engineer, cut-qa, and more) - Venice production skills in
.claude/skills/(compatible with the Agent Skills format) - TypeScript execution layer in
src/ - Comprehensive model registry covering 50+ Venice video, image, audio, and music models
- Character-consistent video projects (any genre, any length)
- Visual-style-locked series or campaigns
- Storyboard-to-video workflows
- Short-form and long-form narrative content
- Branded cinematic sequences, trailers, and teasers
- Recurring-character social series
Getting started
Requirements
Node.js 20+
Latest LTS recommended
ffmpeg + ffprobe
On your PATH
Venice API key
whisper-cpp for local transcription.
Setup
Open in your agent
Open the project in Cursor, Claude Code, or any IDE with agentic chat. The agent reads
CLAUDE.md and the playbooks automatically.Try one of these first messages:- “Set up this Venice video harness for first use”
- “Create a new character-consistent video series”
- “Generate a 30-second branded video sequence”
- “Build a multi-episode narrative with locked characters”
- “Create a product launch trailer with consistent visual style”
What’s Venice-optimized about it
- Image prompts tuned for Venice image models like
seedream-v5-lite,nano-banana-pro,flux-2-pro/max, and more - Two-pass panel generation with Venice multi-edit refinement for character correction
- Model-routing logic for action, atmosphere, and character-consistency tiers
- Reference-aware video generation that uses
elements,reference_image_urls, andscene_image_urlscorrectly per model - Environment-aware prompt adaptation for daytime vs night scene handling
- Venice-native audio paths for TTS (Kokoro, Qwen3, ElevenLabs), SFX, and music
- Cost estimation before generation via
/video/quoteand/audio/quote - Model-aware parameter building that auto-skips parameters the target model doesn’t support
Model routing defaults
The harness defaults are opinionated because consistency is the point. The current routing (April 2026): Seedance 2.0 R2V by default. Kling O3 R2V fallback for 3+ character scenes. Seedance 2.0 i2v for establishing shots.| Role | Default model | When used |
|---|---|---|
| Character shots (1-2 characters) | seedance-2-0-reference-to-video | Default R2V with flat reference_image_urls, @Image tags, up to 15s, native stereo audio |
| Character shots (3+ characters) | kling-o3-standard-reference-to-video | Auto-fallback with structured elements for multi-character identity |
| Establishing / mood / action | seedance-2-0-image-to-video | No characters; epic cinematic quality, up to 15s |
series.json → videoDefaults. To target a non-Seedance family (e.g. accounts that lack Seedance access), set videoDefaults to kling-o3-standard-reference-to-video and veo3.1-fast-image-to-video.
Seedance face rule: Seedance 2.0 blocks face-bearing input images that weren’t produced by
seedream-v5-lite or seedream-v5-lite-edit. The harness handles this automatically by routing character-bearing image work through Seedream and running a pre-flight gate before every Seedance call.Supported Venice models
Video (April 2026)
| Family | i2v | t2v | Max duration | Audio | Notes |
|---|---|---|---|---|---|
| Seedance 2.0 | i2v, R2V | t2v | 15s | Yes (stereo, lip-sync 8+ langs) | #1 ranked. R2V: flat reference_image_urls, @Image tags. |
| Kling V3 | Pro, Standard | Pro, Standard | 15s | Yes | end_image_url for frame targeting |
| Kling O3 | Pro, Std, Pro R2V, Std R2V | Pro, Standard | 15s | Yes | R2V: elements, reference_image_urls, scene_image_urls |
| Kling 2.6 / 2.5 Turbo | Pro | Pro | 10s | 2.6: Yes / 2.5: No | end_image_url |
| Veo 3.1 | Fast, Full | Fast, Full | 8s | Yes | Up to 4K resolution |
| Sora 2 | Standard, Pro | Standard, Pro | 12s | Yes | Up to 1080p |
| Wan 2.6 / 2.5 | Std, Flash / Yes | Std / Yes | 15s / 10s | Yes | audio_url input |
| LTX Video 2.0 | Fast, Full, v2.3, 19B | Fast, Full, v2.3, 19B | 20s | Yes | Up to 4K, longest synced |
| Longcat | Std, Distilled | Std, Distilled | 30s | No | Longest single-shot |
| Vidu Q3 | Yes | Yes | 16s | Yes | reference_image_urls |
| PixVerse v5.6 | Std, Transition | Standard | 8s | Yes | Transition: end_image_url |
| Grok Imagine | Yes | Yes | 15s | Yes | Wide aspect ratio support |
Image, audio, and music
- Image (22+ models):
nano-banana-pro/2,gpt-image-2,flux-2-pro/max,grok-imagine,qwen-image-2-pro,recraft-v4-pro,seedream-v4/v5-lite,lustify-sdxl/v7,wai-Illustrious, and more - Multi-edit:
qwen-edit,flux-2-max-edit,nano-banana-pro-edit,seedream-v5-lite-edit,gpt-image-2-edit, and more - TTS:
tts-kokoro(50+ voices),tts-qwen3-0-6b/1-7b,elevenlabs-tts-v3,elevenlabs-tts-multilingual-v2 - Music:
elevenlabs-music,minimax-music-v2,ace-step-15,stable-audio-25 - SFX:
elevenlabs-sound-effects-v2,mmaudio-v2-text-to-audio
Production pipelines
Generation pipeline
End-to-end narrative video (script → storyboard → video → audio → assembly):src/mini-drama/ covers:
- Series / character / episode management
- LLM-powered script workshopping
- Two-pass storyboard generation (generate + multi-edit refine)
- Vision-based panel QA
- Video generation with frame chaining
- Layered audio post-production
- Subtitle burn-in and final assembly
Editing pipeline
Cut already-existing media (Venice-generated shots or real raw footage). Text-first: the LLM reads a compacttakes_packed.md (~12KB per 40 min of audio) rather than frame-dumping video.
The five steps:
Render the EDL
JSON cut list → ffmpeg concat with 30ms audio fades. Archive-first, so originals are never overwritten.
cut-qa checks catch aspect-ratio regressions, frame-hash jumps inside a word, VO truncation, lighting discontinuity, audio peaks above -6 dBFS, and caption overlap with in-frame text.
Commands, agents, and skills
The harness exposes 19 workflow commands, 10 specialized agents, and 7 production skills. Highlights:| Workflow command | Purpose |
|---|---|
new-series | Create a new series with locked aesthetics |
add-character / lock-character | Character + voice locking |
workshop-episode | Collaborative episode scripting |
storyboard-episode | Storyboard one episode |
produce-episode | Full pipeline in one command |
generate-trailer | Full trailer pipeline |
edit-footage | Text-first editing pipeline for existing media |
ingest-screenplay | Ingest a Fountain or PDF screenplay |
| Specialized agent | Role |
|---|---|
art-director | Aesthetic, palette, lighting, composition decisions |
prompt-engineer | Venice image prompts, character consistency |
storyboard-qa | Panel QA for continuity and character checks |
cut-qa | Post-render quality gate (6 checks per cut, max 3 iterations) |
overlay-designer | Branded motion graphics, parallel sub-agents |
trailer-curator | Trailer shot selection and anti-spoiler rules |
| Production skill | Purpose |
|---|---|
venice-api | Venice REST API usage and defaults |
venice-video-model-routing | R2V-first routing, decision trees |
character-consistency | Multi-shot character consistency guidance |
shot-composition | Shot composition and camera guidance |
screenplay-parsing | Screenplay parsing workflows |
video-editing | Text-first editing philosophy, EDL format, cut-qa loop |
NLE round-trip
After rendering, export the assembled timeline as XML for fine-tuning in your editor of choice. Every video segment, dialogue clip, SFX clip, and music cue lands on its own track.Programmatic usage
You can also call into the harness’s modules directly from your own TypeScript:Resources
GitHub
Source code, issues, and releases
Venice Video Generation
The underlying API the harness drives
Reference-to-Video
R2V guide for character consistency
Seedance 2.0
The harness’s default video family
Community-maintained and provided as-is. For harness-specific issues, file them on the project’s GitHub repo.