Venice Video Harness

The Venice Video Harness is a community, agent-first, Venice-optimized toolkit for consistency-first video creation at any length. It turns an IDE agent (Claude Code, Cursor, Codex, etc.) into an operator of a reusable Venice production system covering 50+ Venice video, image, audio, and music models.

GitHub: venice-video-harness

MIT licensed. Community-maintained.

Character-consistent video

Lock characters, voices, and aesthetics across an entire series

Storyboard-to-video

Two-pass panel generation with Venice multi-edit refinement

Text-first editing

Transcribe locally with whisper.cpp, cut from a 12KB pack, self-eval at every boundary

What this is

Most Venice integrations are thin wrappers around API calls. The Venice Video Harness is the higher-level layer that sits between your agent and the Venice API:

Orchestration rules in CLAUDE.md
Reusable playbooks in .claude/commands/ (19 workflow commands)
Specialized agents in .claude/agents/ (art-director, prompt-engineer, cut-qa, and more)
Venice production skills in .claude/skills/ (compatible with the Agent Skills format)
TypeScript execution layer in src/
Comprehensive model registry covering 50+ Venice video, image, audio, and music models

Built for creators producing:

Character-consistent video projects (any genre, any length)
Visual-style-locked series or campaigns
Storyboard-to-video workflows
Short-form and long-form narrative content
Branded cinematic sequences, trailers, and teasers
Recurring-character social series

Getting started

Requirements

Node.js 20+

Latest LTS recommended

ffmpeg + ffprobe

On your PATH

Venice API key

From venice.ai/settings/api

Optional, for the editing pipeline: install whisper-cpp for local transcription.

brew install whisper-cpp
mkdir -p ~/.cache/whisper.cpp
curl -L -o ~/.cache/whisper.cpp/ggml-base.en.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin

Setup

Clone the harness

git clone https://github.com/jordanurbs/venice-video-harness.git
cd venice-video-harness

Configure your API key

cp .env.example .env
# Add VENICE_API_KEY to .env

Install and build

npm install
npm run build

Open in your agent

Open the project in Cursor, Claude Code, or any IDE with agentic chat. The agent reads CLAUDE.md and the playbooks automatically.Try one of these first messages:

“Set up this Venice video harness for first use”
“Create a new character-consistent video series”
“Generate a 30-second branded video sequence”
“Build a multi-episode narrative with locked characters”
“Create a product launch trailer with consistent visual style”

What’s Venice-optimized about it

Image prompts tuned for Venice image models like seedream-v5-lite, nano-banana-pro, flux-2-pro/max, and more
Two-pass panel generation with Venice multi-edit refinement for character correction
Model-routing logic for action, atmosphere, and character-consistency tiers
Reference-aware video generation that uses elements, reference_image_urls, and scene_image_urls correctly per model
Environment-aware prompt adaptation for daytime vs night scene handling
Venice-native audio paths for TTS (Kokoro, Qwen3, ElevenLabs), SFX, and music
Cost estimation before generation via /video/quote and /audio/quote
Model-aware parameter building that auto-skips parameters the target model doesn’t support

Model routing defaults

The harness defaults are opinionated because consistency is the point. The current routing (April 2026): Seedance 2.0 R2V by default. Kling O3 R2V fallback for 3+ character scenes. Seedance 2.0 i2v for establishing shots.

Role	Default model	When used
Character shots (1-2 characters)	`seedance-2-0-reference-to-video`	Default R2V with flat `reference_image_urls`, `@Image` tags, up to 15s, native stereo audio
Character shots (3+ characters)	`kling-o3-standard-reference-to-video`	Auto-fallback with structured `elements` for multi-character identity
Establishing / mood / action	`seedance-2-0-image-to-video`	No characters; epic cinematic quality, up to 15s

These are overridable per-project via series.json → videoDefaults. To target a non-Seedance family (e.g. accounts that lack Seedance access), set videoDefaults to kling-o3-standard-reference-to-video and veo3.1-fast-image-to-video.

Seedance face rule: Seedance 2.0 blocks face-bearing input images that weren’t produced by seedream-v5-lite or seedream-v5-lite-edit. The harness handles this automatically by routing character-bearing image work through Seedream and running a pre-flight gate before every Seedance call.

Supported Venice models

Video (April 2026)

Family	i2v	t2v	Max duration	Audio	Notes
Seedance 2.0	i2v, R2V	t2v	15s	Yes (stereo, lip-sync 8+ langs)	#1 ranked. R2V: flat `reference_image_urls`, `@Image` tags.
Kling V3	Pro, Standard	Pro, Standard	15s	Yes	`end_image_url` for frame targeting
Kling O3	Pro, Std, Pro R2V, Std R2V	Pro, Standard	15s	Yes	R2V: `elements`, `reference_image_urls`, `scene_image_urls`
Kling 2.6 / 2.5 Turbo	Pro	Pro	10s	2.6: Yes / 2.5: No	`end_image_url`
Veo 3.1	Fast, Full	Fast, Full	8s	Yes	Up to 4K resolution
Sora 2	Standard, Pro	Standard, Pro	12s	Yes	Up to 1080p
Wan 2.6 / 2.5	Std, Flash / Yes	Std / Yes	15s / 10s	Yes	`audio_url` input
LTX Video 2.0	Fast, Full, v2.3, 19B	Fast, Full, v2.3, 19B	20s	Yes	Up to 4K, longest synced
Longcat	Std, Distilled	Std, Distilled	30s	No	Longest single-shot
Vidu Q3	Yes	Yes	16s	Yes	`reference_image_urls`
PixVerse v5.6	Std, Transition	Standard	8s	Yes	Transition: `end_image_url`
Grok Imagine	Yes	Yes	15s	Yes	Wide aspect ratio support

Image, audio, and music

Image (22+ models): nano-banana-pro/2, gpt-image-2, flux-2-pro/max, grok-imagine, qwen-image-2-pro, recraft-v4-pro, seedream-v4 / v5-lite, lustify-sdxl/v7, wai-Illustrious, and more
Multi-edit: qwen-edit, flux-2-max-edit, nano-banana-pro-edit, seedream-v5-lite-edit, gpt-image-2-edit, and more
TTS: tts-kokoro (50+ voices), tts-qwen3-0-6b/1-7b, elevenlabs-tts-v3, elevenlabs-tts-multilingual-v2
Music: elevenlabs-music, minimax-music-v2, ace-step-15, stable-audio-25
SFX: elevenlabs-sound-effects-v2, mmaudio-v2-text-to-audio

Production pipelines

Generation pipeline

End-to-end narrative video (script → storyboard → video → audio → assembly):

npm run dev -- produce-episode -p output/my-series -e 1

Reference implementation in src/mini-drama/ covers:

Series / character / episode management
LLM-powered script workshopping
Two-pass storyboard generation (generate + multi-edit refine)
Vision-based panel QA
Video generation with frame chaining
Layered audio post-production
Subtitle burn-in and final assembly

Editing pipeline

Cut already-existing media (Venice-generated shots or real raw footage). Text-first: the LLM reads a compact takes_packed.md (~12KB per 40 min of audio) rather than frame-dumping video. The five steps:

Transcribe

Local whisper.cpp produces per-source *.words.json + takes_packed.md.

Read the pack

The LLM forms a cut strategy from text alone.

Confirm

Proposes the strategy and waits for “yes / revise / cancel”.

Render the EDL

JSON cut list → ffmpeg concat with 30ms audio fades. Archive-first, so originals are never overwritten.

Self-eval

The cut-qa agent runs 6 programmatic checks at every cut boundary; max 3 fix iterations.

The cut-qa checks catch aspect-ratio regressions, frame-hash jumps inside a word, VO truncation, lighting discontinuity, audio peaks above -6 dBFS, and caption overlap with in-frame text.

The editing pipeline is inspired by browser-use/video-use. Their core insight, “the LLM never watches the video, it reads it”, is what makes agent-driven editing work without drowning in frame-dump tokens.

Commands, agents, and skills

The harness exposes 19 workflow commands, 10 specialized agents, and 7 production skills. Highlights:

Workflow command	Purpose
`new-series`	Create a new series with locked aesthetics
`add-character` / `lock-character`	Character + voice locking
`workshop-episode`	Collaborative episode scripting
`storyboard-episode`	Storyboard one episode
`produce-episode`	Full pipeline in one command
`generate-trailer`	Full trailer pipeline
`edit-footage`	Text-first editing pipeline for existing media
`ingest-screenplay`	Ingest a Fountain or PDF screenplay

Specialized agent	Role
`art-director`	Aesthetic, palette, lighting, composition decisions
`prompt-engineer`	Venice image prompts, character consistency
`storyboard-qa`	Panel QA for continuity and character checks
`cut-qa`	Post-render quality gate (6 checks per cut, max 3 iterations)
`overlay-designer`	Branded motion graphics, parallel sub-agents
`trailer-curator`	Trailer shot selection and anti-spoiler rules

Production skill	Purpose
`venice-api`	Venice REST API usage and defaults
`venice-video-model-routing`	R2V-first routing, decision trees
`character-consistency`	Multi-shot character consistency guidance
`shot-composition`	Shot composition and camera guidance
`screenplay-parsing`	Screenplay parsing workflows
`video-editing`	Text-first editing philosophy, EDL format, cut-qa loop

NLE round-trip

After rendering, export the assembled timeline as XML for fine-tuning in your editor of choice. Every video segment, dialogue clip, SFX clip, and music cue lands on its own track.

mini-drama export-timeline -p output/<project> -e 1 --format fcpxml      # Final Cut Pro X
mini-drama export-timeline -p output/<project> -e 1 --format premiere    # Premiere Pro
mini-drama export-timeline -p output/<project> -e 1 --format davinci     # DaVinci Resolve

Programmatic usage

You can also call into the harness’s modules directly from your own TypeScript:

import { VeniceClient } from './src/venice/client.js';
import { generateVideo, quoteVideo } from './src/venice/video.js';
import { listVideoModels } from './src/venice/models.js';

const client = new VeniceClient();

const quote = await quoteVideo(client, {
  model: 'kling-v3-pro-image-to-video',
  duration: '8s',
  audio: true,
});
console.log(`Estimated cost: $${quote.quote}`);

const result = await generateVideo(client, {
  model: 'kling-v3-pro-image-to-video',
  prompt: 'A slow dolly shot pushes forward...',
  duration: '8s',
  imageUrl: 'data:image/png;base64,...',
  audio: true,
  outputPath: 'output/shot-001.mp4',
});

const longModels = listVideoModels({ minDurationSec: 20 });

Resources

GitHub

Source code, issues, and releases

Venice Video Generation

The underlying API the harness drives

Reference-to-Video

R2V guide for character consistency

Seedance 2.0

The harness’s default video family

Community-maintained and provided as-is. For harness-specific issues, file them on the project’s GitHub repo.

Docs

Getting Started

Text & Chat

Image, Video & Audio

Agents & Integrations

Coding Tools

Agent Tooling

SDKs & Frameworks

GitHub: venice-video-harness

Character-consistent video

Storyboard-to-video

Text-first editing

What this is

Getting started

Requirements

Node.js 20+

ffmpeg + ffprobe

Venice API key

Setup

What’s Venice-optimized about it

Model routing defaults

Supported Venice models

Video (April 2026)

Image, audio, and music

Production pipelines

Generation pipeline

Editing pipeline

Commands, agents, and skills

NLE round-trip

Programmatic usage

Resources

GitHub

Venice Video Generation

Reference-to-Video

Seedance 2.0

GitHub: venice-video-harness

Character-consistent video

Storyboard-to-video

Text-first editing

​What this is

​Getting started

​Requirements

Node.js 20+

ffmpeg + ffprobe

Venice API key

​Setup

​What’s Venice-optimized about it

​Model routing defaults

​Supported Venice models

​Video (April 2026)

​Image, audio, and music

​Production pipelines

​Generation pipeline

​Editing pipeline

​Commands, agents, and skills

​NLE round-trip

​Programmatic usage

​Resources

GitHub

Venice Video Generation

Reference-to-Video

Seedance 2.0

What this is

Getting started

Requirements

Setup

What’s Venice-optimized about it

Model routing defaults

Supported Venice models

Video (April 2026)

Image, audio, and music

Production pipelines

Generation pipeline

Editing pipeline

Commands, agents, and skills

NLE round-trip

Programmatic usage

Resources