Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.venice.ai/llms.txt

Use this file to discover all available pages before exploring further.

The Venice Video Harness is a community, agent-first, Venice-optimized toolkit for consistency-first video creation at any length. It turns an IDE agent (Claude Code, Cursor, Codex, etc.) into an operator of a reusable Venice production system covering 50+ Venice video, image, audio, and music models.

GitHub: venice-video-harness

MIT licensed. Community-maintained.

Character-consistent video

Lock characters, voices, and aesthetics across an entire series

Storyboard-to-video

Two-pass panel generation with Venice multi-edit refinement

Text-first editing

Transcribe locally with whisper.cpp, cut from a 12KB pack, self-eval at every boundary

What this is

Most Venice integrations are thin wrappers around API calls. The Venice Video Harness is the higher-level layer that sits between your agent and the Venice API:
  • Orchestration rules in CLAUDE.md
  • Reusable playbooks in .claude/commands/ (19 workflow commands)
  • Specialized agents in .claude/agents/ (art-director, prompt-engineer, cut-qa, and more)
  • Venice production skills in .claude/skills/ (compatible with the Agent Skills format)
  • TypeScript execution layer in src/
  • Comprehensive model registry covering 50+ Venice video, image, audio, and music models
Built for creators producing:
  • Character-consistent video projects (any genre, any length)
  • Visual-style-locked series or campaigns
  • Storyboard-to-video workflows
  • Short-form and long-form narrative content
  • Branded cinematic sequences, trailers, and teasers
  • Recurring-character social series

Getting started

Requirements

Node.js 20+

Latest LTS recommended

ffmpeg + ffprobe

On your PATH

Venice API key

Optional, for the editing pipeline: install whisper-cpp for local transcription.
brew install whisper-cpp
mkdir -p ~/.cache/whisper.cpp
curl -L -o ~/.cache/whisper.cpp/ggml-base.en.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin

Setup

1

Clone the harness

git clone https://github.com/jordanurbs/venice-video-harness.git
cd venice-video-harness
2

Configure your API key

cp .env.example .env
# Add VENICE_API_KEY to .env
3

Install and build

npm install
npm run build
4

Open in your agent

Open the project in Cursor, Claude Code, or any IDE with agentic chat. The agent reads CLAUDE.md and the playbooks automatically.Try one of these first messages:
  • “Set up this Venice video harness for first use”
  • “Create a new character-consistent video series”
  • “Generate a 30-second branded video sequence”
  • “Build a multi-episode narrative with locked characters”
  • “Create a product launch trailer with consistent visual style”

What’s Venice-optimized about it

  • Image prompts tuned for Venice image models like seedream-v5-lite, nano-banana-pro, flux-2-pro/max, and more
  • Two-pass panel generation with Venice multi-edit refinement for character correction
  • Model-routing logic for action, atmosphere, and character-consistency tiers
  • Reference-aware video generation that uses elements, reference_image_urls, and scene_image_urls correctly per model
  • Environment-aware prompt adaptation for daytime vs night scene handling
  • Venice-native audio paths for TTS (Kokoro, Qwen3, ElevenLabs), SFX, and music
  • Cost estimation before generation via /video/quote and /audio/quote
  • Model-aware parameter building that auto-skips parameters the target model doesn’t support

Model routing defaults

The harness defaults are opinionated because consistency is the point. The current routing (April 2026): Seedance 2.0 R2V by default. Kling O3 R2V fallback for 3+ character scenes. Seedance 2.0 i2v for establishing shots.
RoleDefault modelWhen used
Character shots (1-2 characters)seedance-2-0-reference-to-videoDefault R2V with flat reference_image_urls, @Image tags, up to 15s, native stereo audio
Character shots (3+ characters)kling-o3-standard-reference-to-videoAuto-fallback with structured elements for multi-character identity
Establishing / mood / actionseedance-2-0-image-to-videoNo characters; epic cinematic quality, up to 15s
These are overridable per-project via series.json → videoDefaults. To target a non-Seedance family (e.g. accounts that lack Seedance access), set videoDefaults to kling-o3-standard-reference-to-video and veo3.1-fast-image-to-video.
Seedance face rule: Seedance 2.0 blocks face-bearing input images that weren’t produced by seedream-v5-lite or seedream-v5-lite-edit. The harness handles this automatically by routing character-bearing image work through Seedream and running a pre-flight gate before every Seedance call.

Supported Venice models

Video (April 2026)

Familyi2vt2vMax durationAudioNotes
Seedance 2.0i2v, R2Vt2v15sYes (stereo, lip-sync 8+ langs)#1 ranked. R2V: flat reference_image_urls, @Image tags.
Kling V3Pro, StandardPro, Standard15sYesend_image_url for frame targeting
Kling O3Pro, Std, Pro R2V, Std R2VPro, Standard15sYesR2V: elements, reference_image_urls, scene_image_urls
Kling 2.6 / 2.5 TurboProPro10s2.6: Yes / 2.5: Noend_image_url
Veo 3.1Fast, FullFast, Full8sYesUp to 4K resolution
Sora 2Standard, ProStandard, Pro12sYesUp to 1080p
Wan 2.6 / 2.5Std, Flash / YesStd / Yes15s / 10sYesaudio_url input
LTX Video 2.0Fast, Full, v2.3, 19BFast, Full, v2.3, 19B20sYesUp to 4K, longest synced
LongcatStd, DistilledStd, Distilled30sNoLongest single-shot
Vidu Q3YesYes16sYesreference_image_urls
PixVerse v5.6Std, TransitionStandard8sYesTransition: end_image_url
Grok ImagineYesYes15sYesWide aspect ratio support

Image, audio, and music

  • Image (22+ models): nano-banana-pro/2, gpt-image-2, flux-2-pro/max, grok-imagine, qwen-image-2-pro, recraft-v4-pro, seedream-v4 / v5-lite, lustify-sdxl/v7, wai-Illustrious, and more
  • Multi-edit: qwen-edit, flux-2-max-edit, nano-banana-pro-edit, seedream-v5-lite-edit, gpt-image-2-edit, and more
  • TTS: tts-kokoro (50+ voices), tts-qwen3-0-6b/1-7b, elevenlabs-tts-v3, elevenlabs-tts-multilingual-v2
  • Music: elevenlabs-music, minimax-music-v2, ace-step-15, stable-audio-25
  • SFX: elevenlabs-sound-effects-v2, mmaudio-v2-text-to-audio

Production pipelines

Generation pipeline

End-to-end narrative video (script → storyboard → video → audio → assembly):
npm run dev -- produce-episode -p output/my-series -e 1
Reference implementation in src/mini-drama/ covers:
  • Series / character / episode management
  • LLM-powered script workshopping
  • Two-pass storyboard generation (generate + multi-edit refine)
  • Vision-based panel QA
  • Video generation with frame chaining
  • Layered audio post-production
  • Subtitle burn-in and final assembly

Editing pipeline

Cut already-existing media (Venice-generated shots or real raw footage). Text-first: the LLM reads a compact takes_packed.md (~12KB per 40 min of audio) rather than frame-dumping video. The five steps:
1

Transcribe

Local whisper.cpp produces per-source *.words.json + takes_packed.md.
2

Read the pack

The LLM forms a cut strategy from text alone.
3

Confirm

Proposes the strategy and waits for “yes / revise / cancel”.
4

Render the EDL

JSON cut list → ffmpeg concat with 30ms audio fades. Archive-first, so originals are never overwritten.
5

Self-eval

The cut-qa agent runs 6 programmatic checks at every cut boundary; max 3 fix iterations.
The cut-qa checks catch aspect-ratio regressions, frame-hash jumps inside a word, VO truncation, lighting discontinuity, audio peaks above -6 dBFS, and caption overlap with in-frame text.
The editing pipeline is inspired by browser-use/video-use. Their core insight, “the LLM never watches the video, it reads it”, is what makes agent-driven editing work without drowning in frame-dump tokens.

Commands, agents, and skills

The harness exposes 19 workflow commands, 10 specialized agents, and 7 production skills. Highlights:
Workflow commandPurpose
new-seriesCreate a new series with locked aesthetics
add-character / lock-characterCharacter + voice locking
workshop-episodeCollaborative episode scripting
storyboard-episodeStoryboard one episode
produce-episodeFull pipeline in one command
generate-trailerFull trailer pipeline
edit-footageText-first editing pipeline for existing media
ingest-screenplayIngest a Fountain or PDF screenplay
Specialized agentRole
art-directorAesthetic, palette, lighting, composition decisions
prompt-engineerVenice image prompts, character consistency
storyboard-qaPanel QA for continuity and character checks
cut-qaPost-render quality gate (6 checks per cut, max 3 iterations)
overlay-designerBranded motion graphics, parallel sub-agents
trailer-curatorTrailer shot selection and anti-spoiler rules
Production skillPurpose
venice-apiVenice REST API usage and defaults
venice-video-model-routingR2V-first routing, decision trees
character-consistencyMulti-shot character consistency guidance
shot-compositionShot composition and camera guidance
screenplay-parsingScreenplay parsing workflows
video-editingText-first editing philosophy, EDL format, cut-qa loop

NLE round-trip

After rendering, export the assembled timeline as XML for fine-tuning in your editor of choice. Every video segment, dialogue clip, SFX clip, and music cue lands on its own track.
mini-drama export-timeline -p output/<project> -e 1 --format fcpxml      # Final Cut Pro X
mini-drama export-timeline -p output/<project> -e 1 --format premiere    # Premiere Pro
mini-drama export-timeline -p output/<project> -e 1 --format davinci     # DaVinci Resolve

Programmatic usage

You can also call into the harness’s modules directly from your own TypeScript:
import { VeniceClient } from './src/venice/client.js';
import { generateVideo, quoteVideo } from './src/venice/video.js';
import { listVideoModels } from './src/venice/models.js';

const client = new VeniceClient();

const quote = await quoteVideo(client, {
  model: 'kling-v3-pro-image-to-video',
  duration: '8s',
  audio: true,
});
console.log(`Estimated cost: $${quote.quote}`);

const result = await generateVideo(client, {
  model: 'kling-v3-pro-image-to-video',
  prompt: 'A slow dolly shot pushes forward...',
  duration: '8s',
  imageUrl: 'data:image/png;base64,...',
  audio: true,
  outputPath: 'output/shot-001.mp4',
});

const longModels = listVideoModels({ minDurationSec: 20 });

Resources

GitHub

Source code, issues, and releases

Venice Video Generation

The underlying API the harness drives

Reference-to-Video

R2V guide for character consistency

Seedance 2.0

The harness’s default video family
Community-maintained and provided as-is. For harness-specific issues, file them on the project’s GitHub repo.