> ## Documentation Index
> Fetch the complete documentation index at: https://docs.venice.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Venice Video Harness

> Drive Venice from Claude Code or Cursor with a Venice-optimized harness for character-consistent videos, storyboards, trailers, and long-form narrative.

The [Venice Video Harness](https://github.com/jordanurbs/venice-video-harness) is a community, agent-first, Venice-optimized toolkit for **consistency-first video creation at any length**. It turns an IDE agent (Claude Code, Cursor, Codex, etc.) into an operator of a reusable Venice production system covering 50+ Venice video, image, audio, and music models.

<Card title="GitHub: venice-video-harness" icon="github" href="https://github.com/jordanurbs/venice-video-harness">
  MIT licensed. Community-maintained.
</Card>

<CardGroup cols={3}>
  <Card title="Character-consistent video" icon="users">
    Lock characters, voices, and aesthetics across an entire series
  </Card>

  <Card title="Storyboard-to-video" icon="film">
    Two-pass panel generation with Venice multi-edit refinement
  </Card>

  <Card title="Text-first editing" icon="scissors">
    Transcribe locally with whisper.cpp, cut from a 12KB pack, self-eval at every boundary
  </Card>
</CardGroup>

## What this is

Most Venice integrations are thin wrappers around API calls. The Venice Video Harness is the **higher-level layer** that sits between your agent and the Venice API:

* **Orchestration rules** in `CLAUDE.md`
* **Reusable playbooks** in `.claude/commands/` (19 workflow commands)
* **Specialized agents** in `.claude/agents/` (art-director, prompt-engineer, cut-qa, and more)
* **Venice production skills** in `.claude/skills/` (compatible with the [Agent Skills](/guides/integrations/venice-skills) format)
* **TypeScript execution layer** in `src/`
* **Comprehensive model registry** covering 50+ Venice video, image, audio, and music models

Built for creators producing:

* Character-consistent video projects (any genre, any length)
* Visual-style-locked series or campaigns
* Storyboard-to-video workflows
* Short-form and long-form narrative content
* Branded cinematic sequences, trailers, and teasers
* Recurring-character social series

## Getting started

### Requirements

<CardGroup cols={3}>
  <Card title="Node.js 20+" icon="node-js" href="https://nodejs.org/">
    Latest LTS recommended
  </Card>

  <Card title="ffmpeg + ffprobe" icon="terminal" href="https://ffmpeg.org/">
    On your PATH
  </Card>

  <Card title="Venice API key" icon="key" href="/guides/getting-started/generating-api-key">
    From [venice.ai/settings/api](https://venice.ai/settings/api)
  </Card>
</CardGroup>

Optional, for the editing pipeline: install `whisper-cpp` for local transcription.

```bash theme={"system"}
brew install whisper-cpp
mkdir -p ~/.cache/whisper.cpp
curl -L -o ~/.cache/whisper.cpp/ggml-base.en.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
```

### Setup

<Steps>
  <Step title="Clone the harness">
    ```bash theme={"system"}
    git clone https://github.com/jordanurbs/venice-video-harness.git
    cd venice-video-harness
    ```
  </Step>

  <Step title="Configure your API key">
    ```bash theme={"system"}
    cp .env.example .env
    # Add VENICE_API_KEY to .env
    ```
  </Step>

  <Step title="Install and build">
    ```bash theme={"system"}
    npm install
    npm run build
    ```
  </Step>

  <Step title="Open in your agent">
    Open the project in Cursor, Claude Code, or any IDE with agentic chat. The agent reads `CLAUDE.md` and the playbooks automatically.

    Try one of these first messages:

    * "Set up this Venice video harness for first use"
    * "Create a new character-consistent video series"
    * "Generate a 30-second branded video sequence"
    * "Build a multi-episode narrative with locked characters"
    * "Create a product launch trailer with consistent visual style"
  </Step>
</Steps>

## What's Venice-optimized about it

* **Image prompts tuned for Venice image models** like `seedream-v5-lite`, `nano-banana-pro`, `flux-2-pro/max`, and more
* **Two-pass panel generation** with Venice multi-edit refinement for character correction
* **Model-routing logic** for action, atmosphere, and character-consistency tiers
* **Reference-aware video generation** that uses `elements`, `reference_image_urls`, and `scene_image_urls` correctly per model
* **Environment-aware prompt adaptation** for daytime vs night scene handling
* **Venice-native audio paths** for TTS (Kokoro, Qwen3, ElevenLabs), SFX, and music
* **Cost estimation** before generation via `/video/quote` and `/audio/quote`
* **Model-aware parameter building** that auto-skips parameters the target model doesn't support

## Model routing defaults

The harness defaults are opinionated because consistency is the point. The current routing (April 2026):

**Seedance 2.0 R2V by default. Kling O3 R2V fallback for 3+ character scenes. Seedance 2.0 i2v for establishing shots.**

| Role                             | Default model                          | When used                                                                                   |
| -------------------------------- | -------------------------------------- | ------------------------------------------------------------------------------------------- |
| Character shots (1-2 characters) | `seedance-2-0-reference-to-video`      | Default R2V with flat `reference_image_urls`, `@Image` tags, up to 15s, native stereo audio |
| Character shots (3+ characters)  | `kling-o3-standard-reference-to-video` | Auto-fallback with structured `elements` for multi-character identity                       |
| Establishing / mood / action     | `seedance-2-0-image-to-video`          | No characters; epic cinematic quality, up to 15s                                            |

These are overridable per-project via `series.json → videoDefaults`. To target a non-Seedance family (e.g. accounts that lack Seedance access), set `videoDefaults` to `kling-o3-standard-reference-to-video` and `veo3.1-fast-image-to-video`.

<Note>
  **Seedance face rule:** Seedance 2.0 blocks face-bearing input images that weren't produced by `seedream-v5-lite` or `seedream-v5-lite-edit`. The harness handles this automatically by routing character-bearing image work through Seedream and running a pre-flight gate before every Seedance call.
</Note>

## Supported Venice models

### Video (April 2026)

| Family                    | i2v                        | t2v                   | Max duration | Audio                           | Notes                                                       |
| ------------------------- | -------------------------- | --------------------- | ------------ | ------------------------------- | ----------------------------------------------------------- |
| **Seedance 2.0**          | i2v, R2V                   | t2v                   | 15s          | Yes (stereo, lip-sync 8+ langs) | #1 ranked. R2V: flat `reference_image_urls`, `@Image` tags. |
| **Kling V3**              | Pro, Standard              | Pro, Standard         | 15s          | Yes                             | `end_image_url` for frame targeting                         |
| **Kling O3**              | Pro, Std, Pro R2V, Std R2V | Pro, Standard         | 15s          | Yes                             | R2V: `elements`, `reference_image_urls`, `scene_image_urls` |
| **Kling 2.6 / 2.5 Turbo** | Pro                        | Pro                   | 10s          | 2.6: Yes / 2.5: No              | `end_image_url`                                             |
| **Veo 3.1**               | Fast, Full                 | Fast, Full            | 8s           | Yes                             | Up to 4K resolution                                         |
| **Sora 2**                | Standard, Pro              | Standard, Pro         | 12s          | Yes                             | Up to 1080p                                                 |
| **Wan 2.6 / 2.5**         | Std, Flash / Yes           | Std / Yes             | 15s / 10s    | Yes                             | `audio_url` input                                           |
| **LTX Video 2.0**         | Fast, Full, v2.3, 19B      | Fast, Full, v2.3, 19B | 20s          | Yes                             | Up to 4K, longest synced                                    |
| **Longcat**               | Std, Distilled             | Std, Distilled        | **30s**      | No                              | Longest single-shot                                         |
| **Vidu Q3**               | Yes                        | Yes                   | 16s          | Yes                             | `reference_image_urls`                                      |
| **PixVerse v5.6**         | Std, Transition            | Standard              | 8s           | Yes                             | Transition: `end_image_url`                                 |
| **Grok Imagine**          | Yes                        | Yes                   | 15s          | Yes                             | Wide aspect ratio support                                   |

### Image, audio, and music

* **Image (22+ models):** `nano-banana-pro/2`, `gpt-image-2`, `flux-2-pro/max`, `grok-imagine`, `qwen-image-2-pro`, `recraft-v4-pro`, `seedream-v4` / `v5-lite`, `lustify-sdxl/v7`, `wai-Illustrious`, and more
* **Multi-edit:** `qwen-edit`, `flux-2-max-edit`, `nano-banana-pro-edit`, `seedream-v5-lite-edit`, `gpt-image-2-edit`, and more
* **TTS:** `tts-kokoro` (50+ voices), `tts-qwen3-0-6b/1-7b`, `elevenlabs-tts-v3`, `elevenlabs-tts-multilingual-v2`
* **Music:** `elevenlabs-music`, `minimax-music-v2`, `ace-step-15`, `stable-audio-25`
* **SFX:** `elevenlabs-sound-effects-v2`, `mmaudio-v2-text-to-audio`

## Production pipelines

### Generation pipeline

End-to-end narrative video (script → storyboard → video → audio → assembly):

```bash theme={"system"}
npm run dev -- produce-episode -p output/my-series -e 1
```

Reference implementation in `src/mini-drama/` covers:

* Series / character / episode management
* LLM-powered script workshopping
* Two-pass storyboard generation (generate + multi-edit refine)
* Vision-based panel QA
* Video generation with frame chaining
* Layered audio post-production
* Subtitle burn-in and final assembly

### Editing pipeline

Cut already-existing media (Venice-generated shots or real raw footage). **Text-first**: the LLM reads a compact `takes_packed.md` (\~12KB per 40 min of audio) rather than frame-dumping video.

The five steps:

<Steps>
  <Step title="Transcribe">
    Local whisper.cpp produces per-source `*.words.json` + `takes_packed.md`.
  </Step>

  <Step title="Read the pack">
    The LLM forms a cut strategy from text alone.
  </Step>

  <Step title="Confirm">
    Proposes the strategy and waits for "yes / revise / cancel".
  </Step>

  <Step title="Render the EDL">
    JSON cut list → ffmpeg concat with 30ms audio fades. Archive-first, so originals are never overwritten.
  </Step>

  <Step title="Self-eval">
    The `cut-qa` agent runs 6 programmatic checks at every cut boundary; max 3 fix iterations.
  </Step>
</Steps>

The `cut-qa` checks catch aspect-ratio regressions, frame-hash jumps inside a word, VO truncation, lighting discontinuity, audio peaks above -6 dBFS, and caption overlap with in-frame text.

<Tip>
  The editing pipeline is inspired by [browser-use/video-use](https://github.com/browser-use/video-use). Their core insight, *"the LLM never watches the video, it reads it"*, is what makes agent-driven editing work without drowning in frame-dump tokens.
</Tip>

## Commands, agents, and skills

The harness exposes 19 workflow commands, 10 specialized agents, and 7 production skills. Highlights:

| Workflow command                   | Purpose                                        |
| ---------------------------------- | ---------------------------------------------- |
| `new-series`                       | Create a new series with locked aesthetics     |
| `add-character` / `lock-character` | Character + voice locking                      |
| `workshop-episode`                 | Collaborative episode scripting                |
| `storyboard-episode`               | Storyboard one episode                         |
| `produce-episode`                  | Full pipeline in one command                   |
| `generate-trailer`                 | Full trailer pipeline                          |
| `edit-footage`                     | Text-first editing pipeline for existing media |
| `ingest-screenplay`                | Ingest a Fountain or PDF screenplay            |

| Specialized agent  | Role                                                          |
| ------------------ | ------------------------------------------------------------- |
| `art-director`     | Aesthetic, palette, lighting, composition decisions           |
| `prompt-engineer`  | Venice image prompts, character consistency                   |
| `storyboard-qa`    | Panel QA for continuity and character checks                  |
| `cut-qa`           | Post-render quality gate (6 checks per cut, max 3 iterations) |
| `overlay-designer` | Branded motion graphics, parallel sub-agents                  |
| `trailer-curator`  | Trailer shot selection and anti-spoiler rules                 |

| Production skill             | Purpose                                                |
| ---------------------------- | ------------------------------------------------------ |
| `venice-api`                 | Venice REST API usage and defaults                     |
| `venice-video-model-routing` | R2V-first routing, decision trees                      |
| `character-consistency`      | Multi-shot character consistency guidance              |
| `shot-composition`           | Shot composition and camera guidance                   |
| `screenplay-parsing`         | Screenplay parsing workflows                           |
| `video-editing`              | Text-first editing philosophy, EDL format, cut-qa loop |

## NLE round-trip

After rendering, export the assembled timeline as XML for fine-tuning in your editor of choice. Every video segment, dialogue clip, SFX clip, and music cue lands on its own track.

```bash theme={"system"}
mini-drama export-timeline -p output/<project> -e 1 --format fcpxml      # Final Cut Pro X
mini-drama export-timeline -p output/<project> -e 1 --format premiere    # Premiere Pro
mini-drama export-timeline -p output/<project> -e 1 --format davinci     # DaVinci Resolve
```

## Programmatic usage

You can also call into the harness's modules directly from your own TypeScript:

```typescript theme={"system"}
import { VeniceClient } from './src/venice/client.js';
import { generateVideo, quoteVideo } from './src/venice/video.js';
import { listVideoModels } from './src/venice/models.js';

const client = new VeniceClient();

const quote = await quoteVideo(client, {
  model: 'kling-v3-pro-image-to-video',
  duration: '8s',
  audio: true,
});
console.log(`Estimated cost: $${quote.quote}`);

const result = await generateVideo(client, {
  model: 'kling-v3-pro-image-to-video',
  prompt: 'A slow dolly shot pushes forward...',
  duration: '8s',
  imageUrl: 'data:image/png;base64,...',
  audio: true,
  outputPath: 'output/shot-001.mp4',
});

const longModels = listVideoModels({ minDurationSec: 20 });
```

## Resources

<CardGroup cols={2}>
  <Card title="GitHub" icon="github" href="https://github.com/jordanurbs/venice-video-harness">
    Source code, issues, and releases
  </Card>

  <Card title="Venice Video Generation" icon="film" href="/guides/media/video-generation">
    The underlying API the harness drives
  </Card>

  <Card title="Reference-to-Video" icon="image" href="/guides/media/reference-to-video">
    R2V guide for character consistency
  </Card>

  <Card title="Seedance 2.0" icon="bolt" href="/guides/media/seedance-2-0">
    The harness's default video family
  </Card>
</CardGroup>

<Note>
  Community-maintained and provided as-is. For harness-specific issues, file them on the [project's GitHub repo](https://github.com/jordanurbs/venice-video-harness/issues).
</Note>
