Voice Cloning Guide | Venice API Docs

Voice cloning lets you generate speech in a voice provided by a short reference audio sample. With tts-chatterbox-hd, upload a sample to /audio/voices, save the returned vv_... voice handle, then pass that handle to /audio/speech.

Voice handles are model-specific. A handle created with tts-chatterbox-hd must be used with tts-chatterbox-hd.

How it works

Upload - Send a clean reference audio file to POST /audio/voices
Save - Store the returned id voice handle
Generate - Send the handle as voice in POST /audio/speech

Prerequisites

A Venice API key
A clean reference sample in MP3, WAV, FLAC, or M4A format
At least 5 to 10 seconds of clear speech from one speaker

Set your API key:

export VENICE_API_KEY="your-api-key"

Step 1: Upload a voice sample

Create a voice handle by uploading the reference audio as multipart form data:

curl https://api.venice.ai/api/v1/audio/voices \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -F "model=tts-chatterbox-hd" \
  -F "file=@./reference-voice.wav"

Response (200):

{
  "id": "vv_voice_abc123xyz",
  "model": "tts-chatterbox-hd"
}

Save the id for speech generation:

export VENICE_VOICE_ID="vv_voice_abc123xyz"

Step 2: Generate speech

Pass the cloned voice handle as voice in the speech request:

curl https://api.venice.ai/api/v1/audio/speech \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-chatterbox-hd",
    "voice": "'"$VENICE_VOICE_ID"'",
    "input": "Hello from Venice. This audio is generated with a cloned Chatterbox HD voice.",
    "response_format": "mp3"
  }' \
  --output chatterbox-clone.mp3

The response body is binary audio in the requested format.

Complete example

This example uploads a reference sample, extracts the voice handle with jq, and writes the generated audio to chatterbox-clone.mp3:

VOICE_ID=$(
  curl -s https://api.venice.ai/api/v1/audio/voices \
    -H "Authorization: Bearer $VENICE_API_KEY" \
    -F "model=tts-chatterbox-hd" \
    -F "file=@./reference-voice.wav" | jq -r '.id'
)

curl https://api.venice.ai/api/v1/audio/speech \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-chatterbox-hd",
    "voice": "'"$VOICE_ID"'",
    "input": "This is a complete Chatterbox HD voice cloning example.",
    "response_format": "mp3",
    "speed": 1
  }' \
  --output chatterbox-clone.mp3

Voice sample tips

Use a sample with one speaker, minimal background noise, and no music. Natural speech works better than whispered, sung, or heavily processed audio. Longer samples can help when the voice has distinctive pacing, accent, or tone, but keep the sample focused on the target speaker.

Handle expiration

Chatterbox HD cloning is zero-shot: Venice stores the uploaded reference audio temporarily, and the model reads it when you synthesize speech. No persistent voice template is created. Voice handles expire automatically after 7 days. After a handle expires, upload the reference sample again to create a new vv_... handle.

Discover cloning support

Models that support cloning include a voice_cloning object in the model spec. Query TTS models to check supported formats, minimum sample length, and retention:

curl "https://api.venice.ai/api/v1/models?type=tts" \
  -H "Authorization: Bearer $VENICE_API_KEY"

tts-chatterbox-hd advertises:

{
  "voice_cloning": {
    "mode": "zero_shot",
    "accepted_formats": ["mp3", "wav", "flac", "m4a"],
    "min_sample_seconds": 5,
    "retention_days": 7
  }
}

API parameters

Create voice

Field	Type	Required	Description
`model`	string	Yes	Must be `tts-chatterbox-hd`
`file`	file	Yes	Reference audio sample. Supported formats are MP3, WAV, FLAC, and M4A.

Generate speech

Field	Type	Required	Default	Description
`model`	string	Yes	-	Must match the model used to create the voice handle
`voice`	string	Yes	-	The `vv_...` handle returned by `POST /audio/voices`
`input`	string	Yes	-	Text to synthesize, up to 4096 characters
`response_format`	string	No	`mp3`	`mp3`, `opus`, `aac`, `flac`, `wav`, or `pcm`
`speed`	number	No	`1`	Speech speed from `0.25` to `4.0`
`temperature`	number	No	-	Sampling temperature from `0` to `2`. Higher values can add variation.
`streaming`	boolean	No	`false`	Stream audio sentence by sentence

Common errors

Status	Cause	Fix
`400`	Unsupported audio container or incompatible voice handle	Use MP3, WAV, FLAC, or M4A and pair the handle with the same model used to create it.
`401`	Missing or invalid API key	Send `Authorization: Bearer $VENICE_API_KEY`.
`402`	Insufficient balance	Top up your Venice balance.
`413`	Uploaded file is too large	Use a shorter or more compressed reference sample.
`429`	Rate limit exceeded	Retry after the rate limit window resets.

Docs

Getting Started

Text & Chat

Image, Video & Audio

Agents & Integrations

Coding Tools

Agent Tooling

SDKs & Frameworks

Voice Cloning

How it works

Prerequisites

Step 1: Upload a voice sample

Step 2: Generate speech

Complete example

Voice sample tips

Handle expiration

Discover cloning support

API parameters

Create voice

Generate speech

Common errors

​How it works

​Prerequisites

​Step 1: Upload a voice sample

​Step 2: Generate speech

​Complete example

​Voice sample tips

​Handle expiration

​Discover cloning support

​API parameters

​Create voice

​Generate speech

​Common errors

How it works

Prerequisites

Step 1: Upload a voice sample

Step 2: Generate speech

Complete example

Voice sample tips

Handle expiration

Discover cloning support

API parameters

Create voice

Generate speech

Common errors