Skip to main content
Voice cloning lets you generate speech in a voice provided by a short reference audio sample. With tts-chatterbox-hd, upload a sample to /audio/voices, save the returned vv_... voice handle, then pass that handle to /audio/speech.
Voice handles are model-specific. A handle created with tts-chatterbox-hd must be used with tts-chatterbox-hd.

How it works

  1. Upload - Send a clean reference audio file to POST /audio/voices
  2. Save - Store the returned id voice handle
  3. Generate - Send the handle as voice in POST /audio/speech

Prerequisites

  • A Venice API key
  • A clean reference sample in MP3, WAV, FLAC, or M4A format
  • At least 5 to 10 seconds of clear speech from one speaker
Set your API key:
export VENICE_API_KEY="your-api-key"

Step 1: Upload a voice sample

Create a voice handle by uploading the reference audio as multipart form data:
curl https://api.venice.ai/api/v1/audio/voices \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -F "model=tts-chatterbox-hd" \
  -F "file=@./reference-voice.wav"
Response (200):
{
  "id": "vv_voice_abc123xyz",
  "model": "tts-chatterbox-hd"
}
Save the id for speech generation:
export VENICE_VOICE_ID="vv_voice_abc123xyz"

Step 2: Generate speech

Pass the cloned voice handle as voice in the speech request:
curl https://api.venice.ai/api/v1/audio/speech \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-chatterbox-hd",
    "voice": "'"$VENICE_VOICE_ID"'",
    "input": "Hello from Venice. This audio is generated with a cloned Chatterbox HD voice.",
    "response_format": "mp3"
  }' \
  --output chatterbox-clone.mp3
The response body is binary audio in the requested format.

Complete example

This example uploads a reference sample, extracts the voice handle with jq, and writes the generated audio to chatterbox-clone.mp3:
VOICE_ID=$(
  curl -s https://api.venice.ai/api/v1/audio/voices \
    -H "Authorization: Bearer $VENICE_API_KEY" \
    -F "model=tts-chatterbox-hd" \
    -F "file=@./reference-voice.wav" | jq -r '.id'
)

curl https://api.venice.ai/api/v1/audio/speech \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-chatterbox-hd",
    "voice": "'"$VOICE_ID"'",
    "input": "This is a complete Chatterbox HD voice cloning example.",
    "response_format": "mp3",
    "speed": 1
  }' \
  --output chatterbox-clone.mp3

Voice sample tips

Use a sample with one speaker, minimal background noise, and no music. Natural speech works better than whispered, sung, or heavily processed audio. Longer samples can help when the voice has distinctive pacing, accent, or tone, but keep the sample focused on the target speaker.

Handle expiration

Chatterbox HD cloning is zero-shot: Venice stores the uploaded reference audio temporarily, and the model reads it when you synthesize speech. No persistent voice template is created. Voice handles expire automatically after 7 days. After a handle expires, upload the reference sample again to create a new vv_... handle.

Discover cloning support

Models that support cloning include a voice_cloning object in the model spec. Query TTS models to check supported formats, minimum sample length, and retention:
curl "https://api.venice.ai/api/v1/models?type=tts" \
  -H "Authorization: Bearer $VENICE_API_KEY"
tts-chatterbox-hd advertises:
{
  "voice_cloning": {
    "mode": "zero_shot",
    "accepted_formats": ["mp3", "wav", "flac", "m4a"],
    "min_sample_seconds": 5,
    "retention_days": 7
  }
}

API parameters

Create voice

FieldTypeRequiredDescription
modelstringYesMust be tts-chatterbox-hd
filefileYesReference audio sample. Supported formats are MP3, WAV, FLAC, and M4A.

Generate speech

FieldTypeRequiredDefaultDescription
modelstringYes-Must match the model used to create the voice handle
voicestringYes-The vv_... handle returned by POST /audio/voices
inputstringYes-Text to synthesize, up to 4096 characters
response_formatstringNomp3mp3, opus, aac, flac, wav, or pcm
speednumberNo1Speech speed from 0.25 to 4.0
temperaturenumberNo-Sampling temperature from 0 to 2. Higher values can add variation.
streamingbooleanNofalseStream audio sentence by sentence

Common errors

StatusCauseFix
400Unsupported audio container or incompatible voice handleUse MP3, WAV, FLAC, or M4A and pair the handle with the same model used to create it.
401Missing or invalid API keySend Authorization: Bearer $VENICE_API_KEY.
402Insufficient balanceTop up your Venice balance.
413Uploaded file is too largeUse a shorter or more compressed reference sample.
429Rate limit exceededRetry after the rate limit window resets.