Voice cloning lets you generate speech in a voice provided by a short reference audio sample. With tts-chatterbox-hd, upload a sample to /audio/voices, save the returned vv_... voice handle, then pass that handle to /audio/speech.
Voice handles are model-specific. A handle created with tts-chatterbox-hd must be used with tts-chatterbox-hd.
How it works
- Upload - Send a clean reference audio file to
POST /audio/voices
- Save - Store the returned
id voice handle
- Generate - Send the handle as
voice in POST /audio/speech
Prerequisites
- A Venice API key
- A clean reference sample in MP3, WAV, FLAC, or M4A format
- At least 5 to 10 seconds of clear speech from one speaker
Set your API key:
export VENICE_API_KEY="your-api-key"
Step 1: Upload a voice sample
Create a voice handle by uploading the reference audio as multipart form data:
curl https://api.venice.ai/api/v1/audio/voices \
-H "Authorization: Bearer $VENICE_API_KEY" \
-F "model=tts-chatterbox-hd" \
-F "file=@./reference-voice.wav"
Response (200):
{
"id": "vv_voice_abc123xyz",
"model": "tts-chatterbox-hd"
}
Save the id for speech generation:
export VENICE_VOICE_ID="vv_voice_abc123xyz"
Step 2: Generate speech
Pass the cloned voice handle as voice in the speech request:
curl https://api.venice.ai/api/v1/audio/speech \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-chatterbox-hd",
"voice": "'"$VENICE_VOICE_ID"'",
"input": "Hello from Venice. This audio is generated with a cloned Chatterbox HD voice.",
"response_format": "mp3"
}' \
--output chatterbox-clone.mp3
The response body is binary audio in the requested format.
Complete example
This example uploads a reference sample, extracts the voice handle with jq, and writes the generated audio to chatterbox-clone.mp3:
VOICE_ID=$(
curl -s https://api.venice.ai/api/v1/audio/voices \
-H "Authorization: Bearer $VENICE_API_KEY" \
-F "model=tts-chatterbox-hd" \
-F "file=@./reference-voice.wav" | jq -r '.id'
)
curl https://api.venice.ai/api/v1/audio/speech \
-H "Authorization: Bearer $VENICE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-chatterbox-hd",
"voice": "'"$VOICE_ID"'",
"input": "This is a complete Chatterbox HD voice cloning example.",
"response_format": "mp3",
"speed": 1
}' \
--output chatterbox-clone.mp3
Voice sample tips
Use a sample with one speaker, minimal background noise, and no music. Natural speech works better than whispered, sung, or heavily processed audio.
Longer samples can help when the voice has distinctive pacing, accent, or tone, but keep the sample focused on the target speaker.
Handle expiration
Chatterbox HD cloning is zero-shot: Venice stores the uploaded reference audio temporarily, and the model reads it when you synthesize speech. No persistent voice template is created.
Voice handles expire automatically after 7 days. After a handle expires, upload the reference sample again to create a new vv_... handle.
Discover cloning support
Models that support cloning include a voice_cloning object in the model spec. Query TTS models to check supported formats, minimum sample length, and retention:
curl "https://api.venice.ai/api/v1/models?type=tts" \
-H "Authorization: Bearer $VENICE_API_KEY"
tts-chatterbox-hd advertises:
{
"voice_cloning": {
"mode": "zero_shot",
"accepted_formats": ["mp3", "wav", "flac", "m4a"],
"min_sample_seconds": 5,
"retention_days": 7
}
}
API parameters
Create voice
| Field | Type | Required | Description |
|---|
model | string | Yes | Must be tts-chatterbox-hd |
file | file | Yes | Reference audio sample. Supported formats are MP3, WAV, FLAC, and M4A. |
Generate speech
| Field | Type | Required | Default | Description |
|---|
model | string | Yes | - | Must match the model used to create the voice handle |
voice | string | Yes | - | The vv_... handle returned by POST /audio/voices |
input | string | Yes | - | Text to synthesize, up to 4096 characters |
response_format | string | No | mp3 | mp3, opus, aac, flac, wav, or pcm |
speed | number | No | 1 | Speech speed from 0.25 to 4.0 |
temperature | number | No | - | Sampling temperature from 0 to 2. Higher values can add variation. |
streaming | boolean | No | false | Stream audio sentence by sentence |
Common errors
| Status | Cause | Fix |
|---|
400 | Unsupported audio container or incompatible voice handle | Use MP3, WAV, FLAC, or M4A and pair the handle with the same model used to create it. |
401 | Missing or invalid API key | Send Authorization: Bearer $VENICE_API_KEY. |
402 | Insufficient balance | Top up your Venice balance. |
413 | Uploaded file is too large | Use a shorter or more compressed reference sample. |
429 | Rate limit exceeded | Retry after the rate limit window resets. |