> ## Documentation Index
> Fetch the complete documentation index at: https://docs.venice.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Voice Cloning

> Clone a voice from a short reference audio sample with Chatterbox HD, then generate speech through the Venice Audio API.

Voice cloning lets you generate speech in a voice provided by a short reference audio sample. With `tts-chatterbox-hd`, upload a sample to `/audio/voices`, save the returned `vv_...` voice handle, then pass that handle to `/audio/speech`.

<Note>
  Voice handles are model-specific. A handle created with `tts-chatterbox-hd` must be used with `tts-chatterbox-hd`.
</Note>

## How it works

1. **Upload** - Send a clean reference audio file to `POST /audio/voices`
2. **Save** - Store the returned `id` voice handle
3. **Generate** - Send the handle as `voice` in `POST /audio/speech`

## Prerequisites

* A Venice API key
* A clean reference sample in MP3, WAV, FLAC, or M4A format
* At least 5 to 10 seconds of clear speech from one speaker

Set your API key:

```bash theme={"system"}
export VENICE_API_KEY="your-api-key"
```

## Step 1: Upload a voice sample

Create a voice handle by uploading the reference audio as multipart form data:

```bash theme={"system"}
curl https://api.venice.ai/api/v1/audio/voices \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -F "model=tts-chatterbox-hd" \
  -F "file=@./reference-voice.wav"
```

**Response (200):**

```json theme={"system"}
{
  "id": "vv_voice_abc123xyz",
  "model": "tts-chatterbox-hd"
}
```

Save the `id` for speech generation:

```bash theme={"system"}
export VENICE_VOICE_ID="vv_voice_abc123xyz"
```

## Step 2: Generate speech

Pass the cloned voice handle as `voice` in the speech request:

```bash theme={"system"}
curl https://api.venice.ai/api/v1/audio/speech \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-chatterbox-hd",
    "voice": "'"$VENICE_VOICE_ID"'",
    "input": "Hello from Venice. This audio is generated with a cloned Chatterbox HD voice.",
    "response_format": "mp3"
  }' \
  --output chatterbox-clone.mp3
```

The response body is binary audio in the requested format.

***

## Complete example

This example uploads a reference sample, extracts the voice handle with `jq`, and writes the generated audio to `chatterbox-clone.mp3`:

```bash theme={"system"}
VOICE_ID=$(
  curl -s https://api.venice.ai/api/v1/audio/voices \
    -H "Authorization: Bearer $VENICE_API_KEY" \
    -F "model=tts-chatterbox-hd" \
    -F "file=@./reference-voice.wav" | jq -r '.id'
)

curl https://api.venice.ai/api/v1/audio/speech \
  -H "Authorization: Bearer $VENICE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-chatterbox-hd",
    "voice": "'"$VOICE_ID"'",
    "input": "This is a complete Chatterbox HD voice cloning example.",
    "response_format": "mp3",
    "speed": 1
  }' \
  --output chatterbox-clone.mp3
```

## Voice sample tips

Use a sample with one speaker, minimal background noise, and no music. Natural speech works better than whispered, sung, or heavily processed audio.

Longer samples can help when the voice has distinctive pacing, accent, or tone, but keep the sample focused on the target speaker.

## Handle expiration

Chatterbox HD cloning is zero-shot: Venice stores the uploaded reference audio temporarily, and the model reads it when you synthesize speech. No persistent voice template is created.

Voice handles expire automatically after 7 days. After a handle expires, upload the reference sample again to create a new `vv_...` handle.

## Discover cloning support

Models that support cloning include a `voice_cloning` object in the model spec. Query TTS models to check supported formats, minimum sample length, and retention:

```bash theme={"system"}
curl "https://api.venice.ai/api/v1/models?type=tts" \
  -H "Authorization: Bearer $VENICE_API_KEY"
```

`tts-chatterbox-hd` advertises:

```json theme={"system"}
{
  "voice_cloning": {
    "mode": "zero_shot",
    "accepted_formats": ["mp3", "wav", "flac", "m4a"],
    "min_sample_seconds": 5,
    "retention_days": 7
  }
}
```

***

## API parameters

### Create voice

| Field   | Type   | Required | Description                                                            |
| ------- | ------ | -------- | ---------------------------------------------------------------------- |
| `model` | string | Yes      | Must be `tts-chatterbox-hd`                                            |
| `file`  | file   | Yes      | Reference audio sample. Supported formats are MP3, WAV, FLAC, and M4A. |

### Generate speech

| Field             | Type    | Required | Default | Description                                                            |
| ----------------- | ------- | -------- | ------- | ---------------------------------------------------------------------- |
| `model`           | string  | Yes      | -       | Must match the model used to create the voice handle                   |
| `voice`           | string  | Yes      | -       | The `vv_...` handle returned by `POST /audio/voices`                   |
| `input`           | string  | Yes      | -       | Text to synthesize, up to 4096 characters                              |
| `response_format` | string  | No       | `mp3`   | `mp3`, `opus`, `aac`, `flac`, `wav`, or `pcm`                          |
| `speed`           | number  | No       | `1`     | Speech speed from `0.25` to `4.0`                                      |
| `temperature`     | number  | No       | -       | Sampling temperature from `0` to `2`. Higher values can add variation. |
| `streaming`       | boolean | No       | `false` | Stream audio sentence by sentence                                      |

## Common errors

| Status | Cause                                                    | Fix                                                                                   |
| ------ | -------------------------------------------------------- | ------------------------------------------------------------------------------------- |
| `400`  | Unsupported audio container or incompatible voice handle | Use MP3, WAV, FLAC, or M4A and pair the handle with the same model used to create it. |
| `401`  | Missing or invalid API key                               | Send `Authorization: Bearer $VENICE_API_KEY`.                                         |
| `402`  | Insufficient balance                                     | Top up your Venice balance.                                                           |
| `413`  | Uploaded file is too large                               | Use a shorter or more compressed reference sample.                                    |
| `429`  | Rate limit exceeded                                      | Retry after the rate limit window resets.                                             |
