Skip to main content
POST
/
audio
/
speech
/api/v1/audio/speech
curl --request POST \
  --url https://api.venice.ai/api/v1/audio/speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "input": "Hello, welcome to Venice Voice.",
  "model": "tts-kokoro",
  "response_format": "mp3",
  "speed": 1,
  "streaming": false,
  "voice": "af_sky"
}
'
"<string>"

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Request to generate audio from text.

input
string
required

The text to generate audio for. The maximum length is 4096 characters.

Required string length: 1 - 4096
Example:

"Hello, this is a test of the text to speech system."

language
enum<string>

The language of the input text. Only supported by Qwen 3 TTS models. If not specified, the language is auto-detected.

Available options:
Auto,
English,
Chinese,
Spanish,
French,
German,
Italian,
Japanese,
Korean,
Portuguese,
Russian
Example:

"English"

model
enum<string>
default:tts-kokoro

The model ID of a Venice TTS model.

Available options:
tts-kokoro,
tts-qwen3-0-6b,
tts-qwen3-1-7b
Example:

"tts-kokoro"

prompt
string

A style prompt to control the emotion and delivery of the speech. Only supported by Qwen 3 TTS models. Examples: "Very happy.", "Sad and slow.", "Excited and energetic."

Maximum string length: 500
Example:

"Very happy."

response_format
enum<string>
default:mp3

The format to audio in.

Available options:
mp3,
opus,
aac,
flac,
wav,
pcm
Example:

"mp3"

speed
number
default:1

The speed of the generated audio. Select a value from 0.25 to 4.0. 1.0 is the default.

Required range: 0.25 <= x <= 4
Example:

1

streaming
boolean
default:false

Should the content stream back sentence by sentence or be processed and returned as a complete audio file.

Example:

true

temperature
number

Sampling temperature for speech generation. Higher values produce more varied output. Only supported by Qwen 3 TTS models. Default is 0.9.

Required range: 0 <= x <= 2
Example:

0.9

top_p
number

Nucleus sampling parameter. Only supported by Qwen 3 TTS models. Default is 1.0.

Required range: 0 <= x <= 1
Example:

1

voice
enum<string>
default:af_sky

The voice to use when generating the audio. Voices are model-specific: Kokoro voices (e.g. af_sky, af_bella, am_adam) work with tts-kokoro; Qwen 3 voices (e.g. Vivian, Serena, Dylan, Eric, Ryan, Aiden) work with tts-qwen3-0-6b and tts-qwen3-1-7b. Using an incompatible voice returns a 400 error.

Available options:
af_alloy,
af_aoede,
af_bella,
af_heart,
af_jadzia,
af_jessica,
af_kore,
af_nicole,
af_nova,
af_river,
af_sarah,
af_sky,
am_adam,
am_echo,
am_eric,
am_fenrir,
am_liam,
am_michael,
am_onyx,
am_puck,
am_santa,
bf_alice,
bf_emma,
bf_lily,
bm_daniel,
bm_fable,
bm_george,
bm_lewis,
zf_xiaobei,
zf_xiaoni,
zf_xiaoxiao,
zf_xiaoyi,
zm_yunjian,
zm_yunxi,
zm_yunxia,
zm_yunyang,
ff_siwis,
hf_alpha,
hf_beta,
hm_omega,
hm_psi,
if_sara,
im_nicola,
jf_alpha,
jf_gongitsune,
jf_nezumi,
jf_tebukuro,
jm_kumo,
pf_dora,
pm_alex,
pm_santa,
ef_dora,
em_alex,
em_santa,
Vivian,
Serena,
Ono_Anna,
Sohee,
Uncle_Fu,
Dylan,
Eric,
Ryan,
Aiden
Example:

"af_sky"

Response

Audio content generated successfully

The response is of type file.