Loading models…
Usage
Speech-to-text models transcribe spoken audio into written text. They are accessed via the Audio Transcriptions API.Supported audio formats
mp3, mp4, mpeg, mpga, m4a, wav, webm, flac, ogg
Response formats
| Format | Description |
|---|---|
json | Default. Returns { "text": "..." }. |
text | Plain transcribed text. |
srt | SubRip subtitle format with timestamps. |
vtt | WebVTT subtitle format with timestamps. |
verbose_json | Full response with segment-level timestamps and metadata. |
Pricing is billed per second of input audio. See the Audio Transcriptions API for request examples and parameter details.