Skip to main content
Loading models…

Usage

Speech-to-text models transcribe spoken audio into written text. They are accessed via the Audio Transcriptions API.

Supported audio formats

mp3, mp4, mpeg, mpga, m4a, wav, webm, flac, ogg

Response formats

FormatDescription
jsonDefault. Returns { "text": "..." }.
textPlain transcribed text.
srtSubRip subtitle format with timestamps.
vttWebVTT subtitle format with timestamps.
verbose_jsonFull response with segment-level timestamps and metadata.
Pricing is billed per second of input audio. See the Audio Transcriptions API for request examples and parameter details.