Speech-to-Text (Synchronous)

Convert Cantonese audio files to accurate text transcriptions. This endpoint supports multiple audio formats, timestamps, speaker diarization, and advanced transcription options.

Two modes — pick the right one for your audio:
  • Synchronous (this page): one blocking POST. Best for clips under ~10 minutes — long enough for short voice notes, message replies, and most podcast snippets, but the request is bound by Cloudflare's 100-second proxy timeout.
  • Async (/speech-to-text-async): submit + poll. Required for long-form audio (interviews, lectures, meetings, sermons) — anything that takes more than ~90 seconds of server-side processing.

Request Parameters

This endpoint requires multipart/form-data for file uploads.

ParameterTypeRequiredDescription
api_keystringYesYour API key for authentication
datafileYesAudio file to transcribe. Supported formats: wav, mp3, m4a, flac, ogg.
with_timestampbooleanNoInclude word-level timestamps in the response. Defaults to false.
with_diarizationbooleanNoEnable speaker diarization to identify different speakers. Defaults to false.
contextstringNoFree-form text describing the recording's topic, speakers, or domain (e.g. “quarterly earnings call for HSBC, speakers are CFO and analysts”). Used as a hint during the post-ASR fusion step to disambiguate homophones and prefer domain-specific vocabulary. Has no effect when fusion is disabled (skip_fusion=true or corpus_ids=“none”).
wait_for_completionbooleanNoDefaults to false (async). On this synchronous endpoint, set wait_for_completion=true if you want the response to include the full transcription instead of just { request_id, status }. For long-form audio prefer the dedicated async endpoint — sync calls hit Cloudflare's 100-second timeout for anything that takes longer to process.

Example Request

Sends the audio synchronously and blocks until transcription + fusion finish. The response arrives in one round trip — no polling. Cloudflare cuts requests at 100 seconds, so use the async endpoint for any audio whose total processing time is likely to exceed that.

to auto-fill your API key in the code examples below.
curl -X POST "https://cantonese.ai/api/stt" \
  -F "api_key=YOUR_API_KEY" \
  -F "with_timestamp=false" \
  -F "with_diarization=false" \
  -F "context=Quarterly earnings call for HSBC, speakers are CFO and analysts" \
  -F "[email protected];type=audio/wav"

Response

On success, the response returns a JSON object with the transcription results:

Default response format:

{
  "text": "When you call someone who is thousands of miles away, you're using a satellite.",
  "duration": "6.540000",
  "process_time": "0.19"
}

with_timestamp = true

{
  "text": "1\n00:00:01,032 --> 00:00:04,083\nWhen you call someone who is thousands of\n\n2\n00:00:04,083 --> 00:00:04,868\n miles away, you're using a satellite.\n\n",
  "duration": "6.540000",
  "process_time": "1.86"
}

with_diarization = true

{
  "text": "When you call someone who is thousands of miles away, you're using a satellite.",
  "diarization": "SPEAKER_00: When you call someone who is thousands of miles away, you're using a satellite.",
  "duration": "6.540000",
  "process_time": "0.19"
}

with_timestamp = true and with_diarization = true

{
  "text": "1\n00:00:01,032 --> 00:00:04,083\nSPEAKER_00: When you call someone who is thousands of\n\n2\n00:00:04,083 --> 00:00:04,868\nSPEAKER_00:  miles away, you're using a satellite.\n\n",
  "duration": "6.540000",
  "process_time": "3.22"
}

Status Codes

The API returns standard HTTP status codes to indicate the success or failure of requests.

Status CodeDescription
200Success - Audio transcribed successfully
400Bad Request - Invalid parameters or malformed request
401Unauthorized - Invalid or missing API key
403Forbidden - API key doesn't have permission for this endpoint
413Payload Too Large - Audio file exceeds maximum size limit
415Unsupported Media Type - Audio format not supported
422Unprocessable Entity - Audio file corrupted or invalid parameter values
429Too Many Requests - Rate limit exceeded
500Internal Server Error - Server encountered an unexpected condition
503Service Unavailable - Server is temporarily unable to handle the request