cantonese.aiAPI Reference

Speech-to-Text

Convert Cantonese audio files to accurate text transcriptions. This endpoint supports multiple audio formats, timestamps, speaker diarization, and advanced transcription options.

Request Parameters

This endpoint requires multipart/form-data for file uploads.

ParameterTypeRequiredDescription
api_keystringYesYour API key for authentication
datafileYesAudio file to transcribe. Supported formats: wav, mp3, m4a, flac, ogg.
with_timestampbooleanNoInclude word-level timestamps in the response. Defaults to false.
with_diarizationbooleanNoEnable speaker diarization to identify different speakers. Defaults to false.

Example Request

Here are examples of how to transcribe audio files using different programming languages.

curl -X POST "https://paid-api.cantonese.ai" \
  -F "api_key=YOUR_API_KEY" \
  -F "with_timestamp=false" \
  -F "with_diarization=false" \
  -F "[email protected];type=audio/wav"

Response

On success, the response returns a JSON object with the transcription results:

Default response format:

{
  "text": "When you call someone who is thousands of miles away, you're using a satellite.",
  "duration": "6.540000",
  "is_cached": false,
  "process_time": 0.18551874160766602
}

with_timestamp = true

{
  "text": "1\n00:00:01,032 --> 00:00:04,083\nWhen you call someone who is thousands of\n\n2\n00:00:04,083 --> 00:00:04,868\n miles away, you're using a satellite.\n\n",
  "duration": "6.540000",
  "process_time": 1.863849401473999
}

with_diarization = true

{
  "text": "When you call someone who is thousands of miles away, you're using a satellite.",
  "diarization": "SPEAKER_00: When you call someone who is thousands of miles away, you're using a satellite.",
  "is_cached": false,
  "duration": "6.540000",
  "process_time": 0.18898367881774902
}

with_timestamp = true and with_diarization = true

{
  "text": "1\n00:00:01,032 --> 00:00:04,083\nSPEAKER_00: When you call someone who is thousands of\n\n2\n00:00:04,083 --> 00:00:04,868\nSPEAKER_00:  miles away, you're using a satellite.\n\n",
  "is_cached": true,
  "duration": "6.540000",
  "process_time": 3.2193245887756348
}

Status Codes

The API returns standard HTTP status codes to indicate the success or failure of requests.

Status CodeDescription
200Success - Audio transcribed successfully
400Bad Request - Invalid parameters or malformed request
401Unauthorized - Invalid or missing API key
403Forbidden - API key doesn't have permission for this endpoint
413Payload Too Large - Audio file exceeds maximum size limit
415Unsupported Media Type - Audio format not supported
422Unprocessable Entity - Audio file corrupted or invalid parameter values
429Too Many Requests - Rate limit exceeded
500Internal Server Error - Server encountered an unexpected condition
503Service Unavailable - Server is temporarily unable to handle the request