Speech-to-Text
Convert Cantonese audio files to accurate text transcriptions. This endpoint supports multiple audio formats, timestamps, speaker diarization, and advanced transcription options.
Request Parameters
This endpoint requires multipart/form-data for file uploads.
Parameter | Type | Required | Description |
---|---|---|---|
api_key | string | Yes | Your API key for authentication |
data | file | Yes | Audio file to transcribe. Supported formats: wav, mp3, m4a, flac, ogg. |
with_timestamp | boolean | No | Include word-level timestamps in the response. Defaults to false. |
with_diarization | boolean | No | Enable speaker diarization to identify different speakers. Defaults to false. |
Example Request
Here are examples of how to transcribe audio files using different programming languages.
curl -X POST "https://paid-api.cantonese.ai" \
-F "api_key=YOUR_API_KEY" \
-F "with_timestamp=false" \
-F "with_diarization=false" \
-F "[email protected];type=audio/wav"
Response
On success, the response returns a JSON object with the transcription results:
Default response format:
{
"text": "When you call someone who is thousands of miles away, you're using a satellite.",
"duration": "6.540000",
"is_cached": false,
"process_time": 0.18551874160766602
}
with_timestamp = true
{
"text": "1\n00:00:01,032 --> 00:00:04,083\nWhen you call someone who is thousands of\n\n2\n00:00:04,083 --> 00:00:04,868\n miles away, you're using a satellite.\n\n",
"duration": "6.540000",
"process_time": 1.863849401473999
}
with_diarization = true
{
"text": "When you call someone who is thousands of miles away, you're using a satellite.",
"diarization": "SPEAKER_00: When you call someone who is thousands of miles away, you're using a satellite.",
"is_cached": false,
"duration": "6.540000",
"process_time": 0.18898367881774902
}
with_timestamp = true
and with_diarization = true
{
"text": "1\n00:00:01,032 --> 00:00:04,083\nSPEAKER_00: When you call someone who is thousands of\n\n2\n00:00:04,083 --> 00:00:04,868\nSPEAKER_00: miles away, you're using a satellite.\n\n",
"is_cached": true,
"duration": "6.540000",
"process_time": 3.2193245887756348
}
Status Codes
The API returns standard HTTP status codes to indicate the success or failure of requests.
Status Code | Description |
---|---|
200 | Success - Audio transcribed successfully |
400 | Bad Request - Invalid parameters or malformed request |
401 | Unauthorized - Invalid or missing API key |
403 | Forbidden - API key doesn't have permission for this endpoint |
413 | Payload Too Large - Audio file exceeds maximum size limit |
415 | Unsupported Media Type - Audio format not supported |
422 | Unprocessable Entity - Audio file corrupted or invalid parameter values |
429 | Too Many Requests - Rate limit exceeded |
500 | Internal Server Error - Server encountered an unexpected condition |
503 | Service Unavailable - Server is temporarily unable to handle the request |