Speech-to-Text
Convert Cantonese audio files to accurate text transcriptions. This endpoint supports multiple audio formats, timestamps, speaker diarization, and advanced transcription options.
Request Parameters
This endpoint requires multipart/form-data for file uploads.
| Parameter | Type | Required | Description |
|---|---|---|---|
| api_key | string | Yes | Your API key for authentication |
| data | file | Yes | Audio file to transcribe. Supported formats: wav, mp3, m4a, flac, ogg. |
| with_timestamp | boolean | No | Include word-level timestamps in the response. Defaults to false. |
| with_diarization | boolean | No | Enable speaker diarization to identify different speakers. Defaults to false. |
Example Request
Here are examples of how to transcribe audio files using different programming languages.
curl -X POST "https://paid-api.cantonese.ai" \
-F "api_key=YOUR_API_KEY" \
-F "with_timestamp=false" \
-F "with_diarization=false" \
-F "[email protected];type=audio/wav"Response
On success, the response returns a JSON object with the transcription results:
Default response format:
{
"text": "When you call someone who is thousands of miles away, you're using a satellite.",
"duration": "6.540000",
"is_cached": false,
"process_time": 0.18551874160766602
}with_timestamp = true
{
"text": "1\n00:00:01,032 --> 00:00:04,083\nWhen you call someone who is thousands of\n\n2\n00:00:04,083 --> 00:00:04,868\n miles away, you're using a satellite.\n\n",
"duration": "6.540000",
"process_time": 1.863849401473999
}with_diarization = true
{
"text": "When you call someone who is thousands of miles away, you're using a satellite.",
"diarization": "SPEAKER_00: When you call someone who is thousands of miles away, you're using a satellite.",
"is_cached": false,
"duration": "6.540000",
"process_time": 0.18898367881774902
}with_timestamp = true and with_diarization = true
{
"text": "1\n00:00:01,032 --> 00:00:04,083\nSPEAKER_00: When you call someone who is thousands of\n\n2\n00:00:04,083 --> 00:00:04,868\nSPEAKER_00: miles away, you're using a satellite.\n\n",
"is_cached": true,
"duration": "6.540000",
"process_time": 3.2193245887756348
}Status Codes
The API returns standard HTTP status codes to indicate the success or failure of requests.
| Status Code | Description |
|---|---|
| 200 | Success - Audio transcribed successfully |
| 400 | Bad Request - Invalid parameters or malformed request |
| 401 | Unauthorized - Invalid or missing API key |
| 403 | Forbidden - API key doesn't have permission for this endpoint |
| 413 | Payload Too Large - Audio file exceeds maximum size limit |
| 415 | Unsupported Media Type - Audio format not supported |
| 422 | Unprocessable Entity - Audio file corrupted or invalid parameter values |
| 429 | Too Many Requests - Rate limit exceeded |
| 500 | Internal Server Error - Server encountered an unexpected condition |
| 503 | Service Unavailable - Server is temporarily unable to handle the request |