Text-to-Speech
Convert text to natural-sounding Cantonese speech. This endpoint supports multiple voice options, audio formats, and customization parameters.
Request Body
| Parameter | Type | Required | Description |
|---|---|---|---|
| api_key | string | Yes | Your API key for authentication. |
| text | string | Yes | The text to convert to speech. Maximum 5000 characters. |
| frame_rate | string | No | Audio frame rate in Hz. Common values: "16000", "24000", "44100". Defaults to "24000". |
| speed | number | No | Speech speed multiplier. Range: 0.5-3.0. Defaults to 1.0. |
| duration | number | No | Target duration (seconds). |
| pitch | number | No | Pitch adjustment in semitones. Range: -12 to +12. Defaults to 0. |
| language | string | No | Language code. Defaults to "cantonese". Options: "cantonese", "english", "mandarin". |
| output_extension | string | No | Audio output format. Defaults to "wav". Options: "wav", "mp3" |
| voice_id | string | No | Unique identifier for the voice to use. Defaults to system default voice. |
| should_enhance | boolean | No | Whether to apply audio enhancement. Defaults to false. |
| should_convert_from_simplified_to_traditional | boolean | No | Whether to convert simplified Chinese to traditional Chinese before synthesis. Defaults to false. |
| should_return_timestamp | boolean | No | Defaults to false. |
| should_use_turbo_model | boolean | No | Defaults to false. |
Turbo Model v1
Enables faster speech synthesis for improved performance.
Supported voices available in the voice library.
Response Types
The API supports two different response formats depending on the should_return_timestamp parameter:
🎵 Audio File Response
Direct Audio File
When should_return_timestamp = false (default), the API returns a direct audio file.
curl -X POST "https://cantonese.ai/api/tts" \
-H "Content-Type: application/json" \
-d '{
"api_key": "YOUR_API_KEY",
"text": "你今日食咗飯未?",
"frame_rate": "24000",
"speed": 1,
"duration": 2,
"pitch": 0,
"language": "cantonese",
"output_extension": "wav",
"voice_id": "2725cf0f-efe2-4132-9e06-62ad84b2973d",
"should_enhance": false,
"should_convert_from_simplified_to_traditional": true,
"should_return_timestamp": false,
"should_use_turbo_model": false
}' \
--output output.wav📁 Output Format
Direct audio file in the requested format: .wav, .mp3
📊 JSON Response with Timestamps
JSON with Base64 Audio + Timestamps
When should_return_timestamp = true, the API returns a JSON response with base64-encoded audio and timing data.
curl -X POST "https://cantonese.ai/api/tts" \
-H "Content-Type: application/json" \
-d '{
"api_key": "YOUR_API_KEY",
"text": "你今日食咗飯未?",
"frame_rate": "24000",
"speed": 1,
"duration": 2,
"pitch": 0,
"language": "cantonese",
"voice_id": "2725cf0f-efe2-4132-9e06-62ad84b2973d",
"should_enhance": false,
"should_convert_from_simplified_to_traditional": true,
"should_return_timestamp": true,
"should_use_turbo_model": false
}'📋 JSON Response Structure
fileBase64-encoded audio file in the requested formatrequest_idUnique identifier for this requestsrt_timestampSubtitle timestamps in SRT formattimestampsArray of word-level timing data with start/end times and text{
"file": "Z+AOEA4wHjAXf7s/Qw7uXoYuwz8LD22PVH8gzwR+os6zrq...",
"request_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
"srt_timestamp": "1\n00:00:00,000 --> 00:00:01,984\n你今日食咗飯未\n\n",
"timestamps": [
{
"start": 0,
"end": 1.984,
"text": "你今日食咗飯未"
}
]
}Status Codes
The API returns standard HTTP status codes to indicate the success or failure of requests.
| Status Code | Description |
|---|---|
| 200 | Success - Audio file generated successfully |
| 400 | Bad Request - Invalid parameters or malformed request |
| 401 | Unauthorized - Invalid or missing API key |
| 403 | Forbidden - API key doesn't have permission for this endpoint |
| 413 | Payload Too Large - Text exceeds maximum length (5000 characters) |
| 422 | Unprocessable Entity - Invalid parameter values or unsupported voice/format |
| 429 | Too Many Requests - Rate limit exceeded |
| 500 | Internal Server Error - Server encountered an unexpected condition |
| 503 | Service Unavailable - Server is temporarily unable to handle the request |