Text-to-Speech
Convert text to natural-sounding Cantonese speech. This endpoint supports multiple voice options, audio formats, and customization parameters.
Request Body
Parameter | Type | Required | Description |
---|---|---|---|
api_key | string | Yes | Your API key for authentication. |
text | string | Yes | The text to convert to speech. Maximum 5000 characters. |
frame_rate | string | No | Audio frame rate in Hz. Common values: "16000", "24000", "44100". Defaults to "24000". |
speed | number | No | Speech speed multiplier. Range: 0.5-3.0. Defaults to 1.0. |
duration | number | No | Target duration (seconds). |
pitch | number | No | Pitch adjustment in semitones. Range: -12 to +12. Defaults to 0. |
language | string | No | Language code. Defaults to "cantonese". Options: "cantonese", "english", "mandarin". |
output_extension | string | No | Audio output format. Defaults to "mp3". Options: "mp3", "wav", "ogg", "flac". |
voice_id | string | No | Unique identifier for the voice to use. Defaults to system default voice. |
should_enhance | boolean | No | Whether to apply audio enhancement. Defaults to false. |
should_convert_from_simplified_to_traditional | boolean | No | Whether to convert simplified Chinese to traditional Chinese before synthesis. Defaults to false. |
should_return_timestamp | boolean | No | Defaults to false. |
should_use_turbo_model | boolean | No | Defaults to false. |
Turbo Model v1
Enables faster speech synthesis for improved performance.
Supported voices available in the voice library.
Response Types
The API supports two different response formats depending on the should_return_timestamp
parameter:
🎵 Audio File Response
Direct Audio File
When should_return_timestamp = false
(default), the API returns a direct audio file.
curl -X POST "https://cantonese.ai/api/tts" \
-H "Content-Type: application/json" \
-d '{
"api_key": "YOUR_API_KEY",
"text": "你今日食咗飯未?",
"frame_rate": "24000",
"speed": 1,
"duration": 2,
"pitch": 0,
"language": "cantonese",
"output_extension": "wav",
"voice_id": "2725cf0f-efe2-4132-9e06-62ad84b2973d",
"should_enhance": false,
"should_convert_from_simplified_to_traditional": true,
"should_return_timestamp": false,
"should_use_turbo_model": false
}' \
--output output.wav
📁 Output Format
Direct audio file in the requested format: .mp3
, .wav
, .ogg
, or .flac
📊 JSON Response with Timestamps
JSON with Base64 Audio + Timestamps
When should_return_timestamp = true
, the API returns a JSON response with base64-encoded audio and timing data.
curl -X POST "https://cantonese.ai/api/tts" \
-H "Content-Type: application/json" \
-d '{
"api_key": "YOUR_API_KEY",
"text": "你今日食咗飯未?",
"frame_rate": "24000",
"speed": 1,
"duration": 2,
"pitch": 0,
"language": "cantonese",
"voice_id": "2725cf0f-efe2-4132-9e06-62ad84b2973d",
"should_enhance": false,
"should_convert_from_simplified_to_traditional": true,
"should_return_timestamp": true,
"should_use_turbo_model": false
}'
📋 JSON Response Structure
file
Base64-encoded audio file in the requested formatrequest_id
Unique identifier for this requestsrt_timestamp
Subtitle timestamps in SRT formattimestamps
Array of word-level timing data with start/end times and text{
"file": "Z+AOEA4wHjAXf7s/Qw7uXoYuwz8LD22PVH8gzwR+os6zrq...",
"request_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
"srt_timestamp": "1\n00:00:00,000 --> 00:00:01,984\n你今日食咗飯未\n\n",
"timestamps": [
{
"start": 0,
"end": 1.984,
"text": "你今日食咗飯未"
}
]
}
Status Codes
The API returns standard HTTP status codes to indicate the success or failure of requests.
Status Code | Description |
---|---|
200 | Success - Audio file generated successfully |
400 | Bad Request - Invalid parameters or malformed request |
401 | Unauthorized - Invalid or missing API key |
403 | Forbidden - API key doesn't have permission for this endpoint |
413 | Payload Too Large - Text exceeds maximum length (5000 characters) |
422 | Unprocessable Entity - Invalid parameter values or unsupported voice/format |
429 | Too Many Requests - Rate limit exceeded |
500 | Internal Server Error - Server encountered an unexpected condition |
503 | Service Unavailable - Server is temporarily unable to handle the request |