cantonese.aiAPI Reference

Text-to-Speech

Convert text to natural-sounding Cantonese speech. This endpoint supports multiple voice options, audio formats, and customization parameters.

Request Body

ParameterTypeRequiredDescription
api_keystringYesYour API key for authentication.
textstringYesThe text to convert to speech. Maximum 5000 characters.
frame_ratestringNoAudio frame rate in Hz. Common values: "16000", "24000", "44100". Defaults to "24000".
speednumberNoSpeech speed multiplier. Range: 0.5-3.0. Defaults to 1.0.
durationnumberNoTarget duration (seconds).
pitchnumberNoPitch adjustment in semitones. Range: -12 to +12. Defaults to 0.
languagestringNoLanguage code. Defaults to "cantonese". Options: "cantonese", "english", "mandarin".
output_extensionstringNoAudio output format. Defaults to "mp3". Options: "mp3", "wav", "ogg", "flac".
voice_idstringNoUnique identifier for the voice to use. Defaults to system default voice.
should_enhancebooleanNoWhether to apply audio enhancement. Defaults to false.
should_convert_from_simplified_to_traditionalbooleanNoWhether to convert simplified Chinese to traditional Chinese before synthesis. Defaults to false.
should_return_timestampbooleanNoDefaults to false.
should_use_turbo_modelbooleanNoDefaults to false.

Turbo Model v1

Enables faster speech synthesis for improved performance.
Supported voices available in the voice library.

Response Types

The API supports two different response formats depending on the should_return_timestamp parameter:

🎵 Audio File Response

Direct Audio File

When should_return_timestamp = false (default), the API returns a direct audio file.

curl -X POST "https://cantonese.ai/api/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "api_key": "YOUR_API_KEY",
    "text": "你今日食咗飯未?",
    "frame_rate": "24000",
    "speed": 1,
    "duration": 2,
    "pitch": 0,
    "language": "cantonese",
    "output_extension": "wav",
    "voice_id": "2725cf0f-efe2-4132-9e06-62ad84b2973d",
    "should_enhance": false,
    "should_convert_from_simplified_to_traditional": true,
    "should_return_timestamp": false,
    "should_use_turbo_model": false
  }' \
  --output output.wav

📁 Output Format

Direct audio file in the requested format: .mp3, .wav, .ogg, or .flac

📊 JSON Response with Timestamps

JSON with Base64 Audio + Timestamps

When should_return_timestamp = true, the API returns a JSON response with base64-encoded audio and timing data.

curl -X POST "https://cantonese.ai/api/tts" \
  -H "Content-Type: application/json" \
  -d '{
    "api_key": "YOUR_API_KEY",
    "text": "你今日食咗飯未?",
    "frame_rate": "24000",
    "speed": 1,
    "duration": 2,
    "pitch": 0,
    "language": "cantonese",
    "voice_id": "2725cf0f-efe2-4132-9e06-62ad84b2973d",
    "should_enhance": false,
    "should_convert_from_simplified_to_traditional": true,
    "should_return_timestamp": true,
    "should_use_turbo_model": false
  }'

📋 JSON Response Structure

fileBase64-encoded audio file in the requested format
request_idUnique identifier for this request
srt_timestampSubtitle timestamps in SRT format
timestampsArray of word-level timing data with start/end times and text
{
  "file": "Z+AOEA4wHjAXf7s/Qw7uXoYuwz8LD22PVH8gzwR+os6zrq...",
  "request_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef",
  "srt_timestamp": "1\n00:00:00,000 --> 00:00:01,984\n你今日食咗飯未\n\n",
  "timestamps": [
    {
      "start": 0,
      "end": 1.984,
      "text": "你今日食咗飯未"
    }
  ]
}

Status Codes

The API returns standard HTTP status codes to indicate the success or failure of requests.

Status CodeDescription
200Success - Audio file generated successfully
400Bad Request - Invalid parameters or malformed request
401Unauthorized - Invalid or missing API key
403Forbidden - API key doesn't have permission for this endpoint
413Payload Too Large - Text exceeds maximum length (5000 characters)
422Unprocessable Entity - Invalid parameter values or unsupported voice/format
429Too Many Requests - Rate limit exceeded
500Internal Server Error - Server encountered an unexpected condition
503Service Unavailable - Server is temporarily unable to handle the request