cantonese.aiAPI Reference

Create Voice

Create a custom voice by uploading an audio sample. The audio is validated for speech content, uploaded to storage, and a new voice record is created that can be used with the Text-to-Speech and Voice Conversion endpoints.

Request Parameters

This endpoint requires multipart/form-data for file uploads. Authentication is required via session (not API key).

ParameterTypeRequiredDescription
datafileYesAudio sample file for the voice. Must contain clear speech. Duration must be between the configured minimum and maximum seconds.
namestringNoA display name for the voice.
languagestringNoLanguage of the voice. Defaults to "cantonese".
descriptionstringNoA description of the voice.
genderstringNoGender of the voice. Options: "male", "female", "unknown".
agestringNoAge group of the voice. Options: "youth", "young_adult", "adult", "middle_aged", "senior", "unknown".

How It Works

1

Validate Audio Duration

The uploaded audio file is checked to ensure its duration falls within the allowed range.

2

Validate Speech Content

The audio is transcribed to verify it contains clear speech. Files with no detected speech are rejected.

3

Upload & Create Voice

The audio sample is uploaded to storage and a new voice record is created with a unique voice ID.

Response

On success, the API returns a JSON response with the new voice ID.

Success Response

{
  "success": true,
  "voice_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Status Codes

The API returns standard HTTP status codes to indicate the success or failure of requests.

Status CodeDescription
200Success - Voice created successfully, returns voice_id
400Bad Request - Audio duration out of range or no speech detected in the audio file
401Unauthorized - User is not authenticated or is blocked
403Forbidden - Custom voice quota exceeded. Upgrade your TTS plan to create more voices.
500Internal Server Error - Failed to upload file, create voice record, or validate audio content