Create Voice
Create a custom voice by uploading an audio sample. The audio is validated for speech content, uploaded to storage, and a new voice record is created that can be used with the Text-to-Speech and Voice Conversion endpoints.
Request Parameters
This endpoint requires multipart/form-data for file uploads. Authentication is required via session (not API key).
| Parameter | Type | Required | Description |
|---|---|---|---|
| data | file | Yes | Audio sample file for the voice. Must contain clear speech. Duration must be between the configured minimum and maximum seconds. |
| name | string | No | A display name for the voice. |
| language | string | No | Language of the voice. Defaults to "cantonese". |
| description | string | No | A description of the voice. |
| gender | string | No | Gender of the voice. Options: "male", "female", "unknown". |
| age | string | No | Age group of the voice. Options: "youth", "young_adult", "adult", "middle_aged", "senior", "unknown". |
How It Works
Validate Audio Duration
The uploaded audio file is checked to ensure its duration falls within the allowed range.
Validate Speech Content
The audio is transcribed to verify it contains clear speech. Files with no detected speech are rejected.
Upload & Create Voice
The audio sample is uploaded to storage and a new voice record is created with a unique voice ID.
Response
On success, the API returns a JSON response with the new voice ID.
Success Response
{
"success": true,
"voice_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}Status Codes
The API returns standard HTTP status codes to indicate the success or failure of requests.
| Status Code | Description |
|---|---|
| 200 | Success - Voice created successfully, returns voice_id |
| 400 | Bad Request - Audio duration out of range or no speech detected in the audio file |
| 401 | Unauthorized - User is not authenticated or is blocked |
| 403 | Forbidden - Custom voice quota exceeded. Upgrade your TTS plan to create more voices. |
| 500 | Internal Server Error - Failed to upload file, create voice record, or validate audio content |