Models
More detailed information about the models is available here.- Multilingual v2 (default): Supports 28 languages, known for its accuracy and stability, especially when using high-quality samples.
- Turbo v2.5: Generates speech in 32 languages with low latency, ideal for real-time applications.
- Turbo v2: Optimized for low-latency English text-to-speech, similar in performance to Turbo v2.5.
- English v1: The oldest and fastest model, best for audiobooks but less accurate.
- Multilingual v1: Experimental, surpassed by Multilingual v2, recommended for short text chunks.
Voice Settings
More detailed information about the voice settings is available here.- Stability: Adjusts the emotional range and consistency of the voice. Lower settings result in more variation and emotion, while higher settings produce a more stable, monotone voice.
- Similarity: Controls how closely the AI matches the original voice. High settings may replicate artifacts from low-quality audio.
- Style Exaggeration: Enhances the speaker’s style, but can affect stability.
- Speaker Boost: Increases the likeness to the original speaker, useful for weaker voices.
Text to Speech (TTS)
Step by step to Generate Text to Speech Audio- Text Input: Type or paste your text into the input box on the Speech Synthesis page.
- Select Voice: Select the voice you wish to use from your Voices at the bottom left of the screen.
- Adjust Settings: Modify the voice settings for the desired output.

Speech to Speech (STS)
Steps to Generate Speech to Speech Audio- Audio Input: Upload or record audio via the input box on the Speech Synthesis page.
- Select Voice: Select the voice you wish to use from your Voices at the bottom left of the screen.
- Adjust Settings: Modify the voice settings for the desired output.

Speech to Speech is great for getting the right emotion across when the Text
to Speech can’t get it right.