Speech Synthesis allows you to generate lifelike speech from text (Text to Speech) or audio (Speech to Speech) inputs. In this section, you can also see your generation history and thus retrieve past generations.Selecting Advanced Mode allows you to select the model you would like to use for your generation as well as the voice settings (Stability, Similarity, Style, and Speaker Boost) on top of the existing options with Standard.Let’s touch on models and voice settings briefly before generating our audio clip.
More detailed information about the voice settings is available here.
Stability: Adjusts the emotional range and consistency of the voice. Lower settings result in more variation and emotion, while higher settings produce a more stable, monotone voice.
Similarity: Controls how closely the AI matches the original voice. High settings may replicate artifacts from low-quality audio.
Style Exaggeration: Enhances the speaker’s style, but can affect stability.
Speaker Boost: Increases the likeness to the original speaker, useful for weaker voices.
Now that we understand models and voice settings a bit better, let’s jump into generating audio!
Audio Input: Upload or record audio via the input box on the Speech Synthesis page.
Select Voice: Select the voice you wish to use from your Voices at the bottom left of the screen.
Adjust Settings: Modify the voice settings for the desired output.
Generate: Click the ‘Generate’ button to create your audio file.
Speech to Speech is great for getting the right emotion across when the Text
to Speech can’t get it right.
Exercise: Record yourself saying and generate “The cellar door is open, revealing a world of hidden treasures.” using Brian’s voice (or a voice of your choosing).