wss://api.elevenlabs.io/v1/text-to-speech/{voice_id}/stream-input?model_id={model}
" "
. In the first message, the text should be a space " "
.chunk_length_schedule
. Unlike flush, try_trigger_generation
will only generate audio if our buffer contains more than a minimum threshold of characters, this is to ensure a higher quality response from our model.Note that overriding the chunk schedule to generate small amounts of text may result in lower quality audio, therefore, only use this parameter if you really need text to be processed immediately. We generally recommend keeping the default value of false
and adjusting the chunk_length_schedule
in the generation_config
instead.true
when you have finished sending text, but want to keep the websocket connection open.This is useful when you want to ensure that the last chunk of audio is generated even when the length of text sent is smaller than the value set in chunk_length_schedule
(e.g. 120 or 50).To understand more about how our websockets buffer text before audio is generated, please refer to this section.""
.True
, audio
will be null.Value | Description |
---|---|
0 | default mode (no latency optimizations) |
1 | normal latency optimizations (about 50% of possible latency improvement of option 3) |
2 | strong latency optimizations (about 75% of possible latency improvement of option 3) |
3 | max latency optimizations |
4 | max latency optimizations, but also with text normalizer turned off for even more latency savings (best latency, but can mispronounce eg numbers and dates). |
0
Value | Description |
---|---|
mp3_44100 | default output format, mp3 with 44.1kHz sample rate |
pcm_16000 | PCM format (S16LE) with 16kHz sample rate |
pcm_22050 | PCM format (S16LE) with 22.05kHz sample rate |
pcm_24000 | PCM format (S16LE) with 24kHz sample rate |
pcm_44100 | PCM format (S16LE) with 44.1kHz sample rate |
ulaw_8000 | μ-law format (mulaw) with 8kHz sample rate. (Note that this format is commonly used for Twilio audio inputs.) |
mp3_44100
20
seconds, with a maximum allowed value of 180
seconds.flush
command. We have advanced settings for changing the chunk schedule, which can improve latency at the cost of quality by generating audio more frequently with smaller text inputs.