This documentation is for developers integrating directly with the ElevenLabs
WebSocket API. For convenience, consider using the official SDKs provided by
ElevenLabs.
Endpoint:
wss://api.elevenlabs.io/v1/convai/conversation?agent_id={agent_id}
Authentication
Using Agent ID
For public agents, you can directly use theagent_id
in the WebSocket URL without additional authentication:
Using a Signed URL
For private agents or conversations requiring authorization, obtain a signed URL from your server, which securely communicates with the ElevenLabs API using your API key.Example using cURL
Request:Never expose your ElevenLabs API key on the client side.
Communication
Client-to-Server Messages
User Audio Chunk
Send audio data from the user to the server. Format:-
Audio Format Requirements:
- PCM 16-bit mono format
- Base64 encoded
- Sample rate of 16,000 Hz
-
Recommended Chunk Duration:
- Send audio chunks approximately every 250 milliseconds (0.25 seconds)
- This equates to chunks of about 4,000 samples at a 16,000 Hz sample rate
-
Optimizing Latency and Efficiency:
- Balance Latency and Efficiency: Sending audio chunks every 250 milliseconds offers a good trade-off between responsiveness and network overhead.
- Adjust Based on Needs:
- Lower Latency Requirements: Decrease the chunk duration to send smaller chunks more frequently.
- Higher Efficiency Requirements: Increase the chunk duration to send larger chunks less frequently.
- Network Conditions: Adapt the chunk size if you experience network constraints or variability.
Pong Message
Respond to serverping
messages by sending a pong
message, ensuring the event_id
matches the one received in the ping
message.
Format:
Conversation Initiation Client Data Override
Send initial conversation configuration to the server with aconversation_initiation_client_data
message, optionally including agent prompt overrides, preferred language, TTS voice settings, and custom LLM parameters that will be used for the conversation.
Format:
Client Tool Result
Respond to serverclient_tool_call
messages by sending a client_tool_result
message, ensuring the tool call id matches the one in the received call message.
Server-to-Client Messages
conversation_initiation_metadata
Provides initial metadata about the conversation. Format:Other Server-to-Client Messages
Type | Purpose |
---|---|
user_transcript | Transcriptions of the user’s speech |
agent_response | Agent’s textual response |
audio | Chunks of the agent’s audio response |
interruption | Indicates that the agent’s response was interrupted |
ping | Server pings to measure latency |
client_tool_call | Initiate client tool call |
Message Formats
user_transcript:Latency Management
To ensure smooth conversations, implement these strategies:- Adaptive Buffering: Adjust audio buffering based on network conditions.
- Jitter Buffer: Implement a jitter buffer to smooth out variations in packet arrival times.
- Ping-Pong Monitoring: Use ping and pong events to measure round-trip time and adjust accordingly.
Security Best Practices
- Rotate API keys regularly and use environment variables to store them.
- Implement rate limiting to prevent abuse.
- Clearly explain the intention when prompting users for microphone access.
- Optimized Chunking: Tweak the audio chunk duration to balance latency and efficiency.