Generate Text-to-Speech Audio
A promptfoo workflow example demonstrating text-to-speech integration with ElevenLabs provider for testing and evaluating TTS prompt configurations.
Why it matters
Leverage the ElevenLabs API to convert text into natural-sounding speech. This prompt chain is designed to facilitate the creation of audio content from written material.
Outcomes
What it gets done
Convert text input into spoken audio using ElevenLabs.
Integrate with the ElevenLabs API for high-quality voice generation.
Facilitate the creation of audio assets for various media applications.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/pfoo-tts | bash Capabilities
What this chain does
Produces video clips from scripts or storyboards.
Converts audio or video speech to written text.
Overview
Tts
What it does
This is a runnable example workflow from the promptfoo repository that demonstrates text-to-speech integration with the ElevenLabs provider. It provides a reference configuration for testing TTS prompt chains, allowing developers to evaluate voice synthesis outputs within the promptfoo testing framework.
How it connects
Use this example when you need to set up TTS testing in promptfoo, want to understand how to configure the ElevenLabs provider for voice synthesis, or need a starting point for building audio generation evaluation workflows. It's ideal for developers integrating text-to-speech capabilities into their AI applications.
Source README
provider-elevenlabs/tts (ElevenLabs Text-to-Speech)
You can run this example with:
npx promptfoo@latest init --example provider-elevenlabs/tts
cd provider-elevenlabs/tts
Test and compare ElevenLabs TTS models and voice settings.
What this tests
- Model comparison: Flash v2.5, Turbo v2.5, Multilingual v2
- Streaming vs. non-streaming performance
- Voice quality across different text inputs
- Cost and latency metrics
Setup
Set your ElevenLabs API key:
export ELEVENLABS_API_KEY=your_api_key_here
Run the example
npx promptfoo@latest eval -c ./promptfooconfig.yaml
Or view in the UI:
npx promptfoo@latest eval -c ./promptfooconfig.yaml
npx promptfoo@latest view
What to look for
- Model differences: Flash v2.5 has lowest latency (~200ms), Multilingual v2 best quality
- Streaming benefits: First chunk arrives in ~75ms for real-time feel
- Cost tracking: ~$0.02 per 1000 characters
- Audio metadata: Duration, size, format info in response
Available voices
This example uses Rachel (21m00Tcm4TlvDq8ikWAM). Try other popular voices:
- Rachel: Calm, clear female voice (default)
- Clyde: Warm, grounded male voice (2EiwWnXFnvU5JabPnv8n)
- Drew: Well-rounded male voice (29vD33N1CtxCmqQRPOHJ)
- Paul: Casual male voice (5Q0t7uMcjvnagumLfvZi)
Voice settings
Customize the voice output:
voiceSettings:
stability: 0.5 # 0 (more variable) to 1 (more stable)
similarity_boost: 0.75 # 0 (low) to 1 (high)
style: 0.0 # 0 to 1 (only for v2 models)
use_speaker_boost: true # Enhance clarity
speed: 1.0 # 0.25 to 4.0
Output formats
Available formats:
mp3_22050_32- Smallest size, lower qualitymp3_44100_128- Balanced (default)mp3_44100_192- High qualitypcm_16000- Raw PCM for processingpcm_44100- High quality PCMulaw_8000- Phone quality
Learn more
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.