Prompt Chain

Generate Text-to-Speech Audio

Name: Generate Text-to-Speech Audio
Availability: OnlineOnly
Author: Promptfoo

A promptfoo workflow example demonstrating text-to-speech integration with ElevenLabs provider for testing and evaluating TTS prompt configurations.

Copy chain

Works with elevenlabs

Promptfoo

Maintainer?

Spark score

out of 100

Updated yesterday

Version code-scan-action-0.1

Models

universal

Add to Favorites

Why it matters

Leverage the ElevenLabs API to convert text into natural-sounding speech. This prompt chain is designed to facilitate the creation of audio content from written material.

Outcomes

What it gets done

Convert text input into spoken audio using ElevenLabs.

Integrate with the ElevenLabs API for high-quality voice generation.

Facilitate the creation of audio assets for various media applications.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/pfoo-tts | bash

Capabilities

What this chain does

Generate video

Produces video clips from scripts or storyboards.

Transcribe

Converts audio or video speech to written text.

Overview

Tts

What it does

This is a runnable example workflow from the promptfoo repository that demonstrates text-to-speech integration with the ElevenLabs provider. It provides a reference configuration for testing TTS prompt chains, allowing developers to evaluate voice synthesis outputs within the promptfoo testing framework.

How it connects

Use this example when you need to set up TTS testing in promptfoo, want to understand how to configure the ElevenLabs provider for voice synthesis, or need a starting point for building audio generation evaluation workflows. It's ideal for developers integrating text-to-speech capabilities into their AI applications.

Source README

provider-elevenlabs/tts (ElevenLabs Text-to-Speech)

You can run this example with:

npx promptfoo@latest init --example provider-elevenlabs/tts
cd provider-elevenlabs/tts

Test and compare ElevenLabs TTS models and voice settings.

What this tests

Model comparison: Flash v2.5, Turbo v2.5, Multilingual v2
Streaming vs. non-streaming performance
Voice quality across different text inputs
Cost and latency metrics

Setup

Set your ElevenLabs API key:

export ELEVENLABS_API_KEY=your_api_key_here

Run the example

npx promptfoo@latest eval -c ./promptfooconfig.yaml

Or view in the UI:

npx promptfoo@latest eval -c ./promptfooconfig.yaml
npx promptfoo@latest view

What to look for

Model differences: Flash v2.5 has lowest latency (~200ms), Multilingual v2 best quality
Streaming benefits: First chunk arrives in ~75ms for real-time feel
Cost tracking: ~$0.02 per 1000 characters
Audio metadata: Duration, size, format info in response

Available voices

This example uses Rachel (21m00Tcm4TlvDq8ikWAM). Try other popular voices:

Rachel: Calm, clear female voice (default)
Clyde: Warm, grounded male voice (2EiwWnXFnvU5JabPnv8n)
Drew: Well-rounded male voice (29vD33N1CtxCmqQRPOHJ)
Paul: Casual male voice (5Q0t7uMcjvnagumLfvZi)

Voice settings

Customize the voice output:

voiceSettings:
  stability: 0.5 # 0 (more variable) to 1 (more stable)
  similarity_boost: 0.75 # 0 (low) to 1 (high)
  style: 0.0 # 0 to 1 (only for v2 models)
  use_speaker_boost: true # Enhance clarity
  speed: 1.0 # 0.25 to 4.0

Output formats

Available formats:

mp3_22050_32 - Smallest size, lower quality
mp3_44100_128 - Balanced (default)
mp3_44100_192 - High quality
pcm_16000 - Raw PCM for processing
pcm_44100 - High quality PCM
ulaw_8000 - Phone quality

Learn more

Discussion