Prompt Chain

Generate Text-to-Speech Audio

A promptfoo workflow example demonstrating text-to-speech integration with ElevenLabs provider for testing and evaluating TTS prompt configurations.

Works with elevenlabs

54
Spark score
out of 100
Updated yesterday
Version code-scan-action-0.1
Models

Add to Favorites

Why it matters

Leverage the ElevenLabs API to convert text into natural-sounding speech. This prompt chain is designed to facilitate the creation of audio content from written material.

Outcomes

What it gets done

01

Convert text input into spoken audio using ElevenLabs.

02

Integrate with the ElevenLabs API for high-quality voice generation.

03

Facilitate the creation of audio assets for various media applications.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/pfoo-tts | bash

Capabilities

What this chain does

Generate video

Produces video clips from scripts or storyboards.

Transcribe

Converts audio or video speech to written text.

Overview

Tts

What it does

This is a runnable example workflow from the promptfoo repository that demonstrates text-to-speech integration with the ElevenLabs provider. It provides a reference configuration for testing TTS prompt chains, allowing developers to evaluate voice synthesis outputs within the promptfoo testing framework.

How it connects

Use this example when you need to set up TTS testing in promptfoo, want to understand how to configure the ElevenLabs provider for voice synthesis, or need a starting point for building audio generation evaluation workflows. It's ideal for developers integrating text-to-speech capabilities into their AI applications.

Source README

provider-elevenlabs/tts (ElevenLabs Text-to-Speech)

You can run this example with:

npx promptfoo@latest init --example provider-elevenlabs/tts
cd provider-elevenlabs/tts

Test and compare ElevenLabs TTS models and voice settings.

What this tests

  • Model comparison: Flash v2.5, Turbo v2.5, Multilingual v2
  • Streaming vs. non-streaming performance
  • Voice quality across different text inputs
  • Cost and latency metrics

Setup

Set your ElevenLabs API key:

export ELEVENLABS_API_KEY=your_api_key_here

Run the example

npx promptfoo@latest eval -c ./promptfooconfig.yaml

Or view in the UI:

npx promptfoo@latest eval -c ./promptfooconfig.yaml
npx promptfoo@latest view

What to look for

  1. Model differences: Flash v2.5 has lowest latency (~200ms), Multilingual v2 best quality
  2. Streaming benefits: First chunk arrives in ~75ms for real-time feel
  3. Cost tracking: ~$0.02 per 1000 characters
  4. Audio metadata: Duration, size, format info in response

Available voices

This example uses Rachel (21m00Tcm4TlvDq8ikWAM). Try other popular voices:

  • Rachel: Calm, clear female voice (default)
  • Clyde: Warm, grounded male voice (2EiwWnXFnvU5JabPnv8n)
  • Drew: Well-rounded male voice (29vD33N1CtxCmqQRPOHJ)
  • Paul: Casual male voice (5Q0t7uMcjvnagumLfvZi)

Voice settings

Customize the voice output:

voiceSettings:
  stability: 0.5 # 0 (more variable) to 1 (more stable)
  similarity_boost: 0.75 # 0 (low) to 1 (high)
  style: 0.0 # 0 to 1 (only for v2 models)
  use_speaker_boost: true # Enhance clarity
  speed: 1.0 # 0.25 to 4.0

Output formats

Available formats:

  • mp3_22050_32 - Smallest size, lower quality
  • mp3_44100_128 - Balanced (default)
  • mp3_44100_192 - High quality
  • pcm_16000 - Raw PCM for processing
  • pcm_44100 - High quality PCM
  • ulaw_8000 - Phone quality

Learn more

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.