Prompt Chain

Generate Advanced Text-to-Speech Audio

Name: Generate Advanced Text-to-Speech Audio
Availability: OnlineOnly
Author: Promptfoo

A prompt chain example demonstrating advanced text-to-speech capabilities using ElevenLabs TTS provider for multi-step audio generation workflows.

Copy chain

Works with elevenlabs

Promptfoo

Maintainer?

Spark score

out of 100

Updated 2 days ago

Version code-scan-action-0.1

Add to Favorites

Why it matters

Leverage advanced text-to-speech capabilities to generate high-quality audio from text. This asset enables sophisticated audio content creation for various media applications.

Outcomes

What it gets done

Process text input for advanced TTS generation.

Utilize ElevenLabs for high-fidelity voice synthesis.

Generate audio output suitable for media production.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/pfoo-tts-advanced | bash

Capabilities

What this chain does

Transcribe

Converts audio or video speech to written text.

Translate

Converts text between languages while preserving meaning.

Write copy

Drafts marketing, email, or product copy on demand.

Overview

Tts Advanced

What it does

This is a working example that demonstrates advanced text-to-speech capabilities through a multi-step prompt chain workflow. It shows how to configure and use the ElevenLabs TTS provider within a prompt chain architecture, illustrating patterns for complex audio generation tasks that go beyond single-step synthesis.

How it connects

Use this example when you need to understand how to build sophisticated TTS workflows that require multiple processing steps, advanced provider configuration, or sequential audio generation tasks. It's particularly relevant when evaluating or implementing ElevenLabs integration within a prompt chain framework.

Source README

provider-elevenlabs/tts-advanced (ElevenLabs Advanced TTS Features)

This example demonstrates advanced TTS capabilities:

Pronunciation Dictionaries - Custom pronunciation for technical terms
Voice Design - Generate voices from text descriptions
Voice Remixing - Modify existing voices (style, pacing, gender, age)
Streaming with Advanced Features - Combine streaming with pronunciation control

Quick Start

npx promptfoo@latest init --example provider-elevenlabs/tts-advanced
cd provider-elevenlabs/tts-advanced
export ELEVENLABS_API_KEY=your_api_key_here
npx promptfoo@latest eval

Features Demonstrated

1. Pronunciation Dictionaries

Control how technical terms, acronyms, and brand names are pronounced.

Use Case: Technical documentation, product demos, brand-specific content

providers:
  - id: elevenlabs:tts
    config:
      pronunciationRules:
        # Spell out acronyms
        - word: API
          pronunciation: A-P-I

        # Custom pronunciation
        - word: SQL
          pronunciation: sequel

        # Multi-word terms
        - word: PostgreSQL
          pronunciation: post-gres-Q-L

        # Brand names
        - word: OpenAI
          pronunciation: open-A-I

Common Use Cases:

Technical Content

pronunciationRules:
  - word: JavaScript
    pronunciation: java-script
  - word: TypeScript
    pronunciation: type-script
  - word: Python
    pronunciation: pie-thon
  - word: Node.js
    pronunciation: node-jay-ess
  - word: GraphQL
    pronunciation: graph-Q-L

Medical/Scientific Terms

pronunciationRules:
  - word: COVID-19
    pronunciation: covid-nineteen
  - word: mRNA
    pronunciation: messenger-R-N-A
  - word: DNA
    pronunciation: D-N-A

Brand Names & Products

pronunciationRules:
  - word: Anthropic
    pronunciation: an-throw-pick
  - word: Llama
    pronunciation: lama
  - word: ChatGPT
    pronunciation: chat-G-P-T

2. Voice Design

Generate custom voices from natural language descriptions.

Use Case: Create unique voices for specific content types or brand identities

providers:
  - id: elevenlabs:tts
    config:
      voiceDesign:
        description: A warm, professional voice with excellent clarity and a slight smile in the tone, perfect for technical documentation
        gender: female
        age: middle_aged
        accent: american
        accentStrength: 0.5 # 0-2, subtle to strong

Voice Design Templates:

Professional Voices

# Corporate Presenter
voiceDesign:
  description: A confident, authoritative voice with clear articulation, perfect for business presentations
  gender: male
  age: middle_aged
  accent: american

# Educational Instructor
voiceDesign:
  description: A warm, patient voice with excellent clarity, ideal for educational content
  gender: female
  age: middle_aged
  accent: british

Friendly & Conversational

# Customer Service
voiceDesign:
  description: A friendly, approachable voice with a smile in the tone, great for customer interactions
  gender: female
  age: young
  accent: american

# Podcast Host
voiceDesign:
  description: A casual, engaging voice with natural conversational flow, perfect for podcasts
  gender: male
  age: young
  accent: australian

Narrative & Storytelling

# Audiobook Narrator
voiceDesign:
  description: A deep, resonant voice with storytelling quality and emotional range
  gender: male
  age: middle_aged
  accent: british

# Meditation Guide
voiceDesign:
  description: A soothing, tranquil voice with calming tones and gentle pacing
  gender: female
  age: middle_aged
  accent: american
  accentStrength: 0.3

3. Voice Remixing

Modify existing voices to change their characteristics.

Use Case: Adapt pre-made voices for different contexts or emotions

providers:
  # Make a voice more energetic
  - id: elevenlabs:tts:energetic
    config:
      voiceId: 21m00Tcm4TlvDq8ikWAM # Rachel
      voiceRemix:
        style: energetic
        pacing: fast
        promptStrength: medium # low, medium, high, max

  # Make a voice calmer and slower
  - id: elevenlabs:tts:calm
    config:
      voiceId: 21m00Tcm4TlvDq8ikWAM
      voiceRemix:
        style: calm
        pacing: slow
        promptStrength: high

Remix Parameters:

Parameter	Options	Use Case
`style`	energetic, calm, professional, casual, dramatic	Match voice to content mood
`pacing`	slow, normal, fast	Adjust speech speed
`gender`	male, female	Change voice gender
`age`	young, middle_aged, old	Adjust perceived age
`accent`	american, british, australian, etc.	Change accent
`promptStrength`	low, medium, high, max	How strongly to apply changes

Common Remix Scenarios:

# Sports Commentary (Energetic & Fast)
voiceRemix:
  style: energetic
  pacing: fast
  promptStrength: max

# ASMR Content (Calm & Slow)
voiceRemix:
  style: calm
  pacing: slow
  promptStrength: high

# News Anchor (Professional & Measured)
voiceRemix:
  style: professional
  pacing: normal
  promptStrength: medium

# Storytelling (Dramatic & Expressive)
voiceRemix:
  style: dramatic
  pacing: normal
  promptStrength: high

Advanced Combinations

Streaming + Pronunciation

Combine real-time streaming with custom pronunciation:

providers:
  - id: elevenlabs:tts
    config:
      streaming: true
      pronunciationRules:
        - word: API
          pronunciation: A-P-I
        - word: WebSocket
          pronunciation: web-socket

Benefits:

~75ms first chunk latency
Custom pronunciation for technical terms
Ideal for live demos and interactive applications

Voice Design + Pronunciation

Create a custom voice with domain-specific pronunciation:

providers:
  - id: elevenlabs:tts
    config:
      voiceDesign:
        description: A friendly tech educator with clear pronunciation
        gender: female
        age: middle_aged
      pronunciationRules:
        - word: Python
          pronunciation: pie-thon
        - word: JavaScript
          pronunciation: java-script

Cost Optimization

All advanced features use the same character-based pricing as basic TTS:

~~$0.00002 per character (~~$0.02 per 1000 characters)
Free tier: 10,000 characters/month

Cost Tracking:

tests:
  - assert:
      - type: cost
        threshold: 0.05 # Max $0.05 per test

Testing Assertions

Pronunciation Accuracy

tests:
  - description: Verify tech terms are included
    vars:
      expectedTerms:
        - API
        - SQL
        - JavaScript
    assert:
      - type: javascript
        value: |
          const terms = context.vars.expectedTerms;
          terms.every(term => output.includes(term))

Voice Quality Comparison

tests:
  - description: Compare baseline vs custom pronunciation
    vars:
      baseline: '{{providers[0].output}}'
      custom: '{{providers[1].output}}'
    assert:
      - type: javascript
        value: |
          // Both should succeed
          !context.vars.baseline.includes('error') &&
          !context.vars.custom.includes('error')

Latency with Advanced Features

tests:
  - description: Ensure advanced features don't slow generation
    assert:
      - type: latency
        threshold: 8000 # 8 seconds max

Real-World Use Cases

1. Technical Documentation

config:
  voiceDesign:
    description: Clear, professional voice for technical content
    gender: female
    age: middle_aged
  pronunciationRules:
    - word: API
      pronunciation: A-P-I
    - word: REST
      pronunciation: rest
    - word: GraphQL
      pronunciation: graph-Q-L
    - word: WebSocket
      pronunciation: web-socket
    - word: JSON
      pronunciation: jay-sawn
    - word: YAML
      pronunciation: yam-mel

2. Brand-Specific Content

config:
  voiceId: your-brand-voice-id
  voiceRemix:
    style: professional
    pacing: normal
  pronunciationRules:
    - word: YourProduct
      pronunciation: your-product
    - word: YourCompany
      pronunciation: your-company

3. Multi-Language Support

# English with British accent
providers:
  - id: elevenlabs:tts:en-gb
    config:
      voiceDesign:
        description: British English speaker
        accent: british
        accentStrength: 1.5

  # English with American accent
  - id: elevenlabs:tts:en-us
    config:
      voiceDesign:
        description: American English speaker
        accent: american
        accentStrength: 1.0

4. Dynamic Content Adaptation

# Morning news (Energetic)
providers:
  - id: elevenlabs:tts:morning
    config:
      voiceId: news-anchor-voice
      voiceRemix:
        style: energetic
        pacing: fast

  # Evening news (Calm)
  - id: elevenlabs:tts:evening
    config:
      voiceId: news-anchor-voice
      voiceRemix:
        style: calm
        pacing: normal

Troubleshooting

Voice Design Not Working

Error: Voice design failed

Solutions:

Ensure description is detailed (minimum 10 characters)
Specify gender and age for better results
Check API quota (voice design uses generation credits)

Pronunciation Not Applied

Warning: Pronunciation dictionary not found

Solutions:

Verify pronunciation rules syntax
Ensure words match exactly (case-sensitive)
Check that you're not using both pronunciationDictionaryId and pronunciationRules

Remix Changes Too Subtle

Issue: Voice sounds the same after remix

Solutions:

Increase promptStrength from medium to high or max
Make more significant parameter changes
Some voices have limited remix range - try a different base voice

API Reference

Pronunciation Dictionary Options

Option	Type	Description
`pronunciationRules`	`PronunciationRule[]`	Array of pronunciation rules
`pronunciationDictionaryId`	string	Use existing dictionary by ID

PronunciationRule:

{
  word: string;           // Word to customize
  pronunciation: string;  // Phonetic pronunciation
  phoneme?: string;       // IPA/CMU phoneme (advanced)
  alphabet?: 'ipa' | 'cmu';  // Phonetic alphabet
}

Voice Design Options

{
  description: string;    // Natural language description
  gender?: 'male' | 'female';
  age?: 'young' | 'middle_aged' | 'old';
  accent?: string;        // e.g., 'british', 'american'
  accentStrength?: number;  // 0-2, default 1.0
  sampleText?: string;    // Optional sample for preview
}

Voice Remix Options

{
  style?: string;         // e.g., 'energetic', 'calm'
  pacing?: 'slow' | 'normal' | 'fast';
  gender?: 'male' | 'female';
  age?: 'young' | 'middle_aged' | 'old';
  accent?: string;
  promptStrength?: 'low' | 'medium' | 'high' | 'max';
}

Generate Advanced Text-to-Speech Audio

What it gets done

Add it to your toolbox

What this chain does

Tts Advanced

What it does

How it connects

provider-elevenlabs/tts-advanced (ElevenLabs Advanced TTS Features)

Quick Start

Features Demonstrated

1. Pronunciation Dictionaries

2. Voice Design

Professional Voices

Friendly & Conversational

Narrative & Storytelling

3. Voice Remixing

Advanced Combinations

Streaming + Pronunciation

Voice Design + Pronunciation

Cost Optimization

Testing Assertions

Pronunciation Accuracy

Voice Quality Comparison

Latency with Advanced Features

Real-World Use Cases

1. Technical Documentation

2. Brand-Specific Content

3. Multi-Language Support

4. Dynamic Content Adaptation

Troubleshooting

Voice Design Not Working

Pronunciation Not Applied

Remix Changes Too Subtle

API Reference

Pronunciation Dictionary Options

Voice Design Options

Voice Remix Options

Related Examples

Resources

Questions & comments · 0