Prompt Chain

Generate Advanced Text-to-Speech Audio

A prompt chain example demonstrating advanced text-to-speech capabilities using ElevenLabs TTS provider for multi-step audio generation workflows.

Works with elevenlabs

54
Spark score
out of 100
Updated 2 days ago
Version code-scan-action-0.1

Add to Favorites

Why it matters

Leverage advanced text-to-speech capabilities to generate high-quality audio from text. This asset enables sophisticated audio content creation for various media applications.

Outcomes

What it gets done

01

Process text input for advanced TTS generation.

02

Utilize ElevenLabs for high-fidelity voice synthesis.

03

Generate audio output suitable for media production.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/pfoo-tts-advanced | bash

Capabilities

What this chain does

Transcribe

Converts audio or video speech to written text.

Translate

Converts text between languages while preserving meaning.

Write copy

Drafts marketing, email, or product copy on demand.

Overview

Tts Advanced

What it does

This is a working example that demonstrates advanced text-to-speech capabilities through a multi-step prompt chain workflow. It shows how to configure and use the ElevenLabs TTS provider within a prompt chain architecture, illustrating patterns for complex audio generation tasks that go beyond single-step synthesis.

How it connects

Use this example when you need to understand how to build sophisticated TTS workflows that require multiple processing steps, advanced provider configuration, or sequential audio generation tasks. It's particularly relevant when evaluating or implementing ElevenLabs integration within a prompt chain framework.

Source README

provider-elevenlabs/tts-advanced (ElevenLabs Advanced TTS Features)

This example demonstrates advanced TTS capabilities:

  • Pronunciation Dictionaries - Custom pronunciation for technical terms
  • Voice Design - Generate voices from text descriptions
  • Voice Remixing - Modify existing voices (style, pacing, gender, age)
  • Streaming with Advanced Features - Combine streaming with pronunciation control

Quick Start

npx promptfoo@latest init --example provider-elevenlabs/tts-advanced
cd provider-elevenlabs/tts-advanced
export ELEVENLABS_API_KEY=your_api_key_here
npx promptfoo@latest eval

Features Demonstrated

1. Pronunciation Dictionaries

Control how technical terms, acronyms, and brand names are pronounced.

Use Case: Technical documentation, product demos, brand-specific content

providers:
  - id: elevenlabs:tts
    config:
      pronunciationRules:
        # Spell out acronyms
        - word: API
          pronunciation: A-P-I

        # Custom pronunciation
        - word: SQL
          pronunciation: sequel

        # Multi-word terms
        - word: PostgreSQL
          pronunciation: post-gres-Q-L

        # Brand names
        - word: OpenAI
          pronunciation: open-A-I

Common Use Cases:

  1. Technical Content

    pronunciationRules:
      - word: JavaScript
        pronunciation: java-script
      - word: TypeScript
        pronunciation: type-script
      - word: Python
        pronunciation: pie-thon
      - word: Node.js
        pronunciation: node-jay-ess
      - word: GraphQL
        pronunciation: graph-Q-L
    
  2. Medical/Scientific Terms

    pronunciationRules:
      - word: COVID-19
        pronunciation: covid-nineteen
      - word: mRNA
        pronunciation: messenger-R-N-A
      - word: DNA
        pronunciation: D-N-A
    
  3. Brand Names & Products

    pronunciationRules:
      - word: Anthropic
        pronunciation: an-throw-pick
      - word: Llama
        pronunciation: lama
      - word: ChatGPT
        pronunciation: chat-G-P-T
    

2. Voice Design

Generate custom voices from natural language descriptions.

Use Case: Create unique voices for specific content types or brand identities

providers:
  - id: elevenlabs:tts
    config:
      voiceDesign:
        description: A warm, professional voice with excellent clarity and a slight smile in the tone, perfect for technical documentation
        gender: female
        age: middle_aged
        accent: american
        accentStrength: 0.5 # 0-2, subtle to strong

Voice Design Templates:

Professional Voices
# Corporate Presenter
voiceDesign:
  description: A confident, authoritative voice with clear articulation, perfect for business presentations
  gender: male
  age: middle_aged
  accent: american

# Educational Instructor
voiceDesign:
  description: A warm, patient voice with excellent clarity, ideal for educational content
  gender: female
  age: middle_aged
  accent: british
Friendly & Conversational
# Customer Service
voiceDesign:
  description: A friendly, approachable voice with a smile in the tone, great for customer interactions
  gender: female
  age: young
  accent: american

# Podcast Host
voiceDesign:
  description: A casual, engaging voice with natural conversational flow, perfect for podcasts
  gender: male
  age: young
  accent: australian
Narrative & Storytelling
# Audiobook Narrator
voiceDesign:
  description: A deep, resonant voice with storytelling quality and emotional range
  gender: male
  age: middle_aged
  accent: british

# Meditation Guide
voiceDesign:
  description: A soothing, tranquil voice with calming tones and gentle pacing
  gender: female
  age: middle_aged
  accent: american
  accentStrength: 0.3

3. Voice Remixing

Modify existing voices to change their characteristics.

Use Case: Adapt pre-made voices for different contexts or emotions

providers:
  # Make a voice more energetic
  - id: elevenlabs:tts:energetic
    config:
      voiceId: 21m00Tcm4TlvDq8ikWAM # Rachel
      voiceRemix:
        style: energetic
        pacing: fast
        promptStrength: medium # low, medium, high, max

  # Make a voice calmer and slower
  - id: elevenlabs:tts:calm
    config:
      voiceId: 21m00Tcm4TlvDq8ikWAM
      voiceRemix:
        style: calm
        pacing: slow
        promptStrength: high

Remix Parameters:

Parameter Options Use Case
style energetic, calm, professional, casual, dramatic Match voice to content mood
pacing slow, normal, fast Adjust speech speed
gender male, female Change voice gender
age young, middle_aged, old Adjust perceived age
accent american, british, australian, etc. Change accent
promptStrength low, medium, high, max How strongly to apply changes

Common Remix Scenarios:

# Sports Commentary (Energetic & Fast)
voiceRemix:
  style: energetic
  pacing: fast
  promptStrength: max

# ASMR Content (Calm & Slow)
voiceRemix:
  style: calm
  pacing: slow
  promptStrength: high

# News Anchor (Professional & Measured)
voiceRemix:
  style: professional
  pacing: normal
  promptStrength: medium

# Storytelling (Dramatic & Expressive)
voiceRemix:
  style: dramatic
  pacing: normal
  promptStrength: high

Advanced Combinations

Streaming + Pronunciation

Combine real-time streaming with custom pronunciation:

providers:
  - id: elevenlabs:tts
    config:
      streaming: true
      pronunciationRules:
        - word: API
          pronunciation: A-P-I
        - word: WebSocket
          pronunciation: web-socket

Benefits:

  • ~75ms first chunk latency
  • Custom pronunciation for technical terms
  • Ideal for live demos and interactive applications

Voice Design + Pronunciation

Create a custom voice with domain-specific pronunciation:

providers:
  - id: elevenlabs:tts
    config:
      voiceDesign:
        description: A friendly tech educator with clear pronunciation
        gender: female
        age: middle_aged
      pronunciationRules:
        - word: Python
          pronunciation: pie-thon
        - word: JavaScript
          pronunciation: java-script

Cost Optimization

All advanced features use the same character-based pricing as basic TTS:

  • $0.00002 per character ($0.02 per 1000 characters)
  • Free tier: 10,000 characters/month

Cost Tracking:

tests:
  - assert:
      - type: cost
        threshold: 0.05 # Max $0.05 per test

Testing Assertions

Pronunciation Accuracy

tests:
  - description: Verify tech terms are included
    vars:
      expectedTerms:
        - API
        - SQL
        - JavaScript
    assert:
      - type: javascript
        value: |
          const terms = context.vars.expectedTerms;
          terms.every(term => output.includes(term))

Voice Quality Comparison

tests:
  - description: Compare baseline vs custom pronunciation
    vars:
      baseline: '{{providers[0].output}}'
      custom: '{{providers[1].output}}'
    assert:
      - type: javascript
        value: |
          // Both should succeed
          !context.vars.baseline.includes('error') &&
          !context.vars.custom.includes('error')

Latency with Advanced Features

tests:
  - description: Ensure advanced features don't slow generation
    assert:
      - type: latency
        threshold: 8000 # 8 seconds max

Real-World Use Cases

1. Technical Documentation

config:
  voiceDesign:
    description: Clear, professional voice for technical content
    gender: female
    age: middle_aged
  pronunciationRules:
    - word: API
      pronunciation: A-P-I
    - word: REST
      pronunciation: rest
    - word: GraphQL
      pronunciation: graph-Q-L
    - word: WebSocket
      pronunciation: web-socket
    - word: JSON
      pronunciation: jay-sawn
    - word: YAML
      pronunciation: yam-mel

2. Brand-Specific Content

config:
  voiceId: your-brand-voice-id
  voiceRemix:
    style: professional
    pacing: normal
  pronunciationRules:
    - word: YourProduct
      pronunciation: your-product
    - word: YourCompany
      pronunciation: your-company

3. Multi-Language Support

# English with British accent
providers:
  - id: elevenlabs:tts:en-gb
    config:
      voiceDesign:
        description: British English speaker
        accent: british
        accentStrength: 1.5

  # English with American accent
  - id: elevenlabs:tts:en-us
    config:
      voiceDesign:
        description: American English speaker
        accent: american
        accentStrength: 1.0

4. Dynamic Content Adaptation

# Morning news (Energetic)
providers:
  - id: elevenlabs:tts:morning
    config:
      voiceId: news-anchor-voice
      voiceRemix:
        style: energetic
        pacing: fast

  # Evening news (Calm)
  - id: elevenlabs:tts:evening
    config:
      voiceId: news-anchor-voice
      voiceRemix:
        style: calm
        pacing: normal

Troubleshooting

Voice Design Not Working

Error: Voice design failed

Solutions:

  1. Ensure description is detailed (minimum 10 characters)
  2. Specify gender and age for better results
  3. Check API quota (voice design uses generation credits)

Pronunciation Not Applied

Warning: Pronunciation dictionary not found

Solutions:

  1. Verify pronunciation rules syntax
  2. Ensure words match exactly (case-sensitive)
  3. Check that you're not using both pronunciationDictionaryId and pronunciationRules

Remix Changes Too Subtle

Issue: Voice sounds the same after remix

Solutions:

  1. Increase promptStrength from medium to high or max
  2. Make more significant parameter changes
  3. Some voices have limited remix range - try a different base voice

API Reference

Pronunciation Dictionary Options

Option Type Description
pronunciationRules PronunciationRule[] Array of pronunciation rules
pronunciationDictionaryId string Use existing dictionary by ID

PronunciationRule:

{
  word: string;           // Word to customize
  pronunciation: string;  // Phonetic pronunciation
  phoneme?: string;       // IPA/CMU phoneme (advanced)
  alphabet?: 'ipa' | 'cmu';  // Phonetic alphabet
}

Voice Design Options

{
  description: string;    // Natural language description
  gender?: 'male' | 'female';
  age?: 'young' | 'middle_aged' | 'old';
  accent?: string;        // e.g., 'british', 'american'
  accentStrength?: number;  // 0-2, default 1.0
  sampleText?: string;    // Optional sample for preview
}

Voice Remix Options

{
  style?: string;         // e.g., 'energetic', 'calm'
  pacing?: 'slow' | 'normal' | 'fast';
  gender?: 'male' | 'female';
  age?: 'young' | 'middle_aged' | 'old';
  accent?: string;
  promptStrength?: 'low' | 'medium' | 'high' | 'max';
}

Related Examples

  • Basic TTS - Voice comparison and basic features
  • STT - Speech-to-Text transcription
  • Streaming TTS - Real-time voice generation

Resources

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.