Prompt Chain

Generate Subtitles from Audio and Transcripts

Name: Generate Subtitles from Audio and Transcripts
Availability: OnlineOnly
Author: Promptfoo

Prompt chain that generates time-aligned subtitles in SRT or VTT format from audio files and transcripts using ElevenLabs forced alignment API.

Copy chain

Works with elevenlabs

Promptfoo

Maintainer?

Spark score

out of 100

Updated 2 days ago

Version code-scan-action-0.1

Add to Favorites

Why it matters

Automatically generate time-aligned subtitles (SRT/VTT) from audio files and their corresponding transcripts. This asset leverages ElevenLabs' forced alignment capabilities to precisely synchronize spoken words with timestamps.

Outcomes

What it gets done

Process audio and transcript input.

Perform forced alignment using ElevenLabs.

Generate SRT and VTT subtitle files.

Ensure accurate time synchronization for subtitles.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/pfoo-alignment | bash

Capabilities

What this chain does

Transcribe

Converts audio or video speech to written text.

Extract

Pulls structured data fields from unstructured text.

Summarize

Condenses long documents or threads into key takeaways.

Overview

Alignment

What it does

This prompt chain automates subtitle generation by processing audio files and transcripts through ElevenLabs forced alignment API. It outputs time-synchronized subtitle files in either SRT or VTT format, with word-level timing precision. The workflow handles the technical alignment process that matches transcript text to exact audio timestamps.

How it connects

Use this when you need to add subtitles to videos, podcasts, or other audio content and already have transcripts that need precise timing. It's ideal for content creators, accessibility teams, and media producers who want to automate the time-consuming process of manually syncing captions to audio without sacrificing accuracy.

Source README

provider-elevenlabs/alignment (ElevenLabs Forced Alignment)

Generate time-aligned subtitles (SRT/VTT) from audio and transcripts using ElevenLabs forced alignment.

Quick Start

npx promptfoo@latest init --example provider-elevenlabs/alignment
cd provider-elevenlabs/alignment
export ELEVENLABS_API_KEY=your_api_key_here
npx promptfoo@latest eval

What this tests

Subtitle generation: Create SRT and VTT subtitle files
Word-level alignment: Precise timestamp data for each word
Multiple formats: JSON (raw data), SRT (video players), VTT (web players)
Accuracy: Verify alignment matches audio timing

How it works

Forced alignment takes two inputs:

Audio file: Speech recording (MP3, WAV, etc.)
Transcript: Text of what was spoken

It returns precise timestamps showing when each word was spoken, formatted as subtitles.

Use Cases

Video subtitles: Generate SRT files for video editing software
Web captions: Create VTT files for HTML5 video players
Karaoke apps: Word-level timing for synchronized highlighting
Accessibility: Auto-generate captions for spoken content
Translation sync: Time-align translations to original audio

Output Formats

JSON (Raw alignment data)

{
  "alignment": [
    { "char": "T", "start": 0.0, "end": 0.1 },
    { "char": "h", "start": 0.1, "end": 0.15 }
  ],
  "characters": "That's one small step..."
}

SRT (Standard video subtitles)

1
00:00:00,000 --> 00:00:02,500
That's one small step for man

2
00:00:02,500 --> 00:00:05,000
one giant leap for mankind

VTT (WebVTT for web players)

WEBVTT

1
00:00:00.000 --> 00:00:02.500
That's one small step for man

2
00:00:02.500 --> 00:00:05.000
one giant leap for mankind

Configuration

Basic alignment (JSON output)

providers:
  - id: elevenlabs:alignment:json
    label: Alignment (JSON)

tests:
  - vars:
      audioFile: path/to/audio.mp3
      transcript: 'Your transcript text here'
      format: json

SRT subtitles

providers:
  - id: elevenlabs:alignment:srt
    label: Alignment (SRT Subtitles)

tests:
  - vars:
      audioFile: path/to/audio.mp3
      transcript: 'Your transcript text here'
      format: srt

VTT subtitles

providers:
  - id: elevenlabs:alignment:vtt
    label: Alignment (VTT Subtitles)

tests:
  - vars:
      audioFile: path/to/audio.mp3
      transcript: 'Your transcript text here'
      format: vtt

Testing Assertions

tests:
  # Verify alignment succeeds
  - assert:
      - type: javascript
        value: output.includes('words') # JSON format
      - type: not-contains
        value: error

  # Verify SRT format
  - assert:
      - type: javascript
        value: output.includes('-->') && output.includes('small step')

Best Practices

Transcript accuracy: Ensure transcript exactly matches spoken audio
Include punctuation: Better subtitle chunking and timing
Audio quality: Clear audio produces more accurate timestamps
Format selection:
- Use SRT for video editing (Premiere, Final Cut, DaVinci)
- Use VTT for web players (HTML5 <video> tag)
- Use JSON for custom processing

Cost Information

Forced alignment pricing is based on audio duration:

~$0.05 per minute of audio

The provider automatically tracks costs in evaluation results.

Related Examples

ElevenLabs STT - Speech-to-text transcription
ElevenLabs Isolation - Audio noise removal

Resources

Discussion