Skill

Generate Expressive Speech from Text

Name: Generate Expressive Speech from Text
Availability: OnlineOnly
Author: LlamaIndex

LlamaIndex tool that integrates Typecast.ai text-to-speech API with emotion control, 27+ languages, and reproducible audio generation for AI agents.

Get skill

Works with openai llama index

LlamaIndex

Maintainer?

Spark score

out of 100

Updated 4 days ago

Version 0.14.22

Models

gpt 4o llama 3

Add to Favorites

Why it matters

Transform text into natural-sounding speech with controllable emotions and advanced audio customization. Integrate seamlessly with AI agents for dynamic content creation.

Outcomes

What it gets done

Convert text to speech with emotional nuance

List and filter available AI voice models

Customize pitch, tempo, and volume

Generate reproducible audio outputs using seeds

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/li-tool-tools-typecast | bash

Capabilities

What this skill does

Generate code

Writes source code or scripts from a description.

Transcribe

Converts audio or video speech to written text.

Translate

Converts text between languages while preserving meaning.

Write copy

Drafts marketing, email, or product copy on demand.

Overview

Typecast.ai Tool

What it does

A LlamaIndex tool integration for Typecast.ai that enables AI agents to generate text-to-speech audio with emotion control, multi-language support, and reproducible results using seed parameters.

How it connects

Use this tool when your AI agent needs to create audio content from text with specific emotional expression, voice characteristics, or language requirements, or when you need consistent audio generation across multiple runs.

Source README

Typecast.ai Tool

This tool allows Agents to use Typecast.ai text-to-speech to create audio files from text with emotion control. To see more and get started, visit https://typecast.ai/

Usage

This tool has a more extensive example usage documented in a Jupyter notebook here

from llama_index.tools.typecast import TypecastToolSpec
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

speech_tool = TypecastToolSpec(api_key="your-key")

agent = FunctionAgent(
    tools=speech_tool.to_tool_list(),
    llm=OpenAI(model="gpt-4o-mini"),
)
print(
    await agent.run(
        'Create speech from the text "Hello world!" with a happy emotion and output the file to "speech.wav"'
    )
)

text_to_speech: Convert text to speech with emotion, pitch, tempo control, and reproducible results
get_voices: List all available Typecast voices
get_voice: Get details of a specific voice by ID

This tool is designed to be used as a Tool in an Agent.

Features

Multiple Voice Models: Support for various AI voice models (ssfm-v21, ssfm-v30)
Multi-language Support: 27+ languages including English, Korean, Spanish, Japanese, Chinese, and more
Emotion Control: Adjust emotional expression (happy, sad, angry, normal, whisper, etc.) with intensity control
Audio Customization: Control volume, pitch, tempo, and output format (WAV/MP3)
Reproducible Results: Use seed parameter for consistent audio generation
Voice Discovery: List and search available voices by model, gender, age, or use case (V2 API)

Advanced Usage

Using Seed for Reproducible Results

from llama_index.tools.typecast import TypecastToolSpec

speech_tool = TypecastToolSpec(api_key="your-key")

# Generate the same audio multiple times with the same seed
result = speech_tool.text_to_speech(
    text="Hello world!",
    voice_id="tc_62a8975e695ad26f7fb514d1",
    output_path="speech.wav",
    seed=42,  # Same seed = same audio
)

Getting Voice Details (V2 API)

# Get specific voice information
voice = speech_tool.get_voice("tc_62a8975e695ad26f7fb514d1")
print(f"Voice: {voice['voice_name']}")
print(f"Gender: {voice['gender']}, Age: {voice['age']}")
print(f"Use cases: {voice['use_cases']}")

# Models now include emotions per model version
for model in voice["models"]:
    print(f"Model {model['version']}: emotions = {model['emotions']}")

Filtering Voices (V2 API)

# Filter by model, gender, age, and use case
voices = speech_tool.get_voices(
    model="ssfm-v30", gender="female", age="young_adult", use_case="Audiobook"
)

for voice in voices:
    print(f"{voice['voice_name']} ({voice['voice_id']})")

Discussion

Generate Expressive Speech from Text

What it gets done

Add it to your toolbox

What this skill does

Typecast.ai Tool

What it does

How it connects

Typecast.ai Tool

Usage

Features

Advanced Usage

Using Seed for Reproducible Results

Getting Voice Details (V2 API)

Filtering Voices (V2 API)

Questions & comments · 0