Generate Expressive Speech from Text
LlamaIndex tool that integrates Typecast.ai text-to-speech API with emotion control, 27+ languages, and reproducible audio generation for AI agents.
Why it matters
Transform text into natural-sounding speech with controllable emotions and advanced audio customization. Integrate seamlessly with AI agents for dynamic content creation.
Outcomes
What it gets done
Convert text to speech with emotional nuance
List and filter available AI voice models
Customize pitch, tempo, and volume
Generate reproducible audio outputs using seeds
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/li-tool-tools-typecast | bash Capabilities
What this skill does
Writes source code or scripts from a description.
Converts audio or video speech to written text.
Converts text between languages while preserving meaning.
Drafts marketing, email, or product copy on demand.
Overview
Typecast.ai Tool
What it does
A LlamaIndex tool integration for Typecast.ai that enables AI agents to generate text-to-speech audio with emotion control, multi-language support, and reproducible results using seed parameters.
How it connects
Use this tool when your AI agent needs to create audio content from text with specific emotional expression, voice characteristics, or language requirements, or when you need consistent audio generation across multiple runs.
Source README
Typecast.ai Tool
This tool allows Agents to use Typecast.ai text-to-speech to create audio files from text with emotion control. To see more and get started, visit https://typecast.ai/
Usage
This tool has a more extensive example usage documented in a Jupyter notebook here
from llama_index.tools.typecast import TypecastToolSpec
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI
speech_tool = TypecastToolSpec(api_key="your-key")
agent = FunctionAgent(
tools=speech_tool.to_tool_list(),
llm=OpenAI(model="gpt-4o-mini"),
)
print(
await agent.run(
'Create speech from the text "Hello world!" with a happy emotion and output the file to "speech.wav"'
)
)
text_to_speech: Convert text to speech with emotion, pitch, tempo control, and reproducible resultsget_voices: List all available Typecast voicesget_voice: Get details of a specific voice by ID
This tool is designed to be used as a Tool in an Agent.
Features
- Multiple Voice Models: Support for various AI voice models (ssfm-v21, ssfm-v30)
- Multi-language Support: 27+ languages including English, Korean, Spanish, Japanese, Chinese, and more
- Emotion Control: Adjust emotional expression (happy, sad, angry, normal, whisper, etc.) with intensity control
- Audio Customization: Control volume, pitch, tempo, and output format (WAV/MP3)
- Reproducible Results: Use seed parameter for consistent audio generation
- Voice Discovery: List and search available voices by model, gender, age, or use case (V2 API)
Advanced Usage
Using Seed for Reproducible Results
from llama_index.tools.typecast import TypecastToolSpec
speech_tool = TypecastToolSpec(api_key="your-key")
# Generate the same audio multiple times with the same seed
result = speech_tool.text_to_speech(
text="Hello world!",
voice_id="tc_62a8975e695ad26f7fb514d1",
output_path="speech.wav",
seed=42, # Same seed = same audio
)
Getting Voice Details (V2 API)
# Get specific voice information
voice = speech_tool.get_voice("tc_62a8975e695ad26f7fb514d1")
print(f"Voice: {voice['voice_name']}")
print(f"Gender: {voice['gender']}, Age: {voice['age']}")
print(f"Use cases: {voice['use_cases']}")
# Models now include emotions per model version
for model in voice["models"]:
print(f"Model {model['version']}: emotions = {model['emotions']}")
Filtering Voices (V2 API)
# Filter by model, gender, age, and use case
voices = speech_tool.get_voices(
model="ssfm-v30", gender="female", age="young_adult", use_case="Audiobook"
)
for voice in voices:
print(f"{voice['voice_name']} ({voice['voice_id']})")
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.