Skill

Integrate Azure Speech Services for Text and Audio

Name: Integrate Azure Speech Services for Text and Audio
Availability: OnlineOnly
Author: LlamaIndex

Azure Speech Tool enables AI agents to transcribe .wav audio files to text and synthesize text into spoken audio using Microsoft Azure AI speech services.

Get skill

Works with azure speechopenai

LlamaIndex

Maintainer?

Spark score

out of 100

Updated 3 months ago

Version 1.0.0

Models

gpt 4ogpt 4llama 3

Add to Favorites

Why it matters

Leverage Microsoft Azure's advanced speech services to enable agents to transcribe audio files into text and generate audio files from text, streamlining content creation and data processing.

Outcomes

What it gets done

Transcribe audio files (.wav) into text using Azure Speech-to-Text.

Synthesize audio from input text using Azure Text-to-Speech.

Integrate speech capabilities into agent workflows for automated tasks.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/li-tool-tools-azure-speech | bash

Capabilities

What this skill does

Transcribe

Converts audio or video speech to written text.

Write copy

Drafts marketing, email, or product copy on demand.

Summarize

Condenses long documents or threads into key takeaways.

Overview

Azure Speech Tool

What it does

Azure Speech Tool integrates Microsoft Azure AI speech services into LlamaIndex agents. It provides two functions: `speech_to_text` transcribes .wav audio files into text, and `text_to_speech` synthesizes audio from input strings and plays it on the user's computer. The tool is packaged as a ToolSpec that converts to a tool list for agent workflows.

How it connects

Use this tool when building AI agents that need to process audio input (like transcribing meeting recordings or voice memos) or generate spoken responses. It's ideal for voice-enabled assistants, audio content summarization workflows, or any agent that needs to interact through speech rather than text alone.

Source README

Azure Speech Tool

This tool allows Agents to use Microsoft Azure speech services to transcribe audio files to text, and create audio files from text. To see more and get started, visit https://azure.microsoft.com/en-us/products/ai-services/ai-speech

Usage

This tool has a more extensive example usage documented in a Jupyter notebook here

from llama_index.tools.azure_speech import AzureSpeechToolSpec
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

speech_tool = AzureSpeechToolSpec(speech_key="your-key", region="eastus")

agent = FunctionAgent(
    tools=speech_tool.to_tool_list(),
    llm=OpenAI(model="gpt-4.1"),
)
print(await agent.run('Say "hello world"'))
print(
    await agent.run(
        "summarize the data/speech.wav audio file into a few sentences"
    )
)

text_to_speech: Takes an input string and synthesizes audio to play on the users computer
speech_to_text: Takes a .wav file and transcribes it into text

This loader is designed to be used as a way to load data as a Tool in a Agent.

Discussion

Integrate Azure Speech Services for Text and Audio

What it gets done

Add it to your toolbox

What this skill does

Azure Speech Tool

What it does

How it connects

Azure Speech Tool

Usage

Questions & comments · 0