Skill

Stream Realtime AI Interactions via WebRTC & WebSockets

Name: Stream Realtime AI Interactions via WebRTC & WebSockets
Availability: OnlineOnly
Author: Semantic Kernel

Python examples for real-time streaming AI interactions using WebSocket and WebRTC protocols, including function calling support.

Get skill

Works with webrtcwebsocket

Semantic Kernel

Maintainer?

Spark score

out of 100

Updated 4 days ago

Version python-1.43.1

Add to Favorites

Why it matters

Enable real-time, interactive AI experiences by streaming data over WebRTC and WebSockets. This asset facilitates dynamic communication between users and AI agents for immediate feedback and action.

Outcomes

What it gets done

Implement real-time chat applications.

Integrate function calling for dynamic AI responses.

Stream audio/video for interactive AI sessions.

Connect AI agents via WebSocket for continuous interaction.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/sk-concept-realtime | bash

Capabilities

What this skill does

Chatbot

Handles multi-turn conversations within a defined domain.

Transcribe

Converts audio or video speech to written text.

Search the web

Searches the web and retrieves relevant sources.

Overview

Semantic Kernel - Realtime

What it does

Five Python example files demonstrating real-time streaming and WebSocket-based AI interactions

How it connects

When you need working examples of real-time AI chat and agent interactions using WebSocket or WebRTC protocols

Source README

Realtime Multi-modal API Samples

These samples are more complex then most because of the nature of these API's. They are designed to be run in real-time and require a microphone and speaker to be connected to your computer.

To run these samples, you will need to have the following setup:

Environment variables for OpenAI (websocket or WebRTC), with your key and OPENAI_REALTIME_MODEL_ID set.
Environment variables for Azure (websocket only), set with your endpoint, optionally a key and AZURE_OPENAI_REALTIME_DEPLOYMENT_NAME set. The API version needs to be at least 2025-08-28.
To run the sample with a simple version of a class that handles the incoming and outgoing sound you need to install the following packages in your environment:
- semantic-kernel[realtime]
- pyaudio
- sounddevice
- pydub
  e.g. pip install pyaudio sounddevice pydub semantic-kernel[realtime]

The samples all run as python scripts, that can either be started directly or through your IDE.

All demos have a similar output, where the instructions are printed, each new response item from the API is put into a new Mosscap (transcript): line. The nature of these api's is such that the transcript arrives before the spoken audio, so if you interrupt the audio the transcript will not match the audio.

The realtime api's work by sending event from the server to you and sending events back to the server, this is fully asynchronous. The samples show you can listen to the events being sent by the server and some are handled by the code in the samples, others are not. For instance one could add a clause in the match case in the receive loop that logs the usage that is part of the response.done event.

For more info on the events, go to our documentation, as well as the documentation of OpenAI and Azure.

Simple chat samples

Simple chat with realtime websocket

This sample uses the websocket api with Azure OpenAI to run a simple interaction based on voice. If you want to use this sample with OpenAI, just change AzureRealtimeWebsocket into OpenAIRealtimeWebsocket.

Simple chat with realtime WebRTC

This sample uses the WebRTC api with OpenAI to run a simple interaction based on voice. Because of the way the WebRTC protocol works this needs a different player and recorder than the websocket version.

Function calling samples

The following two samples use function calling with the following functions:

get_weather: This function will return the weather for a given city, it is randomly generated and not based on any real data.
get_time: This function will return the current time and date.
goodbye: This function will end the conversation.

A line is logged whenever one of these functions is called.

Chat with function calling Websocket

This sample uses the websocket api with Azure OpenAI to run a voice agent, capable of taking actions on your behalf.

Chat with function calling WebRTC

This sample uses the WebRTC api with OpenAI to run a voice agent, capable of taking actions on your behalf.

Discussion