Stream Realtime AI Interactions via WebRTC & WebSockets
Python examples for real-time streaming AI interactions using WebSocket and WebRTC protocols, including function calling support.
Why it matters
Enable real-time, interactive AI experiences by streaming data over WebRTC and WebSockets. This asset facilitates dynamic communication between users and AI agents for immediate feedback and action.
Outcomes
What it gets done
Implement real-time chat applications.
Integrate function calling for dynamic AI responses.
Stream audio/video for interactive AI sessions.
Connect AI agents via WebSocket for continuous interaction.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/sk-concept-realtime | bash Capabilities
What this skill does
Handles multi-turn conversations within a defined domain.
Converts audio or video speech to written text.
Searches the web and retrieves relevant sources.
Overview
Semantic Kernel - Realtime
What it does
Five Python example files demonstrating real-time streaming and WebSocket-based AI interactions
How it connects
When you need working examples of real-time AI chat and agent interactions using WebSocket or WebRTC protocols
Source README
Realtime Multi-modal API Samples
These samples are more complex then most because of the nature of these API's. They are designed to be run in real-time and require a microphone and speaker to be connected to your computer.
To run these samples, you will need to have the following setup:
- Environment variables for OpenAI (websocket or WebRTC), with your key and OPENAI_REALTIME_MODEL_ID set.
- Environment variables for Azure (websocket only), set with your endpoint, optionally a key and AZURE_OPENAI_REALTIME_DEPLOYMENT_NAME set. The API version needs to be at least
2025-08-28. - To run the sample with a simple version of a class that handles the incoming and outgoing sound you need to install the following packages in your environment:
- semantic-kernel[realtime]
- pyaudio
- sounddevice
- pydub
e.g. pip install pyaudio sounddevice pydub semantic-kernel[realtime]
The samples all run as python scripts, that can either be started directly or through your IDE.
All demos have a similar output, where the instructions are printed, each new response item from the API is put into a new Mosscap (transcript): line. The nature of these api's is such that the transcript arrives before the spoken audio, so if you interrupt the audio the transcript will not match the audio.
The realtime api's work by sending event from the server to you and sending events back to the server, this is fully asynchronous. The samples show you can listen to the events being sent by the server and some are handled by the code in the samples, others are not. For instance one could add a clause in the match case in the receive loop that logs the usage that is part of the response.done event.
For more info on the events, go to our documentation, as well as the documentation of OpenAI and Azure.
Simple chat samples
Simple chat with realtime websocket
This sample uses the websocket api with Azure OpenAI to run a simple interaction based on voice. If you want to use this sample with OpenAI, just change AzureRealtimeWebsocket into OpenAIRealtimeWebsocket.
Simple chat with realtime WebRTC
This sample uses the WebRTC api with OpenAI to run a simple interaction based on voice. Because of the way the WebRTC protocol works this needs a different player and recorder than the websocket version.
Function calling samples
The following two samples use function calling with the following functions:
- get_weather: This function will return the weather for a given city, it is randomly generated and not based on any real data.
- get_time: This function will return the current time and date.
- goodbye: This function will end the conversation.
A line is logged whenever one of these functions is called.
Chat with function calling Websocket
This sample uses the websocket api with Azure OpenAI to run a voice agent, capable of taking actions on your behalf.
Chat with function calling WebRTC
This sample uses the WebRTC api with OpenAI to run a voice agent, capable of taking actions on your behalf.
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.