What video platforms does joinly MCP Server support?

joinly supports Google Meet, Zoom, Microsoft Teams, and any browser-accessible platform. It works by joining meetings through a URL.

What are the main tools available in joinly?

joinly exposes ten tools including join_meeting, leave_meeting, speak_text for text-to-speech, send_chat_message, mute/unmute controls, get_chat_history, get_participants, get_transcript, and get_video_snapshot. It also provides a live transcript resource with real-time streaming.

What are the system requirements to run joinly?

joinly requires Docker (the packaged image is ~2.3GB and includes a browser and STT/TTS models) and an LLM provider API key such as OpenAI, Anthropic, or a local Ollama setup.

Does joinly support GPU acceleration?

Yes, a CUDA-enabled image is available that can be run with --gpus all for GPU-accelerated transcription and TTS, and it defaults to the higher-quality distil-large-v3 Whisper model instead of the CPU image's base model.

MCP Connector

Automate Video Calls with AI Agents

Name: joinly MCP Server
Availability: OnlineOnly
Author: joinly.ai

An MCP server letting AI agents join and actively participate in Zoom/Meet/Teams video calls with live voice, chat, and transcripts.

Connect

Works with google meetzoommicrosoft teamsopenaiollama

joinly.ai

Own this? Claim it

Maintainer of this project? Claim this page to edit the listing.

Spark score

out of 100

Updated 4 months ago

Version 0.5.3

Models

universal

Add to Favorites

Why it matters

Enable AI agents to join and actively participate in video calls across Google Meet, Zoom, and Microsoft Teams. Leverage browser automation for live transcripts, voice interaction, and chat capabilities.

Outcomes

What it gets done

Join and leave video conferences automatically

Provide live transcripts and chat history

Enable AI-driven voice and chat interactions

Integrate with various LLM, TTS, and STT providers

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/vb-joinly | bash

Capabilities

Tools your agent gets

join_meeting

Join a meeting with URL, participant name, and optional password

leave_meeting

Leave the current meeting

speak_text

Speak text using TTS (requires text parameter)

send_chat_message

Send a message to chat (requires message parameter)

mute_yourself

Mute your microphone

unmute_yourself

Unmute your microphone

get_chat_history

Get chat history of current meeting in JSON format

get_participants

Get participants of current meeting in JSON format

+2 tools

Overview

joinly MCP Server

This MCP server lets AI agents join and participate in Zoom, Google Meet, or Teams calls with live voice/chat, real-time transcripts, participant and screenshare access, using modular STT/TTS providers and any LLM. Use it to give an AI agent real-time presence in a video meeting. Requires Docker, an LLM provider key, and roughly 2.3GB for the bundled browser/model image.

What it does

joinly.ai is an MCP server and connector middleware that lets AI agents join and actively participate in video meetings on Google Meet, Zoom, Microsoft Teams, or any browser-accessible platform. It exposes ten tools: join_meeting with a URL, participant name, and optional passcode; leave_meeting; speak_text for text-to-speech output; send_chat_message; mute_yourself and unmute_yourself; get_chat_history; get_participants; get_transcript, optionally filtered by minutes; and get_video_snapshot, an image from the current meeting such as a screenshare - plus a subscribable transcript://live resource streaming the live transcript with timestamps and speaker information in real time. It supports live voice and chat interaction with built-in conversational-flow logic that handles interruptions and multi-speaker turns, works with any LLM provider including local models via Ollama, and offers modular speech provider choices - Whisper locally by default or Deepgram for transcription, and Kokoro locally by default, ElevenLabs, or Deepgram for text-to-speech. The whole stack is open-source, self-hosted, and privacy-first.

When to use - and when NOT to

Use it to give an AI agent real-time presence in a video call - answering questions live, taking actions such as creating a GitHub issue or editing a Notion page triggered by meeting conversation, or acting as a voice or chat participant. It requires Docker, since the packaged image is roughly 2.3GB bundling a browser and STT/TTS models, and an LLM provider API key such as OpenAI, Anthropic, or a local Ollama setup. It may not be necessary if joinly's own hosted cloud offering already covers the use case without the overhead of self-hosting.

Capabilities

Meeting control through join and leave and mute and unmute, speech and chat through speak_text and send_chat_message, meeting awareness through get_chat_history, get_participants, get_transcript, and get_video_snapshot, and a live-updating transcript resource. Additional MCP servers, such as a Tavily web-search configuration, can be layered onto the client through a standard mcpServers JSON configuration to extend what the agent can do mid-meeting.

How to install

docker pull ghcr.io/joinly-ai/joinly:latest
docker run --env-file .env ghcr.io/joinly-ai/joinly:latest --client <MeetingURL>

A .env file sets the LLM provider and model, for example JOINLY_LLM_PROVIDER=openai, JOINLY_LLM_MODEL=gpt-4o, and OPENAI_API_KEY. It can also run as a standalone server on port 8000 that an external client, such as the joinly-client package run via uvx joinly-client --env-file .env, connects to, optionally with additional MCP tool configurations layered in. Configuration - participant name, language, TTS/STT provider and model, host and port, and VNC debugging - is set via CLI flags or environment variables. A CUDA-enabled image, run with --gpus all, is available for GPU-accelerated transcription and TTS, defaulting to a higher-quality distil-large-v3 Whisper model instead of the CPU image's base model.

Who it's for

Developers building AI agents that need to actively join and participate in live video meetings - answering questions, taking real-time actions, or capturing transcripts and screenshares - without building browser automation and meeting-platform integration from scratch.

Source README

Make your meetings accessible to AI Agents 🤖

joinly.ai is a connector middleware designed to enable AI agents to join and actively participate in video calls. Through its MCP server, joinly.ai provides essential meeting tools and resources that can equip any AI agent with the skills to perform tasks and interact with you in real time during your meetings.

Want to dive right in? Jump to the Quickstart!
Want to know more? Visit our website!

:sparkles: Features

Live Interaction: Lets your agents execute tasks and respond in real-time by voice or chat within your meetings
Conversational flow: Built-in logic that ensures natural conversations by handling interruptions and multi-speaker interactions
Cross-platform: Join Google Meet, Zoom, and Microsoft Teams (or any available over the browser)
Bring-your-own-LLM: Works with all LLM providers (also locally with Ollama)
Choose-your-preferred-TTS/STT: Modular design supports multiple services - Whisper/Deepgram for STT and Kokoro/ElevenLabs/Deepgram for TTS (and more to come...)
100% open-source, self-hosted and privacy-first :rocket:

:video_camera: Demos

GitHub

In this demo video, joinly answers the question 'What is Joinly?' by accessing the latest news from the web. It then creates an issue in a GitHub demo repository.

Notion

In this demo video, we connect joinly to our notion via MCP and let it edit the content of a page content live in the meeting.

Any ideas what we should build next? Write us! :rocket:

:zap: Quickstart

Run joinly via Docker with a basic conversational agent client.

Create a new folder joinly or clone this repository (not mandatory for the following steps). In this directory, create a new .env file with a valid API key for the LLM provider you want to use, e.g. OpenAI:

# .env
# for OpenAI LLM
# change key and model to your desired one
JOINLY_LLM_MODEL=gpt-4o
JOINLY_LLM_PROVIDER=openai
OPENAI_API_KEY=your-openai-api-key

Pull the Docker image (~2.3GB since it packages browser and models):

docker pull ghcr.io/joinly-ai/joinly:latest

Launch your meeting in Zoom, Google Meet or Teams and let joinly join the meeting using the meeting link as <MeetingURL>. Then, run the following command from the folder where you created the .env file:

docker run --env-file .env ghcr.io/joinly-ai/joinly:latest --client <MeetingURL>

:red_circle: Having trouble getting started? Let's figure it out together on our discord!

:technologist: Run an external client

In Quickstart, we ran the Docker Container directly as a client using --client. But we can also run it as a server and connect to it from outside the container, which allows us to connect other MCP servers. Here, we run an external client using the joinly-client package and connect it to the joinly MCP server.

Start the joinly server in the first terminal (note, we are not using --client here and forward port 8000):

docker run -p 8000:8000 ghcr.io/joinly-ai/joinly:latest

While the server is running, start the example client implementation in the second terminal window to connect to it and join a meeting:

uvx joinly-client --env-file .env <MeetingUrl>

Add MCP servers to the client

Add the tools of any MCP server to the agent by providing a JSON configuration. The configuration file can contain multiple entries under "mcpServers" which will all be available as tools in the meeting (see fastmcp client docs for config syntax):

{
    "mcpServers": {
        "localServer": {
            "command": "npx",
            "args": ["-y", "package@0.1.0"]
        },
        "remoteServer": {
            "url": "http://mcp.example.com",
            "auth": "oauth"
        }
    }
}

Add for example a Tavily config for web searching, then run the client using the config file, here named config.json:

uvx joinly-client --env-file .env --mcp-config config.json <MeetingUrl>

:wrench: Configurations

Configurations can be given via env variables and/or command line args. Here is a list of common configuration options, which can be used when starting the docker container:

docker run --env-file .env -p 8000:8000 ghcr.io/joinly-ai/joinly:latest <MyOptionArgs>

Alternatively, you can pass --name, --lang, and provider settings as command line arguments using joinly-client, which will override settings of the server:

uvx joinly-client <MyOptionArgs> <MeetingUrl>

Basic Settings

In general, the docker image provides an MCP server which is started by default. But to quickly get started, we also include a client implementation that can be used via --client. Note, in this case no server is started and no other client can connect to it.

# Start directly as client; default is as server, to which an external client can connect
--client <MeetingUrl>

# Change participant name (default: joinly)
--name "AI Assistant"

# Change language of TTS/STT (default: en)
# Note, availability depends on the TTS/STT provider
--lang de

# Change host & port of the joinly MCP server
--host 0.0.0.0 --port 8000

Providers

Text-to-Speech

# Kokoro (local) TTS (default)
--tts kokoro
--tts-arg voice=<VoiceName>  # optionally, set different voice

# ElevenLabs TTS, include ELEVENLABS_API_KEY in .env
--tts elevenlabs
--tts-arg voice_id=<VoiceID>  # optionally, set different voice

# Deepgram TTS, include DEEPGRAM_API_KEY in .env
--tts deepgram
--tts-arg model_name=<ModelName>  # optionally, set different model (voice)

Transcription

# Whisper (local) STT (default)
--stt whisper
--stt-arg model_name=<ModelName>  # optionally, set different model (default: base), for GPU support see below

# Deepgram STT, include DEEPGRAM_API_KEY in .env
--stt deepgram
--stt-arg model_name=<ModelName>  # optionally, set different model

Debugging

# Start browser with a VNC server for debugging;
# forward the port and connect to it using a VNC client
--vnc-server --vnc-server-port 5900

# Logging
-v  # or -vv, -vvv

# Help
--help

GPU Support

We provide a Docker image with CUDA GPU support for running the transcription and TTS models on a GPU. To use it, you need to have the NVIDIA Container Toolkit installed and CUDA >= 12.6. Then pull the CUDA-enabled image:

docker pull ghcr.io/joinly-ai/joinly:latest-cuda

Run as client or server with the same commands as above, but use the joinly:{version}-cuda image and set --gpus all:

# Run as server
docker run --gpus all --env-file .env -p 8000:8000 ghcr.io/joinly-ai/joinly:latest-cuda -v
# Run as client
docker run --gpus all --env-file .env ghcr.io/joinly-ai/joinly:latest-cuda -v --client <MeetingURL>

By default, the joinly image uses the Whisper model base for transcription, since it still runs reasonably fast on CPU. For cuda, it automatically defaults to distil-large-v3 for significantly better transcription quality. You can change the model by setting --stt-arg model_name=<model_name> (e.g., --stt-arg model_name=large-v3). However, only the respective default models are packaged in the docker image, so it will start to download the model weights on container start.

:test_tube: Create your own agent

You can also write your own agent and connect it to our joinly MCP server. See the code examples for the joinly-client package or the client_example.py if you want a starting point that doesn't depend on our framework.

The joinly MCP server provides following tools and resources:

Tools

join_meeting - Join meeting with URL, participant name, and optional passcode
leave_meeting - Leave the current meeting
speak_text - Speak text using TTS (requires text parameter)
send_chat_message - Send chat message (requires message parameter)
mute_yourself - Mute microphone
unmute_yourself - Unmute microphone
get_chat_history - Get current meeting chat history in JSON format
get_participants - Get current meeting participants in JSON format
get_transcript - Get current meeting transcript in JSON format, optionally filtered by minutes
get_video_snapshot - Get an image from the current meeting, e.g., view a current screenshare

Resources

transcript://live - Live meeting transcript in JSON format, including timestamps and speaker information. Subscribable for real-time updates when new utterances are added.

:building_construction: Developing joinly.ai

For development we recommend using the development container, which installs all necessary dependencies. To get started, install the DevContainer Extension for Visual Studio Code, open the repository and choose Reopen in Container.

The installation can take some time, since it downloads all packages as well as models for Whisper/Kokoro and the Chromium browser. At the end, it automatically invokes the download_assets.py script. If you see errors like Missing kokoro-v1.0.onnx, run this script manually using:

uv run scripts/download_assets.py

We'd love to see what you are using it for or building with it. Showcase your work on our discord

:pencil2: Roadmap

Meeting

Meeting chat access
Camera in video call with status updates
Enable screen share during video conferences
Participant metadata and joining/leaving
Improve browser agent capabilities

Conversation

Speaker attribute for transcription
Improve client memory: reduce token usage, allow persistence across meetings
events
Improve End-of-Utterance/turn-taking detection
Human approval mechanism from inside the meeting

Integrations

Showcase how to add agents using the A2A protocol
Add more provider integrations (STT, TTS)
Integrate meeting platform SDKs
Add alternative open-source meeting provider
Add support for Speech2Speech models

:busts_in_silhouette: Contributing

Contributions are always welcome! Feel free to open issues for bugs or submit a feature request. We'll do our best to review all contributions promptly and help merge your changes.

Please check our Roadmap and don't hesitate to reach out to us!

:memo: License

This project is licensed under the MIT License ‒ see the LICENSE file for details.

:speech_balloon: Getting help

If you have questions or feedback, or if you would like to chat with the maintainers or other community members, please use the following links:

Made with ❤️ in Osnabrück

FAQ

Common questions

Discussion

Automate Video Calls with AI Agents

What it gets done

Add it to your toolbox

Tools your agent gets

joinly MCP Server

What it does

When to use - and when NOT to

Capabilities

How to install

Who it's for

Make your meetings accessible to AI Agents 🤖

:sparkles: Features

:video_camera: Demos

GitHub

Notion

:zap: Quickstart

:technologist: Run an external client

Add MCP servers to the client

:wrench: Configurations

Basic Settings

Providers

Text-to-Speech

Transcription

Debugging

GPU Support

:test_tube: Create your own agent

Tools

Resources

:building_construction: Developing joinly.ai

:pencil2: Roadmap

:busts_in_silhouette: Contributing

:memo: License

:speech_balloon: Getting help

Common questions

Questions & comments · 0