An MCP server that enables AI agents to join and actively participate in video calls on Google Meet, Zoom, and Microsoft Teams through browser automation, providing live transcripts, voice interaction, and chat capabilities.

Installation

Docker

docker pull ghcr.io/joinly-ai/joinly:latest
docker run --env-file .env ghcr.io/joinly-ai/joinly:latest --client <MeetingURL>

External Client

docker run -p 8000:8000 ghcr.io/joinly-ai/joinly:latest
uvx joinly-client --env-file .env <MeetingUrl>

GPU Support

docker pull ghcr.io/joinly-ai/joinly:latest-cuda
docker run --gpus all --env-file .env -p 8000:8000 ghcr.io/joinly-ai/joinly:latest-cuda

Configuration

Environment Variables

# .env
# for OpenAI LLM
JOINLY_LLM_MODEL=gpt-4o
JOINLY_LLM_PROVIDER=openai
OPENAI_API_KEY=your-openai-api-key

MCP Server Configuration

{
    "mcpServers": {
        "localServer": {
            "command": "npx",
            "args": ["-y", "package@0.1.0"]
        },
        "remoteServer": {
            "url": "http://mcp.example.com",
            "auth": "oauth"
        }
    }
}

Available Tools

Tool	Description
`join_meeting`	Join a meeting with URL, participant name, and optional password
`leave_meeting`	Leave the current meeting
`speak_text`	Speak text using TTS (requires text parameter)
`send_chat_message`	Send a message to chat (requires message parameter)
`mute_yourself`	Mute your microphone
`unmute_yourself`	Unmute your microphone
`get_chat_history`	Get chat history of current meeting in JSON format
`get_participants`	Get participants of current meeting in JSON format
`get_transcript`	Get transcript of current meeting in JSON format, optionally filtered by minutes
`get_video_snapshot`	Get an image from the current meeting, for example, to view the current screen share

Capabilities

Live Interaction: Enables agents to perform tasks and respond in real-time via voice or chat during meetings
Conversation Flow: Built-in logic that ensures natural conversations, handling interruptions and multi-speaker interaction
Cross-Platform: Join Google Meet, Zoom, and Microsoft Teams (or any platform accessible via browser)
Bring-Your-Own-LLM: Works with all LLM providers (also locally with Ollama)
Choose-Your-Preferred-TTS/STT: Modular design supports multiple services - Whisper/Deepgram for STT and Kokoro/ElevenLabs/Deepgram for TTS
100% open-source, self-hosted, and privacy-first

Environment Variables

Required

JOINLY_LLM_MODEL - LLM model to use (e.g., gpt-4o)
JOINLY_LLM_PROVIDER - LLM provider (openai, anthropic, ollama)

Optional

OPENAI_API_KEY - OpenAI API key for GPT models
ELEVENLABS_API_KEY - ElevenLabs API key for TTS
DEEPGRAM_API_KEY - Deepgram API key for STT/TTS

Usage Examples

Join a meeting and answer questions with access to latest news from the internet

Create GitHub issues during meetings based on discussions

Edit Notion page content in real-time during meetings

Retrieve meeting transcripts and participant information

Send chat messages and voice responses during video calls

Notes

Provides a live transcript resource (transcript://live) that you can subscribe to for real-time updates. Supports GPU acceleration with CUDA for improved transcription quality. Can run as a standalone client or as an MCP server with external clients connecting to it.

joinly MCP Server

Get this MCP server

Installation

Docker

External Client

GPU Support

Configuration

Environment Variables

MCP Server Configuration

Available Tools

Capabilities

Environment Variables

Required

Optional

Usage Examples

Notes

Comments (0)