joinly MCP Server
An MCP server that enables AI agents to join and actively participate in video calls on Google Meet, Zoom, and Microsoft Teams through browser automation, providing live transcripts, voice interaction, and chat capabilities.
Get this MCP server
An MCP server that enables AI agents to join and actively participate in video calls on Google Meet, Zoom, and Microsoft Teams through browser automation, providing live transcripts, voice interaction, and chat capabilities.
Installation
Docker
docker pull ghcr.io/joinly-ai/joinly:latest
docker run --env-file .env ghcr.io/joinly-ai/joinly:latest --client <MeetingURL>
External Client
docker run -p 8000:8000 ghcr.io/joinly-ai/joinly:latest
uvx joinly-client --env-file .env <MeetingUrl>
GPU Support
docker pull ghcr.io/joinly-ai/joinly:latest-cuda
docker run --gpus all --env-file .env -p 8000:8000 ghcr.io/joinly-ai/joinly:latest-cuda
Configuration
Environment Variables
# .env
# for OpenAI LLM
JOINLY_LLM_MODEL=gpt-4o
JOINLY_LLM_PROVIDER=openai
OPENAI_API_KEY=your-openai-api-key
MCP Server Configuration
{
"mcpServers": {
"localServer": {
"command": "npx",
"args": ["-y", "package@0.1.0"]
},
"remoteServer": {
"url": "http://mcp.example.com",
"auth": "oauth"
}
}
}
Available Tools
| Tool | Description |
|---|---|
join_meeting |
Join a meeting with URL, participant name, and optional password |
leave_meeting |
Leave the current meeting |
speak_text |
Speak text using TTS (requires text parameter) |
send_chat_message |
Send a message to chat (requires message parameter) |
mute_yourself |
Mute your microphone |
unmute_yourself |
Unmute your microphone |
get_chat_history |
Get chat history of current meeting in JSON format |
get_participants |
Get participants of current meeting in JSON format |
get_transcript |
Get transcript of current meeting in JSON format, optionally filtered by minutes |
get_video_snapshot |
Get an image from the current meeting, for example, to view the current screen share |
Capabilities
- Live Interaction: Enables agents to perform tasks and respond in real-time via voice or chat during meetings
- Conversation Flow: Built-in logic that ensures natural conversations, handling interruptions and multi-speaker interaction
- Cross-Platform: Join Google Meet, Zoom, and Microsoft Teams (or any platform accessible via browser)
- Bring-Your-Own-LLM: Works with all LLM providers (also locally with Ollama)
- Choose-Your-Preferred-TTS/STT: Modular design supports multiple services - Whisper/Deepgram for STT and Kokoro/ElevenLabs/Deepgram for TTS
- 100% open-source, self-hosted, and privacy-first
Environment Variables
Required
JOINLY_LLM_MODEL- LLM model to use (e.g., gpt-4o)JOINLY_LLM_PROVIDER- LLM provider (openai, anthropic, ollama)
Optional
OPENAI_API_KEY- OpenAI API key for GPT modelsELEVENLABS_API_KEY- ElevenLabs API key for TTSDEEPGRAM_API_KEY- Deepgram API key for STT/TTS
Usage Examples
Join a meeting and answer questions with access to latest news from the internet
Create GitHub issues during meetings based on discussions
Edit Notion page content in real-time during meetings
Retrieve meeting transcripts and participant information
Send chat messages and voice responses during video calls
Notes
Provides a live transcript resource (transcript://live) that you can subscribe to for real-time updates. Supports GPU acceleration with CUDA for improved transcription quality. Can run as a standalone client or as an MCP server with external clients connecting to it.