Back to catalog
Kubeflow Spark History MCP Server
An MCP server that enables AI agents to analyze Apache Spark job performance, identify bottlenecks, and provide intelligent insights based on data from Spark History Server.
Get this MCP server
An MCP server that enables AI agents to analyze Apache Spark job performance, identify bottlenecks, and provide intelligent insights based on data from Spark History Server.
Installation
PyPI with uvx
uvx --from mcp-apache-spark-history-server spark-mcp
Installation via Pip
python3 -m venv spark-mcp && source spark-mcp/bin/activate
pip install mcp-apache-spark-history-server
python3 -m spark_history_mcp.core.main
From Source
git clone https://github.com/kubeflow/mcp-apache-spark-history-server.git
cd mcp-apache-spark-history-server
brew install go-task
task start-spark-bg
task start-mcp-bg
Helm
helm install spark-history-mcp ./deploy/kubernetes/helm/spark-history-mcp/
Configuration
Server Configuration
servers:
local:
default: true
url: "http://your-spark-history-server:18080"
auth:
username: "user"
password: "pass"
include_plan_description: false
mcp:
transports:
- streamable-http
port: "18888"
debug: true
Multiple Server Configuration
servers:
production:
default: true
url: "http://prod-spark-history:18080"
auth:
username: "user"
password: "pass"
staging:
url: "http://staging-spark-history:18080"
Available Tools
| Tool | Description |
|---|---|
list_applications |
Get a list of all applications available on Spark History Server with optional filtering by... |
get_application |
Get detailed information about a specific Spark application including status, resource usage, duration... |
list_jobs |
Get a list of all jobs for a Spark application with optional filtering by status |
list_slowest_jobs |
Get the N slowest jobs for a Spark application (excludes running jobs by default) |
list_stages |
Get a list of all stages for a Spark application with optional filtering by status and summaries |
list_slowest_stages |
Get the N slowest stages for a Spark application (excludes running stages by default) |
get_stage |
Get information about a specific stage with optional attempt ID and summary metrics |
get_stage_task_summary |
Get statistical distributions of task metrics for a specific stage (execution time, memory usage... |
list_executors |
Get information about executors with optional inclusion of inactive executors |
get_executor |
Get information about a specific executor including resource allocation, task statistics, and performance... |
get_executor_summary |
Aggregates metrics across all executors (memory usage, disk space, task count, performance metrics) |
get_resource_usage_timeline |
Get a timeline view of resource allocation and usage patterns including executor addition/removal... |
get_environment |
Get comprehensive Spark runtime environment configuration including JVM information, Spark properties, system properties... |
list_slowest_sql_queries |
Get the top N slowest SQL queries for an application with detailed execution metrics and optional... |
compare_sql_execution_plans |
Compare SQL execution plans between two Spark jobs, analyzing logical/physical plans and execution... |
Features
- Query job information through natural language
- Analyze performance metrics across applications
- Compare multiple jobs to identify regressions
- Investigate failures with detailed error analysis
- Generate insights based on historical execution data
- Support for multiple Spark History Servers
- Integration guides for AWS Glue and EMR
- Kubernetes deployment via Helm
- HTTP and STDIO transport modes
Environment Variables
Optional
SHS_MCP_PORT- Port for MCP serverSHS_MCP_DEBUG- Enable debug modeSHS_MCP_ADDRESS- Address for MCP serverSHS_MCP_TRANSPORT- MCP transport modeSHS_MCP_CONFIG- Path to configuration fileSHS_SERVERS_*_URL- URL for specific serverSHS_SERVERS_*_AUTH_USERNAME- Username for specific serverSHS_SERVERS_*_AUTH_PASSWORD- Password for specific server
Usage Examples
Show all applications between midnight and 1 AM on June 27, 2025
Why is my job slow?
Compare today with yesterday
What's wrong with stage 5?
Show resource usage over time
Notes
Requires a running and accessible Spark History Server. Python 3.12+ required. Package published on PyPI. Includes sample data for testing. Supports HTTP and STDIO transports. Compatible with Claude Desktop, Amazon Q CLI, LangGraph, and other MCP clients.
