An MCP server that enables AI agents to analyze Apache Spark job performance, identify bottlenecks, and provide intelligent insights based on data from Spark History Server.

Installation

PyPI with uvx

uvx --from mcp-apache-spark-history-server spark-mcp

Installation via Pip

python3 -m venv spark-mcp && source spark-mcp/bin/activate
pip install mcp-apache-spark-history-server
python3 -m spark_history_mcp.core.main

From Source

git clone https://github.com/kubeflow/mcp-apache-spark-history-server.git
cd mcp-apache-spark-history-server
brew install go-task
task start-spark-bg
task start-mcp-bg

Helm

helm install spark-history-mcp ./deploy/kubernetes/helm/spark-history-mcp/

Configuration

Server Configuration

servers:
  local:
    default: true
    url: "http://your-spark-history-server:18080"
    auth:
      username: "user"
      password: "pass"
    include_plan_description: false
mcp:
  transports:
    - streamable-http
  port: "18888"
  debug: true

Multiple Server Configuration

servers:
  production:
    default: true
    url: "http://prod-spark-history:18080"
    auth:
      username: "user"
      password: "pass"
  staging:
    url: "http://staging-spark-history:18080"

Available Tools

Tool	Description
`list_applications`	Get a list of all applications available on Spark History Server with optional filtering by...
`get_application`	Get detailed information about a specific Spark application including status, resource usage, duration...
`list_jobs`	Get a list of all jobs for a Spark application with optional filtering by status
`list_slowest_jobs`	Get the N slowest jobs for a Spark application (excludes running jobs by default)
`list_stages`	Get a list of all stages for a Spark application with optional filtering by status and summaries
`list_slowest_stages`	Get the N slowest stages for a Spark application (excludes running stages by default)
`get_stage`	Get information about a specific stage with optional attempt ID and summary metrics
`get_stage_task_summary`	Get statistical distributions of task metrics for a specific stage (execution time, memory usage...
`list_executors`	Get information about executors with optional inclusion of inactive executors
`get_executor`	Get information about a specific executor including resource allocation, task statistics, and performance...
`get_executor_summary`	Aggregates metrics across all executors (memory usage, disk space, task count, performance metrics)
`get_resource_usage_timeline`	Get a timeline view of resource allocation and usage patterns including executor addition/removal...
`get_environment`	Get comprehensive Spark runtime environment configuration including JVM information, Spark properties, system properties...
`list_slowest_sql_queries`	Get the top N slowest SQL queries for an application with detailed execution metrics and optional...
`compare_sql_execution_plans`	Compare SQL execution plans between two Spark jobs, analyzing logical/physical plans and execution...

Features

Query job information through natural language
Analyze performance metrics across applications
Compare multiple jobs to identify regressions
Investigate failures with detailed error analysis
Generate insights based on historical execution data
Support for multiple Spark History Servers
Integration guides for AWS Glue and EMR
Kubernetes deployment via Helm
HTTP and STDIO transport modes

Environment Variables

Optional

SHS_MCP_PORT - Port for MCP server
SHS_MCP_DEBUG - Enable debug mode
SHS_MCP_ADDRESS - Address for MCP server
SHS_MCP_TRANSPORT - MCP transport mode
SHS_MCP_CONFIG - Path to configuration file
SHS_SERVERS_*_URL - URL for specific server
SHS_SERVERS_*_AUTH_USERNAME - Username for specific server
SHS_SERVERS_*_AUTH_PASSWORD - Password for specific server

Usage Examples

Show all applications between midnight and 1 AM on June 27, 2025

Why is my job slow?

Compare today with yesterday

What's wrong with stage 5?

Show resource usage over time

Notes

Requires a running and accessible Spark History Server. Python 3.12+ required. Package published on PyPI. Includes sample data for testing. Supports HTTP and STDIO transports. Compatible with Claude Desktop, Amazon Q CLI, LangGraph, and other MCP clients.

Kubeflow Spark History MCP Server

Get this MCP server