Back to catalog

Kubeflow Spark History MCP Server

An MCP server that enables AI agents to analyze Apache Spark job performance, identify bottlenecks, and provide intelligent insights based on data from Spark History Server.

Get this MCP server

An MCP server that enables AI agents to analyze Apache Spark job performance, identify bottlenecks, and provide intelligent insights based on data from Spark History Server.

Installation

PyPI with uvx

uvx --from mcp-apache-spark-history-server spark-mcp

Installation via Pip

python3 -m venv spark-mcp && source spark-mcp/bin/activate
pip install mcp-apache-spark-history-server
python3 -m spark_history_mcp.core.main

From Source

git clone https://github.com/kubeflow/mcp-apache-spark-history-server.git
cd mcp-apache-spark-history-server
brew install go-task
task start-spark-bg
task start-mcp-bg

Helm

helm install spark-history-mcp ./deploy/kubernetes/helm/spark-history-mcp/

Configuration

Server Configuration

servers:
  local:
    default: true
    url: "http://your-spark-history-server:18080"
    auth:
      username: "user"
      password: "pass"
    include_plan_description: false
mcp:
  transports:
    - streamable-http
  port: "18888"
  debug: true

Multiple Server Configuration

servers:
  production:
    default: true
    url: "http://prod-spark-history:18080"
    auth:
      username: "user"
      password: "pass"
  staging:
    url: "http://staging-spark-history:18080"

Available Tools

Tool Description
list_applications Get a list of all applications available on Spark History Server with optional filtering by...
get_application Get detailed information about a specific Spark application including status, resource usage, duration...
list_jobs Get a list of all jobs for a Spark application with optional filtering by status
list_slowest_jobs Get the N slowest jobs for a Spark application (excludes running jobs by default)
list_stages Get a list of all stages for a Spark application with optional filtering by status and summaries
list_slowest_stages Get the N slowest stages for a Spark application (excludes running stages by default)
get_stage Get information about a specific stage with optional attempt ID and summary metrics
get_stage_task_summary Get statistical distributions of task metrics for a specific stage (execution time, memory usage...
list_executors Get information about executors with optional inclusion of inactive executors
get_executor Get information about a specific executor including resource allocation, task statistics, and performance...
get_executor_summary Aggregates metrics across all executors (memory usage, disk space, task count, performance metrics)
get_resource_usage_timeline Get a timeline view of resource allocation and usage patterns including executor addition/removal...
get_environment Get comprehensive Spark runtime environment configuration including JVM information, Spark properties, system properties...
list_slowest_sql_queries Get the top N slowest SQL queries for an application with detailed execution metrics and optional...
compare_sql_execution_plans Compare SQL execution plans between two Spark jobs, analyzing logical/physical plans and execution...

Features

  • Query job information through natural language
  • Analyze performance metrics across applications
  • Compare multiple jobs to identify regressions
  • Investigate failures with detailed error analysis
  • Generate insights based on historical execution data
  • Support for multiple Spark History Servers
  • Integration guides for AWS Glue and EMR
  • Kubernetes deployment via Helm
  • HTTP and STDIO transport modes

Environment Variables

Optional

  • SHS_MCP_PORT - Port for MCP server
  • SHS_MCP_DEBUG - Enable debug mode
  • SHS_MCP_ADDRESS - Address for MCP server
  • SHS_MCP_TRANSPORT - MCP transport mode
  • SHS_MCP_CONFIG - Path to configuration file
  • SHS_SERVERS_*_URL - URL for specific server
  • SHS_SERVERS_*_AUTH_USERNAME - Username for specific server
  • SHS_SERVERS_*_AUTH_PASSWORD - Password for specific server

Usage Examples

Show all applications between midnight and 1 AM on June 27, 2025
Why is my job slow?
Compare today with yesterday
What's wrong with stage 5?
Show resource usage over time

Notes

Requires a running and accessible Spark History Server. Python 3.12+ required. Package published on PyPI. Includes sample data for testing. Supports HTTP and STDIO transports. Compatible with Claude Desktop, Amazon Q CLI, LangGraph, and other MCP clients.

Comments (0)

Sign In Sign in to leave a comment.

Spark Drops

Weekly picks: best new AI tools, agents & prompts

Venture Crew
Terms of Service

© 2026, Venture Crew