MCP

Validate Data Quality with Great Expectations

gx-mcp-server: AI-powered Great Expectations data quality validation. Load data, define rules, execute checks via MCP.

Works with great expectationssnowflakebigqueryprometheusopentelemetry

90
Spark score
out of 100
Status Verified
Updated 4 months ago
Version 1.0.0
Models

Add to Favorites

Why it matters

Integrate Great Expectations data quality validation into your AI agent's workflow. Programmatically load datasets, define validation rules, and execute checks for robust data governance.

Outcomes

What it gets done

01

Load data from CSV, Snowflake, or BigQuery.

02

Define and manage ExpectationSuites for data quality.

03

Execute data validation checks and retrieve detailed results.

04

Monitor data quality with Prometheus metrics and OpenTelemetry tracing.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/vb-gx-mcp-server | bash

Capabilities

Tools your agent gets

load_csv

Load CSV data from file, URL, or inline content for validation

load_snowflake_table

Load data from Snowflake tables using URI prefix notation

load_bigquery_table

Load data from BigQuery tables using URI prefix notation

define_expectation_suite

Define a new ExpectationSuite with validation rules

modify_expectation_suite

Modify an existing ExpectationSuite's validation rules

validate_data

Execute data quality validation checks synchronously

validate_data_async

Execute data quality validation checks asynchronously

get_validation_results

Retrieve detailed validation results including failed records

Overview

gx-mcp-server MCP Server

What it does

This tool provides Great Expectations data quality validation capabilities through the MCP protocol, enabling AI agents to programmatically load datasets, define validation rules, and execute data quality checks.

How it connects

Use this tool when you need to integrate automated data quality validation into AI workflows, allowing AI agents to interact with Great Expectations. It's ideal for scenarios requiring programmatic control over data validation processes. Do not use this tool if you require direct, non-MCP-based interaction with Great Expectations or if your AI agents do not support the MCP protocol.

Source README

Provides tools for Great Expectations data quality validation through MCP, enabling AI agents to load datasets, define validation rules, and programmatically execute data quality checks.

Installation

Docker (Recommended)

docker run -d -p 8000:8000 --name gx-mcp-server davidf9999/gx-mcp-server:latest

From Source Code

git clone https://github.com/davidf9999/gx-mcp-server && cd gx-mcp-server
just install

PyPI

uv pip install gx-mcp-server

With Snowflake Support

uv pip install -e .[snowflake]

With BigQuery Support

uv pip install -e .[bigquery]

Configuration

STDIO Mode

{
  "mcpServers": {
    "gx-mcp-server": {
      "type": "stdio",
      "command": "uv",
      "args": ["run", "python", "-m", "gx_mcp_server"]
    }
  }
}

HTTP Mode with Authentication

{
  "mcpServers": {
    "gx-mcp-server": {
      "type": "http",
      "url": "https://your-server.com:8000/mcp/",
      "headers": {
        "Authorization": "Basic dXNlcjpwYXNz"
      }
    }
  }
}

Features

  • Load CSV data from file, URL, or inline (up to 1 GB, configurable)
  • Load tables from Snowflake or BigQuery using URI prefixes
  • Define and modify ExpectationSuites
  • Validate data and retrieve detailed results (synchronously or asynchronously)
  • Choose in-memory storage (default) or SQLite for datasets and results
  • Optional Basic authentication or Bearer token for HTTP clients
  • Configure HTTP request rate limiting per minute
  • Restrict sources via allowed-origins configuration
  • Prometheus metrics on configurable metrics port
  • OpenTelemetry tracing via OTLP exporter

Environment Variables

Optional

  • MCP_SERVER_USER - Username for basic authentication
  • MCP_SERVER_PASSWORD - Password for basic authentication
  • MCP_CSV_SIZE_LIMIT_MB - CSV file size limit in MB (1-1024, default 50)
  • GX_ANALYTICS_ENABLED - Disable anonymous Great Expectations usage data transmission (set to false)
  • MCP_SERVER_URL - Server URL for custom clients
  • MCP_AUTH_TOKEN - Authentication token for custom clients

Usage Examples

Load CSV data id,age\n1,25\n2,19\n3,45 and validate ages 21-65, show failed records

Notes

Supports both STDIO and HTTP transport modes. HTTP mode includes authentication options (Basic and Bearer JWT), rate limiting, CORS control, and health endpoints. Can run with Docker for easy deployment. Includes connectors for Snowflake and BigQuery data warehouses with special URI prefixes. Provides Prometheus metrics and OpenTelemetry tracing for production monitoring.

Trust

How it checks out

Official By maintainer
Downloads 0

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.