Validate Data Quality with Great Expectations
gx-mcp-server: AI-powered Great Expectations data quality validation. Load data, define rules, execute checks via MCP.
1.0.0Add to Favorites
Why it matters
Integrate Great Expectations data quality validation into your AI agent's workflow. Programmatically load datasets, define validation rules, and execute checks for robust data governance.
Outcomes
What it gets done
Load data from CSV, Snowflake, or BigQuery.
Define and manage ExpectationSuites for data quality.
Execute data validation checks and retrieve detailed results.
Monitor data quality with Prometheus metrics and OpenTelemetry tracing.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/vb-gx-mcp-server | bash Capabilities
Tools your agent gets
Load CSV data from file, URL, or inline content for validation
Load data from Snowflake tables using URI prefix notation
Load data from BigQuery tables using URI prefix notation
Define a new ExpectationSuite with validation rules
Modify an existing ExpectationSuite's validation rules
Execute data quality validation checks synchronously
Execute data quality validation checks asynchronously
Retrieve detailed validation results including failed records
Overview
gx-mcp-server MCP Server
What it does
This tool provides Great Expectations data quality validation capabilities through the MCP protocol, enabling AI agents to programmatically load datasets, define validation rules, and execute data quality checks.
How it connects
Use this tool when you need to integrate automated data quality validation into AI workflows, allowing AI agents to interact with Great Expectations. It's ideal for scenarios requiring programmatic control over data validation processes. Do not use this tool if you require direct, non-MCP-based interaction with Great Expectations or if your AI agents do not support the MCP protocol.
Source README
Provides tools for Great Expectations data quality validation through MCP, enabling AI agents to load datasets, define validation rules, and programmatically execute data quality checks.
Installation
Docker (Recommended)
docker run -d -p 8000:8000 --name gx-mcp-server davidf9999/gx-mcp-server:latest
From Source Code
git clone https://github.com/davidf9999/gx-mcp-server && cd gx-mcp-server
just install
PyPI
uv pip install gx-mcp-server
With Snowflake Support
uv pip install -e .[snowflake]
With BigQuery Support
uv pip install -e .[bigquery]
Configuration
STDIO Mode
{
"mcpServers": {
"gx-mcp-server": {
"type": "stdio",
"command": "uv",
"args": ["run", "python", "-m", "gx_mcp_server"]
}
}
}
HTTP Mode with Authentication
{
"mcpServers": {
"gx-mcp-server": {
"type": "http",
"url": "https://your-server.com:8000/mcp/",
"headers": {
"Authorization": "Basic dXNlcjpwYXNz"
}
}
}
}
Features
- Load CSV data from file, URL, or inline (up to 1 GB, configurable)
- Load tables from Snowflake or BigQuery using URI prefixes
- Define and modify ExpectationSuites
- Validate data and retrieve detailed results (synchronously or asynchronously)
- Choose in-memory storage (default) or SQLite for datasets and results
- Optional Basic authentication or Bearer token for HTTP clients
- Configure HTTP request rate limiting per minute
- Restrict sources via allowed-origins configuration
- Prometheus metrics on configurable metrics port
- OpenTelemetry tracing via OTLP exporter
Environment Variables
Optional
MCP_SERVER_USER- Username for basic authenticationMCP_SERVER_PASSWORD- Password for basic authenticationMCP_CSV_SIZE_LIMIT_MB- CSV file size limit in MB (1-1024, default 50)GX_ANALYTICS_ENABLED- Disable anonymous Great Expectations usage data transmission (set to false)MCP_SERVER_URL- Server URL for custom clientsMCP_AUTH_TOKEN- Authentication token for custom clients
Usage Examples
Load CSV data id,age\n1,25\n2,19\n3,45 and validate ages 21-65, show failed records
Notes
Supports both STDIO and HTTP transport modes. HTTP mode includes authentication options (Basic and Bearer JWT), rate limiting, CORS control, and health endpoints. Can run with Docker for easy deployment. Includes connectors for Snowflake and BigQuery data warehouses with special URI prefixes. Provides Prometheus metrics and OpenTelemetry tracing for production monitoring.
Trust
How it checks out
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.