Back to catalog

Data Engineering

Complete stack for building data pipelines. ETL, data warehouses, orchestration, and data quality.

Who This Stack Is For

For data engineers and analysts building data processing pipelines.

What's Included

MCP Servers

PostgreSQL — OLTP database. Transactions, data source.

ClickHouse — OLAP database for analytics. Fast aggregations on large datasets.

SQLite — lightweight database for local development and testing.

Airflow — pipeline orchestration. DAGs, scheduling, monitoring.

Skills

Airflow DAG Builder — creating DAGs for task orchestration.

Change Data Capture — capturing changes from sources.

BigQuery Partitioning — optimizing table partitioning.

Agents

Data Engineer — building reliable data pipelines.

Database Optimizer — optimizing queries and schemas.

Analytics Reporter — creating analytical reports.

How to Use

  1. Define your data sources
  2. Create a DAG for ETL processes
  3. Set up CDC for incremental loading
  4. Optimize queries with Database Optimizer

Example Prompt

Create an Airflow DAG for an ETL pipeline:
- Source: PostgreSQL (orders, products, users)
- Destination: ClickHouse (data warehouse)
- Schedule: every hour
- Logic: incremental loading by updated_at
- Alerts: Slack on errors

Data Pipeline Architecture

┌────────────┐     ┌────────────┐     ┌────────────┐
│ PostgreSQL │     │   MySQL    │     │    API     │
│   (OLTP)   │     │   (OLTP)   │     │  Sources   │
└─────┬──────┘     └─────┬──────┘     └─────┬──────┘
      │                  │                  │
      └──────────────────┼──────────────────┘
                         │
                  ┌──────▼──────┐
                  │   Airflow   │
                  │  (Extract)  │
                  └──────┬──────┘
                         │
                  ┌──────▼──────┐
                  │  Transform  │
                  │   (dbt)     │
                  └──────┬──────┘
                         │
                  ┌──────▼──────┐
                  │ ClickHouse  │
                  │   (OLAP)    │
                  └──────┬──────┘
                         │
                  ┌──────▼──────┐
                  │  Dashboards │
                  │  (Metabase) │
                  └─────────────┘

Results

  • Reliable data pipelines
  • Real-time analytics
  • Optimized queries
  • Data quality monitoring

Comments (0)

Sign In Sign in to leave a comment.