Back to catalog
Data Engineering
Complete stack for building data pipelines. ETL, data warehouses, orchestration, and data quality.
Who This Stack Is For
For data engineers and analysts building data processing pipelines.
What's Included
MCP Servers
PostgreSQL — OLTP database. Transactions, data source.
ClickHouse — OLAP database for analytics. Fast aggregations on large datasets.
SQLite — lightweight database for local development and testing.
Airflow — pipeline orchestration. DAGs, scheduling, monitoring.
Skills
Airflow DAG Builder — creating DAGs for task orchestration.
Change Data Capture — capturing changes from sources.
BigQuery Partitioning — optimizing table partitioning.
Agents
Data Engineer — building reliable data pipelines.
Database Optimizer — optimizing queries and schemas.
Analytics Reporter — creating analytical reports.
How to Use
- Define your data sources
- Create a DAG for ETL processes
- Set up CDC for incremental loading
- Optimize queries with Database Optimizer
Example Prompt
Create an Airflow DAG for an ETL pipeline:
- Source: PostgreSQL (orders, products, users)
- Destination: ClickHouse (data warehouse)
- Schedule: every hour
- Logic: incremental loading by updated_at
- Alerts: Slack on errors
Data Pipeline Architecture
┌────────────┐ ┌────────────┐ ┌────────────┐
│ PostgreSQL │ │ MySQL │ │ API │
│ (OLTP) │ │ (OLTP) │ │ Sources │
└─────┬──────┘ └─────┬──────┘ └─────┬──────┘
│ │ │
└──────────────────┼──────────────────┘
│
┌──────▼──────┐
│ Airflow │
│ (Extract) │
└──────┬──────┘
│
┌──────▼──────┐
│ Transform │
│ (dbt) │
└──────┬──────┘
│
┌──────▼──────┐
│ ClickHouse │
│ (OLAP) │
└──────┬──────┘
│
┌──────▼──────┐
│ Dashboards │
│ (Metabase) │
└─────────────┘
Results
- Reliable data pipelines
- Real-time analytics
- Optimized queries
- Data quality monitoring
