SchemaFlow: Agentic Database Change Impact Analysis, SQL Generation, and Eval Guardrails

This cookbook walks through an end-to-end AI-assisted database change workflow using the OpenAI Agents SDK.

It demonstrates how OpenAI’s tooling ecosystem can be applied to orchestrate complex, data-intensive workflows across modern enterprise infrastructures. While the current implementation focuses on a retail-oriented schema change and impact-analysis use case, the underlying architectural patterns are domain-agnostic and extensible. The same workflow design can be adapted across industries such as manufacturing, pharmaceuticals, healthcare, logistics, finance, and supply chain operations — wherever structured data workflows, operational reasoning, retrieval-augmented analysis, and automated validation are required.

The running example is a retail loyalty-tier change, but the same pattern applies to many database-change requests where teams need traceable impact analysis and reviewable implementation output.

The workflow starts from a natural-language database change request, converts it into structured JSON, optionally grounds impact analysis with PDF-based File Search context, generates a safe rollout plan, drafts SQL across data platform layers, validates the output with deterministic guardrails, saves a reusable artifact, and optionally evaluates the flow with Promptfoo.

The notebook is intentionally self-contained: all core workflow logic, prompts, guardrails, artifact generation, and eval runtime files are created from notebook cells.

Overview

Schema changes are deceptively simple. A request like “add a nullable column and backfill it” can affect landing tables, staging models, dimensional tables, marts, reporting logic, lineage assumptions, validation checks, rollback procedures, and release sequencing.

The examples use retail customer data because the dependencies are easy to see, but the same kinds of handoffs show up in many analytics and platform teams.

This cookbook demonstrates a practical pattern for using agents as a change-analysis and implementation assistant for database engineering work. Instead of asking one model to produce a final SQL script directly, the workflow breaks the task into explicit stages:

Parse the natural-language request into structured JSON.
Analyze impacted objects and operational risks.
Create a rollout plan with prechecks, postchecks, and rollback guidance.
Generate SQL across platform layers.
Run deterministic sanity checks.
Save a machine-readable artifact.
Optionally run Promptfoo evals against the current flow.

The result is not just a generated SQL script. It is an auditable bundle containing the interpreted change request, impact analysis, plan, SQL, validation results, optional RAG evidence summaries, and eval outputs.

Why This Matters

Database change requests often move through several handoffs: product owners describe the need, data engineers interpret it, platform teams assess risk, analytics engineers propagate the field downstream, and reviewers check whether the change is safe. Important context can be lost at each step.

SchemaFlow addresses this by turning a free-form change request into a structured, inspectable workflow.

This matters because database changes can create hidden failure modes:

A column added to ODS may not be propagated into staging, core, or marts.
A nullable field may accidentally be generated as NOT NULL.
Backfill logic may be omitted even though the request asks for historical population.
Index requirements may be missed.
Downstream reporting dependencies may be unknown unless reference documentation is consulted.
Generated SQL may look plausible but fail basic consistency checks.

This cookbook shows a pattern for reducing those risks with staged agent reasoning, typed outputs, optional retrieval context, deterministic guardrails, saved artifacts, and repeatable evals.

Key Benefits

Structured interpretation – Converts natural-language database requests into a normalized change_json contract.
Separation of responsibilities – Uses specialized agents for parse, impact analysis, rollout planning, and SQL generation.
Optional RAG grounding – Lets the impact-analysis agent use File Search over an uploaded PDF, such as an IFD, schema spec, or lineage document.
Typed stage outputs – Uses Pydantic models and Agents SDK output schemas for parse, impact, and plan stages.
Guardrail-first workflow – Adds deterministic checks between stages so obvious failures are caught before downstream steps consume bad state.
Traceability – Emits OpenAI Agents SDK traces and spans for agent runs, guardrails, artifact generation, and eval execution.
Portable artifacts – Saves the final workflow bundle as JSON under artifacts/notebook_runs/.
Eval-ready design – Generates Promptfoo provider, assertion, config, and result files from the live notebook state.
No database side effects – Produces draft SQL and validation output without executing against a live database.

What You'll Build

By the end of this notebook, you will have a working SchemaFlow pipeline that produces:

A parsed database change request:
- title
- domain
- target schema
- target table
- normalized operations
- notes
An impact-analysis report:
- impacted tables, columns, indexes, views, or relationships
- risks
- assumptions
- optional File Search evidence summaries
A rollout plan:
- implementation steps
- prechecks
- postchecks
- rollback actions
A draft SQL script with four required sections:
- LANDING (ODS)
- STAGING (STG)
- CORE (DIM/FACT/VIEW)
- MARTS (SERVING)
A validation result:
- expected table checks
- expected column checks
- required keyword checks such as ALTER TABLE, UPDATE, or CREATE INDEX
A saved JSON artifact:
- change request
- impact analysis
- plan
- SQL
- validation
- optional RAG metadata
A Promptfoo eval harness:
- Python provider
- Python assertion file
- generated Promptfoo config
- parse-only eval case
- full-flow eval case
- timestamped JSON and HTML eval reports

Introduction: Use Case and Solution

This cookbook focuses on a common enterprise data-engineering scenario: a stakeholder requests a database schema change in natural language, and the data team needs to turn that request into an implementation-ready plan.

Here, the retail domain is just a concrete way to make the workflow tangible. The same staged approach can be adapted to other source systems, data products, and review processes.

The default request in this notebook is:

Add LOYALTY_TIER VARCHAR(20) to ODS.ODS_CUSTOMER_PROFILE as nullable.
Backfill from CORE.DIM_CUSTOMER on CUSTOMER_ID where IS_CURRENT=true.
Add a non-unique index on (CUSTOMER_ID, LOYALTY_TIER).

A human data engineer would typically need to answer several questions before writing production SQL:

What table and schema are being changed?
What exact column, type, and nullability were requested?
Is historical backfill required?
Does the request imply an index?
Which downstream layers need the field propagated?
What risks should reviewers look for?
What checks should be run before and after deployment?
What rollback steps are reasonable?
Does the generated SQL include the required elements?

SchemaFlow implements this as a staged agent workflow. Each stage creates a typed intermediate output that the next stage consumes. Deterministic checks then validate the outputs before the notebook saves the final bundle and optionally runs evals.

Workflow Overview

At a high level, SchemaFlow follows this sequence:

The notebook is organized so readers can run the core workflow first and then decide whether they want to run the optional Promptfoo evaluation section.

Architecture - Design Patterns

SchemaFlow uses a staged, contract-driven agent architecture. The goal is to avoid treating the model as a single black-box SQL generator. Instead, each stage has a narrow responsibility and produces an output that can be inspected, validated, traced, and reused.

1. Agent Specialization

Each agent performs one primary task:

Agent	Responsibility	Main Output
Parse Agent	Extract structured fields from the natural-language request	`change_json`
Impact Agent	Identify affected objects, assumptions, and risks	`impact_json`
Plan Agent	Convert the change and impact into rollout steps	`plan_json`
SQL Agent	Draft SQL across data platform layers	`sql_text`

This specialization makes the workflow easier to debug. If SQL is missing a column, you can inspect whether the issue started in parsing, impact analysis, planning, or SQL generation.

2. Typed Output Contracts

The notebook defines Pydantic models for the structured stages:

ChangeRequestModel
ImpactModel
PlanModel

Those models are wrapped with AgentOutputSchema so the Agents SDK knows the expected output shape. The workflow also normalizes outputs after model calls to ensure expected keys exist before downstream stages run.

3. Retrieval-Augmented Impact Analysis

The PDF RAG section is optional. When PDF_PATH is set, the notebook:

Creates an OpenAI vector store.
Uploads the PDF.
Lets OpenAI parse, chunk, embed, and index it.
Gives the Impact Agent a FileSearchTool.
Captures a summary of returned File Search results.

This is useful when the change request needs grounding in an IFD, schema document, lineage file, data contract, or architecture reference.

4. Guardrail Gates Between Stages

The notebook adds deterministic checks after major stages:

Stages 1-2 guardrails validate parse and impact outputs.
Stages 3-4 guardrails validate plan completeness, data type propagation, and nullability handling.
Stage 5 SQL checks validate expected table, column, and SQL keyword presence.
Post-artifact checks verify the saved JSON artifact exists and round-trips.
Pre-Promptfoo checks verify the notebook state is ready for evals.

These checks do not replace human review, but they catch common silent failures early.

5. Artifact-Centered Execution

The final bundle is the main workflow artifact. It captures the state needed to review or debug the run:

bundle = {
  "summary": ...,
  "rag": ...,
  "change_json": ...,
  "impact_json": ...,
  "plan": ...,
  "sql": ...,
  "validation": ...
}

The notebook saves this bundle under artifacts/notebook_runs/.

6. Eval Runtime Generated from Notebook State

Promptfoo runs in a separate process, so it cannot directly read variables from the active notebook kernel. To solve this, Section 10 writes a small reusable Python module and Promptfoo runtime files from the current notebook state.

This ensures that prompt edits, CHANGE_TEXT edits, and model configuration changes are reflected when the eval files are regenerated.

System Design

Component Architecture

Primary Runtime Objects

Object	Created in	Purpose
`CHANGE_TEXT`	Input section	The natural-language database change request
`change_json`	Stage 1	Structured interpretation of the request
`rag_vector_store_id`	Optional PDF RAG section	Hosted vector store ID for uploaded PDF context
`rag_file_search_results`	Stage 2	Summary of File Search results returned to the Impact Agent
`impact_json`	Stage 2	Impacted objects, risks, and assumptions
`plan_json`	Stage 3	Rollout plan, checks, and rollback guidance
`sql_text`	Stage 4	Draft SQL script
`validation`	Stage 5	Deterministic SQL sanity-check result
`bundle`	Final Bundle section	Consolidated workflow output
`out_path`	Save Artifact section	Saved JSON artifact path
`promptfoo_config`	Promptfoo section	Generated eval configuration

Important Boundary

SchemaFlow generates draft implementation artifacts. It does not execute SQL against a database, apply migrations, open pull requests, or modify production systems.

Execution Workflow

Run the notebook in order.

Core Workflow

Environment Setup
- Imports dependencies.
- Verifies the OpenAI Agents SDK version.
- Reads OPENAI_API_KEY.
- Configures tracing and model selection.
Input
- Defines CHANGE_TEXT.
- This is the only required business input for the core workflow.
Optional PDF RAG Context
- Leave PDF_PATH = None to run without retrieval.
- Set PDF_PATH to a local PDF to enable File Search context for impact analysis.
Stages 1-2
- Parse the change request.
- Analyze impact.
- Optionally use File Search during impact analysis.
Stages 1-2 Guardrails
- Confirm parse output is well-formed.
- Confirm impact output includes the target.
- Confirm impacted objects contain required fields.
Stages 3-4
- Generate an execution plan.
- Generate SQL across landing, staging, core, and mart layers.
Stages 3-4 Guardrails
- Confirm plan sections are populated.
- Confirm data type propagation.
- Confirm nullability behavior matches the request.
Stage 5 SQL Sanity Checks
- Check for empty SQL.
- Check expected target table and columns.
- Check required SQL actions implied by the request.
Final Bundle and Artifact
- Assemble the full output bundle.
- Save it as JSON.
- Verify the artifact round-trips successfully.

Optional Eval Workflow

Pre-Promptfoo Checks
- Confirm the notebook state is ready for evals.
Promptfoo Runtime Generation
- Create a reusable SchemaFlow core module.
- Write a Promptfoo provider.
- Write a Promptfoo assertion file.
- Generate Promptfoo test cases and config.
Promptfoo Eval Execution
- Run parse-only and full-flow evals.
- Save timestamped JSON and HTML reports.
- Refresh schemaflow_cookbook_eval_latest.* aliases.

1) Environment Setup

This section prepares the runtime for the SchemaFlow workflow.

The setup cell does the following:

Imports standard Python utilities used throughout the notebook.
Imports the OpenAI client.
Imports the OpenAI Agents SDK primitives:
- Agent
- Runner
- RunConfig
- AgentOutputSchema
- FileSearchTool
- tracing and span helpers
Verifies that the installed openai-agents package meets the minimum required version.
Reads OPENAI_API_KEY from the environment or prompts for it.
Sets the model with OPENAI_MODEL, defaulting to gpt-5.5.
Creates a trace group ID so all related agent runs and guardrail spans can be grouped together.

The workflow intentionally enables sensitive trace payloads for this demo so prompts, outputs, eval bundles, and tool data are visible in traces. For production usage, review this setting before handling private data.

2) Input

This section defines the database change request that SchemaFlow will process. Think of it as the compact ticket, issue, or message a data team might receive before turning the request into implementation details.

The default request asks the workflow to:

Add LOYALTY_TIER VARCHAR(20) to ODS.ODS_CUSTOMER_PROFILE.
Treat the new column as nullable.
Backfill from CORE.DIM_CUSTOMER.
Join on CUSTOMER_ID.
Filter the source to current records with IS_CURRENT=true.
Add a non-unique index on (CUSTOMER_ID, LOYALTY_TIER).

This input is intentionally compact but rich enough to exercise the full workflow:

parsing target schema and table
extracting column name, type, and nullability
recognizing backfill requirements
recognizing index requirements
generating multi-layer SQL
running validation checks for expected table, column, and SQL actions

3) Optional PDF RAG Context

SchemaFlow can run with or without retrieval context, so readers can start with the request alone and add reference docs only when the change needs them.

The sample PDF path in the code cell below points to a file included in the cookbook folder under data/, not to bytes embedded inside the notebook. Leave PDF_PATH = None for static article previews or generic runs.

With the default PDF_PATH = None, the notebook uses only the natural-language change request. This is enough to demonstrate the core staged workflow.

Set PDF_PATH to a local PDF when you want the Impact Agent to ground its analysis in reference material, such as:

interface design documents
schema specifications
lineage documentation
data contracts
platform architecture notes
downstream dependency documentation

When a PDF is configured, this section:

Validates that the file exists and is a PDF.
Creates an OpenAI vector store with a one-day expiration policy.
Uploads the PDF to the vector store.
Lets OpenAI handle parsing, chunking, embedding, and retrieval.
Stores the vector store ID for the Impact Agent.
Later summarizes any File Search results returned during impact analysis.

This keeps the cookbook lightweight because it does not require local embedding models, Chroma, Neo4j, LangGraph, or project-specific Python modules.

4) Stages 1-2 - Parse Change Request + Impact Analysis

This section runs the first two agent stages back to back. Together, they answer two practical questions: what exactly was requested, and what else could be affected?

Stage 1: Parse Change Request

The Parse Agent converts CHANGE_TEXT into a structured change_json object.

Expected fields include:

title
domain
target_schema
target_table
operations
notes

This stage creates the normalized contract that every downstream stage consumes. If the parse step misses the target table, column, data type, nullability, backfill, or index intent, later stages may produce incomplete output. That is why the notebook validates this stage immediately afterward.

Stage 2: Impact Analysis

The Impact Agent consumes change_json and produces impact_json.

Expected fields include:

impacted_objects
risks
assumptions

If PDF_PATH was configured earlier, the Impact Agent also receives a FileSearchTool connected to the uploaded PDF vector store. This lets the model search reference documentation before returning impact claims.

The output is intentionally conservative. When the agent is uncertain, it should call out assumptions and risks instead of inventing undocumented certainty.

Impact Dashboard Preview

The impact-analysis stage produces structured impact_json that can be visualized as a graph of affected objects and relationships.

The preview below shows the kind of customer loyalty lineage graph built in the optional Neo4j dashboard section later in the notebook. Run that section to generate the local graph UI from the sample knowledge-graph seed and inspect impacted objects interactively.

Impact dashboard graph preview

Stages 1-2 Output Guardrails

This guardrail cell performs deterministic checks on the Parse and Impact outputs before the workflow continues.

The checks verify that:

change_json contains a target schema.
change_json contains a target table.
change_json.operations is a non-empty list.
impact_json.impacted_objects contains at least one object.
The impact output references the parsed target table.
Each impacted object has basic required fields such as type, name, and reason.

These checks are deliberately lightweight. They do not prove that the analysis is complete, but they catch obvious failure modes before the Plan Agent or SQL Agent consumes malformed or incomplete state.

5) Stages 3-4 - Execution Plan + SQL Generation

This section runs the implementation-planning and SQL-generation stages. At this point the workflow shifts from understanding the request to drafting an implementation handoff.

Stage 3: Execution Plan

The Plan Agent consumes:

change_json
impact_json

It returns plan_json with four sections:

plan_steps
prechecks
postchecks
rollback

The goal is to make the implementation strategy explicit before generating SQL. This helps separate “what should be done” from “what exact SQL should be drafted.”

Stage 4: SQL Generation

The SQL Agent consumes:

change_json
plan_json

It returns a single plaintext SQL script. The prompt requires four sections in order:

-- === LANDING (ODS) ===
-- === STAGING (STG) ===
-- === CORE (DIM/FACT/VIEW) ===
-- === MARTS (SERVING) ===

The generated SQL is intended as a reviewable draft. It should be checked by engineers before any production use.

Stages 3-4 Output Guardrails

This guardrail cell validates the plan and SQL draft before the notebook moves to the final SQL sanity checks.

The checks verify that:

all four plan sections are populated:
- plan_steps
- prechecks
- postchecks
- rollback
the data type requested in CHANGE_TEXT appears in the generated SQL
nullable requests do not accidentally create NOT NULL constraints
explicit NOT NULL requests are reflected when present

These checks complement Stage 5. Stages 3-4 guardrails focus on plan completeness and semantic consistency, while Stage 5 focuses on expected SQL terms and actions.

6) Stage 5 - Lightweight SQL Sanity Checks

This section runs deterministic checks against the generated SQL for the current notebook run.

This is not a full SQL parser and it does not execute the SQL. Instead, it checks for obvious mismatches between the original request, parsed change object, and generated script. These checks are intentionally small and explainable, so a reader can see exactly what passed or failed before the result is saved or evaluated.

The checks look for:

empty SQL output
missing target table
missing expected columns
required SQL keywords inferred from the request:
- ALTER TABLE
- UPDATE when the request implies backfill or source-based population
- CREATE INDEX when the request mentions an index

The output is stored in validation, which becomes part of the final bundle and is also used by the Promptfoo full-flow assertion.

7) Final Bundle

This section assembles the main SchemaFlow output object.

The final bundle contains:

summary
rag
change_json
impact_json
plan
sql
validation

This object is the reviewable handoff artifact for the notebook run. It collects the model-generated outputs, deterministic validation results, and optional retrieval metadata in one place, so a reviewer does not have to reconstruct the flow from separate cells.

The printed summary gives a compact view of the most important run-level information:

parsed title
parsed target
number of RAG hits
number of plan steps
validation status
validation issues

8) Save Artifact

This section writes the final bundle to disk as JSON.

Artifacts are saved under:

artifacts/notebook_runs/

Each run receives a timestamped filename, which makes it easy to compare outputs across different prompts, models, inputs, or retrieval documents.

The saved artifact is useful for:

code review
audit trails
debugging
regression comparison
eval fixture creation
downstream automation

Post-Artifact Generation Sanity Check

This cell verifies that the saved artifact is usable.

It checks that:

the artifact file exists
the artifact file is non-empty
the file can be loaded with json.loads
the top-level keys on disk match the in-memory bundle

This catches file-write issues immediately instead of letting a later review, eval, or automation step consume a missing or malformed artifact.

9) Optional Cleanup

This section handles cleanup for the optional PDF vector store.

By default, DELETE_VECTOR_STORE_AFTER_RUN = False.

That default is safe for interactive notebook usage because the vector store is created with a one-day expiration policy. Keeping it temporarily can be useful if you want to inspect traces, rerun downstream stages, or debug File Search behavior.

Set DELETE_VECTOR_STORE_AFTER_RUN = True before running this cell if you want to delete the vector store immediately after the notebook run.

If no PDF was configured, this cell simply reports that no vector store was created.

Pre-Promptfoo Checks / Guardrails

This cell is the readiness gate before running Promptfoo.

Promptfoo runs the workflow in a separate process, so it is important to confirm that the notebook state is complete and internally consistent before generating eval files.

The preflight checks verify that:

bundle exists in the notebook kernel.
bundle reflects the current change_json and plan_json.
Stage 5 validation passed.
Stages 1-2 guardrails passed.
Stages 3-4 guardrails passed.
The saved artifact sanity check passed.
CHANGE_TEXT is consistent with the parsed bundle target.
OPENAI_API_KEY is present.
The installed Agents SDK version meets the minimum requirement.

If this section reports failures, rerun or fix the earlier notebook sections before running Promptfoo.

10) Evaluate the Flow with Promptfoo

Promptfoo is now part of OpenAI. This section uses Promptfoo's Jupyter/Colab pattern to run evals from notebook cells while keeping the SchemaFlow logic readable in Python. Promptfoo itself runs via Node.js, and the evaluated flow is provided through Promptfoo's Python file:// provider and Python assertion integrations.

This optional section turns the notebook workflow into a repeatable eval.

The core notebook run validates one live example. Promptfoo adds a reusable eval harness that can run parse-only and full-flow checks using generated provider and assertion files, which is useful when you want to keep the same workflow stable as prompts, models, or inputs change.

Because Promptfoo launches a separate Python process, it cannot directly access variables that only exist inside the active notebook kernel. To solve that, the next cells publish runtime files from the current notebook state:

a reusable schemaflow_cookbook_core.py module
a Python Promptfoo provider
a Python Promptfoo assertion file
generated eval cases
a generated Promptfoo config

This section includes three validation layers:

Input preflight
- deterministic checks before writing the config
- no model calls
Parse-only eval
- checks Stage 1 behavior
- verifies target, operation presence, expected added column, and expected data type
Full-flow eval
- checks downstream impact, SQL terms, and validation status

Eval results are printed in the notebook and exported as timestamped JSON and HTML files under:

artifacts/promptfoo/results/

The latest successful run also refreshes:

schemaflow_cookbook_eval_latest.json
schemaflow_cookbook_eval_latest.html

Runtime note: the core SchemaFlow cells require Python and an OpenAI API key. The Promptfoo cells additionally require Node.js and npm in the same executable notebook runtime.

After the eval runs, Promptfoo provides a compact view of the current change request, expected fields, parse-only check, and full-flow check.

Use this view to answer questions such as:

Did the Parse Agent extract the expected target table?
Did it detect the expected added column?
Did it preserve the requested data type?
Did the full flow produce impact risks?
Did the SQL include required terms?
Did deterministic validation pass?

Promptfoo eval results view

Promptfoo Runtime Directory Setup

This cell creates notebook-local directories for Promptfoo config, logs, cache, npm cache, and results.

Keeping these directories under artifacts/promptfoo/ makes the eval runtime portable and avoids relying on global Promptfoo state under the user’s home directory.

The cell also exports environment variables so the generated provider, assertion, and Promptfoo command all use the same trace group and local runtime paths.

Node.js and npm Runtime Check

Promptfoo runs through Node.js, even though the SchemaFlow provider and assertion logic are written in Python.

This cell verifies that the notebook runtime has a supported node and npm available.

The check is intentionally explicit. The notebook does not silently install or upgrade Node because that depends on the execution environment.

For local macOS notebooks, the cell prefers a supported nvm Node runtime before common Homebrew paths. This helps ensure that the notebook and terminal use the same Node ABI and avoids stale native dependencies.

If this check fails, fix the runtime first and then rerun the Promptfoo section.

Publish SchemaFlow Core Runtime

Promptfoo runs the evaluated flow in a separate Python process. This cell writes a reusable Python module named:

artifacts/promptfoo/schemaflow_cookbook_core.py

The generated module contains the same core SchemaFlow logic used by the notebook:

Pydantic models
Agents SDK setup
output normalization helpers
Parse Agent execution
Impact Agent execution
optional PDF vector store creation
Plan Agent execution
SQL Agent execution
SQL validation
parse-only eval entrypoint
full-flow eval entrypoint

The prompt strings are injected from the current notebook variables. That means if you edit the Parse, Impact, Plan, or SQL prompts above and rerun this cell, the Promptfoo runtime receives the updated prompts.

Promptfoo Provider Runtime

This cell writes the Promptfoo provider file:

artifacts/promptfoo/schemaflow_cookbook_eval_provider.py

The provider is the bridge between Promptfoo and SchemaFlow.

For each Promptfoo test case, it reads variables such as:

change_text
eval_mode
optional pdf_path
optional rag_max_results
validation keywords

Then it chooses one of two execution paths:

parse_only runs only Stage 1 and returns a parse bundle.
full_flow runs the complete SchemaFlow pipeline and returns the full bundle.

The provider returns JSON so Promptfoo assertions can inspect structured fields instead of parsing notebook text output.

Promptfoo Assertion Runtime

This cell writes the Promptfoo assertion file:

artifacts/promptfoo/schemaflow_cookbook_eval_assert.py

The assertion file validates provider output for both eval modes.

For parse_only, it checks:

output is valid JSON
target schema and table match expectations
at least one parsed operation is present
expected added column appears in parsed operations
expected data type appears structurally in parsed operations

For full_flow, it checks:

output is valid JSON
target schema and table match expectations
at least one parsed operation is present
impact risks are present
required SQL terms are present
validation passed

The assertion also emits guardrail spans so eval failures are visible in traces.

Build Promptfoo Test Cases and Config

This cell builds Promptfoo test cases from the current notebook input.

By default, it creates two test cases from the current CHANGE_TEXT, carrying through PDF_PATH when a PDF is configured:

a parse-only test
a full-flow test

The helper functions infer expectations from the change request, including:

expected schema
expected table
expected added column
expected data type
expected SQL terms
expected validation keywords

The cell also includes optional regression fixtures. Set:

RUN_EXTRA_REGRESSION_CASES = True

to add those extra cases to the generated config.

Before writing promptfooconfig.yaml, the cell runs deterministic input preflight checks. This prevents obviously malformed eval inputs from producing confusing Promptfoo failures.

Run Promptfoo Eval

This cell runs Promptfoo non-interactively from the notebook.

The command:

runs from artifacts/promptfoo/
uses the generated promptfooconfig.yaml
uses notebook-local Promptfoo config, cache, logs, and npm cache
runs with concurrency 1 for predictable notebook behavior
keeps CLI output visible in the notebook
writes timestamped JSON and HTML reports
refreshes latest-result aliases after a successful run

The exported result files are saved under:

artifacts/promptfoo/results/

If the Node.js/npm runtime check failed earlier, fix the runtime before running this cell.

Review Latest Promptfoo Results

This cell checks whether the latest Promptfoo result aliases exist and prints their paths and sizes.

Expected files:

artifacts/promptfoo/results/schemaflow_cookbook_eval_latest.json
artifacts/promptfoo/results/schemaflow_cookbook_eval_latest.html

If the latest JSON file exists, the cell also prints available eval metadata such as the eval ID and aggregate stats.

Use this section as a quick confirmation that the eval completed and exported artifacts successfully.

11) Optional Neo4j Knowledge Graph & Dashboard

This optional section is fully self-contained and does not affect the core pipeline above. It uses a small synthetic customer-loyalty graph seed plus inline dashboard code so the cookbook stays portable. Readers can treat it as a visual appendix: the core workflow works without Neo4j, but graph views make lineage and downstream impact easier to inspect.

What this section does, in order:

Step 1 - Seed: define a synthetic customer-loyalty graph with ODS, staging, core, mart, and CRM objects, their columns, lineage, and joins as an inline Python data structure - no external files.
Step 2 - AI Enrichment: use the OpenAI client already loaded in Section 1 to fill in semantic_meaning (a short 2-5 word tag like natural-key, foreign-key, monetary-amount, timestamp) for every column.
Step 3 - Upsert to Neo4j: write the enriched data to a running Neo4j instance via idempotent MERGE Cypher. Nodes are labeled SchemaFlowCookbook so the dashboard ignores stale sample data from older local runs.
Dashboard: write a small FastAPI server + D3.js page next to the notebook, launch it on http://127.0.0.1:8005, and print a clickable link.

Prerequisites for this section:

A running Neo4j instance, e.g. via Docker: docker run -d -p 7687:7687 -p 7474:7474 -e NEO4J_AUTH=neo4j/change-me-please neo4j:5 (Neo4j Desktop or AuraDB free tier also work).
NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD (loaded via env, or entered at the prompt below).
A free local port 8005 for the dashboard (override via NEO4J_DASHBOARD_PORT).
Optional packages neo4j, fastapi, uvicorn - the next cell will install them lazily if missing.

If any prerequisite is missing, every cell below short-circuits with a clear message; nothing throws and the rest of the notebook is unaffected.

11.1) Environment Setup & Optional Dependencies

Mirrors Section 1's OpenAI env loading pattern. Lazy-installs neo4j, fastapi, and uvicorn only when they are not importable, then reads NEO4J_URI, NEO4J_USER, and NEO4J_PASSWORD from the environment (or prompts via getpass if missing). If you press Enter at any prompt without typing, the section is disabled (NEO4J_SECTION_ENABLED = False) and the remaining cells skip cleanly.

11.2) Step 1 - Seed: Define the Knowledge Graph Data

Builds the in-memory data structure for the graph: schemas, tables (with description, primary_key), columns (with type, nullable, is_primary_key, optional description, and a semantic_meaning placeholder to be filled by AI in the next step), foreign keys, views, lineage edges (DERIVED_FROM), and joins.

This inline synthetic retail graph is aligned to the cookbook change request: LOYALTY_TIER is added to ODS.ODS_CUSTOMER_PROFILE, sourced from CORE.DIM_CUSTOMER, and propagated into downstream staging, core, mart, and CRM consumers.

No Neo4j calls here - this cell only prepares Python data structures. Nothing is written until Step 3.

11.3) Step 2 - AI Enrichment

Uses the OpenAI client and MODEL already initialized in Section 1 to generate a short semantic_meaning tag (2-5 words, e.g. natural-key, foreign-key, monetary-amount, timestamp, descriptive-text) for every column whose value is currently None. One LLM call per column, plain-text response. The prompt is defined inline so this cell remains self-contained.

Cost control: capped at MAX_ENRICH_COLS = 30 columns per run (override via env SEED_AI_ENRICH_LIMIT). Set SEED_AI_ENRICH=0 to skip enrichment entirely. Per-column failures are caught and logged; the cell never raises. Skipped entirely if Section 11 was disabled in Step 0.

11.4) Step 3 - Upsert the Enriched Graph to Neo4j

Opens a Neo4j driver using the env vars loaded in Step 0, creates uniqueness constraints, then performs idempotent MERGE upserts for:

Schemas (Schema nodes with :CONTAINS edges to Tables/Views)
Tables with description
Columns with type, nullable, is_primary_key, description, semantic_meaning
Foreign Keys as Column-[:FK_TO]->Column
Views with description
Lineage as (Table|View)-[:DERIVED_FROM]->(Table|View)
Joins as (Table|View)-[:JOINS]->(Table|View)

The Cypher is intentionally written inline so this cell remains self-contained. The MERGE pattern matches the inline seed structures above, and every cookbook node receives a SchemaFlowCookbook label so re-running refreshes this sample graph without exposing older local seed data in the dashboard.

The whole operation is wrapped in try/except - if Neo4j is unreachable, the cell prints a clear diagnostic and returns without raising.

11.5) Launch the Local Dashboard

The next six cells set up and launch a small FastAPI + D3.js dashboard pointed at your Neo4j instance. Files live in _neo4j_dashboard/ next to this notebook so you can read and edit them in place - no base64, no encoding tricks.

Prep cell: mkdir _neo4j_dashboard/ and pick the dashboard port (default 8005, override via NEO4J_DASHBOARD_PORT).
%%writefile _neo4j_dashboard/graph_server.py: FastAPI backend - exposes GET /api/knowledge-graph (read), POST /api/update-column-semantic-meaning (write), POST /api/update-relationship, POST /api/delete-relationship.
%%writefile _neo4j_dashboard/index.html: DOM skeleton with the stats bar, SVG graph container, sidebar, and AI Semantic Insights panel (with Edit / Save buttons).
%%writefile _neo4j_dashboard/style.css: All visual styling.
%%writefile _neo4j_dashboard/script.js: D3 force simulation, click handlers, fetch calls. The Edit button turns Semantic Meaning values into inputs; Save POSTs each one to the backend, which MATCH ... SET c.semantic_meaning = ... against Neo4j.
Launch cell: starts uvicorn graph_server:app via subprocess.Popen on 127.0.0.1:8005, polls GET /api/knowledge-graph for up to 20 s, then prints a clickable Markdown link only when the graph API is ready, plus the server PID.

These cells generate a small local dashboard app next to the notebook, using semantic_meaning as the editable column insight field. Open any generated file in the notebook workspace to tweak colors, add endpoints, etc. - changes take effect the next time you re-run the writefile and launch cells. Re-running the launch cell restarts the prior dashboard process from the same kernel so the browser is not left on stale graph data.

To stop the dashboard later, run kill <pid> (the PID is printed by the launch cell) - or simply restart the Jupyter kernel.

Notes, Assumptions, and Extension Points

What This Cookbook Is Designed For

This cookbook is a compact, self-contained demonstration of an agentic database-change workflow. It is meant to show how to combine:

staged agent orchestration
typed intermediate outputs
optional retrieval context
deterministic guardrails
trace instrumentation
artifact generation
Promptfoo evals

The workflow is intentionally portable and does not depend on external project-specific modules, local vector databases, graph databases, or a live data warehouse connection. The retail example is deliberately concrete, but the same staged pattern can be adapted to other domains where schema changes need analysis, implementation planning, and review.

What This Cookbook Does Not Do

This notebook does not:

execute SQL against a database
inspect a live schema catalog
open pull requests
modify production infrastructure
validate SQL with a database parser
enforce organization-specific migration standards
replace human code review

The generated SQL should be treated as a draft for review.

Tracing and Sensitive Data

The notebook defaults to redacted traces for publication hygiene. Set OPENAI_AGENTS_TRACE_INCLUDE_SENSITIVE_DATA=true only for non-sensitive demo runs where retaining prompt and output payloads is intentional.

Before adapting this for sensitive enterprise data, review:

OPENAI_AGENTS_TRACE_INCLUDE_SENSITIVE_DATA
API key handling
PDF upload policy
data retention expectations
access controls around generated artifacts

Optional PDF RAG

The optional RAG path uses OpenAI Vector Stores and the Agents SDK FileSearchTool.

This is useful when impact analysis should be grounded in a reference PDF such as:

an interface design document
an implementation field definition
schema documentation
lineage documentation
downstream dependency notes

The vector store is configured to expire automatically after one day of inactivity. You can also delete it immediately in the cleanup section.

Promptfoo Eval Path

The Promptfoo section generates runtime files under:

artifacts/promptfoo/

The generated core module injects the current notebook prompt strings, so prompt edits and CHANGE_TEXT edits are reflected after rerunning the generation cells.

By default, the eval flow includes:

deterministic input preflight
one parse-only eval for the current request
one full-flow eval for the current request

Set RUN_EXTRA_REGRESSION_CASES = True to include the optional regression fixtures.

Suggested Production Extensions

For a production-grade implementation, consider adding:

schema catalog lookup
database-specific SQL validation
SQL formatting and linting
migration framework integration
pull request creation
data lineage graph integration
approval workflows
policy checks for destructive changes
richer eval suites
golden test fixtures
organization-specific SQL templates
environment-specific deployment plans

Recommended Review Checklist

Before using generated output for real implementation, review:

parsed target schema and table
parsed operations
data type and nullability
backfill logic
index strategy
downstream propagation assumptions
prechecks and postchecks
rollback feasibility
generated SQL dialect compatibility
validation issues
trace output
saved artifact contents
Promptfoo eval results

Contributors

This cookbook serves as a joint collaboration between OpenAI and Altimetrik.

Use this prompt chain