SchemaFlow: Agentic Database Change Impact Analysis, SQL Generation, and Eval Guardrails
---
Use this prompt chain
SchemaFlow: Agentic Database Change Impact Analysis, SQL Generation, and Eval Guardrails
This cookbook walks through an end-to-end AI-assisted database change workflow using the OpenAI Agents SDK.
It demonstrates how OpenAI’s tooling ecosystem can be applied to orchestrate complex, data-intensive workflows across modern enterprise infrastructures. While the current implementation focuses on a retail-oriented schema change and impact-analysis use case, the underlying architectural patterns are domain-agnostic and extensible. The same workflow design can be adapted across industries such as manufacturing, pharmaceuticals, healthcare, logistics, finance, and supply chain operations — wherever structured data workflows, operational reasoning, retrieval-augmented analysis, and automated validation are required.
The running example is a retail loyalty-tier change, but the same pattern applies to many database-change requests where teams need traceable impact analysis and reviewable implementation output.
The workflow starts from a natural-language database change request, converts it into structured JSON, optionally grounds impact analysis with PDF-based File Search context, generates a safe rollout plan, drafts SQL across data platform layers, validates the output with deterministic guardrails, saves a reusable artifact, and optionally evaluates the flow with Promptfoo.
The notebook is intentionally self-contained: all core workflow logic, prompts, guardrails, artifact generation, and eval runtime files are created from notebook cells.
Overview
Schema changes are deceptively simple. A request like “add a nullable column and backfill it” can affect landing tables, staging models, dimensional tables, marts, reporting logic, lineage assumptions, validation checks, rollback procedures, and release sequencing.
The examples use retail customer data because the dependencies are easy to see, but the same kinds of handoffs show up in many analytics and platform teams.
This cookbook demonstrates a practical pattern for using agents as a change-analysis and implementation assistant for database engineering work. Instead of asking one model to produce a final SQL script directly, the workflow breaks the task into explicit stages:
- Parse the natural-language request into structured JSON.
- Analyze impacted objects and operational risks.
- Create a rollout plan with prechecks, postchecks, and rollback guidance.
- Generate SQL across platform layers.
- Run deterministic sanity checks.
- Save a machine-readable artifact.
- Optionally run Promptfoo evals against the current flow.
The result is not just a generated SQL script. It is an auditable bundle containing the interpreted change request, impact analysis, plan, SQL, validation results, optional RAG evidence summaries, and eval outputs.
Why This Matters
Database change requests often move through several handoffs: product owners describe the need, data engineers interpret it, platform teams assess risk, analytics engineers propagate the field downstream, and reviewers check whether the change is safe. Important context can be lost at each step.
SchemaFlow addresses this by turning a free-form change request into a structured, inspectable workflow.
This matters because database changes can create hidden failure modes:
- A column added to ODS may not be propagated into staging, core, or marts.
- A nullable field may accidentally be generated as
NOT NULL. - Backfill logic may be omitted even though the request asks for historical population.
- Index requirements may be missed.
- Downstream reporting dependencies may be unknown unless reference documentation is consulted.
- Generated SQL may look plausible but fail basic consistency checks.
This cookbook shows a pattern for reducing those risks with staged agent reasoning, typed outputs, optional retrieval context, deterministic guardrails, saved artifacts, and repeatable evals.
Key Benefits
- Structured interpretation – Converts natural-language database requests into a normalized
change_jsoncontract. - Separation of responsibilities – Uses specialized agents for parse, impact analysis, rollout planning, and SQL generation.
- Optional RAG grounding – Lets the impact-analysis agent use File Search over an uploaded PDF, such as an IFD, schema spec, or lineage document.
- Typed stage outputs – Uses Pydantic models and Agents SDK output schemas for parse, impact, and plan stages.
- Guardrail-first workflow – Adds deterministic checks between stages so obvious failures are caught before downstream steps consume bad state.
- Traceability – Emits OpenAI Agents SDK traces and spans for agent runs, guardrails, artifact generation, and eval execution.
- Portable artifacts – Saves the final workflow bundle as JSON under
artifacts/notebook_runs/. - Eval-ready design – Generates Promptfoo provider, assertion, config, and result files from the live notebook state.
- No database side effects – Produces draft SQL and validation output without executing against a live database.
What You'll Build
By the end of this notebook, you will have a working SchemaFlow pipeline that produces:
A parsed database change request:
- title
- domain
- target schema
- target table
- normalized operations
- notes
An impact-analysis report:
- impacted tables, columns, indexes, views, or relationships
- risks
- assumptions
- optional File Search evidence summaries
A rollout plan:
- implementation steps
- prechecks
- postchecks
- rollback actions
A draft SQL script with four required sections:
LANDING (ODS)STAGING (STG)CORE (DIM/FACT/VIEW)MARTS (SERVING)
A validation result:
- expected table checks
- expected column checks
- required keyword checks such as
ALTER TABLE,UPDATE, orCREATE INDEX
A saved JSON artifact:
- change request
- impact analysis
- plan
- SQL
- validation
- optional RAG metadata
A Promptfoo eval harness:
- Python provider
- Python assertion file
- generated Promptfoo config
- parse-only eval case
- full-flow eval case
- timestamped JSON and HTML eval reports
Introduction: Use Case and Solution
This cookbook focuses on a common enterprise data-engineering scenario: a stakeholder requests a database schema change in natural language, and the data team needs to turn that request into an implementation-ready plan.
Here, the retail domain is just a concrete way to make the workflow tangible. The same staged approach can be adapted to other source systems, data products, and review processes.
The default request in this notebook is:
Add LOYALTY_TIER VARCHAR(20) to ODS.ODS_CUSTOMER_PROFILE as nullable.
Backfill from CORE.DIM_CUSTOMER on CUSTOMER_ID where IS_CURRENT=true.
Add a non-unique index on (CUSTOMER_ID, LOYALTY_TIER).
A human data engineer would typically need to answer several questions before writing production SQL:
- What table and schema are being changed?
- What exact column, type, and nullability were requested?
- Is historical backfill required?
- Does the request imply an index?
- Which downstream layers need the field propagated?
- What risks should reviewers look for?
- What checks should be run before and after deployment?
- What rollback steps are reasonable?
- Does the generated SQL include the required elements?
SchemaFlow implements this as a staged agent workflow. Each stage creates a typed intermediate output that the next stage consumes. Deterministic checks then validate the outputs before the notebook saves the final bundle and optionally runs evals.
Workflow Overview
At a high level, SchemaFlow follows this sequence:

The notebook is organized so readers can run the core workflow first and then decide whether they want to run the optional Promptfoo evaluation section.
Table of Contents
Conceptual Guide
- Overview
- Why This Matters
- Key Benefits
- What You'll Build
- Introduction: Use Case and Solution
- Workflow Overview
- Architecture - Design Patterns
- System Design
- Execution Workflow
Notebook Implementation
- Environment Setup
- Input
- Optional PDF RAG Context
- Stages 1-2: Parse Change Request + Impact Analysis
- Stages 3-4: Execution Plan + SQL Generation
- Stage 5: Lightweight SQL Sanity Checks
- Final Bundle
- Save Artifact
- Optional Cleanup
- Evaluate the Flow with Promptfoo
Reference
Architecture - Design Patterns
SchemaFlow uses a staged, contract-driven agent architecture. The goal is to avoid treating the model as a single black-box SQL generator. Instead, each stage has a narrow responsibility and produces an output that can be inspected, validated, traced, and reused.
1. Agent Specialization
Each agent performs one primary task:
| Agent | Responsibility | Main Output |
|---|---|---|
| Parse Agent | Extract structured fields from the natural-language request | change_json |
| Impact Agent | Identify affected objects, assumptions, and risks | impact_json |
| Plan Agent | Convert the change and impact into rollout steps | plan_json |
| SQL Agent | Draft SQL across data platform layers | sql_text |
This specialization makes the workflow easier to debug. If SQL is missing a column, you can inspect whether the issue started in parsing, impact analysis, planning, or SQL generation.
2. Typed Output Contracts
The notebook defines Pydantic models for the structured stages:
ChangeRequestModelImpactModelPlanModel
Those models are wrapped with AgentOutputSchema so the Agents SDK knows the expected output shape. The workflow also normalizes outputs after model calls to ensure expected keys exist before downstream stages run.
3. Retrieval-Augmented Impact Analysis
The PDF RAG section is optional. When PDF_PATH is set, the notebook:
- Creates an OpenAI vector store.
- Uploads the PDF.
- Lets OpenAI parse, chunk, embed, and index it.
- Gives the Impact Agent a
FileSearchTool. - Captures a summary of returned File Search results.
This is useful when the change request needs grounding in an IFD, schema document, lineage file, data contract, or architecture reference.
4. Guardrail Gates Between Stages
The notebook adds deterministic checks after major stages:
- Stages 1-2 guardrails validate parse and impact outputs.
- Stages 3-4 guardrails validate plan completeness, data type propagation, and nullability handling.
- Stage 5 SQL checks validate expected table, column, and SQL keyword presence.
- Post-artifact checks verify the saved JSON artifact exists and round-trips.
- Pre-Promptfoo checks verify the notebook state is ready for evals.
These checks do not replace human review, but they catch common silent failures early.
5. Artifact-Centered Execution
The final bundle is the main workflow artifact. It captures the state needed to review or debug the run:
bundle = {
"summary": ...,
"rag": ...,
"change_json": ...,
"impact_json": ...,
"plan": ...,
"sql": ...,
"validation": ...
}
The notebook saves this bundle under artifacts/notebook_runs/.
6. Eval Runtime Generated from Notebook State
Promptfoo runs in a separate process, so it cannot directly read variables from the active notebook kernel. To solve this, Section 10 writes a small reusable Python module and Promptfoo runtime files from the current notebook state.
This ensures that prompt edits, CHANGE_TEXT edits, and model configuration changes are reflected when the eval files are regenerated.
System Design
Component Architecture

Primary Runtime Objects
| Object | Created in | Purpose |
|---|---|---|
CHANGE_TEXT |
Input section | The natural-language database change request |
change_json |
Stage 1 | Structured interpretation of the request |
rag_vector_store_id |
Optional PDF RAG section | Hosted vector store ID for uploaded PDF context |
rag_file_search_results |
Stage 2 | Summary of File Search results returned to the Impact Agent |
impact_json |
Stage 2 | Impacted objects, risks, and assumptions |
plan_json |
Stage 3 | Rollout plan, checks, and rollback guidance |
sql_text |
Stage 4 | Draft SQL script |
validation |
Stage 5 | Deterministic SQL sanity-check result |
bundle |
Final Bundle section | Consolidated workflow output |
out_path |
Save Artifact section | Saved JSON artifact path |
promptfoo_config |
Promptfoo section | Generated eval configuration |
Important Boundary
SchemaFlow generates draft implementation artifacts. It does not execute SQL against a database, apply migrations, open pull requests, or modify production systems.
Execution Workflow
Run the notebook in order.
Core Workflow
Environment Setup
- Imports dependencies.
- Verifies the OpenAI Agents SDK version.
- Reads
OPENAI_API_KEY. - Configures tracing and model selection.
Input
- Defines
CHANGE_TEXT. - This is the only required business input for the core workflow.
- Defines
Optional PDF RAG Context
- Leave
PDF_PATH = Noneto run without retrieval. - Set
PDF_PATHto a local PDF to enable File Search context for impact analysis.
- Leave
Stages 1-2
- Parse the change request.
- Analyze impact.
- Optionally use File Search during impact analysis.
Stages 1-2 Guardrails
- Confirm parse output is well-formed.
- Confirm impact output includes the target.
- Confirm impacted objects contain required fields.
Stages 3-4
- Generate an execution plan.
- Generate SQL across landing, staging, core, and mart layers.
Stages 3-4 Guardrails
- Confirm plan sections are populated.
- Confirm data type propagation.
- Confirm nullability behavior matches the request.
Stage 5 SQL Sanity Checks
- Check for empty SQL.
- Check expected target table and columns.
- Check required SQL actions implied by the request.
Final Bundle and Artifact
- Assemble the full output bundle.
- Save it as JSON.
- Verify the artifact round-trips successfully.
Optional Eval Workflow
Pre-Promptfoo Checks
- Confirm the notebook state is ready for evals.
Promptfoo Runtime Generation
- Create a reusable SchemaFlow core module.
- Write a Promptfoo provider.
- Write a Promptfoo assertion file.
- Generate Promptfoo test cases and config.
Promptfoo Eval Execution
- Run parse-only and full-flow evals.
- Save timestamped JSON and HTML reports.
- Refresh
schemaflow_cookbook_eval_latest.*aliases.
1) Environment Setup
This section prepares the runtime for the SchemaFlow workflow.
The setup cell does the following:
- Imports standard Python utilities used throughout the notebook.
- Imports the OpenAI client.
- Imports the OpenAI Agents SDK primitives:
AgentRunnerRunConfigAgentOutputSchemaFileSearchTool- tracing and span helpers
- Verifies that the installed
openai-agentspackage meets the minimum required version. - Reads
OPENAI_API_KEYfrom the environment or prompts for it. - Sets the model with
OPENAI_MODEL, defaulting togpt-5.5. - Creates a trace group ID so all related agent runs and guardrail spans can be grouped together.
The workflow intentionally enables sensitive trace payloads for this demo so prompts, outputs, eval bundles, and tool data are visible in traces. For production usage, review this setting before handling private data.
2) Input
This section defines the database change request that SchemaFlow will process. Think of it as the compact ticket, issue, or message a data team might receive before turning the request into implementation details.
The default request asks the workflow to:
- Add
LOYALTY_TIER VARCHAR(20)toODS.ODS_CUSTOMER_PROFILE. - Treat the new column as nullable.
- Backfill from
CORE.DIM_CUSTOMER. - Join on
CUSTOMER_ID. - Filter the source to current records with
IS_CURRENT=true. - Add a non-unique index on
(CUSTOMER_ID, LOYALTY_TIER).
This input is intentionally compact but rich enough to exercise the full workflow:
- parsing target schema and table
- extracting column name, type, and nullability
- recognizing backfill requirements
- recognizing index requirements
- generating multi-layer SQL
- running validation checks for expected table, column, and SQL actions
3) Optional PDF RAG Context
SchemaFlow can run with or without retrieval context, so readers can start with the request alone and add reference docs only when the change needs them.
The sample PDF path in the code cell below points to a file included in the cookbook folder under data/, not to bytes embedded inside the notebook. Leave PDF_PATH = None for static article previews or generic runs.
With the default PDF_PATH = None, the notebook uses only the natural-language change request. This is enough to demonstrate the core staged workflow.
Set PDF_PATH to a local PDF when you want the Impact Agent to ground its analysis in reference material, such as:
- interface design documents
- schema specifications
- lineage documentation
- data contracts
- platform architecture notes
- downstream dependency documentation
When a PDF is configured, this section:
- Validates that the file exists and is a PDF.
- Creates an OpenAI vector store with a one-day expiration policy.
- Uploads the PDF to the vector store.
- Lets OpenAI handle parsing, chunking, embedding, and retrieval.
- Stores the vector store ID for the Impact Agent.
- Later summarizes any File Search results returned during impact analysis.
This keeps the cookbook lightweight because it does not require local embedding models, Chroma, Neo4j, LangGraph, or project-specific Python modules.
4) Stages 1-2 - Parse Change Request + Impact Analysis
This section runs the first two agent stages back to back. Together, they answer two practical questions: what exactly was requested, and what else could be affected?
Stage 1: Parse Change Request
The Parse Agent converts CHANGE_TEXT into a structured change_json object.
Expected fields include:
titledomaintarget_schematarget_tableoperationsnotes
This stage creates the normalized contract that every downstream stage consumes. If the parse step misses the target table, column, data type, nullability, backfill, or index intent, later stages may produce incomplete output. That is why the notebook validates this stage immediately afterward.
Stage 2: Impact Analysis
The Impact Agent consumes change_json and produces impact_json.
Expected fields include:
impacted_objectsrisksassumptions
If PDF_PATH was configured earlier, the Impact Agent also receives a FileSearchTool connected to the uploaded PDF vector store. This lets the model search reference documentation before returning impact claims.
The output is intentionally conservative. When the agent is uncertain, it should call out assumptions and risks instead of inventing undocumented certainty.
Impact Dashboard Preview
The impact-analysis stage produces structured impact_json that can be visualized as a graph of affected objects and relationships.
The preview below shows the kind of customer loyalty lineage graph built in the optional Neo4j dashboard section later in the notebook. Run that section to generate the local graph UI from the sample knowledge-graph seed and inspect impacted objects interactively.

Stages 1-2 Output Guardrails
This guardrail cell performs deterministic checks on the Parse and Impact outputs before the workflow continues.
The checks verify that:
change_jsoncontains a target schema.change_jsoncontains a target table.change_json.operationsis a non-empty list.impact_json.impacted_objectscontains at least one object.- The impact output references the parsed target table.
- Each impacted object has basic required fields such as type, name, and reason.
These checks are deliberately lightweight. They do not prove that the analysis is complete, but they catch obvious failure modes before the Plan Agent or SQL Agent consumes malformed or incomplete state.
5) Stages 3-4 - Execution Plan + SQL Generation
This section runs the implementation-planning and SQL-generation stages. At this point the workflow shifts from understanding the request to drafting an implementation handoff.
Stage 3: Execution Plan
The Plan Agent consumes:
change_jsonimpact_json
It returns plan_json with four sections:
plan_stepsprecheckspostchecksrollback
The goal is to make the implementation strategy explicit before generating SQL. This helps separate “what should be done” from “what exact SQL should be drafted.”
Stage 4: SQL Generation
The SQL Agent consumes:
change_jsonplan_json
It returns a single plaintext SQL script. The prompt requires four sections in order:
-- === LANDING (ODS) ===-- === STAGING (STG) ===-- === CORE (DIM/FACT/VIEW) ===-- === MARTS (SERVING) ===
The generated SQL is intended as a reviewable draft. It should be checked by engineers before any production use.
Stages 3-4 Output Guardrails
This guardrail cell validates the plan and SQL draft before the notebook moves to the final SQL sanity checks.
The checks verify that:
- all four plan sections are populated:
plan_stepsprecheckspostchecksrollback
- the data type requested in
CHANGE_TEXTappears in the generated SQL - nullable requests do not accidentally create
NOT NULLconstraints - explicit
NOT NULLrequests are reflected when present
These checks complement Stage 5. Stages 3-4 guardrails focus on plan completeness and semantic consistency, while Stage 5 focuses on expected SQL terms and actions.
6) Stage 5 - Lightweight SQL Sanity Checks
This section runs deterministic checks against the generated SQL for the current notebook run.
This is not a full SQL parser and it does not execute the SQL. Instead, it checks for obvious mismatches between the original request, parsed change object, and generated script. These checks are intentionally small and explainable, so a reader can see exactly what passed or failed before the result is saved or evaluated.
The checks look for:
- empty SQL output
- missing target table
- missing expected columns
- required SQL keywords inferred from the request:
ALTER TABLEUPDATEwhen the request implies backfill or source-based populationCREATE INDEXwhen the request mentions an index
The output is stored in validation, which becomes part of the final bundle and is also used by the Promptfoo full-flow assertion.
7) Final Bundle
This section assembles the main SchemaFlow output object.
The final bundle contains:
summaryragchange_jsonimpact_jsonplansqlvalidation
This object is the reviewable handoff artifact for the notebook run. It collects the model-generated outputs, deterministic validation results, and optional retrieval metadata in one place, so a reviewer does not have to reconstruct the flow from separate cells.
The printed summary gives a compact view of the most important run-level information:
- parsed title
- parsed target
- number of RAG hits
- number of plan steps
- validation status
- validation issues
8) Save Artifact
This section writes the final bundle to disk as JSON.
Artifacts are saved under:
artifacts/notebook_runs/
Each run receives a timestamped filename, which makes it easy to compare outputs across different prompts, models, inputs, or retrieval documents.
The saved artifact is useful for:
- code review
- audit trails
- debugging
- regression comparison
- eval fixture creation
- downstream automation
Post-Artifact Generation Sanity Check
This cell verifies that the saved artifact is usable.
It checks that:
- the artifact file exists
- the artifact file is non-empty
- the file can be loaded with
json.loads - the top-level keys on disk match the in-memory
bundle
This catches file-write issues immediately instead of letting a later review, eval, or automation step consume a missing or malformed artifact.
9) Optional Cleanup
This section handles cleanup for the optional PDF vector store.
By default, DELETE_VECTOR_STORE_AFTER_RUN = False.
That default is safe for interactive notebook usage because the vector store is created with a one-day expiration policy. Keeping it temporarily can be useful if you want to inspect traces, rerun downstream stages, or debug File Search behavior.
Set DELETE_VECTOR_STORE_AFTER_RUN = True before running this cell if you want to delete the vector store immediately after the notebook run.
If no PDF was configured, this cell simply reports that no vector store was created.
Pre-Promptfoo Checks / Guardrails
This cell is the readiness gate before running Promptfoo.
Promptfoo runs the workflow in a separate process, so it is important to confirm that the notebook state is complete and internally consistent before generating eval files.
The preflight checks verify that:
bundleexists in the notebook kernel.bundlereflects the currentchange_jsonandplan_json.- Stage 5 validation passed.
- Stages 1-2 guardrails passed.
- Stages 3-4 guardrails passed.
- The saved artifact sanity check passed.
CHANGE_TEXTis consistent with the parsed bundle target.OPENAI_API_KEYis present.- The installed Agents SDK version meets the minimum requirement.
If this section reports failures, rerun or fix the earlier notebook sections before running Promptfoo.
10) Evaluate the Flow with Promptfoo
Promptfoo is now part of OpenAI. This section uses Promptfoo's Jupyter/Colab pattern to run evals from notebook cells while keeping the SchemaFlow logic readable in Python. Promptfoo itself runs via Node.js, and the evaluated flow is provided through Promptfoo's Python file:// provider and Python assertion integrations.
This optional section turns the notebook workflow into a repeatable eval.
The core notebook run validates one live example. Promptfoo adds a reusable eval harness that can run parse-only and full-flow checks using generated provider and assertion files, which is useful when you want to keep the same workflow stable as prompts, models, or inputs change.
Because Promptfoo launches a separate Python process, it cannot directly access variables that only exist inside the active notebook kernel. To solve that, the next cells publish runtime files from the current notebook state:
- a reusable
schemaflow_cookbook_core.pymodule - a Python Promptfoo provider
- a Python Promptfoo assertion file
- generated eval cases
- a generated Promptfoo config
This section includes three validation layers:
Input preflight
- deterministic checks before writing the config
- no model calls
Parse-only eval
- checks Stage 1 behavior
- verifies target, operation presence, expected added column, and expected data type
Full-flow eval
- checks downstream impact, SQL terms, and validation status
Eval results are printed in the notebook and exported as timestamped JSON and HTML files under:
artifacts/promptfoo/results/
The latest successful run also refreshes:
schemaflow_cookbook_eval_latest.json
schemaflow_cookbook_eval_latest.html
Runtime note: the core SchemaFlow cells require Python and an OpenAI API key. The Promptfoo cells additionally require Node.js and npm in the same executable notebook runtime.
After the eval runs, Promptfoo provides a compact view of the current change request, expected fields, parse-only check, and full-flow check.
Use this view to answer questions such as:
- Did the Parse Agent extract the expected target table?
- Did it detect the expected added column?
- Did it preserve the requested data type?
- Did the full flow produce impact risks?
- Did the SQL include required terms?
- Did deterministic validation pass?

Promptfoo Runtime Directory Setup
This cell creates notebook-local directories for Promptfoo config, logs, cache, npm cache, and results.
Keeping these directories under artifacts/promptfoo/ makes the eval runtime portable and avoids relying on global Promptfoo state under the user’s home directory.
The cell also exports environment variables so the generated provider, assertion, and Promptfoo command all use the same trace group and local runtime paths.
Node.js and npm Runtime Check
Promptfoo runs through Node.js, even though the SchemaFlow provider and assertion logic are written in Python.
This cell verifies that the notebook runtime has a supported node and npm available.
The check is intentionally explicit. The notebook does not silently install or upgrade Node because that depends on the execution environment.
For local macOS notebooks, the cell prefers a supported nvm Node runtime before common Homebrew paths. This helps ensure that the notebook and terminal use the same Node ABI and avoids stale native dependencies.
If this check fails, fix the runtime first and then rerun the Promptfoo section.
Publish SchemaFlow Core Runtime
Promptfoo runs the evaluated flow in a separate Python process. This cell writes a reusable Python module named:
artifacts/promptfoo/schemaflow_cookbook_core.py
The generated module contains the same core SchemaFlow logic used by the notebook:
- Pydantic models
- Agents SDK setup
- output normalization helpers
- Parse Agent execution
- Impact Agent execution
- optional PDF vector store creation
- Plan Agent execution
- SQL Agent execution
- SQL validation
- parse-only eval entrypoint
- full-flow eval entrypoint
The prompt strings are injected from the current notebook variables. That means if you edit the Parse, Impact, Plan, or SQL prompts above and rerun this cell, the Promptfoo runtime receives the updated prompts.
Promptfoo Provider Runtime
This cell writes the Promptfoo provider file:
artifacts/promptfoo/schemaflow_cookbook_eval_provider.py
The provider is the bridge between Promptfoo and SchemaFlow.
For each Promptfoo test case, it reads variables such as:
change_texteval_mode- optional
pdf_path - optional
rag_max_results - validation keywords
Then it chooses one of two execution paths:
parse_onlyruns only Stage 1 and returns a parse bundle.full_flowruns the complete SchemaFlow pipeline and returns the full bundle.
The provider returns JSON so Promptfoo assertions can inspect structured fields instead of parsing notebook text output.
Promptfoo Assertion Runtime
This cell writes the Promptfoo assertion file:
artifacts/promptfoo/schemaflow_cookbook_eval_assert.py
The assertion file validates provider output for both eval modes.
For parse_only, it checks:
- output is valid JSON
- target schema and table match expectations
- at least one parsed operation is present
- expected added column appears in parsed operations
- expected data type appears structurally in parsed operations
For full_flow, it checks:
- output is valid JSON
- target schema and table match expectations
- at least one parsed operation is present
- impact risks are present
- required SQL terms are present
- validation passed
The assertion also emits guardrail spans so eval failures are visible in traces.
Build Promptfoo Test Cases and Config
This cell builds Promptfoo test cases from the current notebook input.
By default, it creates two test cases from the current CHANGE_TEXT, carrying through PDF_PATH when a PDF is configured:
- a parse-only test
- a full-flow test
The helper functions infer expectations from the change request, including:
- expected schema
- expected table
- expected added column
- expected data type
- expected SQL terms
- expected validation keywords
The cell also includes optional regression fixtures. Set:
RUN_EXTRA_REGRESSION_CASES = True
to add those extra cases to the generated config.
Before writing promptfooconfig.yaml, the cell runs deterministic input preflight checks. This prevents obviously malformed eval inputs from producing confusing Promptfoo failures.
Run Promptfoo Eval
This cell runs Promptfoo non-interactively from the notebook.
The command:
- runs from
artifacts/promptfoo/ - uses the generated
promptfooconfig.yaml - uses notebook-local Promptfoo config, cache, logs, and npm cache
- runs with concurrency
1for predictable notebook behavior - keeps CLI output visible in the notebook
- writes timestamped JSON and HTML reports
- refreshes latest-result aliases after a successful run
The exported result files are saved under:
artifacts/promptfoo/results/
If the Node.js/npm runtime check failed earlier, fix the runtime before running this cell.
Review Latest Promptfoo Results
This cell checks whether the latest Promptfoo result aliases exist and prints their paths and sizes.
Expected files:
artifacts/promptfoo/results/schemaflow_cookbook_eval_latest.json
artifacts/promptfoo/results/schemaflow_cookbook_eval_latest.html
If the latest JSON file exists, the cell also prints available eval metadata such as the eval ID and aggregate stats.
Use this section as a quick confirmation that the eval completed and exported artifacts successfully.
11) Optional Neo4j Knowledge Graph & Dashboard
This optional section is fully self-contained and does not affect the core pipeline above. It uses a small synthetic customer-loyalty graph seed plus inline dashboard code so the cookbook stays portable. Readers can treat it as a visual appendix: the core workflow works without Neo4j, but graph views make lineage and downstream impact easier to inspect.
What this section does, in order:
- Step 1 - Seed: define a synthetic customer-loyalty graph with ODS, staging, core, mart, and CRM objects, their columns, lineage, and joins as an inline Python data structure - no external files.
- Step 2 - AI Enrichment: use the OpenAI
clientalready loaded in Section 1 to fill insemantic_meaning(a short 2-5 word tag likenatural-key,foreign-key,monetary-amount,timestamp) for every column. - Step 3 - Upsert to Neo4j: write the enriched data to a running Neo4j instance via idempotent
MERGECypher. Nodes are labeledSchemaFlowCookbookso the dashboard ignores stale sample data from older local runs. - Dashboard: write a small FastAPI server + D3.js page next to the notebook, launch it on
http://127.0.0.1:8005, and print a clickable link.
Prerequisites for this section:
- A running Neo4j instance, e.g. via Docker:
docker run -d -p 7687:7687 -p 7474:7474 -e NEO4J_AUTH=neo4j/change-me-please neo4j:5(Neo4j Desktop or AuraDB free tier also work). NEO4J_URI,NEO4J_USER,NEO4J_PASSWORD(loaded via env, or entered at the prompt below).- A free local port
8005for the dashboard (override viaNEO4J_DASHBOARD_PORT). - Optional packages
neo4j,fastapi,uvicorn- the next cell will install them lazily if missing.
If any prerequisite is missing, every cell below short-circuits with a clear message; nothing throws and the rest of the notebook is unaffected.
11.1) Environment Setup & Optional Dependencies
Mirrors Section 1's OpenAI env loading pattern. Lazy-installs neo4j, fastapi, and uvicorn only when they are not importable, then reads NEO4J_URI, NEO4J_USER, and NEO4J_PASSWORD from the environment (or prompts via getpass if missing). If you press Enter at any prompt without typing, the section is disabled (NEO4J_SECTION_ENABLED = False) and the remaining cells skip cleanly.
11.2) Step 1 - Seed: Define the Knowledge Graph Data
Builds the in-memory data structure for the graph: schemas, tables (with description, primary_key), columns (with type, nullable, is_primary_key, optional description, and a semantic_meaning placeholder to be filled by AI in the next step), foreign keys, views, lineage edges (DERIVED_FROM), and joins.
This inline synthetic retail graph is aligned to the cookbook change request: LOYALTY_TIER is added to ODS.ODS_CUSTOMER_PROFILE, sourced from CORE.DIM_CUSTOMER, and propagated into downstream staging, core, mart, and CRM consumers.
No Neo4j calls here - this cell only prepares Python data structures. Nothing is written until Step 3.
11.3) Step 2 - AI Enrichment
Uses the OpenAI client and MODEL already initialized in Section 1 to generate a short semantic_meaning tag (2-5 words, e.g. natural-key, foreign-key, monetary-amount, timestamp, descriptive-text) for every column whose value is currently None. One LLM call per column, plain-text response. The prompt is defined inline so this cell remains self-contained.
Cost control: capped at MAX_ENRICH_COLS = 30 columns per run (override via env SEED_AI_ENRICH_LIMIT). Set SEED_AI_ENRICH=0 to skip enrichment entirely. Per-column failures are caught and logged; the cell never raises. Skipped entirely if Section 11 was disabled in Step 0.
11.4) Step 3 - Upsert the Enriched Graph to Neo4j
Opens a Neo4j driver using the env vars loaded in Step 0, creates uniqueness constraints, then performs idempotent MERGE upserts for:
- Schemas (
Schemanodes with:CONTAINSedges to Tables/Views) - Tables with
description - Columns with
type,nullable,is_primary_key,description,semantic_meaning - Foreign Keys as
Column-[:FK_TO]->Column - Views with
description - Lineage as
(Table|View)-[:DERIVED_FROM]->(Table|View) - Joins as
(Table|View)-[:JOINS]->(Table|View)
The Cypher is intentionally written inline so this cell remains self-contained. The MERGE pattern matches the inline seed structures above, and every cookbook node receives a SchemaFlowCookbook label so re-running refreshes this sample graph without exposing older local seed data in the dashboard.
The whole operation is wrapped in try/except - if Neo4j is unreachable, the cell prints a clear diagnostic and returns without raising.
11.5) Launch the Local Dashboard
The next six cells set up and launch a small FastAPI + D3.js dashboard pointed at your Neo4j instance. Files live in _neo4j_dashboard/ next to this notebook so you can read and edit them in place - no base64, no encoding tricks.
- Prep cell:
mkdir _neo4j_dashboard/and pick the dashboard port (default8005, override viaNEO4J_DASHBOARD_PORT). %%writefile _neo4j_dashboard/graph_server.py: FastAPI backend - exposesGET /api/knowledge-graph(read),POST /api/update-column-semantic-meaning(write),POST /api/update-relationship,POST /api/delete-relationship.%%writefile _neo4j_dashboard/index.html: DOM skeleton with the stats bar, SVG graph container, sidebar, and AI Semantic Insights panel (with Edit / Save buttons).%%writefile _neo4j_dashboard/style.css: All visual styling.%%writefile _neo4j_dashboard/script.js: D3 force simulation, click handlers, fetch calls. The Edit button turns Semantic Meaning values into inputs; Save POSTs each one to the backend, whichMATCH ... SET c.semantic_meaning = ...against Neo4j.- Launch cell: starts
uvicorn graph_server:appviasubprocess.Popenon127.0.0.1:8005, pollsGET /api/knowledge-graphfor up to 20 s, then prints a clickable Markdown link only when the graph API is ready, plus the server PID.
These cells generate a small local dashboard app next to the notebook, using semantic_meaning as the editable column insight field. Open any generated file in the notebook workspace to tweak colors, add endpoints, etc. - changes take effect the next time you re-run the writefile and launch cells. Re-running the launch cell restarts the prior dashboard process from the same kernel so the browser is not left on stale graph data.
To stop the dashboard later, run kill <pid> (the PID is printed by the launch cell) - or simply restart the Jupyter kernel.
Notes, Assumptions, and Extension Points
What This Cookbook Is Designed For
This cookbook is a compact, self-contained demonstration of an agentic database-change workflow. It is meant to show how to combine:
- staged agent orchestration
- typed intermediate outputs
- optional retrieval context
- deterministic guardrails
- trace instrumentation
- artifact generation
- Promptfoo evals
The workflow is intentionally portable and does not depend on external project-specific modules, local vector databases, graph databases, or a live data warehouse connection. The retail example is deliberately concrete, but the same staged pattern can be adapted to other domains where schema changes need analysis, implementation planning, and review.
What This Cookbook Does Not Do
This notebook does not:
- execute SQL against a database
- inspect a live schema catalog
- open pull requests
- modify production infrastructure
- validate SQL with a database parser
- enforce organization-specific migration standards
- replace human code review
The generated SQL should be treated as a draft for review.
Tracing and Sensitive Data
The notebook defaults to redacted traces for publication hygiene. Set OPENAI_AGENTS_TRACE_INCLUDE_SENSITIVE_DATA=true only for non-sensitive demo runs where retaining prompt and output payloads is intentional.
Before adapting this for sensitive enterprise data, review:
OPENAI_AGENTS_TRACE_INCLUDE_SENSITIVE_DATA- API key handling
- PDF upload policy
- data retention expectations
- access controls around generated artifacts
Optional PDF RAG
The optional RAG path uses OpenAI Vector Stores and the Agents SDK FileSearchTool.
This is useful when impact analysis should be grounded in a reference PDF such as:
- an interface design document
- an implementation field definition
- schema documentation
- lineage documentation
- downstream dependency notes
The vector store is configured to expire automatically after one day of inactivity. You can also delete it immediately in the cleanup section.
Promptfoo Eval Path
The Promptfoo section generates runtime files under:
artifacts/promptfoo/
The generated core module injects the current notebook prompt strings, so prompt edits and CHANGE_TEXT edits are reflected after rerunning the generation cells.
By default, the eval flow includes:
- deterministic input preflight
- one parse-only eval for the current request
- one full-flow eval for the current request
Set RUN_EXTRA_REGRESSION_CASES = True to include the optional regression fixtures.
Suggested Production Extensions
For a production-grade implementation, consider adding:
- schema catalog lookup
- database-specific SQL validation
- SQL formatting and linting
- migration framework integration
- pull request creation
- data lineage graph integration
- approval workflows
- policy checks for destructive changes
- richer eval suites
- golden test fixtures
- organization-specific SQL templates
- environment-specific deployment plans
Recommended Review Checklist
Before using generated output for real implementation, review:
- parsed target schema and table
- parsed operations
- data type and nullability
- backfill logic
- index strategy
- downstream propagation assumptions
- prechecks and postchecks
- rollback feasibility
- generated SQL dialect compatibility
- validation issues
- trace output
- saved artifact contents
- Promptfoo eval results
Contributors
This cookbook serves as a joint collaboration between OpenAI and Altimetrik.