Skill

Build AI-agent collaboration infrastructure for codebases

Design and deploy Agent Harness infrastructure-ECL docs, change tracking, linters, CI gates-so AI agents work reliably in production codebases.

Works with githubmakefile

57
Spark score
out of 100
Updated 8 days ago
Version 1.0.0

Add to Favorites

Why it matters

Teams hire this skill to transform repositories into reliable AI-agent workspaces by creating the harness infrastructure-AGENTS.md, ECL lifecycle docs, change templates, linters, and CI gates-that turns ad-hoc agent demos into production-grade collaboration systems with mechanical validation and auto-evolution tracking.

Outcomes

What it gets done

01

Detect harness gaps and classify project state (empty, code-only, partial, or complete)

02

Generate ECL documentation, change templates, and harness-change scripts for agent workflow tracking

03

Create lint rules and CI integration that validate agent changes before merge

04

Audit existing harness infrastructure and recommend auto-evolve improvements with rollback discipline

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/ag-ecl-harness-engineer | bash

Capabilities

What this skill does

Generate code

Writes source code or scripts from a description.

Review code

Analyzes code for bugs, style issues, and improvements.

Deploy / CI

Runs build pipelines, tests, and deploys to environments.

Audit access

Reviews permissions and logs to flag unauthorized activity.

Write tests

Creates unit, integration, or end-to-end test cases.

Overview

ECL Harness Engineer

What it does

Harness infrastructure builder

How it connects

When your repository needs AI-agent collaboration infrastructure or when auditing an existing Agent Harness for missing ECL lifecycle docs, change templates, lint checks, environment contracts, or CI integration

Source README

ECL Harness Engineer

Design and create Harness Engineering infrastructure so AI agents can work reliably in a codebase.

Core Philosophy: "Intelligence without infrastructure is just a demo." The Agent Harness is the Operating System - the LLM is just the CPU. The repository becomes the single source of truth - if an agent can't see it in context, it doesn't exist.

When to Use This Skill

  • Use when a repository needs AI-agent collaboration infrastructure such as AGENTS.md, docs/ECL.md, docs/STATUS.md, harness change tracking, or mechanical validation gates.
  • Use when auditing an existing Agent Harness for missing ECL lifecycle docs, change templates, lint checks, environment contracts, or CI integration.
  • Use when converting repeated agent workflow failures into repository-local documentation, tests, lint rules, or lightweight auto-evolution checks.
  • Do not use for ordinary business feature implementation unless the requested work is specifically about creating or improving the repository harness.

Limitations

  • This skill creates or audits harness infrastructure; it does not replace product requirements, implementation planning, code review, or release approval for the target project.
  • The generated ECL docs, linters, scripts, and CI examples must be adapted to the repository's actual stack, security model, and existing contributor workflow before enforcement.
  • Auto-evolve recommendations are guidance only. Apply harness changes through normal review, validation, and rollback discipline instead of accepting them as autonomous policy changes.

Unified Workflow

This skill follows a single unified workflow regardless of project state (empty, existing code, or existing harness). The core idea: detect the gap between current state and target state, then fill it.

Default to a core ECL harness. Core includes lightweight auto-evolve threshold checking:
closed changes are counted, a pending evolution note is generated when the threshold is reached,
and Codex applies harness improvements only through evidence, validation, scoring, and rollback.
Advanced agent-platform capabilities such as eval datasets, execution traces, durable state,
checkpoints, long-term memory, and metrics remain optional profiles only when the user explicitly
asks for agent evaluation, observability, resumable execution, or long-term memory.

This skill improves the target repository's agent harness. It does not implement ordinary
business features, replace the coding agent's plan mode, or create a separate requirements product.
Plan mode is useful for live discussion; ECL artifacts are the repository record that later agents,
linters, CI, and archive history can inspect.

  1. Quick Detection + Intent Confirmation - what exists, what already passes, and what the user wants.
  2. Analysis - architecture, harness state, environment, and project identity.
  3. Intake Review + Delta Synthesis - classify small vs structured work, support requirement-first
    and plan-first inputs, and compute exactly what to create or update.
  4. Creation/Update - docs, status handoff, linters, ECL/change scripts, environment config, and CI.
  5. Verification + Handoff - run checks, attribute failures, update STATUS.md, trigger auto-evolve checks, and summarize results.

Phase 1: Quick Detection + Intent Confirmation

Goal: In under 5 minutes, understand project state and user intent.

1.1 Project State Detection

Run this quick scan:

# Count files
file_count=$(find . -type f ! -path './.git/*' ! -path './node_modules/*' ! -path './vendor/*' 2>/dev/null | wc -l)
code_files=$(find . -type f \( -name "*.go" -o -name "*.ts" -o -name "*.js" -o -name "*.py" -o -name "*.rs" \) ! -path './.git/*' ! -path './node_modules/*' ! -path './vendor/*' 2>/dev/null | wc -l)

# Check harness components
has_agents_md=$(test -f AGENTS.md && echo "yes" || echo "no")
has_architecture=$(test -f docs/ARCHITECTURE.md && echo "yes" || echo "no")
has_linters=$(ls scripts/lint-* 2>/dev/null | wc -l)
has_harness_dir=$(test -d harness && echo "yes" || echo "no")
has_ecl_doc=$(test -f docs/ECL.md && echo "yes" || echo "no")
has_changes_dir=$(test -d harness/changes && echo "yes" || echo "no")
has_change_templates=$(test -d harness/templates/change && echo "yes" || echo "no")
has_change_script=$(ls scripts/harness-change.* 2>/dev/null | wc -l)
has_evolve_script=$(ls scripts/harness-evolve.* 2>/dev/null | wc -l)
has_ecl_lint=$(ls scripts/lint-ecl.* 2>/dev/null | wc -l)
has_encoding_lint=$(ls scripts/lint-encoding.* 2>/dev/null | wc -l)
has_makefile=$(test -f Makefile && echo "yes" || echo "no")
has_package_json=$(test -f package.json && echo "yes" || echo "no")

# Detect tech stack
if test -f go.mod; then TECH="Go"
elif test -f package.json; then TECH="TypeScript/Node.js"
elif test -f requirements.txt || test -f pyproject.toml; then TECH="Python"
else TECH="Unknown"
fi

1.2 Classify Project State

Based on detection:

State Criteria Action
Empty file_count < 5 AND code_files = 0 Guide user through project choices first
Code Only code_files > 0 AND has_agents_md = "no" Full analysis + core harness creation
Partial Harness has_agents_md = "yes" AND (has_linters = 0 OR has_harness_dir = "no") Gap analysis + fill gaps
Harness Present Core harness components exist Audit + improvement suggestions

Also classify ECL readiness:

ECL State Criteria Action
ECL Missing has_ecl_doc = "no" OR has_changes_dir = "no" Create ECL docs, change templates, and scripts
ECL Partial ECL doc exists but scripts/templates missing Fill ECL automation gaps
ECL Ready docs/ECL.md, harness/changes, templates, harness-change, harness-evolve, lint-ecl, lint-encoding exist Audit index freshness and workflow quality

1.3 Baseline Verification Snapshot

For existing projects, capture a best-effort baseline before creating or updating harness files.
The baseline is for attribution only: it distinguishes pre-existing project failures from
failures introduced by harness work. It must not be used to weaken default CI.

Run only commands that already exist in the project:

Ecosystem Baseline commands
TypeScript/Node.js package scripts such as lint, typecheck, test, build; include nested package build scripts when detected
Go go test ./..., go build ./..., existing make lint or make test
Python existing test/lint scripts, python -m compileall .

Record each command as pass, fail, or missing, with the short failure reason. If a command
fails before harness creation, report it later as pre-existing project debt, not as harness
failure. Default CI remains strict and should still include normal business gates unless the user
explicitly asks for a temporary staged rollout.

1.4 Intent Confirmation

Before planning changes, classify requested scope:

Scope Default? Includes
Core harness Yes AGENTS.md, docs/ECL.md, docs/STATUS.md, docs, ECL changes, lightweight auto-evolve, linters, environment contract, CI
Advanced harness No Core harness plus explicitly requested eval, trace, state, checkpoints, memory, or metrics
Documentation only No AGENTS.md and docs without linters, scripts, or CI

When a user-confirmation tool is available, confirm scope. In Codex, use request_user_input.
On other platforms, use the equivalent user-choice tool. If no such tool is available, use the
detected context and record assumptions.

{
  "question": "What's your priority for this harness setup?",
  "header": "Scope",
  "multiSelect": false,
  "options": [
    {
      "label": "Core harness (Recommended)",
      "description": "Project-first AGENTS.md, ECL changes, STATUS handoff, auto-evolve threshold checks, linters, environment contract, and strict CI"
    },
    {
      "label": "Advanced harness",
      "description": "Core harness plus explicitly requested eval, trace, memory, checkpoint, or metrics infrastructure"
    },
    {
      "label": "Documentation only",
      "description": "AGENTS.md and project docs only; skip linters, scripts, and CI for now"
    }
  ]
}

If Empty project, also ask for basics:

{
  "question": "What tech stack for this project?",
  "header": "Tech Stack",
  "multiSelect": false,
  "options": [
    {"label": "Go", "description": "CLI tools, high-performance services, system programming"},
    {"label": "TypeScript/Node.js", "description": "Web APIs, full-stack apps, rapid prototyping"},
    {"label": "Python", "description": "Data processing, ML/AI, scripting"}
  ]
}

If no user-confirmation tool is available, use detected values and document assumptions:

## Auto-Detected Context

| Field | Value | Confidence | Evidence |
|-------|-------|------------|----------|
| Tech Stack | {TECH} | High | Found {config file} |
| Project State | {state} | High | {criteria matched} |
| Scope | Core harness | Default | No user preference specified |

Proceeding with these assumptions. Tell me if any need adjustment.

1.5 ECL Work Intake Rules

When generating ECL guidance for a target project, keep the process small enough to use:

Intake type Criteria Required ECL handling
Small Change Local, low-risk edits such as copy, comments, style-only tweaks, or single-file bug fixes with no interface, data, permission, architecture, or release impact Active change optional; still record the verification command in the final response or existing task notes
Structured Change Cross-file/module behavior, APIs, data model, permissions, architecture, validation chain, unclear requirements, or work likely to exceed 20 minutes Use active change files and require intake/spec/plan review before implementation

Decision tree:

  1. If an active change already exists, keep using it; do not create a second active context.
  2. If the change is copy, comments, README text, formatting, or an obviously local single-file fix
    with no runtime, API, data, permission, architecture, or validation-chain impact, treat it as
    Small Change.
  3. If the change touches APIs, data, permissions, architecture, multiple modules, release/runtime
    behavior, or unclear requirements, treat it as Structured Change.
  4. If impact is unclear, do read-only investigation first. If uncertainty remains after inspection,
    ask one high-impact question or upgrade to Structured Change; do not assume Small Change.

For structured changes, support both common entry points:

  • Requirement-first input: extract target users/scenarios, evidence, success criteria,
    acceptance criteria, non-goals, constraints, assumptions, and risks into spec.md.
  • Plan-first input: treat the user's plan as a draft, split WHAT/WHY into spec.md and HOW into
    plan.md, then ask only about high-impact gaps that affect implementation direction or acceptance.
    If the plan is complete and does not conflict with repository evidence, do not repeat a full
    interview. If it conflicts with code, docs, commands, or existing harness constraints, record the
    conflict and return to Intake Review.

Questions are allowed and expected, but must be bounded: ask at most three high-impact questions per
round. Low-risk unknowns become assumptions; high-impact unknowns become
[NEEDS CLARIFICATION: ...] and block implementation until resolved.

For complex structured changes, use a lightweight iteration loop rather than treating the first
spec as final:

Draft Spec -> Draft Plan -> Review Gaps -> Revise Spec/Plan -> Gate -> Tasks

Default to at most two loops. If key gaps remain, continue up to five loops; after that, record a
blocker instead of implementing from guesses. plan.md must include any planning-discovered spec
gaps, because plans often expose missing acceptance, boundary, permission, data, or validation
requirements.


Phase 2: Analysis

Goal: Deeply understand codebase architecture, harness state, and environment requirements.

2.1 Execution Mode

Use subagents only when the user authorized delegation and the environment supports it. Otherwise, execute the same responsibilities inline.

If using subagents, assign:

  • Code architecture analysis: follow agents/analyzer.md; output harness/.analysis/architecture.json.
  • Harness state audit: follow agents/auditor.md; output harness/.analysis/audit.json.
  • Environment analysis: follow references/environment-detection-guide.md; output harness/.analysis/environment.json.

If working inline, produce the same three analysis artifacts or equivalent in-memory summaries before Phase 3.

2.2 Project Identity Extraction

For existing projects, extract target-project meaning before writing docs:

  • One-sentence project identity: what it does and for whom.
  • Core workflow or domain model: user/system flow, key entities, API resources, jobs, or commands.
  • Primary source entrypoints and where common changes belong.

Use README.md, manifests, entrypoints, routes/controllers, schemas/models, and key source
directories. Harness files are not sufficient evidence for project identity.

2.3 Adapter Selection

After detecting the tech stack, load the matching adapter before creating linters, scripts, CI,
or environment config. Adapter guidance overrides generic templates for language-specific details.

Detected stack Required adapter
TypeScript/Node.js references/adapters/typescript.md
Go references/adapters/go.md
Python references/adapters/python.md
Rust references/adapters/rust.md
Java references/adapters/java.md
Unknown/mixed references/adapters/generic.md plus any detected language adapters

For TypeScript/Node.js projects, prefer Node/TS-native outputs: scripts/lint-deps.mjs or
equivalent, scripts/lint-quality.mjs, npm/package-manager scripts, and Node/TS GitHub Actions.
Do not adapt Go linter or Makefile-only patterns to TypeScript unless the project is actually Go
or already uses Makefile as the primary command surface.

2.4 Command Surface Selection

Before creating ECL scripts, select the target project's command surface. Do not assume
PowerShell is the only Windows option. This selection is normally automatic; do not ask the user to
choose a script format unless project evidence conflicts or the user has already expressed a hard
constraint.

Priority:

  1. Existing project entrypoints: package-manager scripts, Makefile targets, README commands,
    or CI shell conventions.
  2. Explicit user/project constraints. If the project rejects .ps1, do not generate PowerShell
    as the only harness entrypoint.
  3. Bash profile when allowed. For Windows projects that accept Bash, generate .sh scripts and
    document the prerequisite: Git Bash, WSL, MSYS2, or a CI Linux runner.
  4. PowerShell profile when the project accepts Windows-native PowerShell. Keep it compatible with
    Windows PowerShell 5.1 and PowerShell 7.
  5. Node or Python profiles when those runtimes are already first-class project dependencies.

Default when evidence is sparse: for TypeScript/Node projects choose Node/package-manager scripts;
for Windows projects that allow Bash choose Bash profile and document Git Bash/WSL/MSYS2; otherwise
choose the adapter's native lightweight scripting profile.

All profiles must implement the same ECL invariants and command set. harness-change,
harness-evolve, lint-ecl, and lint-encoding may be implemented as .ps1, .sh, .mjs,
or .py, but docs, CI, Makefile/package scripts, and verification commands must use the chosen
entrypoint consistently.

2.5 Wait for Analysis Completion

When subagents are running, wait for their final reports. While waiting, you can:

  • Review any existing documentation
  • Prepare templates for Phase 4

2.5 For Empty Projects

Skip Phase 2 analysis agents. Instead:

  • Use templates from references/greenfield-templates.md
  • Base decisions on user's tech stack choice
  • Design a standard 3-layer architecture

Phase 3: Delta Synthesis

Goal: Merge analysis results and compute exactly what needs to be created/updated.

3.1 Read Analysis Results

cat harness/.analysis/architecture.json
cat harness/.analysis/audit.json
cat harness/.analysis/environment.json

3.2 Compute Delta

Create a delta list:

## Delta: What Needs to Be Done

### Core To Create (doesn't exist)
- [ ] AGENTS.md
- [ ] docs/ECL.md
- [ ] docs/STATUS.md
- [ ] docs/ARCHITECTURE.md
- [ ] scripts/lint-deps.go
- [ ] scripts/harness-change.{ps1|sh|mjs|py}
- [ ] scripts/harness-evolve.{ps1|sh|mjs|py}
- [ ] scripts/lint-ecl.{ps1|sh|mjs|py}
- [ ] scripts/lint-encoding.{ps1|sh|mjs|py}
- [ ] harness/changes/{active,parking,archive}
- [ ] harness/templates/change/
- [ ] harness/config/environment.json
- [ ] harness/evolution/{state.json,results.tsv,proposals/} (`pending.md` is generated later only when the archive threshold is reached)

### Optional Advanced (only if explicitly requested)
- [ ] harness/eval/ - agent evaluation datasets and runner inputs
- [ ] harness/trace/ - execution traces for agent runs
- [ ] harness/state/ - executor runtime state
- [ ] harness/checkpoints/ - resumable execution checkpoints
- [ ] harness/memory/ - long-term agent memory experiments
- [ ] harness/metrics/ - execution and quality metrics

### To Update (exists but has gaps)
- [ ] docs/DEVELOPMENT.md - missing build commands
- [ ] scripts/lint-quality.py - missing 3 packages in layer map

### Already Good (no changes needed)
- [x] Makefile - has all required targets
- [x] .github/workflows/ci.yml - properly configured

3.3 Confirm with User (if confirmation tool is available)

For significant changes:

{
  "question": "I've analyzed the codebase. Ready to proceed with these changes?",
  "header": "Confirm",
  "multiSelect": false,
  "options": [
    {"label": "Yes, proceed with all", "description": "Create/update all identified items"},
    {"label": "Show me the details first", "description": "I'll explain what each change involves"},
    {"label": "Only critical items", "description": "Just P0/P1 items, skip P2/P3 for now"}
  ]
}

Phase 4: Creation/Update

Goal: Create or update all harness files from the delta.

4.1 Execution Mode

Use subagents only when authorized and available. Otherwise, perform the same work inline. Keep write scopes disjoint if using parallel workers.

Creation responsibilities:

  • Documentation: follow agents/creator-docs.md; create/update AGENTS.md, docs/ECL.md, docs/STATUS.md, docs/ARCHITECTURE.md, docs/DEVELOPMENT.md, and design docs. AGENTS.md is the target project's entry map, not a harness creation record. Keep the first screen project-first, but preserve ECL/current-change priority in context loading: AGENTS.md -> docs/ECL.md -> active change if present -> auto-evolve pending if present -> otherwise docs/STATUS.md -> task-specific project docs.
  • Linters: follow agents/creator-linters.md; create/update dependency, quality, ECL, and encoding checks.
  • Config and scripts: follow agents/creator-config.md; create/update environment contract, harness scripts, changes directories/templates, lightweight evolution state, harness-change, harness-evolve, Makefile targets, and CI. Create advanced directories only when the confirmed scope requires them.

ECL change templates must include summary.md, spec.md, plan.md, tasks.md, and
reviews/review.md. spec.md captures WHAT/WHY, plan.md captures HOW and planning-discovered
spec gaps, and tasks.md is generated only after the spec/plan gate is ready enough for
implementation. Do not require old archived changes to contain plan.md; compatibility applies to
history.

Important: do not create static verification config such as harness/config/verify.json. Verification plans are generated at runtime by the executor from environment.json and the task context.

Strict CI rule: default CI must include normal business quality gates (lint, typecheck, test,
build, and backend/package-specific equivalents when available) plus harness checks. Do not remove
or skip business gates because the baseline is red. If the baseline was already red, explain that CI
will be red until the pre-existing project issues are fixed. Generate staged or relaxed CI only when
the user explicitly asks for it.

Command surface rule: create ECL scripts for the selected profile, not a hardcoded shell. If Bash is
selected on Windows, document Git Bash, WSL, MSYS2, or CI Linux shell requirements in the generated
environment/development docs. If PowerShell is selected, detect whether pwsh is available; if not,
use powershell -NoProfile -ExecutionPolicy Bypass. PowerShell templates must be compatible with
Windows PowerShell 5.1: avoid ambiguous overloads such as TrimStart(".\"), and avoid non-ASCII
mojibake marker string literals in .ps1; represent markers by Unicode codepoint or another
PS5-safe construction.

4.2 For Empty Projects: Also Create Business Code Plan

For empty projects, add one more agent:

Agent("create-exec-plan", prompt="""
Create execution plan for business code (harness-executor will implement this):

Tech stack: {TECH}
Project type: {from user choice}
Architecture: 3-layer (Types → Core → Entry Points)

Create: docs/exec-plans/active/bootstrap-code.md

Contents:
- Full source code for initial project structure
- main.go/index.ts/main.py entry point
- Basic types and core logic
- Test files

This is for harness-executor to implement - not ecl-harness-engineer's responsibility.
""")

4.3 Wait for Creation Completion

Agents will notify when done. Collect any issues they encountered.


Phase 5: Verification + Handoff

Goal: Ensure everything works, then hand off or present results.

5.1 Run Verification

# 0. Compare against the baseline snapshot
# Re-run the same existing lint/typecheck/test/build commands captured in Phase 1.

# 1. Harness checks pass
make verify-harness || npm run lint:harness || {generated_harness_lint_command}

# 2. Architecture linters pass
make lint-arch || npm run lint:arch

# 3. Business build/test gates run
go build ./... || npm run build || python -m compileall .

# 4. AGENTS.md size check
wc -l AGENTS.md  # Should be 80-120 lines

# 4b. AGENTS.md content gate
# Confirm it explains project identity, core workflow/domain model, source entrypoints,
# task-based verification, active-change-before-STATUS loading, and contains no
# ECL Harness Engineer internal boundary language.

# 5. All expected files exist
test -f AGENTS.md && echo "✓ AGENTS.md"
test -f docs/ARCHITECTURE.md && echo "✓ ARCHITECTURE.md"
test -f docs/ECL.md && echo "✓ ECL.md"
test -f docs/STATUS.md && echo "✓ STATUS.md"
test -f scripts/lint-deps* && echo "✓ lint-deps"
test -f scripts/harness-change.* && echo "✓ harness-change"
test -f scripts/lint-ecl.* && echo "✓ lint-ecl"
test -f scripts/harness-evolve.* && echo "✓ harness-evolve"
test -d harness/ && echo "✓ harness/"
test -d harness/changes && echo "✓ harness/changes"
test -f harness/evolution/state.json && echo "✓ evolution state"

# 6. Design docs exist (not just index)
find docs/design-docs -name "*.md" ! -name "index.md" | wc -l

Classify every verification result:

Classification Meaning
Harness pass Harness-created checks/files/scripts work
Pre-existing project failure The same command failed in the Phase 1 baseline
New regression The command passed in Phase 1 and fails after harness creation
Not available The command/script does not exist in this project

AGENTS.md content gate:

  • A new agent can tell what the project does within 30 seconds.
  • The core product/system workflow or domain model is visible.
  • Main source entrypoints and task-to-directory mapping are visible.
  • Verification guidance maps to task type.
  • Context loading reads docs/ECL.md first, then active change when present.
  • If no active change exists and harness/evolution/pending.md exists, read it before
    docs/STATUS.md, mention it as pending maintenance, and ask whether to handle it now unless the
    user already prioritized the current task. Reading or asking does not start auto-evolve and must
    not block ordinary user work.
  • If no active change exists and no pending evolution exists, context loading reads docs/STATUS.md before task-specific project docs.
  • For structured work, docs/ECL.md explains Small Change vs Structured Change, bounded Intake
    Review, plan-first input handling, and the spec/plan review gate.
  • Archive history is loaded selectively through docs/STATUS.md paths or harness/changes/INDEX.json, starting with historical summary.md only.
  • No skill-internal boundary leaks, such as sections or sentences that describe this skill's own scope limits as target-project rules.

5.2 STATUS.md Handoff Update

When a target project uses ECL changes, maintain docs/STATUS.md as a lightweight handoff file.
It is not the authority while an active change exists, but it becomes the default recent-history
entry point after the active change is closed.

Close-change handoff protocol:

  1. Before running harness-change close, read the active change summary.md, spec.md,
    plan.md, tasks.md, and relevant reviews/; update docs/STATUS.md with completed work,
    verification results, residual risks, and the next recommended resume point.
  2. Run the close command so the active change moves to harness/changes/archive/... and
    harness/changes/INDEX.json is rebuilt.
  3. After close, update docs/STATUS.md again with the final archive path, normally pointing to
    the archived summary.md.
  4. Run the harness lint command (npm run lint:harness, make verify-harness, or the generated
    ECL lint command) to confirm STATUS, ECL structure, and INDEX state are consistent.

Hooks and CI may validate docs/STATUS.md, but must not auto-write it or move changes.

5.3 Auto-Evolve Check

Core harnesses include lightweight auto-evolve by default. The script layer only detects when
enough new archive evidence exists and writes harness/evolution/pending.md; Codex performs the
semantic improvement pass.

Trigger model: harness-change close and reindex run harness-evolve check; new only reminds
when pending exists. Hooks and CI may warn, but must not modify docs, scripts, STATUS, or changes.
Generated scripts do not call subagents. They only count archive evidence and create pending
context. When no active change exists and Codex notices pending maintenance, it should ask the user
whether to handle it now unless the user already prioritized the current task. Asking does not start
pending evolution.

harness/evolution/pending.md is a maintenance reminder, not a hard lock. Reading it for context
does not start pending evolution. Pending evolution starts only when Codex creates or uses an
auto-evolve-harness-* change, writes an evolution proposal/result, or edits Harness files based
on the pending evidence. Once started, finish with a proposal, one harness/evolution/results.tsv
row, and harness-evolve mark-complete; otherwise park or close blocked, not completed.

Apply only the smallest evidence-backed delta that passes review. No independent scorer =
no auto-apply: user approval to handle pending implies permission to request an independent
auditor/subagent when the environment supports it. If the environment still requires explicit
authorization, ask once. If scoring is unavailable, declined, or still unauthorized after asking,
record noop with eval_mode=dry_run, keep the proposal, run mark-complete, and stop.
Machinery repair
(harness-evolve, pending templates, lint) does not complete pending evolution by itself; after
repair, still evaluate candidate archives or leave the work parked/blocked.

Detailed proposal format, scoring weights, status values, and complexity budget live in
references/ecl-harness.md.

5.4 Present Summary

## Harness Infrastructure Complete

**Project**: {project-name}
**Tech Stack**: {TECH}
**Files Created/Updated**: {count}

### Created Files
- AGENTS.md ({N} lines)
- docs/ARCHITECTURE.md
- docs/ECL.md
- docs/STATUS.md
- docs/DEVELOPMENT.md
- docs/design-docs/{component}.md
- scripts/lint-deps.{ext}
- scripts/lint-quality.{ext}
- scripts/harness-change.{ps1|sh|mjs|py}
- scripts/lint-ecl.{ps1|sh|mjs|py}
- scripts/lint-encoding.{ps1|sh|mjs|py}
- scripts/harness-evolve.{ps1|sh|mjs|py}
- harness/config/environment.json
- harness/changes/
- harness/evolution/
- harness/templates/change/
- Makefile

### Verification Results
- Harness checks: ✓
- Architecture checks: ✓
- Business gates: ✓ or pre-existing failures listed below
- AGENTS.md size: ✓ ({N} lines)

### Pre-existing Project Failures
- {List baseline-red commands and short reasons, or "None observed."}

### New Regressions Introduced By Harness
- {List commands that passed before and failed after, or "None observed."}

### Next Steps
{For empty projects: "Run harness-executor to implement business code from docs/exec-plans/active/bootstrap-code.md"}
{For existing projects: "The harness is ready. AI agents can now use AGENTS.md as their entry point."}

5.5 Automatic Handoff (for Empty Projects)

If this was an empty project with a bootstrap exec-plan, invoke harness-executor:

Skill(skill="harness-executor")

With context: "Implement the bootstrap exec-plan at docs/exec-plans/active/bootstrap-code.md"


Core Principles

1. Repository as Single Source of Truth

Agents cannot access Slack, Google Docs, or tribal knowledge. If it's not in the repository, it doesn't exist for the agent.

2. AGENTS.md is a Map, Not a Manual

Keep it 80-120 lines. Link to detailed docs, don't embed them.

3. Enforce Invariants Mechanically

Linter errors must be agent-actionable:

✗ BAD: "Forbidden import in core/types/user.go"

✓ GOOD: "core/types/user.go:15 imports core/config (layer 0 → layer 2).
         Layer 0 packages must have NO internal dependencies.

         Fix options:
         1. Move config-dependent logic to a higher layer
         2. Pass the config value as a parameter
         3. Use dependency injection via an interface"

4. Build to Delete

Every component should be replaceable. Capabilities that required complex pipelines yesterday may be single prompts tomorrow.

5. Start Simple

Atomic, well-documented tools > complex agent choreography. Don't over-engineer.

6. Change State Is Explicit

Use a single harness/changes/active/ task for personal development. Move paused work to parking/ and closed work to archive/ with the generated scripts/harness-change.* command. Maintain docs/STATUS.md as the soft handoff summary after active work is closed. Never hand-edit harness/changes/INDEX.json; it is a generated index rebuilt by park, close, resume, and reindex. Structured changes use spec.md for WHAT/WHY, plan.md for HOW, and tasks.md for executable work.

7. Harness Evolves From Evidence

Every few closed changes, the generated scripts/harness-evolve.* check command may create
harness/evolution/pending.md. Treat it as a maintenance reminder to improve harness rules from
real archived evidence, not as a hard blocker for unrelated user work. If you start acting on the
pending evidence, first refresh harness/changes/INDEX.json and use the current eligible archive
window; the Candidate Archives in an old pending file are a trigger snapshot, not the only evidence.
Then finish with proposal + results.tsv + mark-complete, or park/block the work.
Do not turn one-off business bugs into permanent process. Keep only changes that improve the audit
score and pass validation.


Reference Files

File When to Read Contents
references/greenfield-templates.md Empty projects (Phase 2.5) Complete Go/TS/Python scaffolding
references/documentation-templates.md Phase 4 doc creation Doc templates with numbered sections
references/linter-templates.md Phase 4 linter creation Linter code templates per language
references/ecl-harness.md ECL-aware harness creation docs/ECL.md, docs/STATUS.md, change lifecycle, INDEX.json, PowerShell script templates
references/darwin-eval-prompts.md Skill quality evaluation Dry-run prompts for darwin-skill review
references/environment-detection-guide.md Phase 2 env analysis Environment ecosystem detection
references/environment-config-guide.md Phase 4 config creation Startup, services, env vars, user-confirmation templates
references/adapters/typescript.md TypeScript/Node.js projects npm scripts, Node linters, package-manager detection, CI defaults
references/adapters/{go,python,rust,java,generic}.md Matching detected stacks Language-specific commands and conventions

Agent prompts for Phase 2 and Phase 4 subagents are in agents/.

For small projects (< 20 files) or when subagents aren't available, execute phases inline instead of spawning agents.

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.