Prompt Chain

Migrate Legacy Code with Sandboxed Agents

Multi-step prompt workflow that migrates legacy code by running an OpenAI agent outside sandboxed execution environments, validating each repo shard with tests

Works with dockeropenai

74
Spark score
out of 100
Updated 17 days ago
Version 1.0.0
Models

Add to Favorites

Why it matters

Modernize legacy codebases by breaking down large migration tasks into smaller, manageable shards. This asset automates code updates, validation, and patching within isolated sandbox environments for safer and more efficient code modernization.

Outcomes

What it gets done

01

Segment large code migration projects into task-sized repository shards.

02

Execute code edits and validation checks within isolated, secure sandbox environments.

03

Automate the process of applying patches and running tests for each code shard.

04

Generate typed migration reports and audit logs for each completed task.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/oai-sandboxedcodemigrationagent | bash

Steps

Steps in the chain

01
Define the migration tasks

Create a task list that points each migration shard at a local repo. Each task should have a name and repo_path. The task list mounts each fixture repo into a fresh sandbox as /workspace/repo and asks the agent to follow that repo's MIGRATION.md. Replace the task's repo_path and edit its MIGRATION.md to adapt to your own codebase.

02
Verify the baseline before the agent edits anything

Before the agent edits a repo, run the same tests yourself. The agent will run this baseline test command inside that task's sandbox before it changes any files.

03
Stage the sandbox workspace

Create a manifest that tells the sandbox client which host files to stage and where they should appear in the execution environment. Copy the agent instructions and one task repo into /workspace. For a real migration, stage only the target checkout, task instructions, and files required for that run. Keep credentials, customer storage, and memory in the host harness.

04
Define the sandbox agent

Create an agent with two sandbox-facing capabilities: Shell() for terminal work and ApplyPatch() for file edits. Keep everything else in the definition with the host harness: instructions, model settings, MCP servers, and the typed output contract.

05
Connect a host-side MCP server (optional)

Because the harness runs outside the sandbox, connect MCP servers from the trusted host process. The sandbox does not need MCP credentials or broad network access. Keep this deterministic for migrations by fetching the approved migration guide instead of asking the agent to search the docs during every run.

06
Run the migration campaign

Execute a host-side loop over migration tasks. For each task, build the manifest and agent, create a fresh sandbox session, and pass that session into Runner.run_streamed. After the task finishes, write the returned patch bundle under outputs/<task_name>/ and delete the sandbox before starting the next task.

07
Inspect returned artifacts

The host runner writes each task's typed result to disk. The sandboxes can disappear after the run; each task's report, patch, JSON result, and audit log remain in outputs/<task_name>/, with a campaign summary at outputs/batch_summary.json.

08
Validate the generated artifacts (optional)

Check each returned patch, typed result, and audit log before showing a patch to a user or applying it to a real repo. This eval is deterministic: it reads the campaign outputs and fails if any task did not produce the expected contract.

09
Swap sandbox providers (optional)

Choose between three sandbox backends: Docker for local runs, E2B for a hosted sandbox, and Cloudflare for a hosted worker-backed sandbox. The pattern is the same: change the sandbox client, not the agent. Use --backend flag with CLI to specify docker, e2b, or cloudflare.

Overview

Migrate a Legacy Codebase with Sandbox Agents

What it does

Prompt chain that orchestrates code migration agents using the OpenAI Agents SDK with sandbox isolation-the harness runs in a trusted host process while file edits and shell commands execute in Docker, E2B, or Cloudflare containers.

How it connects

Use when modernizing large codebases incrementally: split work into reviewable repo shards, validate each with tests inside isolated execution environments, and return patch bundles that can be opened as separate pull requests.

Source README

Migrate a Legacy Codebase with Sandbox Agents

Code modernization never really ends. Outdated dependencies, security risks, compliance pressure, and legacy patterns keep accumulating across large codebases, and one massive migration PR is hard to review and risky to merge. A code-migration agent should work in a controlled environment, one scoped task at a time: inspect the relevant repo, edit files, run checks, and return a patch.

This cookbook uses the Agents SDK with the harness outside the sandbox: orchestration stays in the trusted host process, while shell commands and file edits run in isolated execution environments. This separation lets the host harness use secrets, tools, and external services while giving the sandbox only the files and commands needed for the task.

By the end of this cookbook, you'll be able to:

  • Keep the agent harness outside the execution environment that runs shell commands and file edits
  • Segment a modernization job into task-sized repo shards
  • Validate each shard with tests, checks, artifacts, and an audit log
  • Swap sandbox providers without rewriting the agent

The example is a two-service code migration. Each service runs in its own sandbox and returns its own patch bundle, the same shape you could use to open separate pull requests for review and CI. In each sandbox, the agent migrates an OpenAI client wrapper from Chat Completions to the Responses API. Along the way it runs tests, patches the app and tests, runs a compile check, reruns tests, and returns a typed migration report with a patch.

We'll run the sandbox with Docker locally. Provider-specific code is isolated to sandbox creation, so the same harness and agent can point at hosted sandbox providers such as E2B or Cloudflare without changing the SandboxAgent, tools, manifest, or prompt.

Architecture: sandbox as a tool

Agents SDK harness running outside a swappable sandbox

The trusted host process owns the Agents SDK harness, tools, MCP servers, credentials, policy, and audit. The execution environment receives only the scoped workspace for the current task and the sandbox-facing capabilities needed to run commands and edit files.

Another pattern is to launch a coding agent whose harness, agent loop, tools, and filesystem all live inside the sandbox. That can work, but it pushes orchestration and tool integration into the same environment that runs generated code.

In this pattern, the sandbox is something the harness calls when the agent needs a filesystem, terminal command, test run, or patch. The broader agent stack stays on the harness side.

Flow:

  1. Your app receives a migration request and splits it into task-sized repo shards.
  2. For each task, the host-side Agents SDK harness starts an agent and creates a fresh sandbox.
  3. The harness stages that task's repo and migration brief into the sandbox.
  4. The agent uses sandbox tools to inspect files, edit code, and run tests.
  5. The host receives the task's report and patch, writes an audit log, deletes the sandbox, and moves to the next task.

Requirements

  • Python 3.10+
  • Docker, running locally for the Docker sandbox example
  • An OpenAI API key, exported as OPENAI_API_KEY
  • The OpenAI Agents SDK with sandbox support
  • Optional: hosted sandbox provider credentials, such as E2B_API_KEY for E2B or a Cloudflare Worker URL and API key for Cloudflare

Keep API keys in the host environment. Do not add them to the mounted repo or sandbox manifest.

Install dependencies

Clone the cookbook and move into this example directory:

git clone https://github.com/openai/openai-cookbook.git
cd openai-cookbook/examples/agents_sdk/sandboxed-code-migration

Open sandboxed_code_migration_agent.ipynb from that directory and install the dependencies below. Start Docker before running the full agent demo.

Import the host-side harness

Import the small host-side runner used by this example. It creates sandbox sessions, starts the agent loop, writes returned artifacts, records an audit log, and keeps provider credentials out of the mounted repo. The full file is included at src/run_migration_agent.py; the notebook pulls the important pieces into the walkthrough below.

1. Define the migration tasks

This cookbook includes two small fixture repos in repo_fixtures/. If you run the notebook as-is, the host harness mounts each fixture repo into a fresh sandbox as /workspace/repo and asks the agent to follow that repo's MIGRATION.md.

The task list points each migration shard at a local repo:

@dataclass(frozen=True)
class MigrationTask:
    name: str
    repo_path: Path

    @property
    def migration_brief_path(self) -> Path:
        return self.repo_path / "MIGRATION.md"

DEFAULT_MIGRATION_TASKS = (
    MigrationTask(
        name="support_reply_service",
        repo_path=EXAMPLE_ROOT / "repo_fixtures" / "support_reply_service",
    ),
    MigrationTask(
        name="case_summary_service",
        repo_path=EXAMPLE_ROOT / "repo_fixtures" / "case_summary_service",
    ),
)

To adapt this to your own codebase, replace the task's repo_path and edit its MIGRATION.md. The generic run prompt can stay the same because it tells the agent to follow the mounted repo's brief.

Inspect one task before the agent touches it: the migration brief, the OpenAI client wrapper, and the application call site.

Now inspect the two main code targets: the OpenAI client wrapper and the application call site that uses it.

2. Verify the baseline before the agent edits anything

Before the agent edits a repo, run the same tests yourself. The agent will run this baseline test command inside that task's sandbox before it changes any files.

3. Stage the sandbox workspace

The manifest is the sandbox boundary. It tells the sandbox client which host files to stage and where they should appear in the execution environment. Here, we copy the agent instructions and one task repo into /workspace.

For a real migration, stage only the target checkout, task instructions, and files required for that run. Keep credentials, customer storage, and memory in the host harness. The helper below stays small for that reason: it stages only the shared agent instructions and the repo for the current migration task.

4. Define the sandbox agent

The agent gets two sandbox-facing capabilities: Shell() for terminal work and ApplyPatch() for file edits. Everything else in the definition stays with the host harness: instructions, model settings, MCP servers, and the typed output contract.

Optional: connect a host-side MCP server

Because the harness runs outside the sandbox, it can connect MCP servers from the trusted host process. The sandbox does not need MCP credentials or broad network access.

This runner can optionally connect the public OpenAI docs MCP from the host harness. The agent can use that docs context while shell commands and patches still run in the sandbox.

Keep this deterministic for migrations. Fetch the approved migration guide instead of asking the agent to search the docs during every run.

5. Run the migration campaign

The full run is a host-side loop over migration tasks. For each task, the harness builds the manifest and agent, creates a fresh sandbox session, and passes that session into Runner.run_streamed. After the task finishes, the host writes the returned patch bundle under outputs/<task_name>/ and deletes the sandbox before starting the next task.

manifest = build_manifest(task)
agent = build_agent(model=model, manifest=manifest, mcp_servers=mcp_servers)
client, session = await create_sandbox(backend, manifest, docker_image=docker_image)

try:
    async with session:
        result = Runner.run_streamed(
            agent,
            [{"role": "user", "content": f"Task name: {task.name}\n\n{prompt}"}],
            max_turns=30,
            run_config=RunConfig(
                sandbox=SandboxRunConfig(session=session),
                workflow_name=f"Sandboxed code migration: {task.name} ({backend})",
                tracing_disabled=not enable_hosted_tracing,
            ),
        )
        async for event in result.stream_events():
            if event.type == "run_item_stream_event" and event.name in {"tool_called", "tool_output"}:
                append_audit_event(audit_log_path, {"event": event.name})
finally:
    await client.delete(session)

The snippet above is the core of run_migration_task; the runnable cell below calls the full helper. The run is guarded so the notebook can execute without calling the model. Change RUN_FULL_AGENT_DEMO to True when you want to run the Docker-backed migration end to end.

6. Inspect returned artifacts

The host runner writes each task's typed result to disk. The sandboxes can disappear after the run; each task's report, patch, JSON result, and audit log remain in outputs/<task_name>/, with a campaign summary at outputs/batch_summary.json.

Optional: validate the generated artifacts

The host can check each returned patch, typed result, and audit log before showing a patch to a user or applying it to a real repo. This eval is deterministic: it reads the campaign outputs and fails if any task did not produce the expected contract.

7. Optional: swap sandbox providers

This section shows three sandbox backends: Docker for local runs, E2B for a hosted sandbox, and Cloudflare for a hosted worker-backed sandbox. The pattern is the same: change the sandbox client, not the agent.

Docker:

client = DockerSandboxClient(docker.from_env())
session = await client.create(
    manifest=manifest,
    options=DockerSandboxClientOptions(image="python:3.14-slim"),
)

E2B:

client = E2BSandboxClient()
session = await client.create(
    manifest=manifest,
    options=E2BSandboxClientOptions(
        sandbox_type=E2BSandboxType.E2B,
    ),
)

Run E2B from the CLI:

export E2B_API_KEY="..."
python src/run_migration_agent.py --backend e2b

Cloudflare:

client = CloudflareSandboxClient()
session = await client.create(
    manifest=manifest,
    options=CloudflareSandboxClientOptions(
        worker_url=os.environ["CLOUDFLARE_SANDBOX_WORKER_URL"],
        api_key=os.environ.get("CLOUDFLARE_SANDBOX_API_KEY"),
    ),
)

Run Cloudflare from the CLI:

export CLOUDFLARE_SANDBOX_WORKER_URL="https://..."
export CLOUDFLARE_SANDBOX_API_KEY="..."
python src/run_migration_agent.py --backend cloudflare

Run Docker from the CLI:

python src/run_migration_agent.py --backend docker

Production notes

Production code should keep orchestration, execution, data access, and returned outputs behind separate trust boundaries. Treat each migration task as its own unit of review.

Boundary Production pattern
Harness Keep orchestration, tools, credentials, policy, and audit on the host.
Sandbox Stage only the task workspace. Run commands and edits there. Tear it down after the task.
Data access Route customer storage and network access through the host, not directly through the sandbox.
Output Validate sandbox output in the host before showing it to users or applying changes.

Tracing and ZDR

This example disables hosted tracing per run with RunConfig.tracing_disabled=True. To opt in while running this cookbook's CLI, pass --enable-hosted-tracing. The Agents SDK also supports the global OPENAI_AGENTS_DISABLE_TRACING=1 environment variable when you want tracing disabled process-wide.

Next steps

To adapt this pattern, replace migration_tasks with your own repos, packages, or services. Give each task one checkout and one MIGRATION.md. Keep the validation commands explicit, and return a patch for review instead of applying host-side changes automatically. Add deterministic evals that check the migration contract, not just whether the test suite passes.

When the job spans many packages, such as a Jest-to-Vitest migration, the host harness can use repo metadata or a manager agent to plan shards. Each shard should produce the same thing this cookbook produces: a validated patch, report, and audit trail from an isolated sandbox.

Step 1: Define the migration tasks

Create a task list that points each migration shard at a local repo. Each task should have a name and repo_path. The task list mounts each fixture repo into a fresh sandbox as /workspace/repo and asks the agent to follow that repo's MIGRATION.md. Replace the task's repo_path and edit its MIGRATION.md to adapt to your own codebase.

Step 2: Verify the baseline before the agent edits anything

Before the agent edits a repo, run the same tests yourself. The agent will run this baseline test command inside that task's sandbox before it changes any files.

Step 3: Stage the sandbox workspace

Create a manifest that tells the sandbox client which host files to stage and where they should appear in the execution environment. Copy the agent instructions and one task repo into /workspace. For a real migration, stage only the target checkout, task instructions, and files required for that run. Keep credentials, customer storage, and memory in the host harness.

Step 4: Define the sandbox agent

Create an agent with two sandbox-facing capabilities: Shell() for terminal work and ApplyPatch() for file edits. Keep everything else in the definition with the host harness: instructions, model settings, MCP servers, and the typed output contract.

Step 5: Connect a host-side MCP server (optional)

Because the harness runs outside the sandbox, connect MCP servers from the trusted host process. The sandbox does not need MCP credentials or broad network access. Keep this deterministic for migrations by fetching the approved migration guide instead of asking the agent to search the docs during every run.

Step 6: Run the migration campaign

Execute a host-side loop over migration tasks. For each task, build the manifest and agent, create a fresh sandbox session, and pass that session into Runner.run_streamed. After the task finishes, write the returned patch bundle under outputs/<task_name>/ and delete the sandbox before starting the next task.

Step 7: Inspect returned artifacts

The host runner writes each task's typed result to disk. The sandboxes can disappear after the run; each task's report, patch, JSON result, and audit log remain in outputs/<task_name>/, with a campaign summary at outputs/batch_summary.json.

Step 8: Validate the generated artifacts (optional)

Check each returned patch, typed result, and audit log before showing a patch to a user or applying it to a real repo. This eval is deterministic: it reads the campaign outputs and fails if any task did not produce the expected contract.

Step 9: Swap sandbox providers (optional)

Choose between three sandbox backends: Docker for local runs, E2B for a hosted sandbox, and Cloudflare for a hosted worker-backed sandbox. The pattern is the same: change the sandbox client, not the agent. Use --backend flag with CLI to specify docker, e2b, or cloudflare.

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.