Prompt Chain

Generate and Review Complex Code

Name: Generate and Review Complex Code
Availability: OnlineOnly
Author: Promptfoo

Demonstrates Claude Opus 4.6's coding and reasoning capabilities for complex software engineering tasks.

Copy chain

Promptfoo

Maintainer?

Spark score

out of 100

Updated yesterday

Version code-scan-action-0.1

Models

claudeclaude 3 opus

Add to Favorites

Why it matters

Leverage state-of-the-art coding and reasoning to tackle complex software engineering tasks, including handling ambiguity and performing tradeoff analysis.

Outcomes

What it gets done

Generate high-quality code for complex problems.

Analyze and review code for quality and correctness.

Debug and resolve issues in software development.

Evaluate tradeoffs in software design and implementation.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/pfoo-opus-4-6-coding | bash

Capabilities

What this chain does

Generate code

Writes source code or scripts from a description.

Review code

Analyzes code for bugs, style issues, and improvements.

Debug

Traces errors to their root cause and suggests fixes.

Overview

Opus 4 6 Coding

What it does

This prompt chain demonstrates Claude Opus 4.6's coding and reasoning capabilities, specifically its ability to handle complex software engineering tasks with ambiguity and tradeoff analysis.

How it connects

Use this to explore and understand how Claude Opus 4.6 handles complex software engineering tasks that involve ambiguity and tradeoff analysis.

Source README

anthropic/opus-4-6-coding (Claude Opus 4.6 Advanced Coding)

This example demonstrates Claude Opus 4.6's state-of-the-art coding and reasoning capabilities, showcasing its ability to handle complex software engineering tasks with ambiguity and tradeoff analysis.

You can run this example with:

npx promptfoo@latest init --example anthropic/opus-4-6-coding
cd anthropic/opus-4-6-coding

What This Tests

Claude Opus 4.6 is the best model in the world for coding, agents, and computer use. This example evaluates:

Complex code analysis: Understanding multi-file codebases and architectural decisions
Bug diagnosis: Identifying root causes in complex, multi-system scenarios
Ambiguity handling: Making informed decisions when requirements are unclear
Tradeoff reasoning: Evaluating different approaches and explaining pros/cons
Code generation: Writing high-quality, production-ready code

Features Demonstrated

State-of-the-art coding: Opus 4.6 achieves the highest score on SWE-bench Verified among frontier models
Reasoning about tradeoffs: The model excels at analyzing different approaches and making informed decisions
Handling ambiguity: Unlike models that require hand-holding, Opus 4.6 figures things out
Extended thinking: Support for thinking budgets up to 128K tokens for complex reasoning

Running the Example

# Set your API key
export ANTHROPIC_API_KEY=your_api_key_here

# Run the evaluation
npx promptfoo@latest eval

# View results
npx promptfoo@latest view

Expected Results

The evaluation tests Opus 4.6's ability to:

Diagnose bugs across multiple system boundaries
Choose appropriate data structures with clear reasoning
Write production-quality code with proper error handling
Analyze architectural decisions and propose improvements

Learn More

Discussion