Compare Claude Model Reasoning
See Claude's step-by-step reasoning before the final answer. Compares Claude Sonnet 4 and Haiku 4.5 thinking outputs.
code-scan-action-0.1Add to Favorites
Why it matters
Understand and compare the step-by-step reasoning processes of different Claude AI models. Gain insights into how models like Sonnet and Haiku arrive at their conclusions before presenting a final answer.
Outcomes
What it gets done
Analyze Claude Sonnet 4's thinking process.
Analyze Claude Haiku 4.5's thinking process.
Compare reasoning outputs between models.
Extract key steps from model thought processes.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/pfoo-claude-thinking | bash Capabilities
What this chain does
Condenses long documents or threads into key takeaways.
Pulls structured data fields from unstructured text.
Labels or categorizes text, files, or data points.
Overview
Claude Thinking
What it does
This prompt chain demonstrates Claude's 'thinking' capability, allowing users to observe the model's step-by-step reasoning process before it generates a final answer. It compares thinking outputs from Claude Sonnet 4 (Anthropic API) and Claude Haiku 4.5 (AWS Bedrock).
How it connects
Use this prompt chain when you need to understand the reasoning behind Claude's responses, especially for complex tasks where transparency is crucial. It's ideal for educational purposes or when comparing different model behaviors. Do not use this if you only need a quick, final answer without insight into the reasoning process, or if the task is too simple to benefit from a detailed thought process.
Source README
claude-thinking (Claude Thinking)
This example demonstrates Claude's "thinking" capability, which allows you to see the model's step-by-step reasoning process before it provides a final answer. The example compares thinking outputs from Claude Sonnet 4 (Anthropic API) and Claude Haiku 4.5 (AWS Bedrock).
You can run this example with:
npx promptfoo@latest init --example claude-thinking
cd claude-thinking
What This Example Demonstrates
- Using Claude's thinking feature to reveal step-by-step reasoning
- Comparing thinking output quality between different Claude models
- Comparing Anthropic API vs AWS Bedrock providers
- Configuring the thinking token budget
- Using LLM-based evaluation rubrics to assess reasoning quality
Environment Variables
This example requires:
For Anthropic API
ANTHROPIC_API_KEY- Your Anthropic API key from console.anthropic.com
For AWS Bedrock
AWS_ACCESS_KEY_ID- Your AWS access keyAWS_SECRET_ACCESS_KEY- Your AWS secret key- Or configure credentials via the AWS CLI:
aws configure
Running the Example
After setting up environment variables:
# From the example directory
promptfoo eval
promptfoo view
Test Cases
This example includes several test cases of increasing complexity:
- 8 Balls Problem - A classic logic puzzle requiring careful reasoning
- Train Meeting Problem - A traditional algebra word problem
These test cases are specifically designed to showcase Claude's ability to break down complex problems and show detailed thinking steps.
How Claude Thinking Works
The thinking feature is enabled by setting special parameters in the provider configuration:
thinking:
type: 'enabled'
budget_tokens: 4096 # Controls how many tokens are allocated for thinking
max_tokens: 8192 # Must be greater than budget_tokens
When enabled, Claude's response will include a "Thinking:" section that shows its reasoning process before the final answer:
Thinking: Let me solve this step by step...
1. First, I'll divide the 8 balls into three groups...
2. In the first weighing, I'll compare groups A and B...
3. Based on the result, I can determine...
Final answer: We need exactly 2 weighings to find the heavier ball.
Additional Resources
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.