Compare LLM Responses for Quality
Compare Claude and GPT models side-by-side. Evaluate their performance on various prompts to determine the best LLM for your needs.
Why it matters
Evaluate and compare the outputs of different large language models (LLMs) like Claude and GPT. This asset helps you determine which model provides superior responses for your specific needs.
Outcomes
What it gets done
Run prompts against multiple LLMs.
Score and rank LLM responses.
Identify the best performing LLM for a given task.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/pfoo-claude-vs-gpt | bash Capabilities
What this chain does
Handles multi-turn conversations within a defined domain.
Condenses long documents or threads into key takeaways.
Labels or categorizes text, files, or data points.
Overview
Claude Vs Gpt
What it does
This prompt chain enables a direct comparison of responses between Claude and GPT models. It allows users to input a series of prompts and receive side-by-side evaluations of how each LLM handles them. The goal is to facilitate a clear understanding of their performance differences.
How it connects
Use this when you need to make an informed decision between using Claude or GPT for a specific application. It's ideal for A/B testing and empirical evaluation of LLM outputs to determine which model better suits your needs.
Source README
yaml-language-server: $schema=https://promptfoo.dev/config-schema.json
description: 'GPT vs Claude example'
prompts:
- file://prompt.yaml
providers:
- id: anthropic:claude-sonnet-4-6
label: Claude Sonnet 4.6 - openai:gpt-4.1-mini
defaultTest:
assert:
- type: cost
threshold: 0.01
- type: latency
threshold: 3000
- type: javascript
value: 'output.length <= 100 ? 1 : output.length > 1000 ? 0 : 1 - (output.length - 100) / 900'
tests:
- vars:
riddle: 'I speak without a mouth and hear without ears. I have no body, but I come alive with wind. What am I?'
assert:Make sure the LLM output contains this word
- type: icontains
value: echo
Use model-graded assertions to enforce free-form instructions
- type: llm-rubric
value: Do not apologize
- type: icontains
- vars:
riddle: "You see a boat filled with people. It has not sunk, but when you look again you don't see a single person on the boat. Why?"
assert:- type: llm-rubric
value: explains that the people are below deck, or they are all in a relationship
- type: llm-rubric
- vars:
riddle: 'The more of this there is, the less you see. What is it?'
assert:- type: icontains
value: darkness
- type: icontains
- vars:
riddle: >-
I have keys but no locks. I have space but no room. You can enter, but
can't go outside. What am I?
assert:- type: icontains
value: keyboard
- type: icontains
- vars:
riddle: >-
I am not alive, but I grow; I don't have lungs, but I need air; I don't
have a mouth, but water kills me. What am I?
assert:- type: icontains-any
value:- fire
- flame
- type: icontains-any
- vars:
riddle: What can travel around the world while staying in a corner?
assert:- type: icontains
value: stamp
- type: icontains
- vars:
riddle: Forward I am heavy, but backward I am not. What am I?
assert:- type: icontains
value: ton
- type: icontains
- vars:
riddle: >-
The person who makes it, sells it. The person who buys it, never uses
it. The person who uses it, doesn't know they're using it. What is it?
assert:- type: icontains
value: coffin
- type: icontains
- vars:
riddle: I can be cracked, made, told, and played. What am I?
assert:- type: icontains
value: joke
- type: icontains
- vars:
riddle: What has keys but can't open locks?
assert:- type: icontains
value: piano
- type: icontains
- vars:
riddle: >-
I'm light as a feather, yet the strongest person can't hold me for much
more than a minute. What am I?
assert:- type: icontains
value: breath
- type: icontains
- vars:
riddle: >-
I can fly without wings, I can cry without eyes. Whenever I go, darkness
follows me. What am I?
assert:- type: icontains
value: cloud
- type: icontains
- vars:
riddle: >-
I am taken from a mine, and shut up in a wooden case, from which I am
never released, and yet I am used by almost every person. What am I? - vars:
riddle: >-
David's father has three sons: Snap, Crackle, and _____? What is the
name of the third son?
assert:- type: contains
value: David
- type: contains
- vars:
riddle: >-
I am light as a feather, but even the world's strongest man couldn't
hold me for much longer than a minute. What am I?
assert:- type: icontains
value: breath
- type: icontains
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.