Evaluate Search Results with a Rubric
A prompt chain example that demonstrates building evaluation rubrics for search quality assessment, using promptfoo's multi-step workflow capabilities.
1.0.0Add to Favorites
Why it matters
Automate the evaluation of search engine results using a predefined rubric to ensure quality and relevance. This asset helps in systematically assessing the output of search queries.
Outcomes
What it gets done
Define and apply a structured rubric for evaluating search results.
Classify search results based on relevance and quality criteria.
Summarize findings from evaluated search results.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/pfoo-eval-search-rubric | bash Capabilities
What this chain does
Searches the web and retrieves relevant sources.
Condenses long documents or threads into key takeaways.
Labels or categorizes text, files, or data points.
Overview
Eval Search Rubric
What it does
This is a prompt chain example from the promptfoo repository that demonstrates how to build evaluation rubrics for assessing search quality. It structures the evaluation process as a multi-step workflow, showing how to organize assessment criteria systematically. The example is runnable and serves as a reference implementation for search quality evaluation patterns.
How it connects
Use this when you need a starting point for building search evaluation workflows or want to understand how to structure assessment rubrics as prompt chains. It's particularly useful when you're setting up quality standards for search systems and need a concrete example of how to organize multi-step evaluation logic using promptfoo.
Source README
eval-search-rubric (Search Rubric)
You can run this example with:
npx promptfoo@latest init --example eval-search-rubric
cd eval-search-rubric
This example demonstrates how to use the search-rubric assertion type to verify that LLM outputs contain accurate, current information.
Overview
The search-rubric assertion allows you to verify facts by searching the web in real-time. This is particularly useful for:
- Current events and news
- Stock prices and financial data
- Weather information
- Recent company information
- Any time-sensitive data
Running the Example
npx promptfoo eval
How It Works
- The LLM generates a response to your prompt
- The search-rubric assertion extracts the claim you want to verify
- A provider with web search capabilities searches for current information
- The assertion passes or fails based on whether the output matches current web data
Provider Support
Anthropic Claude
- Web search capabilities via tool configuration (launched in 2025)
- Requires explicit
web_search_20250305tool configuration - Pricing: $10 per 1,000 searches plus standard token costs
OpenAI
- Requires
web_search_previewtool configuration - Works with gpt-5.1, o4-mini, and other Responses API models
Perplexity
- Built-in web search capabilities
- No additional configuration needed
Configuration
assert:
- type: search-rubric
value: 'search query to verify'
threshold: 0.8 # Optional: minimum accuracy score (0-1)
Notes
- Search rubric assertions add latency (2-5 seconds per assertion)
- Use caching during development:
npx promptfoo eval --cache - Be specific with your search queries for better results
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.