Evaluate RAG System Performance
A complete RAG evaluation workflow that tests retrieval-augmented generation pipelines end-to-end using promptfoo's testing framework.
Why it matters
Assess the effectiveness of your Retrieval Augmented Generation (RAG) system by running comprehensive evaluations. This asset helps you test and refine your RAG pipeline to ensure accurate and relevant information retrieval and generation.
Outcomes
What it gets done
Index knowledge base for RAG.
Query RAG system with test cases.
Summarize evaluation results.
Write tests for RAG components.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/pfoo-eval-rag-full | bash Capabilities
What this chain does
Chunks, embeds, and indexes documents for semantic retrieval.
Condenses long documents or threads into key takeaways.
Writes and executes SQL or NoSQL queries on databases.
Creates unit, integration, or end-to-end test cases.
Overview
Eval Rag Full
What it does
This is a multi-step prompt workflow that evaluates retrieval-augmented generation (RAG) systems using the promptfoo testing framework. It runs end-to-end tests of RAG pipelines, checking both the retrieval of relevant documents and the quality of generated answers. The workflow automates the evaluation process from query execution through result validation.
How it connects
Use this when you need to systematically test a RAG implementation before deployment or when debugging accuracy issues in retrieval or generation. It's designed for teams building RAG applications who need repeatable, automated quality checks across their entire pipeline.
Source README
eval-rag-full (Rag Full)
You can run this example with:
npx promptfoo@latest init --example eval-rag-full
cd eval-rag-full
Usage
This RAG example allows you to ask questions over a number of public company SEC filings. It uses LangChain, but the flow is representative of any RAG solution.
There are 3 parts:
ingest.py: Chunks and loads PDFs into a vector database (PDFs are pulled from a public Google Cloud bucket)retrieve.py: Promptfoo-compatible provider that answers RAG questions using the database.promptfooconfig.yaml: Test inputs and requirements.
To get started:
Set the OPENAI_API_KEY environment variable.
Create a python virtual environment:
python3 -m venv venvEnter the environment:
source venv/bin/activateInstall python dependencies:
pip install -r requirements.txtRun
ingest.pyto create the vector database:python ingest.py
Now we're ready to go.
- Edit
promptfooconfig.yamlto your liking to configure the questions you'd like to ask in your tests. Then run: - Edit
retrieve.pyto control how context is loaded and questions are answered.
npx promptfoo@latest eval
Promptfoo is a Node.js CLI, but the file://retrieve.py provider runs inside Python. Keep the virtual environment active when running the eval, or set PROMPTFOO_PYTHON=./venv/bin/python so Promptfoo can import the packages from requirements.txt.
Afterwards, you can view the results by running npx promptfoo@latest view
See promptfooconfig.with-asserts.yaml for a more complete example that compares the performance of two RAG configurations. The smaller retrieval configuration is intentionally expected to miss a couple of details so the comparison view demonstrates failures as well as passes.
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.