Prompt Chain

Evaluate RAG System Performance

A complete RAG evaluation workflow that tests retrieval-augmented generation pipelines end-to-end using promptfoo's testing framework.


72
Spark score
out of 100
Updated 3 months ago
Version 1.0.0
Models

Add to Favorites

Why it matters

Assess the effectiveness of your Retrieval Augmented Generation (RAG) system by running comprehensive evaluations. This asset helps you test and refine your RAG pipeline to ensure accurate and relevant information retrieval and generation.

Outcomes

What it gets done

01

Index knowledge base for RAG.

02

Query RAG system with test cases.

03

Summarize evaluation results.

04

Write tests for RAG components.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/pfoo-eval-rag-full | bash

Capabilities

What this chain does

RAG index

Chunks, embeds, and indexes documents for semantic retrieval.

Summarize

Condenses long documents or threads into key takeaways.

Query a database

Writes and executes SQL or NoSQL queries on databases.

Write tests

Creates unit, integration, or end-to-end test cases.

Overview

Eval Rag Full

What it does

This is a multi-step prompt workflow that evaluates retrieval-augmented generation (RAG) systems using the promptfoo testing framework. It runs end-to-end tests of RAG pipelines, checking both the retrieval of relevant documents and the quality of generated answers. The workflow automates the evaluation process from query execution through result validation.

How it connects

Use this when you need to systematically test a RAG implementation before deployment or when debugging accuracy issues in retrieval or generation. It's designed for teams building RAG applications who need repeatable, automated quality checks across their entire pipeline.

Source README

eval-rag-full (Rag Full)

You can run this example with:

npx promptfoo@latest init --example eval-rag-full
cd eval-rag-full

Usage

This RAG example allows you to ask questions over a number of public company SEC filings. It uses LangChain, but the flow is representative of any RAG solution.

There are 3 parts:

  1. ingest.py: Chunks and loads PDFs into a vector database (PDFs are pulled from a public Google Cloud bucket)

  2. retrieve.py: Promptfoo-compatible provider that answers RAG questions using the database.

  3. promptfooconfig.yaml: Test inputs and requirements.

To get started:

  1. Set the OPENAI_API_KEY environment variable.

  2. Create a python virtual environment: python3 -m venv venv

  3. Enter the environment: source venv/bin/activate

  4. Install python dependencies: pip install -r requirements.txt

  5. Run ingest.py to create the vector database: python ingest.py

Now we're ready to go.

  • Edit promptfooconfig.yaml to your liking to configure the questions you'd like to ask in your tests. Then run:
  • Edit retrieve.py to control how context is loaded and questions are answered.
npx promptfoo@latest eval

Promptfoo is a Node.js CLI, but the file://retrieve.py provider runs inside Python. Keep the virtual environment active when running the eval, or set PROMPTFOO_PYTHON=./venv/bin/python so Promptfoo can import the packages from requirements.txt.

Afterwards, you can view the results by running npx promptfoo@latest view

See promptfooconfig.with-asserts.yaml for a more complete example that compares the performance of two RAG configurations. The smaller retrieval configuration is intentionally expected to miss a couple of details so the comparison view demonstrates failures as well as passes.

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.