Prompt Chain

Implement Corrective RAG with Local LLMs

Implement Corrective RAG (CRAG) with local LLMs and Tavily Search.

Works with ollamatavilylangchainnomic

91
Spark score
out of 100
Updated 3 months ago
Version 1.0.0

Add to Favorites

Why it matters

Enhance retrieval-augmented generation (RAG) by incorporating self-reflection and web search for improved accuracy, especially when dealing with local LLMs.

Outcomes

What it gets done

01

Implement a Corrective RAG (CRAG) pipeline using LangGraph.

02

Integrate local LLMs via Ollama for document retrieval and generation.

03

Utilize Tavily Search to supplement retrieval when initial documents are irrelevant.

04

Index local documents using Nomic embeddings for efficient retrieval.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/lg-langgraphcraglocal | bash

Steps

Steps in the chain

01
Setup: Install packages and configure API keys

Install required packages for Ollama, Tavily Search, and vectorstore with Nomic local embeddings or OpenAI embeddings. Set up API keys for the services.

02
Select LLM from Ollama

Choose from available Ollama LLMs. Download Ollama app and pull your model of choice, e.g.: ollama pull llama3

03
Create Index

Index 3 blog posts to create a vectorstore for retrieval.

04
Define Tools

Define the tools needed for the CRAG workflow, including web search via Tavily and document grading.

05
Create the Graph

Explicitly define the majority of the control flow using LangGraph, with an LLM defining a single branch point following grading. If any documents are irrelevant, supplement retrieval with web search.

06
Evaluation: Response Assessment

Create a dataset of question-answer pairs and save it in LangSmith. Use an LLM as a grader (gpt-4o) to compare agent responses to ground truth reference answers.

07
Evaluation: Trajectory Assessment

Assess the list of tool calls that each agent makes relative to expected trajectories. Evaluate the specific reasoning traces taken by agents and benchmark against GPT-4o and Llama-3-70b.

Overview

Corrective RAG (CRAG) using local LLMs

What it does

Implement Corrective RAG (CRAG) using local LLMs and Tavily Search. This example demonstrates a RAG strategy that incorporates self-reflection/self-grading on retrieved documents. The implementation uses LangGraph and follows a flow where retrieved documents are assessed for relevance. If documents fall below a relevance threshold or if a grader is unsure, web search is used to supplement retrieval. The implementation skips knowledge refinement but notes it can be added. It utilizes Ollama for local LLMs, Tavily Search for web search, and a vectorstore with Nomic local embeddings or OpenAI embeddings. Evaluation of agent response accuracy and tool call trajectory is performed using LangSmith.

Source README

This directory is retained purely for archival purposes and is no longer updated. Please see the newly consolidated LangChain documentation for the most current information and resources.

Corrective RAG (CRAG) using local LLMs

Corrective-RAG (CRAG) is a strategy for RAG that incorporates self-reflection / self-grading on retrieved documents.

The paper follows this general flow:

  • If at least one document exceeds the threshold for relevance, then it proceeds to generation
  • If all documents fall below the relevance threshold or if the grader is unsure, then it uses web search to supplement retrieval
  • Before generation, it performs knowledge refinement of the search or retrieved documents
  • This partitions the document into knowledge strips
  • It grades each strip, and filters out irrelevant ones

We will implement some of these ideas from scratch using LangGraph:

  • If any documents are irrelevant, we'll supplement retrieval with web search.
  • We'll skip the knowledge refinement, but this can be added back as a node if desired.
  • We'll use Tavily Search for web search.

Setup

We'll use Ollama to access a local LLM:

  • Download Ollama app.
  • Pull your model of choice, e.g.: ollama pull llama3

We'll use Tavily for web search.

We'll use a vectorstore with Nomic local embeddings or, optionally, OpenAI embeddings.

Let's install our required packages and set our API keys:

Set up LangSmith for LangGraph development

Sign up for LangSmith to quickly spot issues and improve the performance of your LangGraph projects. LangSmith lets you use trace data to debug, test, and monitor your LLM apps built with LangGraph - read more about how to get started here.

LLM

You can select from Ollama LLMs.

Create Index

Let's index 3 blog posts.

Define Tools

Create the Graph

Here we'll explicitly define the majority of the control flow, only using an LLM to define a single branch point following grading.

Trace:

https://smith.langchain.com/public/88e7579e-2571-4cf6-98d2-1f9ce3359967/r

Evaluation

Now we've defined two different agent architectures that do roughly the same thing!

We can evaluate them. See our conceptual guide for context on agent evaluation.

Response

First, we can assess how well our agent performs on a set of question-answer pairs.

We'll create a dataset and save it in LangSmith.

Now, we'll use an LLM as a grader to compare both agent responses to our ground truth reference answer.

Here is the default prompt that we can use.

We'll use gpt-4o as our LLM grader.

Trajectory

Second, we can assess the list of tool calls that each agent makes relative to expected trajectories.

This evaluates the specific reasoning traces taken by our agents!

We can see the results benchmarked against GPT-4o and Llama-3-70b using Custom agent (as shown here) and ReAct.

The local custom agent performs well in terms of tool calling reliability: it follows the expected reasoning traces.

However, the answer accuracy performance lags the larger models with custom agent implementations.

Step 1: Setup: Install packages and configure API keys

Install required packages for Ollama, Tavily Search, and vectorstore with Nomic local embeddings or OpenAI embeddings. Set up API keys for the services.

Step 2: Select LLM from Ollama

Choose from available Ollama LLMs. Download Ollama app and pull your model of choice, e.g.: ollama pull llama3

Step 3: Create Index

Index 3 blog posts to create a vectorstore for retrieval.

Step 4: Define Tools

Define the tools needed for the CRAG workflow, including web search via Tavily and document grading.

Step 5: Create the Graph

Explicitly define the majority of the control flow using LangGraph, with an LLM defining a single branch point following grading. If any documents are irrelevant, supplement retrieval with web search.

Step 6: Evaluation: Response Assessment

Create a dataset of question-answer pairs and save it in LangSmith. Use an LLM as a grader (gpt-4o) to compare agent responses to ground truth reference answers.

Step 7: Evaluation: Trajectory Assessment

Assess the list of tool calls that each agent makes relative to expected trajectories. Evaluate the specific reasoning traces taken by agents and benchmark against GPT-4o and Llama-3-70b.

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.