Implement Corrective RAG with Local LLMs
Implement Corrective RAG (CRAG) with local LLMs and Tavily Search.
Why it matters
Enhance retrieval-augmented generation (RAG) by incorporating self-reflection and web search for improved accuracy, especially when dealing with local LLMs.
Outcomes
What it gets done
Implement a Corrective RAG (CRAG) pipeline using LangGraph.
Integrate local LLMs via Ollama for document retrieval and generation.
Utilize Tavily Search to supplement retrieval when initial documents are irrelevant.
Index local documents using Nomic embeddings for efficient retrieval.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/lg-langgraphcraglocal | bash Steps
Steps in the chain
Install required packages for Ollama, Tavily Search, and vectorstore with Nomic local embeddings or OpenAI embeddings. Set up API keys for the services.
Choose from available Ollama LLMs. Download Ollama app and pull your model of choice, e.g.: ollama pull llama3
Index 3 blog posts to create a vectorstore for retrieval.
Define the tools needed for the CRAG workflow, including web search via Tavily and document grading.
Explicitly define the majority of the control flow using LangGraph, with an LLM defining a single branch point following grading. If any documents are irrelevant, supplement retrieval with web search.
Create a dataset of question-answer pairs and save it in LangSmith. Use an LLM as a grader (gpt-4o) to compare agent responses to ground truth reference answers.
Assess the list of tool calls that each agent makes relative to expected trajectories. Evaluate the specific reasoning traces taken by agents and benchmark against GPT-4o and Llama-3-70b.
Overview
Corrective RAG (CRAG) using local LLMs
What it does
Implement Corrective RAG (CRAG) using local LLMs and Tavily Search. This example demonstrates a RAG strategy that incorporates self-reflection/self-grading on retrieved documents. The implementation uses LangGraph and follows a flow where retrieved documents are assessed for relevance. If documents fall below a relevance threshold or if a grader is unsure, web search is used to supplement retrieval. The implementation skips knowledge refinement but notes it can be added. It utilizes Ollama for local LLMs, Tavily Search for web search, and a vectorstore with Nomic local embeddings or OpenAI embeddings. Evaluation of agent response accuracy and tool call trajectory is performed using LangSmith.
Source README
This directory is retained purely for archival purposes and is no longer updated. Please see the newly consolidated LangChain documentation for the most current information and resources.
Corrective RAG (CRAG) using local LLMs
Corrective-RAG (CRAG) is a strategy for RAG that incorporates self-reflection / self-grading on retrieved documents.
The paper follows this general flow:
- If at least one document exceeds the threshold for
relevance, then it proceeds to generation - If all documents fall below the
relevancethreshold or if the grader is unsure, then it uses web search to supplement retrieval - Before generation, it performs knowledge refinement of the search or retrieved documents
- This partitions the document into
knowledge strips - It grades each strip, and filters out irrelevant ones
We will implement some of these ideas from scratch using LangGraph:
- If any documents are irrelevant, we'll supplement retrieval with web search.
- We'll skip the knowledge refinement, but this can be added back as a node if desired.
- We'll use Tavily Search for web search.
Setup
We'll use Ollama to access a local LLM:
- Download Ollama app.
- Pull your model of choice, e.g.:
ollama pull llama3
We'll use Tavily for web search.
We'll use a vectorstore with Nomic local embeddings or, optionally, OpenAI embeddings.
Let's install our required packages and set our API keys:
Set up LangSmith for LangGraph development
Sign up for LangSmith to quickly spot issues and improve the performance of your LangGraph projects. LangSmith lets you use trace data to debug, test, and monitor your LLM apps built with LangGraph - read more about how to get started here.
LLM
You can select from Ollama LLMs.
Create Index
Let's index 3 blog posts.
Define Tools
Create the Graph
Here we'll explicitly define the majority of the control flow, only using an LLM to define a single branch point following grading.
Trace:
https://smith.langchain.com/public/88e7579e-2571-4cf6-98d2-1f9ce3359967/r
Evaluation
Now we've defined two different agent architectures that do roughly the same thing!
We can evaluate them. See our conceptual guide for context on agent evaluation.
Response
First, we can assess how well our agent performs on a set of question-answer pairs.
We'll create a dataset and save it in LangSmith.
Now, we'll use an LLM as a grader to compare both agent responses to our ground truth reference answer.
Here is the default prompt that we can use.
We'll use gpt-4o as our LLM grader.
Trajectory
Second, we can assess the list of tool calls that each agent makes relative to expected trajectories.
This evaluates the specific reasoning traces taken by our agents!
We can see the results benchmarked against GPT-4o and Llama-3-70b using Custom agent (as shown here) and ReAct.
The local custom agent performs well in terms of tool calling reliability: it follows the expected reasoning traces.
However, the answer accuracy performance lags the larger models with custom agent implementations.
Step 1: Setup: Install packages and configure API keys
Install required packages for Ollama, Tavily Search, and vectorstore with Nomic local embeddings or OpenAI embeddings. Set up API keys for the services.
Step 2: Select LLM from Ollama
Choose from available Ollama LLMs. Download Ollama app and pull your model of choice, e.g.: ollama pull llama3
Step 3: Create Index
Index 3 blog posts to create a vectorstore for retrieval.
Step 4: Define Tools
Define the tools needed for the CRAG workflow, including web search via Tavily and document grading.
Step 5: Create the Graph
Explicitly define the majority of the control flow using LangGraph, with an LLM defining a single branch point following grading. If any documents are irrelevant, supplement retrieval with web search.
Step 6: Evaluation: Response Assessment
Create a dataset of question-answer pairs and save it in LangSmith. Use an LLM as a grader (gpt-4o) to compare agent responses to ground truth reference answers.
Step 7: Evaluation: Trajectory Assessment
Assess the list of tool calls that each agent makes relative to expected trajectories. Evaluate the specific reasoning traces taken by agents and benchmark against GPT-4o and Llama-3-70b.
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.