Implement Self-RAG with Local LLMs
Self-RAG workflow using LangGraph that grades retrieved documents and generations for relevance, hallucination detection, and response quality, based on the
Why it matters
Enhance Retrieval Augmented Generation (RAG) systems by incorporating self-reflection and self-grading mechanisms. This asset enables local LLM integration for more controlled and verifiable information retrieval and generation.
Outcomes
What it gets done
Set up and configure local LLMs and embedding models using Ollama.
Implement a LangGraph-based Self-RAG strategy for document retrieval and generation.
Develop decision-making nodes for retrieval, relevance checking, and generation verification.
Index local documents for efficient retrieval within the RAG pipeline.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/lg-langgraphselfraglocal | bash Steps
Steps in the chain
Input: x (question) OR x (question), y (generation). Decides when to retrieve D chunks with R. Output: yes, no, continue
Input: (x (question), d (chunk)) for d in D. Determine if d provides useful information to solve x. Output: relevant, irrelevant
Input: x (question), d (chunk), y (generation) for d in D. Verify all verification-worthy statements in y (generation) are supported by d. Output: fully supported, partially supported, no support
Input: x (question), y (generation) for d in D. Determine if y (generation) is a useful response to x (question). Output: 5, 4, 3, 2, 1
Download Ollama app from https://ollama.ai/
Download a Mistral model from https://ollama.ai/library/mistral or Mixtral versions from https://ollama.ai/library/mixtral. Run: ollama pull mistral
Overview
Self-RAG using local LLMs
What it does
Implements the Self-RAG paper's approach to retrieval-augmented generation with self-grading steps for relevance, hallucination detection, and response quality using LangGraph and local LLMs via Ollama.
How it connects
Use when you need RAG with explicit quality control and grading steps. Avoid for simple queries or latency-sensitive applications where the multi-step evaluation overhead isn't justified.
Source README
This directory is retained purely for archival purposes and is no longer updated. Please see the newly consolidated LangChain documentation for the most current information and resources.
Self-RAG using local LLMs
Self-RAG is a strategy for RAG that incorporates self-reflection / self-grading on retrieved documents and generations.
In the paper, a few decisions are made:
- Should I retrieve from retriever,
R-
- Input:
x (question)ORx (question),y (generation) - Decides when to retrieve
Dchunks withR - Output:
yes, no, continue
- Are the retrieved passages
Drelevant to the questionx-
- Input: (
x (question),d (chunk)) fordinD
- Input: (
dprovides useful information to solvex- Output:
relevant, irrelevant
- Are the LLM generation from each chunk in
Dis relevant to the chunk (hallucinations, etc) -
- Input:
x (question),d (chunk),y (generation)fordinD - All of the verification-worthy statements in
y (generation)are supported byd - Output:
{fully supported, partially supported, no support
- The LLM generation from each chunk in
Dis a useful response tox (question)-
- Input:
x (question),y (generation)fordinD y (generation)is a useful response tox (question).- Output:
{5, 4, 3, 2, 1}
We will implement some of these ideas from scratch using LangGraph.
Setup
First let's install our required packages and set our API keys
Set up LangSmith for LangGraph development
Sign up for LangSmith to quickly spot issues and improve the performance of your LangGraph projects. LangSmith lets you use trace data to debug, test, and monitor your LLM apps built with LangGraph - read more about how to get started here.
LLMs
Local Embeddings
You can use GPT4AllEmbeddings() from Nomic, which can access use Nomic's recently released v1 and v1.5 embeddings.
Follow the documentation here.
Local LLM
(1) Download Ollama app.
(2) Download a Mistral model from various Mistral versions here and Mixtral versions here available.
ollama pull mistral
Create Index
Let's index 3 blog posts.
LLMs
Graph
Capture the flow in as a graph.
Graph state
Build Graph
This just follows the flow we outlined in the figure above.
Run
Trace:
https://smith.langchain.com/public/4163a342-5260-4852-8602-bda3f95177e7/r
Step 1: Retrieve decision: Should retrieve from retriever R
Input: x (question) OR x (question), y (generation). Decides when to retrieve D chunks with R. Output: yes, no, continue
Step 2: Relevance check: Are retrieved passages relevant to question
Input: (x (question), d (chunk)) for d in D. Determine if d provides useful information to solve x. Output: relevant, irrelevant
Step 3: Hallucination check: Is LLM generation supported by chunk
Input: x (question), d (chunk), y (generation) for d in D. Verify all verification-worthy statements in y (generation) are supported by d. Output: fully supported, partially supported, no support
Step 4: Usefulness rating: Is generation useful response to question
Input: x (question), y (generation) for d in D. Determine if y (generation) is a useful response to x (question). Output: 5, 4, 3, 2, 1
Step 5: Download Ollama app
Download Ollama app from https://ollama.ai/
Step 6: Download Mistral model
Download a Mistral model from https://ollama.ai/library/mistral or Mixtral versions from https://ollama.ai/library/mixtral. Run: ollama pull mistral
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.