Prompt Chain

Implement Self-RAG with Local LLMs

Name: Implement Self-RAG with Local LLMs
Availability: OnlineOnly
Author: LangGraph

Self-RAG workflow using LangGraph that grades retrieved documents and generations for relevance, hallucination detection, and response quality, based on the

Copy chain

Works with ollamalangchain

LangGraph

Maintainer?

Spark score

out of 100

Updated 3 months ago

Version 1.0.0

Models

mistral large universal

Add to Favorites

Why it matters

Enhance Retrieval Augmented Generation (RAG) systems by incorporating self-reflection and self-grading mechanisms. This asset enables local LLM integration for more controlled and verifiable information retrieval and generation.

Outcomes

What it gets done

Set up and configure local LLMs and embedding models using Ollama.

Implement a LangGraph-based Self-RAG strategy for document retrieval and generation.

Develop decision-making nodes for retrieval, relevance checking, and generation verification.

Index local documents for efficient retrieval within the RAG pipeline.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/lg-langgraphselfraglocal | bash

Steps

Steps in the chain

Retrieve decision: Should retrieve from retriever R

Input: x (question) OR x (question), y (generation). Decides when to retrieve D chunks with R. Output: yes, no, continue

Relevance check: Are retrieved passages relevant to question

Input: (x (question), d (chunk)) for d in D. Determine if d provides useful information to solve x. Output: relevant, irrelevant

Hallucination check: Is LLM generation supported by chunk

Input: x (question), d (chunk), y (generation) for d in D. Verify all verification-worthy statements in y (generation) are supported by d. Output: fully supported, partially supported, no support

Usefulness rating: Is generation useful response to question

Input: x (question), y (generation) for d in D. Determine if y (generation) is a useful response to x (question). Output: 5, 4, 3, 2, 1

Download Ollama app

Download Ollama app from https://ollama.ai/

Download Mistral model

Download a Mistral model from https://ollama.ai/library/mistral or Mixtral versions from https://ollama.ai/library/mixtral. Run: ollama pull mistral

Overview

Self-RAG using local LLMs

What it does

Implements the Self-RAG paper's approach to retrieval-augmented generation with self-grading steps for relevance, hallucination detection, and response quality using LangGraph and local LLMs via Ollama.

How it connects

Use when you need RAG with explicit quality control and grading steps. Avoid for simple queries or latency-sensitive applications where the multi-step evaluation overhead isn't justified.

Source README

This directory is retained purely for archival purposes and is no longer updated. Please see the newly consolidated LangChain documentation for the most current information and resources.

Self-RAG using local LLMs

Self-RAG is a strategy for RAG that incorporates self-reflection / self-grading on retrieved documents and generations.

In the paper, a few decisions are made:

Should I retrieve from retriever, R -

Input: x (question) OR x (question), y (generation)
Decides when to retrieve D chunks with R
Output: yes, no, continue

Are the retrieved passages D relevant to the question x -

- Input: (x (question), d (chunk)) for d in D
d provides useful information to solve x
Output: relevant, irrelevant

Are the LLM generation from each chunk in D is relevant to the chunk (hallucinations, etc) -

Input: x (question), d (chunk), y (generation) for d in D
All of the verification-worthy statements in y (generation) are supported by d
Output: {fully supported, partially supported, no support

The LLM generation from each chunk in D is a useful response to x (question) -

Input: x (question), y (generation) for d in D
y (generation) is a useful response to x (question).
Output: {5, 4, 3, 2, 1}

We will implement some of these ideas from scratch using LangGraph.

Setup

First let's install our required packages and set our API keys

Set up LangSmith for LangGraph development

Sign up for LangSmith to quickly spot issues and improve the performance of your LangGraph projects. LangSmith lets you use trace data to debug, test, and monitor your LLM apps built with LangGraph - read more about how to get started here.

LLMs

Local Embeddings

You can use GPT4AllEmbeddings() from Nomic, which can access use Nomic's recently released v1 and v1.5 embeddings.

Follow the documentation here.

Local LLM

(1) Download Ollama app.

(2) Download a Mistral model from various Mistral versions here and Mixtral versions here available.

ollama pull mistral

Create Index

Let's index 3 blog posts.

LLMs

Graph

Capture the flow in as a graph.

Graph state

Build Graph

This just follows the flow we outlined in the figure above.

Run

Trace:

https://smith.langchain.com/public/4163a342-5260-4852-8602-bda3f95177e7/r

Step 1: Retrieve decision: Should retrieve from retriever R

Input: x (question) OR x (question), y (generation). Decides when to retrieve D chunks with R. Output: yes, no, continue

Step 2: Relevance check: Are retrieved passages relevant to question

Input: (x (question), d (chunk)) for d in D. Determine if d provides useful information to solve x. Output: relevant, irrelevant

Step 3: Hallucination check: Is LLM generation supported by chunk

Input: x (question), d (chunk), y (generation) for d in D. Verify all verification-worthy statements in y (generation) are supported by d. Output: fully supported, partially supported, no support

Step 4: Usefulness rating: Is generation useful response to question

Input: x (question), y (generation) for d in D. Determine if y (generation) is a useful response to x (question). Output: 5, 4, 3, 2, 1

Step 5: Download Ollama app

Download Ollama app from https://ollama.ai/

Step 6: Download Mistral model

Download a Mistral model from https://ollama.ai/library/mistral or Mixtral versions from https://ollama.ai/library/mixtral. Run: ollama pull mistral

Discussion

Implement Self-RAG with Local LLMs

What it gets done

Add it to your toolbox

Steps in the chain

Self-RAG using local LLMs

What it does

How it connects

Self-RAG using local LLMs

Setup

LLMs

Local Embeddings

Local LLM

Create Index

LLMs

Graph

Graph state

Build Graph

Run

Step 1: Retrieve decision: Should retrieve from retriever R

Step 2: Relevance check: Are retrieved passages relevant to question

Step 3: Hallucination check: Is LLM generation supported by chunk

Step 4: Usefulness rating: Is generation useful response to question

Step 5: Download Ollama app

Step 6: Download Mistral model

Questions & comments · 0