Prompt Chain

Implement Self-RAG with Local LLMs

Self-RAG workflow using LangGraph that grades retrieved documents and generations for relevance, hallucination detection, and response quality, based on the

Works with ollamalangchain

91
Spark score
out of 100
Updated 3 months ago
Version 1.0.0

Add to Favorites

Why it matters

Enhance Retrieval Augmented Generation (RAG) systems by incorporating self-reflection and self-grading mechanisms. This asset enables local LLM integration for more controlled and verifiable information retrieval and generation.

Outcomes

What it gets done

01

Set up and configure local LLMs and embedding models using Ollama.

02

Implement a LangGraph-based Self-RAG strategy for document retrieval and generation.

03

Develop decision-making nodes for retrieval, relevance checking, and generation verification.

04

Index local documents for efficient retrieval within the RAG pipeline.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/lg-langgraphselfraglocal | bash

Steps

Steps in the chain

01
Retrieve decision: Should retrieve from retriever R

Input: x (question) OR x (question), y (generation). Decides when to retrieve D chunks with R. Output: yes, no, continue

02
Relevance check: Are retrieved passages relevant to question

Input: (x (question), d (chunk)) for d in D. Determine if d provides useful information to solve x. Output: relevant, irrelevant

03
Hallucination check: Is LLM generation supported by chunk

Input: x (question), d (chunk), y (generation) for d in D. Verify all verification-worthy statements in y (generation) are supported by d. Output: fully supported, partially supported, no support

04
Usefulness rating: Is generation useful response to question

Input: x (question), y (generation) for d in D. Determine if y (generation) is a useful response to x (question). Output: 5, 4, 3, 2, 1

05
Download Ollama app

Download Ollama app from https://ollama.ai/

06
Download Mistral model

Download a Mistral model from https://ollama.ai/library/mistral or Mixtral versions from https://ollama.ai/library/mixtral. Run: ollama pull mistral

Overview

Self-RAG using local LLMs

What it does

Implements the Self-RAG paper's approach to retrieval-augmented generation with self-grading steps for relevance, hallucination detection, and response quality using LangGraph and local LLMs via Ollama.

How it connects

Use when you need RAG with explicit quality control and grading steps. Avoid for simple queries or latency-sensitive applications where the multi-step evaluation overhead isn't justified.

Source README

This directory is retained purely for archival purposes and is no longer updated. Please see the newly consolidated LangChain documentation for the most current information and resources.

Self-RAG using local LLMs

Self-RAG is a strategy for RAG that incorporates self-reflection / self-grading on retrieved documents and generations.

In the paper, a few decisions are made:

  1. Should I retrieve from retriever, R -
  • Input: x (question) OR x (question), y (generation)
  • Decides when to retrieve D chunks with R
  • Output: yes, no, continue
  1. Are the retrieved passages D relevant to the question x -
    • Input: (x (question), d (chunk)) for d in D
  • d provides useful information to solve x
  • Output: relevant, irrelevant
  1. Are the LLM generation from each chunk in D is relevant to the chunk (hallucinations, etc) -
  • Input: x (question), d (chunk), y (generation) for d in D
  • All of the verification-worthy statements in y (generation) are supported by d
  • Output: {fully supported, partially supported, no support
  1. The LLM generation from each chunk in D is a useful response to x (question) -
  • Input: x (question), y (generation) for d in D
  • y (generation) is a useful response to x (question).
  • Output: {5, 4, 3, 2, 1}

We will implement some of these ideas from scratch using LangGraph.

Setup

First let's install our required packages and set our API keys

Set up LangSmith for LangGraph development

Sign up for LangSmith to quickly spot issues and improve the performance of your LangGraph projects. LangSmith lets you use trace data to debug, test, and monitor your LLM apps built with LangGraph - read more about how to get started here.

LLMs

Local Embeddings

You can use GPT4AllEmbeddings() from Nomic, which can access use Nomic's recently released v1 and v1.5 embeddings.

Follow the documentation here.

Local LLM

(1) Download Ollama app.

(2) Download a Mistral model from various Mistral versions here and Mixtral versions here available.

ollama pull mistral

Create Index

Let's index 3 blog posts.

LLMs

Graph

Capture the flow in as a graph.

Graph state

Build Graph

This just follows the flow we outlined in the figure above.

Run

Trace:

https://smith.langchain.com/public/4163a342-5260-4852-8602-bda3f95177e7/r

Step 1: Retrieve decision: Should retrieve from retriever R

Input: x (question) OR x (question), y (generation). Decides when to retrieve D chunks with R. Output: yes, no, continue

Step 2: Relevance check: Are retrieved passages relevant to question

Input: (x (question), d (chunk)) for d in D. Determine if d provides useful information to solve x. Output: relevant, irrelevant

Step 3: Hallucination check: Is LLM generation supported by chunk

Input: x (question), d (chunk), y (generation) for d in D. Verify all verification-worthy statements in y (generation) are supported by d. Output: fully supported, partially supported, no support

Step 4: Usefulness rating: Is generation useful response to question

Input: x (question), y (generation) for d in D. Determine if y (generation) is a useful response to x (question). Output: 5, 4, 3, 2, 1

Step 5: Download Ollama app

Download Ollama app from https://ollama.ai/

Step 6: Download Mistral model

Download a Mistral model from https://ollama.ai/library/mistral or Mixtral versions from https://ollama.ai/library/mixtral. Run: ollama pull mistral

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.