Prompt Chain

Optimize RAG Retrieval with Dynamic Alpha Tuning

Fine-tunable Hybrid Retriever that dynamically determines the optimal alpha for a given query.

Works with openaipinecone

91
Spark score
out of 100
Updated 3 months ago
Version 1.0.0

Add to Favorites

Why it matters

Enhance your Retrieval Augmented Generation (RAG) system's performance by dynamically optimizing the balance between dense and sparse retrieval methods. This asset intelligently routes queries to achieve superior context retrieval for more accurate responses.

Outcomes

What it gets done

01

Dynamically determine optimal alpha values for hybrid search based on query classification.

02

Integrate seamlessly with existing LlamaIndex retrieval interfaces.

03

Fine-tune retrieval strategies for specific corpora and query types.

04

Improve RAG accuracy by balancing vector and sparse search results.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/li-pack-packs-koda-retriever | bash

Steps

Steps in the chain

01
Setup LLM and Embedding Models

Configure Settings with OpenAI LLM and OpenAIEmbedding. Set up a vector store (e.g., PineconeVectorStore) and create a VectorStoreIndex from it with the configured embed model.

02
Initialize Optional Reranker

Create an LLMRerank postprocessor using the configured LLM. This is optional but can improve retrieval quality.

03
Create KodaRetriever Instance

Instantiate KodaRetriever with the vector index, LLM, optional reranker, and verbose flag set to True for debugging.

04
Execute Retrieval Query

Call retriever.retrieve(query) with your search query to get relevant results. The retriever will automatically determine optimal alpha based on query categorization.

05
Build Query Engine

Create a RetrieverQueryEngine from the KodaRetriever instance using RetrieverQueryEngine.from_args().

06
Generate Response

Call query_engine.query(query) to get a final response that combines retrieval and generation components.

Overview

Koda Retriever

What it does

A custom hybrid retriever that uses an LLM to categorize queries and dynamically determine the optimal alpha parameter for balancing dense vector search and sparse search methods in RAG systems.

How it connects

Use when you need to optimize hybrid search performance across different query types in your RAG pipeline and want to automate the selection of alpha values based on query characteristics rather than using a fixed alpha for all queries.

Source README

Koda Retriever

This retriever is a custom fine-tunable Hybrid Retriever that dynamically determines the optimal alpha for a given query.
An LLM is used to categorize the query and therefore determine the optimal alpha value, as each category has a preset/provided alpha value.
It is recommended that you run tests on your corpus of data and queries to determine categories and corresponding alpha values for your use case.

koda-retriever-mascot

Introduction

Alpha tuning in hybrid retrieval for RAG models refers to the process of adjusting the weight (alpha) given to different components of a hybrid search strategy. In RAG, the retrieval component is crucial for fetching relevant context from a knowledge base, which the generation component then uses to produce answers. By fine-tuning the alpha parameter, the balance between the retrieved results from dense vector search methods and traditional sparse methods can be optimized. This optimization aims to enhance the overall performance of the system, ensuring that the retrieval process effectively supports the generation of accurate and contextually relevant responses.

Simply explained

Imagine you're playing a game where someone whispers a sentence to you, and you have to decide whether to draw a picture of exactly what they said, or draw a picture of what you think they mean. Alpha tuning is like finding the best rule for when to draw exactly what's said and when to think deeper about the meaning. It helps us get the best mix, so the game is more fun and everyone understands each other better!

Usage Snapshot

Koda Retriever is compatible with all other retrieval interfaces and objects that would normally be able to interact with an LI-native retriever.

Please see the examples folder for more specific examples.

### Setup
from llama_index.packs.koda_retriever import KodaRetriever
from llama_index.core import VectorStoreIndex
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.postprocessor import LLMRerank
from llama_index.core import Settings

Settings.llm = OpenAI()
Settings.embed_model = OpenAIEmbedding()
vector_store = PineconeVectorStore(pinecone_index=index, text_key="summary")
vector_index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, embed_model=Settings.embed_model
)

reranker = LLMRerank(llm=Settings.llm)  # optional
retriever = KodaRetriever(
    index=vector_index, llm=Settings.llm, reranker=reranker, verbose=True
)

### Retrieval
query = "What was the intended business model for the parks in the Jurassic Park lore?"

results = retriever.retrieve(query)

### Query Engine
query_engine = RetrieverQueryEngine.from_args(retriever=retriever)

response = query_engine.query(query)

Prerequisites

  • Vector Store Index w/ hybrid search enabled
  • LLM (or any model to route/classify prompts)

Please note that you will also need vector AND text representations of your data for a hybrid retriever to work. It is not uncommon for some vector databases to only store the vectors themselves, in which case an error will occur downstream if you try to run any hybrid queries.

Setup

Citations

Idea & original implementation sourced from the following docs:

Buy me a coffee

Thanks!

Step 1: Setup LLM and Embedding Models

Configure Settings with OpenAI LLM and OpenAIEmbedding. Set up a vector store (e.g., PineconeVectorStore) and create a VectorStoreIndex from it with the configured embed model.

Step 2: Initialize Optional Reranker

Create an LLMRerank postprocessor using the configured LLM. This is optional but can improve retrieval quality.

Step 3: Create KodaRetriever Instance

Instantiate KodaRetriever with the vector index, LLM, optional reranker, and verbose flag set to True for debugging.

Step 4: Execute Retrieval Query

Call retriever.retrieve(query) with your search query to get relevant results. The retriever will automatically determine optimal alpha based on query categorization.

Step 5: Build Query Engine

Create a RetrieverQueryEngine from the KodaRetriever instance using RetrieverQueryEngine.from_args().

Step 6: Generate Response

Call query_engine.query(query) to get a final response that combines retrieval and generation components.

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.