Research & summarize

Integrate RAGatouille for Advanced Document Retrieval

Bundle RAGatouille for efficient document indexing and retrieval within LlamaIndex, enabling advanced RAG pipelines for question answering.

Without it

Piece it together by hand, every time.

With it

Leverage the RAGatouille library within a LlamaIndex pipeline to index documents using state-of-the-art retrieval models like ColBERT, and then synthesize answers from the indexed corpus using an LLM.

What you get

  • Index documents using RAGatouille and ColBERTv2.
  • Combine RAGatouille retrieval with LlamaIndex query engines.
  • Synthesize answers from retrieved document chunks.
  • Inspect and modify the RAGatouille pack for custom RAG pipelines.

Use this prompt chain

LlamaIndex RAG indexQuery a databaseSummarize

RAGatouille Retriever Pack

RAGatouille is a cool library that lets you use e.g. ColBERT and other SOTA retrieval models in your RAG pipeline. You can use it to either run inference on ColBERT, or use it to train/fine-tune models.

This LlamaPack shows you an easy way to bundle RAGatouille into your RAG pipeline. We use RAGatouille to index a corpus of documents (by default using colbertv2.0), and then we combine it with LlamaIndex query modules to synthesize an answer with an LLM.

A full notebook guide can be found here.

CLI Usage

You can download llamapacks directly using llamaindex-cli, which comes installed with the llama-index python package:

llamaindex-cli download-llamapack RAGatouilleRetrieverPack --download-dir ./ragatouille_pack

You can then inspect the files at ./ and use them as a template for your own project!

Code Usage

You can download the pack to a ./ragatouille_pack directory:

from llama_index.core.llama_pack import download_llama_pack

### download and install dependencies
RAGatouilleRetrieverPack = download_llama_pack(
    "RAGatouilleRetrieverPack", "./ragatouille_pack"
)

From here, you can use the pack, or inspect and modify the pack in ./ragatouille_pack.

Then, you can set up the pack like so:

### create the pack
ragatouille_pack = RAGatouilleRetrieverPack(
    docs,  # List[Document]
    llm=OpenAI(model="gpt-3.5-turbo"),
    index_name="my_index",
    top_k=5,
)

The run() function is a light wrapper around query_engine.query.

response = ragatouille_pack.run("How does ColBERTv2 compare to BERT")

You can also use modules individually.

from llama_index.core.response.notebook_utils import display_source_node

retriever = ragatouille_pack.get_modules()["retriever"]
nodes = retriever.retrieve("How does ColBERTv2 compare with BERT?")

for node in nodes:
    display_source_node(node)

### try out the RAG module directly
RAG = ragatouille_pack.get_modules()["RAG"]
results = RAG.search(
    "How does ColBERTv2 compare with BERT?", index_name=index_name, k=4
)
results

Comments (0)

Sign In Sign in to leave a comment.