Tool

Summarize and Retrieve Data with RAPTOR

Name: Raptor Retriever LlamaPack
Availability: OnlineOnly
Author: LlamaIndex

RAPTOR retrieval LlamaPack that recursively clusters and summarizes documents in layers, offering tree-traversal or collapsed top-k retrieval modes for

Get this

Works with llama index

LlamaIndex

Own this? Claim it

Maintainer of this project? Claim this page to edit the listing.

Spark score

out of 100

Updated last month

Version 0.14.22

Models

llama 3

Add to Favorites

Why it matters

Leverage the RAPTOR algorithm for efficient, hierarchical summarization and retrieval of large document sets. This pack enables advanced data understanding by recursively clustering and summarizing information for precise querying.

Outcomes

What it gets done

Implement RAPTOR's recursive summarization and clustering.

Enable two retrieval modes: tree traversal and collapsed top-k.

Configure summarization prompts and parallel processing workers.

Persist and reload the retrieval index using vector stores.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/li-pack-packs-raptor | bash

Overview

Raptor Retriever LlamaPack

A LlamaPack implementation of the RAPTOR algorithm that builds hierarchical document representations through recursive clustering and summarization, enabling two retrieval strategies: tree-traversal for layer-by-layer top-k search or collapsed for flat top-k across all nodes. Use when working with large document collections where hierarchical context matters, when standard flat vector search loses important structural relationships, or when you need configurable retrieval depth through tree layers versus comprehensive flat search across all summary levels.

What it does

Big Job: When you need to retrieve relevant information from large document collections with better context and hierarchy than flat vector search.

Small Job: Recursively cluster and summarize your documents into a tree structure, then retrieve using either tree-traversal (top-k at each layer) or collapsed (top-k across all nodes) modes.

Install and initialize:

llamaindex-cli download-llamapack RaptorPack --download-dir ./raptor_pack

from llama_index.packs.raptor import RaptorPack

pack = RaptorPack(documents, llm=llm, embed_model=embed_model)

nodes = pack.run(
    "query",
    mode="collapsed",  # or tree_traversal
)

Persist to remote vector stores by passing vector_store during initialization, then reconnect with an empty document list. Configure summarization behavior via SummaryModule to control the LLM, customize the summary prompt, and set num_workers to influence the number of async semaphores for simultaneous summary processing (which may affect LLM provider API limits).

Source README

Description pending for li-pack-packs-raptor.

Discussion

Summarize and Retrieve Data with RAPTOR

What it gets done

Add it to your toolbox

Raptor Retriever LlamaPack

What it does

Questions & comments · 0