Prompt Chain

Retrieve and Synthesize Multi-Document Insights

A LlamaPack that implements structured hierarchical retrieval over multiple documents using Weaviate collections.

Works with weaviatellama index

85
Spark score
out of 100
Updated 3 months ago
Version 1.0.0
Models

Add to Favorites

Why it matters

Leverage structured hierarchical retrieval across multiple Weaviate collections to efficiently query and synthesize information from diverse documents.

Outcomes

What it gets done

01

Index and structure data across multiple Weaviate collections.

02

Perform hierarchical retrieval over document metadata and content.

03

Query and synthesize information from a collection of documents.

04

Utilize Weaviate for efficient vector storage and retrieval.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/li-pack-packs-multidoc-autoretrieval | bash

Steps

Steps in the chain

01
Download the MultiDocAutoRetrieverPack

Use llamaindex-cli to download the pack: llamaindex-cli download-llamapack MultiDocAutoRetrieverPack --download-dir ./multidoc_autoretrieval_pack. Inspect the files at ./multidoc_autoretrieval_pack and use them as a template for your own project.

02
Import and download pack via Python

Import download_llama_pack from llama_index.core.llama_pack and download the pack to ./multidoc_autoretrieval_pack directory: MultiDocAutoRetrieverPack = download_llama_pack('MultiDocAutoRetrieverPack', './multidoc_autoretrieval_pack')

03
Configure Weaviate client and authentication

Set up Weaviate cloud authentication using AuthApiKey with your API key and create a Weaviate Client instance pointing to your cluster: auth_config = weaviate.AuthApiKey(api_key='<api_key>'); client = weaviate.Client('https://<cluster>.weaviate.network', auth_client_secret=auth_config)

04
Define VectorStoreInfo with metadata

Create a VectorStoreInfo object with content_info describing your data (e.g., 'Github Issues') and metadata_info list containing MetadataInfo objects that describe each metadata field, including name, description, and type.

05
Prepare metadata nodes and documents

Create metadata_nodes as a set of TextNode objects with metadata representing each document, and docs as the source Document objects. Both lists must be the same length.

06
Initialize the MultiDocAutoRetrieverPack

Instantiate the pack with: pack = MultiDocAutoRetrieverPack(client, '<metadata_index_name>', '<doc_chunks_index_name>', metadata_nodes, docs, vector_store_info, auto_retriever_kwargs={...})

07
Execute queries using pack.run()

Use the pack's run() function as a wrapper around query_engine.query() to execute queries: response = pack.run('Tell me about a Music celebrity.')

08
Use retriever module individually

Access the retriever directly via pack.retriever and call retrieve() with a query string to get nodes: nodes = retriever.retrieve('query_str')

09
Use query engine module individually

Access the query engine directly via pack.query_engine and call query() with a query string: response = query_engine.query('query_str')

Overview

Multi-Document AutoRetrieval (with Weaviate) Pack

What it does

This pack implements structured hierarchical retrieval over multiple documents using Weaviate collections, providing both a retriever and query engine interface.

How it connects

Use this pack when you need structured hierarchical retrieval over multiple documents with metadata, stored in Weaviate collections.

Source README

Multi-Document AutoRetrieval (with Weaviate) Pack

This LlamaPack implements structured hierarchical retrieval over multiple documents, using multiple @weaviate_io collections.

CLI Usage

You can download llamapacks directly using llamaindex-cli, which comes installed with the llama-index python package:

llamaindex-cli download-llamapack MultiDocAutoRetrieverPack --download-dir ./multidoc_autoretrieval_pack

You can then inspect the files at ./multidoc_autoretrieval_pack and use them as a template for your own project!

Code Usage

You can download the pack to a the ./multidoc_autoretrieval_pack directory:

from llama_index.core.llama_pack import download_llama_pack

### download and install dependencies
MultiDocAutoRetrieverPack = download_llama_pack(
    "MultiDocAutoRetrieverPack", "./multidoc_autoretrieval_pack"
)

From here, you can use the pack. To initialize it, you need to define a few arguments, see below.

Then, you can set up the pack like so:

### setup pack arguments
from llama_index.core.vector_stores import MetadataInfo, VectorStoreInfo

import weaviate

### cloud
auth_config = weaviate.AuthApiKey(api_key="<api_key>")
client = weaviate.Client(
    "https://<cluster>.weaviate.network",
    auth_client_secret=auth_config,
)

vector_store_info = VectorStoreInfo(
    content_info="Github Issues",
    metadata_info=[
        MetadataInfo(
            name="state",
            description="Whether the issue is `open` or `closed`",
            type="string",
        ),
        ...,
    ],
)

### metadata_nodes is set of nodes with metadata representing each document
### docs is the source docs
### metadata_nodes and docs must be the same length
metadata_nodes = [TextNode(..., metadata={...}), ...]
docs = [Document(...), ...]

pack = MultiDocAutoRetrieverPack(
    client,
    "<metadata_index_name>",
    "<doc_chunks_index_name>",
    metadata_nodes,
    docs,
    vector_store_info,
    auto_retriever_kwargs={
        # any kwargs for the auto-retriever
        ...
    },
)

The run() function is a light wrapper around query_engine.query().

response = pack.run("Tell me a bout a Music celebritiy.")

You can also use modules individually.

### use the retriever
retriever = pack.retriever
nodes = retriever.retrieve("query_str")

### use the query engine
query_engine = pack.query_engine
response = query_engine.query("query_str")

Step 1: Download the MultiDocAutoRetrieverPack

Use llamaindex-cli to download the pack: llamaindex-cli download-llamapack MultiDocAutoRetrieverPack --download-dir ./multidoc_autoretrieval_pack. Inspect the files at ./multidoc_autoretrieval_pack and use them as a template for your own project.

Step 2: Import and download pack via Python

Import download_llama_pack from llama_index.core.llama_pack and download the pack to ./multidoc_autoretrieval_pack directory: MultiDocAutoRetrieverPack = download_llama_pack('MultiDocAutoRetrieverPack', './multidoc_autoretrieval_pack')

Step 3: Configure Weaviate client and authentication

Set up Weaviate cloud authentication using AuthApiKey with your API key and create a Weaviate Client instance pointing to your cluster: auth_config = weaviate.AuthApiKey(api_key='<api_key>'); client = weaviate.Client('https://<cluster>.weaviate.network', auth_client_secret=auth_config)

Step 4: Define VectorStoreInfo with metadata

Create a VectorStoreInfo object with content_info describing your data (e.g., 'Github Issues') and metadata_info list containing MetadataInfo objects that describe each metadata field, including name, description, and type.

Step 5: Prepare metadata nodes and documents

Create metadata_nodes as a set of TextNode objects with metadata representing each document, and docs as the source Document objects. Both lists must be the same length.

Step 6: Initialize the MultiDocAutoRetrieverPack

Instantiate the pack with: pack = MultiDocAutoRetrieverPack(client, '<metadata_index_name>', '<doc_chunks_index_name>', metadata_nodes, docs, vector_store_info, auto_retriever_kwargs={...})

Step 7: Execute queries using pack.run()

Use the pack's run() function as a wrapper around query_engine.query() to execute queries: response = pack.run('Tell me about a Music celebrity.')

Step 8: Use retriever module individually

Access the retriever directly via pack.retriever and call retrieve() with a query string to get nodes: nodes = retriever.retrieve('query_str')

Step 9: Use query engine module individually

Access the query engine directly via pack.query_engine and call query() with a query string: response = query_engine.query('query_str')

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.