Skill

Integrate Google Search Results with LlamaIndex

LlamaIndex reader that fetches organic Google Search results via Zyte API, returning top search result URLs for any query to feed downstream document loaders.

Works with zyte

57
Spark score
out of 100
Updated 4 days ago
Version 0.14.22

Add to Favorites

Why it matters

Leverage Zyte's Google Search API integration to enrich your LlamaIndex applications with real-time organic search results. This asset allows you to programmatically fetch top search result URLs based on a given query, enabling more comprehensive data ingestion for your AI.

Outcomes

What it gets done

01

Fetch Google search result URLs using Zyte API

02

Integrate Zyte's search capabilities into LlamaIndex pipelines

03

Extract relevant content from fetched URLs using ZyteWebReader

04

Build RAG systems with up-to-date web data

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/li-reader-readers-zyte-serp | bash

Capabilities

What this skill does

Search the web

Searches the web and retrieves relevant sources.

Extract

Pulls structured data fields from unstructured text.

RAG index

Chunks, embeds, and indexes documents for semantic retrieval.

Overview

LlamaIndex Readers Integration: Zyte-Serp

What it does

ZyteSerpReader is a LlamaIndex integration that retrieves organic Google Search results through the Zyte API. It accepts a search query and returns the top result URLs as documents. The reader supports two extraction modes (httpResponseBody or browserHtml) and can be chained with ZyteWebReader to fetch full article content from discovered URLs.

How it connects

Use this reader when you need to augment your LlamaIndex pipeline with current web search results, such as building RAG systems that require fresh information from Google Search. It's particularly useful when combined with content extractors to create search-then-extract workflows for question answering or research applications.

Source README

LlamaIndex Readers Integration: Zyte-Serp

ZyteSerp can be used to add organic search results from Google Search. It takes a query and returns top search results urls.

Instructions for ZyteSerpReader

Setup and Installation

pip install llama-index-readers-zyte-serp

Secure an API key from Zyte to access the Zyte services.

Using ZyteSerpReader

  • Initialization: Initialize the ZyteSerpReader by providing the API key and the option for extraction ("httpResponseBody" or "browserHtml").

    from llama_index.readers.zyte_serp import ZyteSerpReader
    
    zyte_serp = ZyteSerpReader(
        api_key="your_api_key_here",
        extract_from="httpResponseBody",  # or "browserHtml"
    )
    
  • Loading Data: To load search results, use the load_data method with the query you wish to search.

documents = zyte_serp.load_data(query="llama index docs")

Example Usage

Here is an example demonstrating how to initialize the ZyteSerpReader and get top search URLs.
Further the content from these URLs can be loaded using ZyteWebReader in "article" mode to obtain just the article content from webpage.

from llama_index.readers.zyte_serp import ZyteSerpReader
from llama_index.readers.web.zyte.base import ZyteWebReader

# Initialize the ZyteSerpReader with your API key
zyte_serp = ZyteSerpReader(
    api_key="your_api_key_here",  # Replace with your actual API key
)

# Get the search results (URLs from google search results)
search_urls = zyte_serp.load_data(query="llama index docs")

# Display the results
print(search_urls)

urls = [result.text for result in search_urls]

# Initialize the ZyteWebReader to load the content from search results
zyte_web = ZyteWebReader(
    api_key="your_api_key_here",  # Replace with your actual API key
    mode="article",
)

documents = zyte_web.load_data(urls)
print(documents)

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.