Skill

Augment LLM Agents with Web Search and Extraction

Name: Augment LLM Agents with Web Search and Extraction
Availability: OnlineOnly
Author: LlamaIndex

LlamaIndex tool that integrates Parallel AI's Search and Extract APIs to enable LLM agents to perform web research and convert URLs into clean, LLM-optimized

Get skill

Works with llama indexparallel ai

LlamaIndex

Maintainer?

Spark score

out of 100

Updated 3 months ago

Version 1.0.0

Models

gpt 4o llama 3

Add to Favorites

Why it matters

Empower your LLM agents with advanced web research capabilities. This tool integrates with Parallel AI's Search and Extract APIs to efficiently gather and process information from the web, optimizing it for LLM consumption.

Outcomes

What it gets done

Perform targeted web searches using natural language objectives or keywords.

Extract clean, LLM-optimized content from web pages, including dynamic sites and PDFs.

Structure search results and extracted content for seamless integration into LLM workflows.

Automate web research tasks for enhanced agent intelligence.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/li-tool-tools-parallel-web-systems | bash

Capabilities

What this skill does

Search the web

Searches the web and retrieves relevant sources.

Extract

Pulls structured data fields from unstructured text.

RAG index

Chunks, embeds, and indexes documents for semantic retrieval.

Summarize

Condenses long documents or threads into key takeaways.

Overview

Parallel AI Tool

What it does

A LlamaIndex integration for Parallel AI's web research APIs

How it connects

When you need your LLM agent to search the web or extract clean content from URLs

Source README

Parallel AI Tool

This tool provides integration between LlamaIndex and Parallel AI's Search and Extract APIs, enabling LLM agents to perform web research and content extraction.

Search API: Returns structured, compressed excerpts from web search results optimized for LLM consumption
Extract API: Converts public URLs into clean, LLM-optimized markdown including JavaScript-heavy pages and PDFs

Installation

pip install llama-index-tools-parallel-web-systems

Setup

Get your API key from Parallel AI Platform
Set your API key as an environment variable or pass it directly

Usage

from llama_index.tools.parallel_web_systems import ParallelWebSystemsToolSpec
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

### Initialize the tool with your API key
parallel_tool = ParallelWebSystemsToolSpec(
    api_key="your-api-key-here",
)

### Create an agent with the tool
agent = FunctionAgent(
    tools=parallel_tool.to_tool_list(),
    llm=OpenAI(model="gpt-4o"),
)

### Use the agent to perform web research
response = await agent.run("What was the GDP of France in 2023?")
print(response)

Available Functions

`search`

Search the web using Parallel AI's Search API. Returns structured excerpts optimized for LLM consumption.

Parameters:

objective (str, optional): Natural-language description of what to search for
search_queries (list[str], optional): Traditional keyword search queries (max 5)
max_results (int): Maximum results to return, 1-40 (default: 10)
mode (str, optional): 'one-shot' for comprehensive results, 'agentic' for token-efficient results
excerpts (dict, optional): Excerpt settings, e.g., {'max_chars_per_result': 1500}
source_policy (dict, optional): Domain and date preferences
fetch_policy (dict, optional): Cache vs live content policy

At least one of objective or search_queries must be provided.

Example:

from llama_index.tools.parallel_web_systems import ParallelWebSystemsToolSpec

parallel_tool = ParallelWebSystemsToolSpec(api_key="your-api-key")

### Search with an objective
results = parallel_tool.search(
    objective="What are the latest developments in renewable energy?",
    max_results=5,
    mode="one-shot",
)

for doc in results:
    print(f"Title: {doc.metadata.get('title')}")
    print(f"URL: {doc.metadata.get('url')}")
    print(f"Excerpts: {doc.text[:300]}...")
    print("---")

### Search with specific queries
results = parallel_tool.search(
    search_queries=["solar power 2024", "wind energy statistics"],
    max_results=8,
    mode="agentic",
)

`extract`

Extract clean, structured content from web pages using Parallel AI's Extract API.

Parameters:

urls (list[str]): List of URLs to extract content from
objective (str, optional): Natural language objective to focus extraction
search_queries (list[str], optional): Specific keyword queries to focus extraction
excerpts (bool | dict): Include excerpts (default: True). Can be dict like {'max_chars_per_result': 2000}
full_content (bool | dict): Include full page content (default: False)
fetch_policy (dict, optional): Cache vs live content policy

Example:

from llama_index.tools.parallel_web_systems import ParallelWebSystemsToolSpec

parallel_tool = ParallelWebSystemsToolSpec(api_key="your-api-key")

#### Extract content focused on a specific objective
results = parallel_tool.extract(
    urls=["https://en.wikipedia.org/wiki/Artificial_intelligence"],
    objective="What are the main applications and ethical concerns of AI?",
    excerpts={"max_chars_per_result": 2000},
)

for doc in results:
    print(f"Title: {doc.metadata.get('title')}")
    print(f"Content: {doc.text[:500]}...")

#### Extract full content from multiple URLs
results = parallel_tool.extract(
    urls=[
        "https://example.com/

Discussion

Augment LLM Agents with Web Search and Extraction

What it gets done

Add it to your toolbox

What this skill does

Parallel AI Tool

What it does

How it connects

Parallel AI Tool

Installation

Setup

Usage

Available Functions

search

extract

Questions & comments · 0

`search`

`extract`