Skill

Automate Web Research and Data Extraction

Name: Automate Web Research and Data Extraction
Availability: OnlineOnly
Author: LlamaIndex

Tavily Research Tool equips LLM agents with real-time web search and content extraction via a purpose-built research API that handles scraping, filtering, and

Get skill

Works with llama index openaitavily

LlamaIndex

Maintainer?

Spark score

out of 100

Updated 4 months ago

Version 1.0.0

Models

llama 3

Add to Favorites

Why it matters

Leverage the Tavily Research API to automate complex web research tasks for LLM agents. Seamlessly search, scrape, and extract relevant information from online sources to fuel your AI applications.

Outcomes

What it gets done

Perform targeted web searches with customizable depth and domain controls.

Extract raw content and metadata from specified URLs.

Integrate with LLM agents via LlamaIndex for automated research workflows.

Process and structure extracted data for further analysis or use.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/li-tool-tools-tavily-research | bash

Capabilities

What this skill does

Perform targeted web

Perform targeted web searches with customizable depth and domain controls.

Extract raw content

Extract raw content and metadata from specified URLs.

Integrate with LLM

Integrate with LLM agents via LlamaIndex for automated research workflows.

Process and structure

Process and structure extracted data for further analysis or use.

Overview

Tavily Research Tool

What it does

a research API tool for LLM agents that provides web search and content extraction capabilities

How it connects

when your AI agent needs to retrieve current information from the web or extract content from specific URLs

Source README

Tavily Research Tool

Tavily is a robust research API tailored specifically for LLM Agents. It seamlessly integrates with diverse data sources to ensure a superior, relevant research experience.

To begin, you need to obtain an API key on the Tavily's developer dashboard.

Why Choose Tavily Research API?

Purpose-Built: Tailored just for LLM Agents, we ensure our features and results resonate with your unique needs. We take care of all the burden in searching, scraping, filtering and extracting information from online sources. All in a single API call!
Versatility: Beyond just fetching results, Tavily Research API offers precision. With customizable search depths, domain management, and parsing html content controls, you're in the driver's seat.
Performance: Committed to rapidity and efficiency, our API guarantees real-time outcomes without sidelining accuracy. Please note that we're just getting started, so performance may vary and improve over time.
Integration-friendly: We appreciate the essence of adaptability. That's why integrating our API with your existing setup is a breeze. You can choose our Python library or a simple API call or any of our supported partners such as Langchain and LLamaIndex.
Transparent & Informative: Our detailed documentation ensures you're never left in the dark. From setup basics to nuanced features, we've got you covered.

Usage

This tool has a more extensive example usage documented in a Jupyter notebook here

Here's an example usage of the TavilyToolSpec.

from llama_index.tools.tavily_research import TavilyToolSpec
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI

tavily_tool = TavilyToolSpec(
    api_key="your-key",
)
agent = FunctionAgent(
    tools=tavily_tool.to_tool_list(),
    llm=OpenAI(model="gpt-4o"),
)

await agent.run("What happened in the latest Burning Man festival?")

Available Functions

search: Search for relevant dynamic data based on a query. Returns a list of Document objects with urls and their relevant content.

extract: Extract raw content from specific URLs using Tavily Extract API. Returns a list of Document objects containing the extracted content and metadata.

Extract Function Example

from llama_index.tools.tavily_research import TavilyToolSpec

tavily_tool = TavilyToolSpec(api_key="your-key")

### Extract content from specific URLs
documents = tavily_tool.extract(
    urls=["https://example.com/article1", "https://example.com/article2"],
    include_images=True,
    include_favicon=True,
    extract_depth="advanced",  # "basic" or "advanced"
    format="markdown",  # "markdown" or "text"
)

for doc in documents:
    print(f"URL: {doc.extra_info['url']}")
    print(f"Content: {doc.text[:200]}...")

This loader is designed to be used as a way to load data as a Tool in an Agent.

Discussion

Automate Web Research and Data Extraction

What it gets done

Add it to your toolbox

What this skill does

Tavily Research Tool

What it does

How it connects

Tavily Research Tool

Why Choose Tavily Research API?

Usage

Available Functions

Extract Function Example

Questions & comments · 0