Skill

Query Pandas DataFrames with Natural Language

A LlamaIndex reader that wraps the PandasAI Python package to run queries on pandas DataFrames and optionally load results as Document objects.

Works with pandasopenai

57
Spark score
out of 100
Updated 4 days ago
Version 0.14.22

Add to Favorites

Why it matters

Unlock insights from your Pandas DataFrames by asking questions in natural language. This asset leverages LLMs to interpret your queries and extract relevant information, acting as a bridge between human language and data analysis.

Outcomes

What it gets done

01

Translate natural language questions into executable Pandas operations.

02

Extract specific data points or summaries from DataFrames based on user queries.

03

Integrate with LLMs (like OpenAI) to power natural language understanding.

04

Load query results directly into LlamaIndex Document objects for further processing.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/li-reader-readers-pandas-ai | bash

Capabilities

What this skill does

Query a database

Writes and executes SQL or NoSQL queries on databases.

Extract

Pulls structured data fields from unstructured text.

RAG index

Chunks, embeds, and indexes documents for semantic retrieval.

Summarize

Condenses long documents or threads into key takeaways.

Overview

Pandas AI Loader

What it does

A wrapper around PandasAI for LlamaIndex

How it connects

When you need to run PandasAI queries within a LlamaIndex workflow

Source README

Pandas AI Loader

pip install llama-index-readers-pandas-ai

This loader is a light wrapper around the PandasAI Python package.

See here: https://github.com/gventuri/pandas-ai.

You can directly get the result of pandasai.run command, or
you can choose to load in Document objects via load_data.

Usage

from pandasai.llm.openai import OpenAI
import pandas as pd

# Sample DataFrame
df = pd.DataFrame(
    {
        "country": [
            "United States",
            "United Kingdom",
            "France",
            "Germany",
            "Italy",
            "Spain",
            "Canada",
            "Australia",
            "Japan",
            "China",
        ],
        "gdp": [
            21400000,
            2940000,
            2830000,
            3870000,
            2160000,
            1350000,
            1780000,
            1320000,
            516000,
            14000000,
        ],
        "happiness_index": [7.3, 7.2, 6.5, 7.0, 6.0, 6.3, 7.3, 7.3, 5.9, 5.0],
    }
)

llm = OpenAI()

from llama_index.readers.pandas_ai import PandasAIReader

# use run_pandas_ai directly
# set is_conversational_answer=False to get parsed output
loader = PandasAIReader(llm=llm)
response = reader.run_pandas_ai(
    df, "Which are the 5 happiest countries?", is_conversational_answer=False
)
print(response)

# load data with is_conversational_answer=False
# will use our PandasCSVReader under the hood
docs = reader.load_data(
    df, "Which are the 5 happiest countries?", is_conversational_answer=False
)

# load data with is_conversational_answer=True
# will use our PandasCSVReader under the hood
docs = reader.load_data(
    df, "Which are the 5 happiest countries?", is_conversational_answer=True
)

This loader is designed to be used as a way to load data into LlamaIndex.

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.