Prompt Chain

Extract Amazon Product Data

Name: Extract Amazon Product Data
Availability: OnlineOnly
Author: LlamaIndex

LlamaPack that screenshots Amazon product pages and extracts structured JSON data using OpenAI GPT-4V with prompt engineering for automated product information

Copy chain

Works with openai github

LlamaIndex

Maintainer?

Spark score

out of 100

Updated 4 days ago

Version 0.14.22

Models

gpt 4o llama 3

Add to Favorites

Why it matters

Automate the extraction of structured product data from Amazon web pages using visual analysis and prompt engineering. This pack streamlines the process of gathering key product details for e-commerce analysis and integration.

Outcomes

What it gets done

Screenshot product pages from provided URLs.

Utilize GPT-4V and prompt engineering to interpret visual data.

Extract product information into a structured JSON format.

Provide CLI and code-based integration for easy use.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/li-pack-packs-amazon-product-extraction | bash

Steps

Steps in the chain

Load website URL and screenshot page

Load in a website URL and screenshot the page to capture the visual content for processing.

Extract screenshot using GPT-4V

Use OpenAI GPT-4V with prompt engineering to extract the screenshot into structured JSON output.

Download AmazonProductExtractionPack

Download the pack using llamaindex-cli: llamaindex-cli download-llamapack AmazonProductExtractionPack --download-dir ./amazon_product_extraction_pack

Initialize the extraction pack

Create the pack instance with amazon_product_page: amazon_product_extraction_pack = SentenceWindowRetrieverPack(amazon_product_page,)

Run extraction and get results

Execute the pack's run() function to process the data: response = amazon_product_extraction_pack.run() and display results with display(response.dict())

Overview

Amazon Product Extraction Pack

What it does

This LlamaPack loads an Amazon product page URL, captures a screenshot, and uses OpenAI GPT-4V with prompt engineering to extract product information into structured JSON format. It provides a Pydantic program for schema validation and exposes both the extraction program and the underlying multi-modal LLM as individual modules. The pack can be downloaded via CLI or Python and includes a complete workflow from page capture to structured output.

How it connects

Use this pack when you need to automate extraction of product details from Amazon pages without writing custom HTML parsers or scraping logic. It's ideal for building product catalogs, price monitoring systems, or competitive analysis tools where visual page content needs to be transformed into structured data that can be processed programmatically.

Source README

Description pending for li-pack-packs-amazon-product-extraction.

Step 1: Load website URL and screenshot page

Load in a website URL and screenshot the page to capture the visual content for processing.

Step 2: Extract screenshot using GPT-4V

Use OpenAI GPT-4V with prompt engineering to extract the screenshot into structured JSON output.

Step 3: Download AmazonProductExtractionPack

Download the pack using llamaindex-cli: llamaindex-cli download-llamapack AmazonProductExtractionPack --download-dir ./amazon_product_extraction_pack

Step 4: Initialize the extraction pack

Create the pack instance with amazon_product_page: amazon_product_extraction_pack = SentenceWindowRetrieverPack(amazon_product_page,)

Step 5: Run extraction and get results

Execute the pack's run() function to process the data: response = amazon_product_extraction_pack.run() and display results with display(response.dict())

Discussion