Skill

Extract Clean Content from Web Pages

Defuddle is a CLI tool that extracts clean, readable content from web pages in markdown, removing ads and navigation to reduce token usage.


46
Spark score
out of 100
Updated yesterday
Version 13.1.0

Add to Favorites

Why it matters

This asset efficiently extracts clean, readable content from standard web pages, removing clutter like ads and navigation. It's ideal for tasks requiring token efficiency when processing articles, blog posts, or documentation.

Outcomes

What it gets done

01

Scrape and parse web page content.

02

Remove navigation, ads, and clutter from web pages.

03

Extract specific metadata like title, description, or domain.

04

Output content in Markdown or JSON format.

Install

Add it to your toolbox

Run in your project directory:

curl -fsSL https://spark.entire.vc/get/ag-defuddle | bash

Capabilities

What this skill does

Scrape

Fetches and parses content from web pages.

Extract

Pulls structured data fields from unstructured text.

Summarize

Condenses long documents or threads into key takeaways.

Search the web

Searches the web and retrieves relevant sources.

Overview

Defuddle

What it does

A command-line tool for extracting clean, readable content from web pages by removing navigation, ads, and clutter. Outputs markdown by default, with options for HTML, JSON, or specific metadata properties.

How it connects

Use when you need to read, summarize, or analyze content from normal webpages like docs, articles, and blog posts. Prefer it over noisier page-fetch approaches when token efficiency matters for standard web content.

Source README

Defuddle

Use Defuddle CLI to extract clean readable content from web pages. Prefer over WebFetch for standard web pages - it removes navigation, ads, and clutter, reducing token usage.

When to Use

  • Use when the user provides a normal webpage URL to read, summarize, or analyze.
  • Prefer it over noisy page-fetch approaches when token efficiency matters.
  • Use for docs, articles, blog posts, and similar public web content.

If not installed: npm install -g defuddle

Usage

Always use --md for markdown output:

defuddle parse <url> --md

Save to file:

defuddle parse <url> --md -o content.md

Extract specific metadata:

defuddle parse <url> -p title
defuddle parse <url> -p description
defuddle parse <url> -p domain

Output formats

Flag Format
--md Markdown (default choice)
--json JSON with both HTML and markdown
(none) HTML
-p <name> Specific metadata property

Limitations

  • Use this skill only when the task clearly matches the scope described above.
  • Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
  • Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.

Discussion

Questions & comments · 0

Sign In Sign in to leave a comment.