Extract Clean Content from Web Pages
Defuddle is a CLI tool that extracts clean, readable content from web pages in markdown, removing ads and navigation to reduce token usage.
Why it matters
This asset efficiently extracts clean, readable content from standard web pages, removing clutter like ads and navigation. It's ideal for tasks requiring token efficiency when processing articles, blog posts, or documentation.
Outcomes
What it gets done
Scrape and parse web page content.
Remove navigation, ads, and clutter from web pages.
Extract specific metadata like title, description, or domain.
Output content in Markdown or JSON format.
Install
Add it to your toolbox
Run in your project directory:
curl -fsSL https://spark.entire.vc/get/ag-defuddle | bash Capabilities
What this skill does
Fetches and parses content from web pages.
Pulls structured data fields from unstructured text.
Condenses long documents or threads into key takeaways.
Searches the web and retrieves relevant sources.
Overview
Defuddle
What it does
A command-line tool for extracting clean, readable content from web pages by removing navigation, ads, and clutter. Outputs markdown by default, with options for HTML, JSON, or specific metadata properties.
How it connects
Use when you need to read, summarize, or analyze content from normal webpages like docs, articles, and blog posts. Prefer it over noisier page-fetch approaches when token efficiency matters for standard web content.
Source README
Defuddle
Use Defuddle CLI to extract clean readable content from web pages. Prefer over WebFetch for standard web pages - it removes navigation, ads, and clutter, reducing token usage.
When to Use
- Use when the user provides a normal webpage URL to read, summarize, or analyze.
- Prefer it over noisy page-fetch approaches when token efficiency matters.
- Use for docs, articles, blog posts, and similar public web content.
If not installed: npm install -g defuddle
Usage
Always use --md for markdown output:
defuddle parse <url> --md
Save to file:
defuddle parse <url> --md -o content.md
Extract specific metadata:
defuddle parse <url> -p title
defuddle parse <url> -p description
defuddle parse <url> -p domain
Output formats
| Flag | Format |
|---|---|
--md |
Markdown (default choice) |
--json |
JSON with both HTML and markdown |
| (none) | HTML |
-p <name> |
Specific metadata property |
Limitations
- Use this skill only when the task clearly matches the scope described above.
- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.
- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.
Discussion
Questions & comments · 0
Sign In Sign in to leave a comment.