[PR #1740] [MERGED] Add real-world use cases to CLI pipeline plan #4503

Closed
opened 2026-03-15 01:48:14 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/1740
Author: @pirate
Created: 12/31/2025
Status: Merged
Merged: 12/31/2025
Merged by: @pirate

Base: devHead: claude/review-code-quality-macdO


📝 Commits (2)

  • 0f46d8a Add real-world use cases to CLI pipeline plan
  • 1c85b4d Refine use cases: 8 examples with efficient patterns

📊 Changes

1 file changed (+129 additions, -2 deletions)

View changed files

📝 TODO_archivebox_jsonl_cli.md (+129 -2)

📄 Description

Summary

Related issues

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

Summary by cubic

Added eight real-world examples to the CLI pipeline plan to show JSONL piping with archivebox run. Documents auto-cascade (Crawl → Snapshots → ArchiveResults) and that run emits JSONL for chained retries, transforms, and recursive crawling.

  • New Features
    • Eight examples: basic archive, retry failures, Pinboard import, selective extraction, bulk tags, RSS archiving, recursive link following, chained retries with overrides.
    • Clarifies efficient filtering via CLI args (--status, --plugin, --url__icontains) and using jq for transforms.
    • Adds a pattern summary to illustrate common pipelines.

Written for commit 1c85b4daa3. Summary will update on new commits.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ArchiveBox/ArchiveBox/pull/1740 **Author:** [@pirate](https://github.com/pirate) **Created:** 12/31/2025 **Status:** ✅ Merged **Merged:** 12/31/2025 **Merged by:** [@pirate](https://github.com/pirate) **Base:** `dev` ← **Head:** `claude/review-code-quality-macdO` --- ### 📝 Commits (2) - [`0f46d8a`](https://github.com/ArchiveBox/ArchiveBox/commit/0f46d8a22ec90e81262514bb6761b4a15c022c13) Add real-world use cases to CLI pipeline plan - [`1c85b4d`](https://github.com/ArchiveBox/ArchiveBox/commit/1c85b4daa35f55c9dd2de8bf27ab3e29c7629045) Refine use cases: 8 examples with efficient patterns ### 📊 Changes **1 file changed** (+129 additions, -2 deletions) <details> <summary>View changed files</summary> 📝 `TODO_archivebox_jsonl_cli.md` (+129 -2) </details> ### 📄 Description <!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line length changes. --> # Summary <!--e.g. This PR fixes ABC or adds the ability to do XYZ...--> # Related issues <!-- e.g. #123 or Roadmap goal # https://github.com/pirate/ArchiveBox/wiki/Roadmap --> # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Added eight real-world examples to the CLI pipeline plan to show JSONL piping with archivebox run. Documents auto-cascade (Crawl → Snapshots → ArchiveResults) and that run emits JSONL for chained retries, transforms, and recursive crawling. - **New Features** - Eight examples: basic archive, retry failures, Pinboard import, selective extraction, bulk tags, RSS archiving, recursive link following, chained retries with overrides. - Clarifies efficient filtering via CLI args (--status, --plugin, --url__icontains) and using jq for transforms. - Adds a pattern summary to illustrate common pipelines. <sup>Written for commit 1c85b4daa35f55c9dd2de8bf27ab3e29c7629045. Summary will update on new commits.</sup> <!-- End of auto-generated description by cubic. --> --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-15 01:48:14 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#4503
No description provided.