[PR #1743] Add unit tests for JSONL CLI pipeline commands (Phase 5 & 6) #3003

Closed
opened 2026-03-01 18:01:22 +03:00 by kerem · 0 comments
Owner

Original Pull Request: https://github.com/ArchiveBox/ArchiveBox/pull/1743

State: closed
Merged: Yes


Summary

Related issues

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

Summary by cubic

Adds pass-through and create-or-update behavior to JSONL CLI pipeline commands and centralizes filter utilities, then adds comprehensive unit tests to validate the pipeline end-to-end. This improves piping workflows and makes run orchestration more reliable.

  • New Features
    • Pass-through added to crawl/snapshot/archiveresult create: non-target records are output unchanged.
    • run now supports create-or-update for Crawl/Snapshot/ArchiveResult and outputs processed records for chaining; cascades Crawl → Snapshots → ArchiveResults.
    • Shared apply_filters utility added (cli_utils.py) and used across 7 CLI files to remove duplication.
    • ArchiveResult.from_json()/from_jsonl() implemented; Snapshot.to_json now emits tags_str.
    • Supervisord default updated to use archivebox run instead of manage orchestrator.
    • New pytest fixtures and CLI tests for crawl, snapshot, archiveresult, run, plus pass-through and pipeline accumulation cases.

Written for commit bb52b5902a. Summary will update on new commits.

**Original Pull Request:** https://github.com/ArchiveBox/ArchiveBox/pull/1743 **State:** closed **Merged:** Yes --- <!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line length changes. --> # Summary <!--e.g. This PR fixes ABC or adds the ability to do XYZ...--> # Related issues <!-- e.g. #123 or Roadmap goal # https://github.com/pirate/ArchiveBox/wiki/Roadmap --> # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Adds pass-through and create-or-update behavior to JSONL CLI pipeline commands and centralizes filter utilities, then adds comprehensive unit tests to validate the pipeline end-to-end. This improves piping workflows and makes run orchestration more reliable. - **New Features** - Pass-through added to crawl/snapshot/archiveresult create: non-target records are output unchanged. - run now supports create-or-update for Crawl/Snapshot/ArchiveResult and outputs processed records for chaining; cascades Crawl → Snapshots → ArchiveResults. - Shared apply_filters utility added (cli_utils.py) and used across 7 CLI files to remove duplication. - ArchiveResult.from_json()/from_jsonl() implemented; Snapshot.to_json now emits tags_str. - Supervisord default updated to use archivebox run instead of manage orchestrator. - New pytest fixtures and CLI tests for crawl, snapshot, archiveresult, run, plus pass-through and pipeline accumulation cases. <sup>Written for commit bb52b5902a512f076f98b5f16139a76c7890c22b. Summary will update on new commits.</sup> <!-- End of auto-generated description by cubic. -->
kerem 2026-03-01 18:01:22 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3003
No description provided.