[PR #1721] [MERGED] Improve concurrency control between plugin hooks #4485

Closed
opened 2026-03-15 01:47:09 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/1721
Author: @pirate
Created: 12/28/2025
Status: Merged
Merged: 12/28/2025
Merged by: @pirate

Base: devHead: claude/review-snapshot-archive-RJOkr


📝 Commits (5)

  • 1b5a816 Implement hook step-based concurrency system
  • 6b3c872 Mark hook renumbering testing as complete in TODO
  • 32bcf08 Restore missing folder utility functions
  • 767458e Revert "Restore missing folder utility functions"
  • 057b49a Update status command to use DB as source of truth

📊 Changes

30 files changed (+327 additions, -127 deletions)

View changed files

📝 TODO_hook_concurrency.md (+48 -36)
📝 archivebox/cli/archivebox_status.py (+27 -41)
📝 archivebox/core/migrations/0032_alter_archiveresult_binary_and_more.py (+2 -2)
archivebox/core/migrations/0034_snapshot_current_step.py (+23 -0)
📝 archivebox/core/models.py (+116 -37)
📝 archivebox/core/statemachines.py (+5 -0)
📝 archivebox/hooks.py (+73 -2)
📝 archivebox/machine/migrations/0003_alter_dependency_id_alter_installedbinary_dependency_and_more.py (+5 -5)
📝 archivebox/plugins/dom/on_Snapshot__53_dom.js (+0 -0)
📝 archivebox/plugins/forumdl/on_Snapshot__65_forumdl.bg.py (+0 -0)
📝 archivebox/plugins/gallerydl/on_Snapshot__64_gallerydl.bg.py (+0 -0)
📝 archivebox/plugins/git/on_Snapshot__62_git.py (+0 -0)
📝 archivebox/plugins/headers/on_Snapshot__55_headers.js (+0 -0)
📝 archivebox/plugins/htmltotext/on_Snapshot__57_htmltotext.py (+0 -0)
📝 archivebox/plugins/media/on_Snapshot__63_media.bg.py (+0 -0)
📝 archivebox/plugins/mercury/on_Snapshot__56_mercury.py (+0 -0)
📝 archivebox/plugins/papersdl/on_Snapshot__66_papersdl.bg.py (+0 -0)
📝 archivebox/plugins/parse_dom_outlinks/on_Snapshot__75_parse_dom_outlinks.js (+0 -0)
📝 archivebox/plugins/parse_html_urls/on_Snapshot__70_parse_html_urls.py (+0 -0)
📝 archivebox/plugins/parse_jsonl_urls/on_Snapshot__74_parse_jsonl_urls.py (+0 -0)

...and 10 more files

📄 Description

Summary

Related issues

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ArchiveBox/ArchiveBox/pull/1721 **Author:** [@pirate](https://github.com/pirate) **Created:** 12/28/2025 **Status:** ✅ Merged **Merged:** 12/28/2025 **Merged by:** [@pirate](https://github.com/pirate) **Base:** `dev` ← **Head:** `claude/review-snapshot-archive-RJOkr` --- ### 📝 Commits (5) - [`1b5a816`](https://github.com/ArchiveBox/ArchiveBox/commit/1b5a8160225f6bc9549adf90fcfac8e600d1d1c5) Implement hook step-based concurrency system - [`6b3c872`](https://github.com/ArchiveBox/ArchiveBox/commit/6b3c87276fe92ead2caeea7c1d9d9ab77ab9c494) Mark hook renumbering testing as complete in TODO - [`32bcf08`](https://github.com/ArchiveBox/ArchiveBox/commit/32bcf0896d5576e4478309808576b3fdabdc42e4) Restore missing folder utility functions - [`767458e`](https://github.com/ArchiveBox/ArchiveBox/commit/767458e4e04017114a074f63b7bb59d4112cea97) Revert "Restore missing folder utility functions" - [`057b49a`](https://github.com/ArchiveBox/ArchiveBox/commit/057b49ad85011286b2eace4631d20df7f17549d6) Update status command to use DB as source of truth ### 📊 Changes **30 files changed** (+327 additions, -127 deletions) <details> <summary>View changed files</summary> 📝 `TODO_hook_concurrency.md` (+48 -36) 📝 `archivebox/cli/archivebox_status.py` (+27 -41) 📝 `archivebox/core/migrations/0032_alter_archiveresult_binary_and_more.py` (+2 -2) ➕ `archivebox/core/migrations/0034_snapshot_current_step.py` (+23 -0) 📝 `archivebox/core/models.py` (+116 -37) 📝 `archivebox/core/statemachines.py` (+5 -0) 📝 `archivebox/hooks.py` (+73 -2) 📝 `archivebox/machine/migrations/0003_alter_dependency_id_alter_installedbinary_dependency_and_more.py` (+5 -5) 📝 `archivebox/plugins/dom/on_Snapshot__53_dom.js` (+0 -0) 📝 `archivebox/plugins/forumdl/on_Snapshot__65_forumdl.bg.py` (+0 -0) 📝 `archivebox/plugins/gallerydl/on_Snapshot__64_gallerydl.bg.py` (+0 -0) 📝 `archivebox/plugins/git/on_Snapshot__62_git.py` (+0 -0) 📝 `archivebox/plugins/headers/on_Snapshot__55_headers.js` (+0 -0) 📝 `archivebox/plugins/htmltotext/on_Snapshot__57_htmltotext.py` (+0 -0) 📝 `archivebox/plugins/media/on_Snapshot__63_media.bg.py` (+0 -0) 📝 `archivebox/plugins/mercury/on_Snapshot__56_mercury.py` (+0 -0) 📝 `archivebox/plugins/papersdl/on_Snapshot__66_papersdl.bg.py` (+0 -0) 📝 `archivebox/plugins/parse_dom_outlinks/on_Snapshot__75_parse_dom_outlinks.js` (+0 -0) 📝 `archivebox/plugins/parse_html_urls/on_Snapshot__70_parse_html_urls.py` (+0 -0) 📝 `archivebox/plugins/parse_jsonl_urls/on_Snapshot__74_parse_jsonl_urls.py` (+0 -0) _...and 10 more files_ </details> ### 📄 Description <!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line length changes. --> # Summary <!--e.g. This PR fixes ABC or adds the ability to do XYZ...--> # Related issues <!-- e.g. #123 or Roadmap goal # https://github.com/pirate/ArchiveBox/wiki/Roadmap --> # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-15 01:47:09 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#4485
No description provided.