[PR #1720] [MERGED] Improve filesystem based hook architecture #4486

Closed
opened 2026-03-15 01:47:09 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/1720
Author: @pirate
Created: 12/27/2025
Status: Merged
Merged: 12/27/2025
Merged by: @pirate

Base: devHead: claude/implement-hook-architecture-ahCXy


📝 Commits (8)

  • 3d985fa Implement hook architecture with JSONL output support
  • c52eef1 Update Python/JS hooks to clean JSONL format + add audit report
  • 2623c6c Complete JS hooks to clean JSONL format + rename background hooks
  • 8c846b7 Rename validate hooks to install hooks
  • e3ba599 Update install hooks to respect XYZ_BINARY env vars
  • 4e50c4f Mark snapshot hook checklist items as complete
  • d65eb58 Add hook architecture unit tests + mark remaining work complete
  • b632894 Update views, API, and exports for new ArchiveResult output fields

📊 Changes

47 files changed (+2007 additions, -796 deletions)

View changed files

📝 TODO_hook_architecture.md (+207 -13)
📝 archivebox/api/v1_core.py (+7 -3)
📝 archivebox/cli/archivebox_extract.py (+3 -3)
📝 archivebox/core/admin_archiveresults.py (+25 -22)
archivebox/core/migrations/0029_archiveresult_hook_fields.py (+80 -0)
archivebox/core/migrations/0030_migrate_output_field.py (+64 -0)
📝 archivebox/core/models.py (+302 -28)
📝 archivebox/core/statemachines.py (+16 -5)
📝 archivebox/core/templatetags/core_tags.py (+3 -3)
📝 archivebox/hooks.py (+151 -5)
📝 archivebox/misc/jsonl.py (+16 -2)
📝 archivebox/plugins/accessibility/on_Snapshot__39_accessibility.js (+12 -31)
📝 archivebox/plugins/archive_org/on_Snapshot__13_archive_org.py (+7 -21)
📝 archivebox/plugins/chrome_navigate/on_Snapshot__30_chrome_navigate.js (+8 -19)
📝 archivebox/plugins/chrome_session/on_Crawl__00_install_chrome.py (+29 -5)
📝 archivebox/plugins/chrome_session/on_Crawl__00_install_chrome_config.py (+0 -0)
📝 archivebox/plugins/chrome_session/on_Snapshot__20_chrome_session.js (+9 -27)
📝 archivebox/plugins/consolelog/on_Snapshot__21_consolelog.bg.js (+14 -37)
📝 archivebox/plugins/dom/on_Snapshot__36_dom.js (+18 -33)
📝 archivebox/plugins/extractor_utils.py (+12 -17)

...and 27 more files

📄 Description

Summary

Related issues

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ArchiveBox/ArchiveBox/pull/1720 **Author:** [@pirate](https://github.com/pirate) **Created:** 12/27/2025 **Status:** ✅ Merged **Merged:** 12/27/2025 **Merged by:** [@pirate](https://github.com/pirate) **Base:** `dev` ← **Head:** `claude/implement-hook-architecture-ahCXy` --- ### 📝 Commits (8) - [`3d985fa`](https://github.com/ArchiveBox/ArchiveBox/commit/3d985fa8c88c46e9c16a112f0dba2e8bc21acaac) Implement hook architecture with JSONL output support - [`c52eef1`](https://github.com/ArchiveBox/ArchiveBox/commit/c52eef1459e8f838fbe47c7995374d1fc8ab329e) Update Python/JS hooks to clean JSONL format + add audit report - [`2623c6c`](https://github.com/ArchiveBox/ArchiveBox/commit/2623c6cc116744974d80ab896356e4ddf2055e75) Complete JS hooks to clean JSONL format + rename background hooks - [`8c846b7`](https://github.com/ArchiveBox/ArchiveBox/commit/8c846b7d1ce400bccd158ad1a1c18abe7565edca) Rename validate hooks to install hooks - [`e3ba599`](https://github.com/ArchiveBox/ArchiveBox/commit/e3ba599812fa1716ebebc98dae2c77482b52e2cb) Update install hooks to respect XYZ_BINARY env vars - [`4e50c4f`](https://github.com/ArchiveBox/ArchiveBox/commit/4e50c4f18216ec1b9b2555801f6d20c0ced38434) Mark snapshot hook checklist items as complete - [`d65eb58`](https://github.com/ArchiveBox/ArchiveBox/commit/d65eb587d90b9f2c1b25a22a42d4912f9ae6ac0c) Add hook architecture unit tests + mark remaining work complete - [`b632894`](https://github.com/ArchiveBox/ArchiveBox/commit/b632894bc9d853bdc3385ea6e5437f2c25bf153c) Update views, API, and exports for new ArchiveResult output fields ### 📊 Changes **47 files changed** (+2007 additions, -796 deletions) <details> <summary>View changed files</summary> 📝 `TODO_hook_architecture.md` (+207 -13) 📝 `archivebox/api/v1_core.py` (+7 -3) 📝 `archivebox/cli/archivebox_extract.py` (+3 -3) 📝 `archivebox/core/admin_archiveresults.py` (+25 -22) ➕ `archivebox/core/migrations/0029_archiveresult_hook_fields.py` (+80 -0) ➕ `archivebox/core/migrations/0030_migrate_output_field.py` (+64 -0) 📝 `archivebox/core/models.py` (+302 -28) 📝 `archivebox/core/statemachines.py` (+16 -5) 📝 `archivebox/core/templatetags/core_tags.py` (+3 -3) 📝 `archivebox/hooks.py` (+151 -5) 📝 `archivebox/misc/jsonl.py` (+16 -2) 📝 `archivebox/plugins/accessibility/on_Snapshot__39_accessibility.js` (+12 -31) 📝 `archivebox/plugins/archive_org/on_Snapshot__13_archive_org.py` (+7 -21) 📝 `archivebox/plugins/chrome_navigate/on_Snapshot__30_chrome_navigate.js` (+8 -19) 📝 `archivebox/plugins/chrome_session/on_Crawl__00_install_chrome.py` (+29 -5) 📝 `archivebox/plugins/chrome_session/on_Crawl__00_install_chrome_config.py` (+0 -0) 📝 `archivebox/plugins/chrome_session/on_Snapshot__20_chrome_session.js` (+9 -27) 📝 `archivebox/plugins/consolelog/on_Snapshot__21_consolelog.bg.js` (+14 -37) 📝 `archivebox/plugins/dom/on_Snapshot__36_dom.js` (+18 -33) 📝 `archivebox/plugins/extractor_utils.py` (+12 -17) _...and 27 more files_ </details> ### 📄 Description <!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line length changes. --> # Summary <!--e.g. This PR fixes ABC or adds the ability to do XYZ...--> # Related issues <!-- e.g. #123 or Roadmap goal # https://github.com/pirate/ArchiveBox/wiki/Roadmap --> # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-15 01:47:09 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#4486
No description provided.