mirror of
https://github.com/ArchiveBox/ArchiveBox.git
synced 2026-04-25 17:16:00 +03:00
[PR #1751] [MERGED] Clean up on_Crawl hooks and remove dead code #1500
Labels
No labels
expected: maybe someday
expected: next release
expected: release after next
expected: unlikely unless contributed
good first ticket
help wanted
pull-request
scope: all users
scope: windows users
size: easy
size: hard
size: medium
size: medium
status: backlog
status: blocked
status: done
status: idea-phase
status: needs followup
status: wip
status: wontfix
touches: API/CLI/Spec
touches: configuration
touches: data/schema/architecture
touches: dependencies/packaging
touches: docs
touches: js
touches: views/replayers/html/css
why: correctness
why: functionality
why: performance
why: security
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ArchiveBox#1500
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/1751
Author: @pirate
Created: 12/31/2025
Status: ✅ Merged
Merged: 12/31/2025
Merged by: @pirate
Base:
dev← Head:claude/cleanup-on-crawl-hooks-TtLF6📝 Commits (1)
4c77949Clean up on_Crawl hooks: remove duplicates and standardize naming📊 Changes
21 files changed (+109 additions, -1729 deletions)
View changed files
➖
archivebox/plugins/captcha2/config.json(+0 -21)➖
archivebox/plugins/captcha2/on_Crawl__01_captcha2.js(+0 -121)➖
archivebox/plugins/captcha2/on_Crawl__11_captcha2_config.js(+0 -279)➖
archivebox/plugins/captcha2/templates/icon.html(+0 -0)➖
archivebox/plugins/captcha2/tests/test_captcha2.py(+0 -184)➖
archivebox/plugins/chrome/on_Crawl__00_chrome_install.py(+0 -184)📝
archivebox/plugins/chrome/on_Crawl__01_chrome_install.py(+0 -0)📝
archivebox/plugins/chrome/on_Crawl__10_chrome_validate.py(+0 -0)📝
archivebox/plugins/chrome/on_Crawl__20_chrome_launch.bg.js(+109 -31)➖
archivebox/plugins/chrome/on_Crawl__30_chrome_launch.bg.js(+0 -323)📝
archivebox/plugins/istilldontcareaboutcookies/on_Crawl__02_istilldontcareaboutcookies_install.js(+0 -0)➖
archivebox/plugins/istilldontcareaboutcookies/on_Crawl__20_install_istilldontcareaboutcookies_extension.js(+0 -59)📝
archivebox/plugins/search_backend_ripgrep/on_Crawl__00_ripgrep_install.py(+0 -0)📝
archivebox/plugins/singlefile/on_Crawl__04_singlefile_install.js(+0 -0)➖
archivebox/plugins/singlefile/on_Crawl__20_install_singlefile_extension.js(+0 -281)📝
archivebox/plugins/twocaptcha/on_Crawl__05_twocaptcha_install.js(+0 -0)📝
archivebox/plugins/twocaptcha/on_Crawl__25_twocaptcha_config.js(+0 -0)➖
archivebox/plugins/ublock/on_Crawl__03_ublock.js(+0 -116)📝
archivebox/plugins/ublock/on_Crawl__03_ublock_install.js(+0 -0)➖
archivebox/plugins/wget/on_Crawl__10_wget_validate_config.py(+0 -130)...and 1 more files
📄 Description
Deleted dead/duplicate hooks:
Renamed hooks to follow consistent pattern: on_Crawl__XX__.
Priority bands:
00-09: Binary/extension installation 10-19: Config validation 20-29: Browser launch and post-launch config
Final hooks:
00 ripgrep_install.py, 01 chrome_install.py 02 istilldontcareaboutcookies_install.js 03 ublock_install.js, 04 singlefile_install.js 05 twocaptcha_install.js 10 chrome_validate.py, 11 wget_validate.py 20 chrome_launch.bg.js, 25 twocaptcha_config.js
Summary
Related issues
Changes these areas
Summary by cubic
Cleaned up Crawl-level hooks by removing legacy/duplicate code and standardizing hook names and priorities. Chrome launch is now a single, updated hook with better extension detection and cleaner outputs.
Refactors
Migration
Written for commit
4c77949197. Summary will update on new commits.🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.
archivebox update --index-onlycreates deprecatedindex.html&index.jsonfiles in data root #2395archivebox update --index-onlycreates deprecatedindex.html&index.jsonfiles in data root #3904