[PR #1741] [MERGED] Delete pid_utils.py and migrate to Process model #4504

Closed
opened 2026-03-15 01:48:14 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/ArchiveBox/ArchiveBox/pull/1741
Author: @pirate
Created: 12/31/2025
Status: Merged
Merged: 12/31/2025
Merged by: @pirate

Base: devHead: claude/refactor-process-management-WcQyZ


📝 Commits (5)

  • 2d3a2fe Add terminate, kill_tree, and query methods to Process model
  • b822352 Delete pid_utils.py and migrate to Process model
  • ee201a0 Fix code review issues in process management refactor
  • 5121b0e Merge branch 'dev' into claude/refactor-process-management-WcQyZ
  • b2132d1 Fix cubic review issues: process_type detection, cmd storage, PID cleanup, and migration

📊 Changes

26 files changed (+3889 additions, -1181 deletions)

View changed files

📝 TODO_process_tracking.md (+239 -5)
archivebox/cli/archivebox_extract.py (+265 -0)
archivebox/cli/archivebox_orchestrator.py (+67 -0)
archivebox/cli/archivebox_remove.py (+98 -0)
archivebox/cli/archivebox_search.py (+131 -0)
📝 archivebox/core/models.py (+73 -321)
📝 archivebox/crawls/models.py (+13 -168)
📝 archivebox/hooks.py (+46 -256)
archivebox/machine/migrations/0002_process_parent_and_type.py (+101 -0)
📝 archivebox/machine/models.py (+895 -195)
📝 archivebox/misc/process_utils.py (+45 -6)
archivebox/plugins/captcha2/config.json (+21 -0)
archivebox/plugins/captcha2/on_Crawl__01_captcha2.js (+121 -0)
archivebox/plugins/captcha2/on_Crawl__11_captcha2_config.js (+279 -0)
archivebox/plugins/captcha2/templates/icon.html (+0 -0)
archivebox/plugins/captcha2/tests/test_captcha2.py (+184 -0)
archivebox/plugins/chrome/on_Crawl__00_chrome_install.py (+184 -0)
archivebox/plugins/chrome/on_Crawl__10_chrome_validate_config.py (+172 -0)
archivebox/plugins/chrome/on_Crawl__20_chrome_launch.bg.js (+245 -0)
archivebox/plugins/istilldontcareaboutcookies/on_Crawl__02_istilldontcareaboutcookies.js (+115 -0)

...and 6 more files

📄 Description

Summary

Related issues

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

Summary by cubic

Replaced PID-file based process tracking with the Process model as the single source of truth, adding hierarchy, is_alive checks, and safe kill with SIGTERM→SIGKILL escalation. This removes workers/pid_utils.py and simplifies worker/orchestrator/crawl/hook code while improving safety against PID reuse.

  • Refactors

    • Deleted workers/pid_utils.py; all process ops now live on Process (current, launch, terminate, kill_tree, is_alive, get_running/count, cleanup_stale_running, get_next_worker_id).
    • Added parent FK and process_type to Process, plus validated psutil-backed status checks and PID reuse protection.
    • Updated orchestrator.py and worker.py to register with Process.current(), set process_type, throttle stale cleanup, and mark EXITED on shutdown. Added CLI entry for orchestrator (cli/archivebox_orchestrator.py).
    • Simplified crawls/models.py cleanup and removed hooks.py process_is_alive/kill_process; core/machine cleanup now use safe_kill_process for legacy hook.pid files.
  • Migration

    • Run Django migrations (machine.0002 adds Process.parent and process_type with indexes).
    • No config changes; stale PID files are ignored.

Written for commit b2132d1f14. Summary will update on new commits.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/ArchiveBox/ArchiveBox/pull/1741 **Author:** [@pirate](https://github.com/pirate) **Created:** 12/31/2025 **Status:** ✅ Merged **Merged:** 12/31/2025 **Merged by:** [@pirate](https://github.com/pirate) **Base:** `dev` ← **Head:** `claude/refactor-process-management-WcQyZ` --- ### 📝 Commits (5) - [`2d3a2fe`](https://github.com/ArchiveBox/ArchiveBox/commit/2d3a2fec579796a320c8278b509ee24916c7e8f6) Add terminate, kill_tree, and query methods to Process model - [`b822352`](https://github.com/ArchiveBox/ArchiveBox/commit/b822352fc3aa571079edac71a160b20151f07eea) Delete pid_utils.py and migrate to Process model - [`ee201a0`](https://github.com/ArchiveBox/ArchiveBox/commit/ee201a0f836d50054307a71bd59e3ebe2b1823be) Fix code review issues in process management refactor - [`5121b0e`](https://github.com/ArchiveBox/ArchiveBox/commit/5121b0e5f92e51c7a2ede6a38c493d5d63e736d8) Merge branch 'dev' into claude/refactor-process-management-WcQyZ - [`b2132d1`](https://github.com/ArchiveBox/ArchiveBox/commit/b2132d1f14e30051658e523d0818980d629ecc97) Fix cubic review issues: process_type detection, cmd storage, PID cleanup, and migration ### 📊 Changes **26 files changed** (+3889 additions, -1181 deletions) <details> <summary>View changed files</summary> 📝 `TODO_process_tracking.md` (+239 -5) ➕ `archivebox/cli/archivebox_extract.py` (+265 -0) ➕ `archivebox/cli/archivebox_orchestrator.py` (+67 -0) ➕ `archivebox/cli/archivebox_remove.py` (+98 -0) ➕ `archivebox/cli/archivebox_search.py` (+131 -0) 📝 `archivebox/core/models.py` (+73 -321) 📝 `archivebox/crawls/models.py` (+13 -168) 📝 `archivebox/hooks.py` (+46 -256) ➕ `archivebox/machine/migrations/0002_process_parent_and_type.py` (+101 -0) 📝 `archivebox/machine/models.py` (+895 -195) 📝 `archivebox/misc/process_utils.py` (+45 -6) ➕ `archivebox/plugins/captcha2/config.json` (+21 -0) ➕ `archivebox/plugins/captcha2/on_Crawl__01_captcha2.js` (+121 -0) ➕ `archivebox/plugins/captcha2/on_Crawl__11_captcha2_config.js` (+279 -0) ➕ `archivebox/plugins/captcha2/templates/icon.html` (+0 -0) ➕ `archivebox/plugins/captcha2/tests/test_captcha2.py` (+184 -0) ➕ `archivebox/plugins/chrome/on_Crawl__00_chrome_install.py` (+184 -0) ➕ `archivebox/plugins/chrome/on_Crawl__10_chrome_validate_config.py` (+172 -0) ➕ `archivebox/plugins/chrome/on_Crawl__20_chrome_launch.bg.js` (+245 -0) ➕ `archivebox/plugins/istilldontcareaboutcookies/on_Crawl__02_istilldontcareaboutcookies.js` (+115 -0) _...and 6 more files_ </details> ### 📄 Description <!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line length changes. --> # Summary <!--e.g. This PR fixes ABC or adds the ability to do XYZ...--> # Related issues <!-- e.g. #123 or Roadmap goal # https://github.com/pirate/ArchiveBox/wiki/Roadmap --> # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Replaced PID-file based process tracking with the Process model as the single source of truth, adding hierarchy, is_alive checks, and safe kill with SIGTERM→SIGKILL escalation. This removes workers/pid_utils.py and simplifies worker/orchestrator/crawl/hook code while improving safety against PID reuse. - **Refactors** - Deleted workers/pid_utils.py; all process ops now live on Process (current, launch, terminate, kill_tree, is_alive, get_running/count, cleanup_stale_running, get_next_worker_id). - Added parent FK and process_type to Process, plus validated psutil-backed status checks and PID reuse protection. - Updated orchestrator.py and worker.py to register with Process.current(), set process_type, throttle stale cleanup, and mark EXITED on shutdown. Added CLI entry for orchestrator (cli/archivebox_orchestrator.py). - Simplified crawls/models.py cleanup and removed hooks.py process_is_alive/kill_process; core/machine cleanup now use safe_kill_process for legacy hook.pid files. - **Migration** - Run Django migrations (machine.0002 adds Process.parent and process_type with indexes). - No config changes; stale PID files are ignored. <sup>Written for commit b2132d1f14e30051658e523d0818980d629ecc97. Summary will update on new commits.</sup> <!-- End of auto-generated description by cubic. --> --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-15 01:48:14 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#4504
No description provided.