[PR #1741] Delete pid_utils.py and migrate to Process model #3002

Closed
opened 2026-03-01 18:01:22 +03:00 by kerem · 0 comments
Owner

Original Pull Request: https://github.com/ArchiveBox/ArchiveBox/pull/1741

State: closed
Merged: Yes


Summary

Related issues

Changes these areas

  • Bugfixes
  • Feature behavior
  • Command line interface
  • Configuration options
  • Internal architecture
  • Snapshot data layout on disk

Summary by cubic

Replaced PID-file based process tracking with the Process model as the single source of truth, adding hierarchy, is_alive checks, and safe kill with SIGTERM→SIGKILL escalation. This removes workers/pid_utils.py and simplifies worker/orchestrator/crawl/hook code while improving safety against PID reuse.

  • Refactors

    • Deleted workers/pid_utils.py; all process ops now live on Process (current, launch, terminate, kill_tree, is_alive, get_running/count, cleanup_stale_running, get_next_worker_id).
    • Added parent FK and process_type to Process, plus validated psutil-backed status checks and PID reuse protection.
    • Updated orchestrator.py and worker.py to register with Process.current(), set process_type, throttle stale cleanup, and mark EXITED on shutdown. Added CLI entry for orchestrator (cli/archivebox_orchestrator.py).
    • Simplified crawls/models.py cleanup and removed hooks.py process_is_alive/kill_process; core/machine cleanup now use safe_kill_process for legacy hook.pid files.
  • Migration

    • Run Django migrations (machine.0002 adds Process.parent and process_type with indexes).
    • No config changes; stale PID files are ignored.

Written for commit b2132d1f14. Summary will update on new commits.

**Original Pull Request:** https://github.com/ArchiveBox/ArchiveBox/pull/1741 **State:** closed **Merged:** Yes --- <!-- IMPORTANT: Do not submit PRs with only formatting / PEP8 / line length changes. --> # Summary <!--e.g. This PR fixes ABC or adds the ability to do XYZ...--> # Related issues <!-- e.g. #123 or Roadmap goal # https://github.com/pirate/ArchiveBox/wiki/Roadmap --> # Changes these areas - [ ] Bugfixes - [ ] Feature behavior - [ ] Command line interface - [ ] Configuration options - [ ] Internal architecture - [ ] Snapshot data layout on disk <!-- This is an auto-generated description by cubic. --> --- ## Summary by cubic Replaced PID-file based process tracking with the Process model as the single source of truth, adding hierarchy, is_alive checks, and safe kill with SIGTERM→SIGKILL escalation. This removes workers/pid_utils.py and simplifies worker/orchestrator/crawl/hook code while improving safety against PID reuse. - **Refactors** - Deleted workers/pid_utils.py; all process ops now live on Process (current, launch, terminate, kill_tree, is_alive, get_running/count, cleanup_stale_running, get_next_worker_id). - Added parent FK and process_type to Process, plus validated psutil-backed status checks and PID reuse protection. - Updated orchestrator.py and worker.py to register with Process.current(), set process_type, throttle stale cleanup, and mark EXITED on shutdown. Added CLI entry for orchestrator (cli/archivebox_orchestrator.py). - Simplified crawls/models.py cleanup and removed hooks.py process_is_alive/kill_process; core/machine cleanup now use safe_kill_process for legacy hook.pid files. - **Migration** - Run Django migrations (machine.0002 adds Process.parent and process_type with indexes). - No config changes; stale PID files are ignored. <sup>Written for commit b2132d1f14e30051658e523d0818980d629ecc97. Summary will update on new commits.</sup> <!-- End of auto-generated description by cubic. -->
kerem 2026-03-01 18:01:22 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ArchiveBox#3002
No description provided.