mirror of
https://github.com/mikeyobrien/ralph-orchestrator.git
synced 2026-04-25 23:25:57 +03:00
[GH-ISSUE #202] Bug: Worktree loop becomes zombie when worktree directory is removed but registry entry persists #76
Labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ralph-orchestrator#76
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @CoderMageFox on GitHub (Feb 26, 2026).
Original GitHub issue: https://github.com/mikeyobrien/ralph-orchestrator/issues/202
Summary
When a parallel loop's worktree directory is removed (by external process, git operation, or cleanup), the loop registry (
loops.json) retains the entry and the ralph process continues running. This creates a "zombie loop" that appears asrunninginralph loops listbut cannot perform any work, blocking the parallel slot and confusing users into thinking parallel execution is active.Environment
Steps to Reproduce
ralph run -b codex -p "Task A"ralph run -b claude -p "Task B".worktrees/<loop-id>/ralph loops listExpected Behavior
orphanor automatically cleaned upralph loops listshould show accurate statusActual Behavior
ralph loops listshows the loop asrunningloops.json) is still alive but operating on a non-existent directoryralph loops stop <id>fails with "Cannot determine active loop - it may have already stopped"kill <pid>+ editingloops.jsonObserved State
Root Cause Analysis
Three areas in the codebase lack worktree existence validation:
1.
loop_registry.rs—LoopEntry::is_alive()(line ~129)Currently only checks if the PID is alive via
kill(pid, 0). Does not verify the worktree directory exists.Suggested fix: Add worktree path existence check for entries that have
worktree_path:2.
loop_runner.rs— No runtime worktree health checkThe event loop in
run_loop_impldoes not verify the workspace directory exists between iterations. If the worktree is removed mid-execution, the loop continues running but all file operations silently fail or error.Suggested fix: Add a workspace existence check at the start of each iteration:
3.
loops.rs—listcommand doesn't detect orphansThe
listsubcommand displays status based solely on PID liveness. It should cross-reference with actual worktree existence and showorphanstatus when the directory is missing.4.
loops.rs—stopcommand fails on zombie loopsralph loops stop <id>fails because it can't determine the active loop context. It should still be able to kill the PID and clean up the registry entry even when the worktree is gone.Suggested Priority
Medium-High — This silently breaks parallel execution, which is a core v2 feature. Users see "2 loops running" but only 1 is actually working, with no indication of the problem.
Workaround
Manual cleanup: