mirror of
https://github.com/mikeyobrien/ralph-orchestrator.git
synced 2026-04-24 22:55:57 +03:00
[GH-ISSUE #187] [Bug]: default_publishes cascades to LOOP_COMPLETE when agent writes no events in worktree loops #71
Labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ralph-orchestrator#71
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @arjhun-personal on GitHub (Feb 25, 2026).
Original GitHub issue: https://github.com/mikeyobrien/ralph-orchestrator/issues/187
Bug Description
When parallel worktree loops are spawned, they complete in ~3 seconds (3 iterations) without the Claude agent performing any actual work. The
default_publisheschain cascades through all hats toLOOP_COMPLETEbecause the agent writes zero events in each iteration.Steps to Reproduce
ralph run -c preset.yml --autonomous -p "..."ralph rundetects the lock, creates a worktree, and starts a fresh loopPreset Configuration
What Happens
Each iteration, the Claude agent runs but writes no events (never calls
ralph emit). After each iteration,check_default_publishesinevent_loop/mod.rs:1422fires becauseprocess_events_from_jsonlreturnedOk(false):track.build.startactivatestrack_builder→ agent writes nothing → default injectstrack.build.donetrack.build.doneactivatessecurity_reviewer→ agent writes nothing → default injectssecurity_review.passedsecurity_review.passedactivatestrack_reviewer→ agent writes nothing → default injectsLOOP_COMPLETE→ loop terminatesThe loop completes having done zero work. The Coordinator then has to fall back to sequential execution.
Root Cause Analysis
Event isolation is correct — worktree loops get fresh timestamped events files via
run_loop_impl(loop_runner.rs:120-128). The events are NOT inherited from the parent. The ralph agent memorymem-1740470400-a1b2claiming "event history inheritance" is an incorrect self-diagnosis.The actual issue is the
default_publishesfallback design. It treats agent silence as success. When the agent fails to emit events for ANY reason (malformed prompt, missing env, API error, confused context),default_publishesadvances the state machine as if the work was done. Three silent iterations cascade the full hat chain to completion.The code path in
loop_runner.rs:1082-1089:Why the Agent Wrote No Events
The exact cause of agent silence is unclear since worktree session logs were deleted during cleanup. Possible causes:
-pinline prompt passed by the Coordinator was insufficient for the worktree context.env, can't build) and exited without emittingSuggested Fixes
Option 1: Don't allow
default_publisheson the completion promise hat. If a hat'sdefault_publishesmatchescompletion_promise, reject the configuration at startup. The final step should always require explicit agent action.Option 2: Add minimum iteration duration guard. If an iteration completes in < N seconds (e.g., 10s), don't fire
default_publishes. A real Claude API call takes much longer than 1-3 seconds.Option 3: Require agent tool usage before accepting defaults. Before firing
default_publishes, verify the agent performed at least one meaningful tool call (file read/write, bash command). If the agent did nothing, treat it as a failure — not silent success.Option 4: Limit consecutive
default_publishescascading. Ifdefault_publishesfires N times in a row without any agent-written events, terminate with a diagnostic error rather than completing.Environment
arjhun-personal/ralph-orchestratorfork (up to date with upstream)