[PR #197] fix(core): default_publishes of completion_promise must set completion_requested #190

Open
opened 2026-02-27 10:22:39 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/mikeyobrien/ralph-orchestrator/pull/197
Author: @arjhun-personal
Created: 2/25/2026
Status: 🔄 Open

Base: mainHead: fix/default-publishes-completion-requested


📝 Commits (8)

  • 159821d feat(core): add hat scope enforcement, event chain validation, and human timeout routing
  • 63111d9 docs: add upstream PR draft for hat scope enforcement
  • 479e6cb feat(core): make hat enforcement opt-in via config flags
  • dd46589 style: fix rustfmt formatting in hat enforcement code
  • 31c55ef spec: add context window utilization tracking design
  • adce9bc fix(core): record default_publishes topics in seen_topics for chain validation
  • 4da8468 docs: add upstream PR draft for default_publishes seen_topics fix
  • ed40abe fix(core): default_publishes of completion_promise must set completion_requested

📊 Changes

14 files changed (+1361 additions, -4 deletions)

View changed files

📝 crates/ralph-bench/src/main.rs (+1 -0)
📝 crates/ralph-cli/src/display.rs (+1 -0)
📝 crates/ralph-cli/src/loop_runner.rs (+27 -0)
📝 crates/ralph-core/src/config.rs (+21 -0)
📝 crates/ralph-core/src/event_loop/loop_state.rs (+21 -0)
📝 crates/ralph-core/src/event_loop/mod.rs (+144 -4)
📝 crates/ralph-core/src/event_loop/tests.rs (+590 -0)
📝 crates/ralph-core/src/hat_registry.rs (+79 -0)
📝 crates/ralph-core/src/summary_writer.rs (+3 -0)
docs/specs/context-window-utilization.md (+151 -0)
upstream-PRs/default-publishes-seen-topics-body.md (+91 -0)
upstream-PRs/default-publishes-seen-topics.md (+47 -0)
upstream-PRs/hat-scope-enforcement-body.md (+58 -0)
upstream-PRs/hat-scope-enforcement.md (+127 -0)

📄 Description

Relates to #187

Summary

Fixes an infinite loop caused by check_default_publishes not setting completion_requested when the injected topic matches the completion_promise. The loop spins forever — the completion event exists on the bus but check_completion_event() never fires because the flag is only set by process_events_from_jsonl(), which default_publishes bypasses entirely.

Bug

There are two independent code paths that can emit the completion_promise event:

  1. JSONL path (process_events_from_jsonl): Agent writes LOOP_COMPLETE to events JSONL → parsed → completion_requested = truecheck_completion_event() fires → loop terminates.
  2. default_publishes path (check_default_publishes): Agent writes no events → orchestrator injects default event → published to bus → but completion_requested is never setcheck_completion_event() returns None → loop continues forever.

The result: the final hat's default_publishes: "LOOP_COMPLETE" fires a LOOP_COMPLETE event on the bus (which wakes downstream/wildcard hats), but the loop never terminates. It cycles endlessly between hats that keep re-activating each other.

Observed behavior

In a lexis-feature preset with 8 hats:

Iteration 8:  dispatcher → publishes dispatch.start
Iteration 9:  builder → completes, writes no events
              → default_publishes injects "all.built"
              → check_completion_event: completion_requested=false → continues
Iteration 10: dispatcher → re-triggered by all.built
Iteration 11: builder → same cycle repeats
...forever

Steps to Reproduce

  1. Configure a preset where the final hat has default_publishes matching completion_promise:

    event_loop:
      completion_promise: "LOOP_COMPLETE"
      required_events:
        - "all.built"
    
    hats:
      final_committer:
        triggers: ["all.built"]
        publishes: ["LOOP_COMPLETE"]
        default_publishes: "LOOP_COMPLETE"
        instructions: "Verify all work is complete and emit LOOP_COMPLETE"
    
  2. Run the loop. The final_committer hat activates when all.built arrives.

  3. The agent completes its work but does not explicitly write LOOP_COMPLETE to JSONL (this is common — agents follow hat instructions imperfectly).

  4. check_default_publishes injects LOOP_COMPLETE on the bus.

  5. check_completion_event() checks completion_requestedfalse → returns None.

  6. The LOOP_COMPLETE event on the bus wakes the wildcard subscriber (ralph), which re-dispatches to the next triggered hat, starting a new cycle.

  7. The loop never terminates.

Root Cause

check_default_publishes (event_loop/mod.rs:1403) calls self.bus.publish() but never sets self.state.completion_requested. This flag is only set in process_events_from_jsonl (line ~2157) when parsing events from the agent's JSONL output. The default_publishes path bypasses JSONL entirely, so the flag is never set.

// BEFORE (bug): publishes on bus but completion_requested stays false
pub fn check_default_publishes(&mut self, hat_id: &HatId) {
    if let Some(config) = self.registry.get_config(hat_id)
        && let Some(default_topic) = &config.default_publishes
    {
        let default_event = Event::new(default_topic.as_str(), "")
            .with_source(hat_id.clone());
        self.state.record_topic(default_topic.as_str());
        self.bus.publish(default_event);
        // <-- completion_requested never set, even if topic == completion_promise
    }
}

Fix

Added a check: if the default topic matches completion_promise, set completion_requested = true directly:

self.state.record_topic(default_topic.as_str());

// If the default topic is the completion promise, set the flag directly.
// The normal path (process_events_from_jsonl) sets this when reading from
// JSONL, but default_publishes bypasses JSONL entirely.
if default_topic.as_str() == self.config.event_loop.completion_promise {
    info!(
        hat = %hat_id.as_str(),
        topic = %default_topic,
        "default_publishes matches completion_promise — requesting termination"
    );
    self.state.completion_requested = true;
}

self.bus.publish(default_event);

Changes

File Change
crates/ralph-core/src/event_loop/mod.rs +13 lines: completion_requested check in check_default_publishes, doc comment update
crates/ralph-core/src/event_loop/tests.rs +55 lines: new regression test

Tests

New regression test (test_default_publishes_completion_promise_triggers_termination):

  • Configures completion_promise: "LOOP_COMPLETE" with required_events: ["all.built"]
  • Creates a final_committer hat with default_publishes: "LOOP_COMPLETE"
  • Satisfies required_events by writing all.built via JSONL
  • Calls check_default_publishes (simulating agent writing no events)
  • Asserts check_completion_event() returns Some(TerminationReason::CompletionPromise)
  • Previously this returned None, causing the infinite loop

Test Plan

  • cargo test -p ralph-core test_default_publishes — 5 tests pass (including new regression test)
  • cargo test -p ralph-core — 703 tests pass (no regressions)
  • cargo test — full workspace passes

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/mikeyobrien/ralph-orchestrator/pull/197 **Author:** [@arjhun-personal](https://github.com/arjhun-personal) **Created:** 2/25/2026 **Status:** 🔄 Open **Base:** `main` ← **Head:** `fix/default-publishes-completion-requested` --- ### 📝 Commits (8) - [`159821d`](https://github.com/mikeyobrien/ralph-orchestrator/commit/159821ddb1b5bb3dd8aff358a1b3d7d8cedcbdb3) feat(core): add hat scope enforcement, event chain validation, and human timeout routing - [`63111d9`](https://github.com/mikeyobrien/ralph-orchestrator/commit/63111d9e8007ee3e49158d6b1d5c5a1d8a693c4d) docs: add upstream PR draft for hat scope enforcement - [`479e6cb`](https://github.com/mikeyobrien/ralph-orchestrator/commit/479e6cbc5e7e8ccb3bea47c1bb1cc914a37080d4) feat(core): make hat enforcement opt-in via config flags - [`dd46589`](https://github.com/mikeyobrien/ralph-orchestrator/commit/dd46589d6a05cbd1aded82c651d93beb359ab23b) style: fix rustfmt formatting in hat enforcement code - [`31c55ef`](https://github.com/mikeyobrien/ralph-orchestrator/commit/31c55efc8348025302491932f910ee4a3d365572) spec: add context window utilization tracking design - [`adce9bc`](https://github.com/mikeyobrien/ralph-orchestrator/commit/adce9bcd46a364af4aea5f7a710bab5b86d2e062) fix(core): record default_publishes topics in seen_topics for chain validation - [`4da8468`](https://github.com/mikeyobrien/ralph-orchestrator/commit/4da84681accffb10c620857e8010f7af6fc09e97) docs: add upstream PR draft for default_publishes seen_topics fix - [`ed40abe`](https://github.com/mikeyobrien/ralph-orchestrator/commit/ed40abe586ab0390e9338b44d8f00de1d5f37213) fix(core): default_publishes of completion_promise must set completion_requested ### 📊 Changes **14 files changed** (+1361 additions, -4 deletions) <details> <summary>View changed files</summary> 📝 `crates/ralph-bench/src/main.rs` (+1 -0) 📝 `crates/ralph-cli/src/display.rs` (+1 -0) 📝 `crates/ralph-cli/src/loop_runner.rs` (+27 -0) 📝 `crates/ralph-core/src/config.rs` (+21 -0) 📝 `crates/ralph-core/src/event_loop/loop_state.rs` (+21 -0) 📝 `crates/ralph-core/src/event_loop/mod.rs` (+144 -4) 📝 `crates/ralph-core/src/event_loop/tests.rs` (+590 -0) 📝 `crates/ralph-core/src/hat_registry.rs` (+79 -0) 📝 `crates/ralph-core/src/summary_writer.rs` (+3 -0) ➕ `docs/specs/context-window-utilization.md` (+151 -0) ➕ `upstream-PRs/default-publishes-seen-topics-body.md` (+91 -0) ➕ `upstream-PRs/default-publishes-seen-topics.md` (+47 -0) ➕ `upstream-PRs/hat-scope-enforcement-body.md` (+58 -0) ➕ `upstream-PRs/hat-scope-enforcement.md` (+127 -0) </details> ### 📄 Description Relates to #187 ## Summary Fixes an infinite loop caused by `check_default_publishes` not setting `completion_requested` when the injected topic matches the `completion_promise`. The loop spins forever — the completion event exists on the bus but `check_completion_event()` never fires because the flag is only set by `process_events_from_jsonl()`, which `default_publishes` bypasses entirely. ## Bug There are two independent code paths that can emit the `completion_promise` event: 1. **JSONL path** (`process_events_from_jsonl`): Agent writes `LOOP_COMPLETE` to events JSONL → parsed → `completion_requested = true` → `check_completion_event()` fires → loop terminates. 2. **default_publishes path** (`check_default_publishes`): Agent writes no events → orchestrator injects default event → published to bus → but `completion_requested` is **never set** → `check_completion_event()` returns `None` → loop continues forever. The result: the final hat's `default_publishes: "LOOP_COMPLETE"` fires a `LOOP_COMPLETE` event on the bus (which wakes downstream/wildcard hats), but the loop never terminates. It cycles endlessly between hats that keep re-activating each other. ### Observed behavior In a lexis-feature preset with 8 hats: ``` Iteration 8: dispatcher → publishes dispatch.start Iteration 9: builder → completes, writes no events → default_publishes injects "all.built" → check_completion_event: completion_requested=false → continues Iteration 10: dispatcher → re-triggered by all.built Iteration 11: builder → same cycle repeats ...forever ``` ## Steps to Reproduce 1. Configure a preset where the final hat has `default_publishes` matching `completion_promise`: ```yaml event_loop: completion_promise: "LOOP_COMPLETE" required_events: - "all.built" hats: final_committer: triggers: ["all.built"] publishes: ["LOOP_COMPLETE"] default_publishes: "LOOP_COMPLETE" instructions: "Verify all work is complete and emit LOOP_COMPLETE" ``` 2. Run the loop. The final_committer hat activates when `all.built` arrives. 3. The agent completes its work but does not explicitly write `LOOP_COMPLETE` to JSONL (this is common — agents follow hat instructions imperfectly). 4. `check_default_publishes` injects `LOOP_COMPLETE` on the bus. 5. `check_completion_event()` checks `completion_requested` → `false` → returns `None`. 6. The `LOOP_COMPLETE` event on the bus wakes the wildcard subscriber (ralph), which re-dispatches to the next triggered hat, starting a new cycle. 7. The loop never terminates. ## Root Cause `check_default_publishes` (`event_loop/mod.rs:1403`) calls `self.bus.publish()` but never sets `self.state.completion_requested`. This flag is only set in `process_events_from_jsonl` (line ~2157) when parsing events from the agent's JSONL output. The `default_publishes` path bypasses JSONL entirely, so the flag is never set. ```rust // BEFORE (bug): publishes on bus but completion_requested stays false pub fn check_default_publishes(&mut self, hat_id: &HatId) { if let Some(config) = self.registry.get_config(hat_id) && let Some(default_topic) = &config.default_publishes { let default_event = Event::new(default_topic.as_str(), "") .with_source(hat_id.clone()); self.state.record_topic(default_topic.as_str()); self.bus.publish(default_event); // <-- completion_requested never set, even if topic == completion_promise } } ``` ## Fix Added a check: if the default topic matches `completion_promise`, set `completion_requested = true` directly: ```rust self.state.record_topic(default_topic.as_str()); // If the default topic is the completion promise, set the flag directly. // The normal path (process_events_from_jsonl) sets this when reading from // JSONL, but default_publishes bypasses JSONL entirely. if default_topic.as_str() == self.config.event_loop.completion_promise { info!( hat = %hat_id.as_str(), topic = %default_topic, "default_publishes matches completion_promise — requesting termination" ); self.state.completion_requested = true; } self.bus.publish(default_event); ``` ## Changes | File | Change | |------|--------| | `crates/ralph-core/src/event_loop/mod.rs` | +13 lines: completion_requested check in `check_default_publishes`, doc comment update | | `crates/ralph-core/src/event_loop/tests.rs` | +55 lines: new regression test | ## Tests **New regression test** (`test_default_publishes_completion_promise_triggers_termination`): - Configures `completion_promise: "LOOP_COMPLETE"` with `required_events: ["all.built"]` - Creates a `final_committer` hat with `default_publishes: "LOOP_COMPLETE"` - Satisfies `required_events` by writing `all.built` via JSONL - Calls `check_default_publishes` (simulating agent writing no events) - Asserts `check_completion_event()` returns `Some(TerminationReason::CompletionPromise)` - Previously this returned `None`, causing the infinite loop ## Test Plan - [x] `cargo test -p ralph-core test_default_publishes` — 5 tests pass (including new regression test) - [x] `cargo test -p ralph-core` — 703 tests pass (no regressions) - [x] `cargo test` — full workspace passes --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ralph-orchestrator#190
No description provided.