[GH-ISSUE #193] [Feature]: Hat scope enforcement, event chain validation, and human timeout routing #75

New issue

Open

opened 2026-02-27 10:22:04 +03:00 by kerem · 0 comments

kerem commented

2026-02-27 10:22:04 +03:00

Owner

Originally created by @arjhun-personal on GitHub (Feb 25, 2026).
Original GitHub issue: https://github.com/mikeyobrien/ralph-orchestrator/issues/193

Problem

Agents can bypass hat workflow constraints by emitting events outside their hat's declared publishes list. In one observed failure mode, an agent skipped human approval, never created tasks, and implemented all phases in a single context window — because nothing enforced the hat's declared scope.

Additionally, when RObot (human-in-the-loop) times out waiting for a response, the timeout is silent — it logs a warning and continues with no event, making it invisible in the event log and unroutable to hats.

Proposed Solution

Three layers of defense-in-depth:

Hat scope enforcement — Gate events in process_events_from_jsonl() against the active hat's declared publishes patterns. Out-of-scope events are dropped and replaced with {hat_id}.scope_violation diagnostic events. Coordination mode (no active hat) retains unrestricted publishing.
Event chain validation + loop.cancel — New required_events config field and seen_topics tracking. LOOP_COMPLETE is rejected unless all required events have been seen during the loop's lifetime. A separate loop.cancel event provides clean early termination (human rejection, timeout) without chain validation.
Human timeout routing — wait_for_response() timeout publishes a human.timeout event instead of silently continuing, making timeouts visible and routable to subscriber hats.

All features are opt-in via config (enforce_hat_scope, cancellation_promise, required_events) — zero behavior change for existing users on upgrade.

Originally created by @arjhun-personal on GitHub (Feb 25, 2026). Original GitHub issue: https://github.com/mikeyobrien/ralph-orchestrator/issues/193 ## Problem Agents can bypass hat workflow constraints by emitting events outside their hat's declared `publishes` list. In one observed failure mode, an agent skipped human approval, never created tasks, and implemented all phases in a single context window — because nothing enforced the hat's declared scope. Additionally, when `RObot` (human-in-the-loop) times out waiting for a response, the timeout is silent — it logs a warning and continues with no event, making it invisible in the event log and unroutable to hats. ## Proposed Solution Three layers of defense-in-depth: 1. **Hat scope enforcement** — Gate events in `process_events_from_jsonl()` against the active hat's declared `publishes` patterns. Out-of-scope events are dropped and replaced with `{hat_id}.scope_violation` diagnostic events. Coordination mode (no active hat) retains unrestricted publishing. 2. **Event chain validation + `loop.cancel`** — New `required_events` config field and `seen_topics` tracking. `LOOP_COMPLETE` is rejected unless all required events have been seen during the loop's lifetime. A separate `loop.cancel` event provides clean early termination (human rejection, timeout) without chain validation. 3. **Human timeout routing** — `wait_for_response()` timeout publishes a `human.timeout` event instead of silently continuing, making timeouts visible and routable to subscriber hats. All features are opt-in via config (`enforce_hat_scope`, `cancellation_promise`, `required_events`) — zero behavior change for existing users on upgrade.