[GH-ISSUE #194] [Feature]: disallowed_tools, stale loop detection, and file-modification audit #73

Open
opened 2026-02-27 10:22:04 +03:00 by kerem · 0 comments
Owner

Originally created by @arjhun-personal on GitHub (Feb 25, 2026).
Original GitHub issue: https://github.com/mikeyobrien/ralph-orchestrator/issues/194

Problem

Two classes of bugs observed in production:

  1. Hat role violations — Dispatcher hat implementing code despite instructions not to, because there's no mechanism to restrict tool usage per hat.
  2. Infinite cycling — After all work is done, the loop cycles between hats emitting the same events repeatedly, wasting API credits. In one observed incident, 5 wasted iterations cost ~$1.7.

Phase 1 addressed the immediate cycling bug through preset YAML changes (stronger dispatcher instructions, build.noop escape hatch, max_activations safety nets). Phase 2 provides systemic engine-level protection against the class of bugs.

Proposed Solution

Three layered defenses:

  1. disallowed_tools — Per-hat tool restriction via a prominent prompt section. Significantly reduces LLM tool misuse compared to buried "DON'T" instructions in hat prompts.

  2. Stale loop detection — Hard termination when the same topic appears 3+ times consecutively. Detects infinite cycling and stops the loop before further API credits are wasted.

  3. File-modification audit — Post-iteration detection that emits {hat}.scope_violation events when a hat modifies files outside its expected scope. Presets can route these to trigger corrective action.

Originally created by @arjhun-personal on GitHub (Feb 25, 2026). Original GitHub issue: https://github.com/mikeyobrien/ralph-orchestrator/issues/194 ## Problem Two classes of bugs observed in production: 1. **Hat role violations** — Dispatcher hat implementing code despite instructions not to, because there's no mechanism to restrict tool usage per hat. 2. **Infinite cycling** — After all work is done, the loop cycles between hats emitting the same events repeatedly, wasting API credits. In one observed incident, 5 wasted iterations cost ~$1.7. Phase 1 addressed the immediate cycling bug through preset YAML changes (stronger dispatcher instructions, `build.noop` escape hatch, `max_activations` safety nets). Phase 2 provides systemic engine-level protection against the class of bugs. ## Proposed Solution Three layered defenses: 1. **`disallowed_tools`** — Per-hat tool restriction via a prominent prompt section. Significantly reduces LLM tool misuse compared to buried "DON'T" instructions in hat prompts. 2. **Stale loop detection** — Hard termination when the same topic appears 3+ times consecutively. Detects infinite cycling and stops the loop before further API credits are wasted. 3. **File-modification audit** — Post-iteration detection that emits `{hat}.scope_violation` events when a hat modifies files outside its expected scope. Presets can route these to trigger corrective action.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ralph-orchestrator#73
No description provided.