[GH-ISSUE #194] [Feature]: disallowed_tools, stale loop detection, and file-modification audit #73

New issue

Open

opened 2026-02-27 10:22:04 +03:00 by kerem · 0 comments

kerem commented

2026-02-27 10:22:04 +03:00

Owner

Originally created by @arjhun-personal on GitHub (Feb 25, 2026).
Original GitHub issue: https://github.com/mikeyobrien/ralph-orchestrator/issues/194

Problem

Two classes of bugs observed in production:

Hat role violations — Dispatcher hat implementing code despite instructions not to, because there's no mechanism to restrict tool usage per hat.
Infinite cycling — After all work is done, the loop cycles between hats emitting the same events repeatedly, wasting API credits. In one observed incident, 5 wasted iterations cost ~$1.7.

Phase 1 addressed the immediate cycling bug through preset YAML changes (stronger dispatcher instructions, build.noop escape hatch, max_activations safety nets). Phase 2 provides systemic engine-level protection against the class of bugs.

Proposed Solution

Three layered defenses:

disallowed_tools — Per-hat tool restriction via a prominent prompt section. Significantly reduces LLM tool misuse compared to buried "DON'T" instructions in hat prompts.
Stale loop detection — Hard termination when the same topic appears 3+ times consecutively. Detects infinite cycling and stops the loop before further API credits are wasted.
File-modification audit — Post-iteration detection that emits {hat}.scope_violation events when a hat modifies files outside its expected scope. Presets can route these to trigger corrective action.

Originally created by @arjhun-personal on GitHub (Feb 25, 2026). Original GitHub issue: https://github.com/mikeyobrien/ralph-orchestrator/issues/194 ## Problem Two classes of bugs observed in production: 1. **Hat role violations** — Dispatcher hat implementing code despite instructions not to, because there's no mechanism to restrict tool usage per hat. 2. **Infinite cycling** — After all work is done, the loop cycles between hats emitting the same events repeatedly, wasting API credits. In one observed incident, 5 wasted iterations cost ~$1.7. Phase 1 addressed the immediate cycling bug through preset YAML changes (stronger dispatcher instructions, `build.noop` escape hatch, `max_activations` safety nets). Phase 2 provides systemic engine-level protection against the class of bugs. ## Proposed Solution Three layered defenses: 1. **`disallowed_tools`** — Per-hat tool restriction via a prominent prompt section. Significantly reduces LLM tool misuse compared to buried "DON'T" instructions in hat prompts. 2. **Stale loop detection** — Hard termination when the same topic appears 3+ times consecutively. Detects infinite cycling and stops the loop before further API credits are wasted. 3. **File-modification audit** — Post-iteration detection that emits `{hat}.scope_violation` events when a hat modifies files outside its expected scope. Presets can route these to trigger corrective action.