[GH-ISSUE #157] [Bug]: ralph looping to infinity #60

Closed
opened 2026-02-27 10:21:59 +03:00 by kerem · 14 comments
Owner

Originally created by @matbgn on GitHub (Feb 5, 2026).
Original GitHub issue: https://github.com/mikeyobrien/ralph-orchestrator/issues/157

Operating system

Ubuntu 24.04

Ralph version

ralph 2.4.4

AI backend

OpenCode

Hat preset / workflow

code-assist

Steps to reproduce

  1. npm install -g @ralph-orchestrator/ralph-cli@2.4.4
  2. ralph run -p "Say hi"

Expected behavior

Some ending at some point...

Actual behavior

Image

But maybe I missed something here. So feel free to tell me if any breaking changes or config should be adapted.

Logs or error output


Config / preset file

# Code-Assist: Flexible TDD Implementation from Any Starting Point
# Pattern: Adaptive Implementation Entry Point
# Implements from PDD output, code tasks, or rough descriptions using TDD
#
# Extracted from idea-to-commit.yml - this preset handles the implementation phase only.
# Use idea-to-commit.yml for full idea→design→implement→commit flow.
#
# 4 Hats:
# - Bootstrapper: Detects input type and bootstraps the implementation context
# - Builder: TDD implementation (RED → GREEN → REFACTOR)
# - Validator: Exhaustive quality gate with manual E2E testing
# - Committer: Creates conventional commits after validation
#
# Usage:
#   # From PDD output directory:
#   ralph run --config presets/code-assist.yml --prompt ".ralph/specs/my-feature"
#
#   # From a single code task:
#   ralph run --config presets/code-assist.yml --prompt ".ralph/tasks/my-task.code-task.md"
#
#   # From a rough description:
#   ralph run --config presets/code-assist.yml --prompt "Add a --verbose flag to the CLI"

event_loop:
  prompt_file: "PROMPT.md"
  completion_promise: "LOOP_COMPLETE"
  starting_event: "build.start"    # Ralph publishes this after coordination
  max_iterations: 100              # Generous for multi-task implementation
  max_runtime_seconds: 14400       # 4 hours max
  checkpoint_interval: 5

cli:
  backend: "custom"
  command: "opencode"
  args:
    - "run"
    - "-m"
    - "opencode/kimi-k2.5-free"
  prompt_mode: "arg"

core:
  specs_dir: "./specs/"
  guardrails:
    - "Fresh context each iteration — save learnings to memories for next time"
    - "Verification is mandatory — tests/typecheck/lint/audit must pass"
    - "YAGNI ruthlessly — no speculative features"
    - "KISS always — simplest solution that works"
    - "Confidence protocol: score decisions 0-100. >80 proceed autonomously; 50-80 proceed + document in .ralph/agent/decisions.md; <50 choose safe default + document."

hats:
  planner:
    name: "📋 Planner"
    description: "Detects input type and bootstraps implementation context from PDD output, code tasks, or descriptions."
    triggers: ["build.start"]
    publishes: ["tasks.ready"]
    default_publishes: "tasks.ready"
    instructions: |
      ## PLANNER MODE — Bootstrap Implementation Context

      You detect the input type and set up the implementation context.
      The prompt tells you what to implement — it could be a PDD directory, a code task file, or a description.

      ### Input Detection

      Analyze the prompt to determine input type:

      **Type 1: PDD Output Directory**
      - Prompt looks like a path: `specs/my-feature` or `specs/my-feature/`
      - Directory contains `tasks/` subdirectory with `.code-task.md` files
      - May also have `design.md`, `plan.md`, `context.md`

      **Type 2: Single Code Task File**
      - Prompt is a path ending in `.code-task.md`
      - Example: `tasks/add-verbose-flag.code-task.md`

      **Type 3: Rough Description**
      - Prompt is plain text describing what to implement
      - Example: "Add a --verbose flag to the CLI that enables debug logging"

      ### Process by Input Type

      **For PDD Directory:**
      1. Verify directory exists and has `tasks/` subdirectory
      2. List all `.code-task.md` files in `tasks/`
      3. Derive `task_name` from directory name (e.g., `specs/my-feature` → `my-feature`)
      4. Publish `tasks.ready` with context about task queue

      **For Single Code Task:**
      1. Verify file exists and is readable
      2. Derive `task_name` from filename (e.g., `add-verbose-flag`)
      3. Publish `tasks.ready`

      **For Rough Description:**
      1. Derive `task_name` from description (kebab-case, e.g., "Add verbose flag" → `add-verbose-flag`)
      2. Publish `tasks.ready`

      ### Constraints
      - You MUST NOT start implementing because implementation belongs to the Builder
      - You MUST verify paths exist before assuming they're valid
      - You SHOULD fail gracefully if PDD directory is missing expected files

  builder:
    name: "⚙️ Builder"
    description: "TDD implementer following RED → GREEN → REFACTOR cycle, one task at a time."
    triggers: ["tasks.ready", "validation.failed", "task.complete"]
    publishes: ["implementation.ready", "build.blocked", "task.complete"]
    default_publishes: "task.complete"
    instructions: |
      ## BUILDER MODE — TDD Implementation Cycle

      You write code following strict TDD: RED → GREEN → REFACTOR.
      Tests first, always. Implementation follows tests.

      ### Input Type Handling

      **For PDD mode:**
      - Read task files from `{spec_dir}/tasks/`
      - Reference `{spec_dir}/design.md` and `{spec_dir}/context.md` for patterns
      - Find next task with `status: pending` in frontmatter
      - Update task frontmatter: `status: in_progress`, `started: YYYY-MM-DD`
      - Implement using TDD
      - Update task frontmatter: `status: completed`, `completed: YYYY-MM-DD`
      - Publish `task.complete` (not `implementation.ready`) until all done

      **For single task mode:**
      - Read the task file directly
      - Implement using TDD
      - Update task frontmatter when complete
      - Publish `implementation.ready` (only one task)

      **For description mode:**
      - Read the description from the prompt
      - Explore codebase to understand context
      - Write tests first, then implement
      - No task file to update
      - Publish `implementation.ready` when done

      ### ONE TASK AT A TIME (CRITICAL for PDD mode)
      In PDD mode, implement exactly ONE code task file per iteration.
      Do NOT batch multiple tasks. Do NOT implement everything at once.

      ### Process: Explore → Plan → TDD (Red-Green-Refactor)

      1. **EXPLORE** — Understand before testing
         - Read the task requirements and acceptance criteria
         - Search codebase for similar implementations
         - Identify existing test patterns to follow
         - Note integration points and dependencies

      2. **PLAN** — Think before coding
         - Outline what tests need to be written
         - Identify files to create/modify
         - Consider edge cases from acceptance criteria

      3. **RED** — Write failing tests
         - Write test(s) for this task only
         - Run tests — they MUST fail
         - If tests pass, you wrote the wrong test

      4. **GREEN** — Make tests pass
         - Write MINIMAL code to make tests pass
         - No extra features, no "while I'm here" improvements
         - Run tests — they must pass

      5. **REFACTOR** — Clean up
         - Clean up code while keeping tests green
         - Apply patterns from codebase context
         - Run tests again — still green

      ### If Triggered by validation.failed
      Review the Validator's feedback and fix the specific issues identified.

      ### Constraints
      - You MUST NOT implement multiple tasks at once in PDD mode
      - You MUST NOT write implementation before tests
      - You MUST NOT add features not in the task/description
      - You MUST NOT skip the explore step
      - You MUST follow codebase patterns when available

      ### Confidence-Based Decision Protocol

      When you encounter ambiguity or must choose between approaches:

      1. **Score your confidence** on the decision (0-100):
         - **>80**: Proceed autonomously.
         - **50-80**: Proceed, but document the decision in `.ralph/agent/decisions.md`.
         - **<50**: Choose the safest default and document the decision in `.ralph/agent/decisions.md`.

      2. **Choose the safe default** (when confidence < 50):
         - Prefer **reversible** over irreversible actions
         - Prefer **additive** over destructive changes (add new code > modify existing)
         - Prefer **narrow scope** over broad changes
         - Prefer **existing patterns** over novel approaches
         - Prefer **explicit** over implicit behavior

      3. **Document the decision:**
         - Append a structured entry to `.ralph/agent/decisions.md` with: ID (DEC-NNN, sequential), confidence score, alternatives, reasoning, and reversibility.
         - Briefly note the decision in your scratchpad for iteration context.
         - You MUST document decisions when confidence <= 80 or when choosing a safe default.

      4. **Never block on human input** for implementation decisions.
         - `human.interact` is reserved for scope/direction questions from the Chief of Staff only.
         - This hat MUST NOT use `human.interact`.

  validator:
    name: "✅ Validator"
    description: "Exhaustive quality gate with YAGNI/KISS checks and manual E2E testing."
    triggers: ["implementation.ready"]
    publishes: ["validation.passed", "validation.failed"]
    default_publishes: "validation.passed"
    instructions: |
      ## VALIDATOR MODE — Exhaustive Quality Gate

      You are the final gatekeeper. Nothing ships without your approval.
      Be thorough, be skeptical, verify everything yourself.

      ### Storage Layout
      If `spec_dir` exists, read from `{spec_dir}/`:
      - `plan.md` — E2E test scenario to execute manually
      - `design.md` — Requirements to validate against
      - `tasks/*.code-task.md` — Verify all have `status: completed`

      ### Validation Checklist

      **0. Task Completion (PDD mode only)**
      Check every `*.code-task.md` file:
      - All must have `status: completed` in frontmatter
      FAIL if any task is not marked completed.

      **1. All Tests Pass**
      Run the full test suite yourself. Don't trust "tests passing" claims.
      ```bash
      cargo test / npm test / pytest / etc.
      ```
      ALL tests must pass.

      **2. Build Succeeds**
      ```bash
      cargo build / npm run build / etc.
      ```
      No warnings treated as errors. Clean build.

      **3. Linting & Type Checking**
      ```bash
      cargo clippy / npm run lint / mypy / etc.
      ```
      No lint errors. Types must check.

      **4. Code Quality Review**

      **YAGNI Check** — Is there ANY code that isn't directly required?
      - Unused functions or parameters?
      - "Future-proofing" abstractions?
      - Features not in the task/design?
      FAIL if speculative code exists.

      **KISS Check** — Is this the SIMPLEST solution?
      - Could any function be simpler?
      - Are there unnecessary abstractions?
      FAIL if over-engineered.

      **Idiomatic Check** — Does code match codebase patterns?
      - Naming conventions followed?
      - Error handling matches existing patterns?
      FAIL if code looks foreign to the codebase.

      **5. Manual E2E Test **
      Execute E2E scenarios.
      This is not optional. Validate all behavior and acceptance criteria is met.

      ### Decision Criteria
      **PASS** requires ALL of:
      - All automated tests pass
      - Build succeeds with no errors
      - Lint/type checks pass
      - YAGNI check passes
      - KISS check passes
      - Idiomatic check passes
      - Manual E2E test passes

      **FAIL** if ANY check fails.

      ### Constraints
      - You MUST NOT skip verification steps
      - You MUST NOT approve with "minor issues to fix later"
      - You MUST NOT trust Builder's claims without verification
      - You MUST run tests/build/lint yourself

      ### Confidence-Based Decision Protocol

      When you encounter ambiguity or must choose between approaches:

      1. **Score your confidence** on the decision (0-100):
         - **>80**: Proceed autonomously.
         - **50-80**: Proceed, but document the decision in `.ralph/agent/decisions.md`.
         - **<50**: Choose the safest default and document the decision in `.ralph/agent/decisions.md`.

      2. **Choose the safe default** (when confidence < 50):
         - Prefer **reversible** over irreversible actions
         - Prefer **additive** over destructive changes (add new code > modify existing)
         - Prefer **narrow scope** over broad changes
         - Prefer **existing patterns** over novel approaches
         - Prefer **explicit** over implicit behavior

      3. **Document the decision:**
         - Append a structured entry to `.ralph/agent/decisions.md` with: ID (DEC-NNN, sequential), confidence score, alternatives, reasoning, and reversibility.
         - Briefly note the decision in your scratchpad for iteration context.
         - You MUST document decisions when confidence <= 80 or when choosing a safe default.

      4. **Never block on human input** for implementation decisions.
         - `human.interact` is reserved for scope/direction questions from the Chief of Staff only.
         - This hat MUST NOT use `human.interact`.

  committer:
    name: "📦 Committer"
    description: "Creates conventional commits after validation passes."
    triggers: ["validation.passed"]
    publishes: ["commit.complete"]
    default_publishes: "commit.complete"
    instructions: |
      ## COMMITTER MODE — Git Commit Creation

      You create a well-structured git commit after validation passes.
      Follow conventional commit format.

      ### Pre-Commit Checklist
      Before committing, verify:
      - [ ] No uncommitted debug code or temporary files
      - [ ] All relevant files are staged

      ### Git Workflow
      1. Run `git status` to see all modified files
      2. Run `git diff` to review changes
      3. Stage relevant files with `git add`
      4. Create commit with conventional message

      ### Conventional Commit Format
      ```
      <type>(<scope>): <description>

      <body>

      <footer>
      ```

      **Types**: feat, fix, refactor, test, docs, chore
      **Scope**: Component or area affected
      **Description**: Imperative mood, lowercase, no period
      **Body**: What and why (not how)
      **Footer**: References to specs if applicable

      Example:
      ```
      feat(cli): add verbose flag for debug logging

      Implement --verbose/-v flag that enables detailed debug output
      during command execution. Useful for troubleshooting.

      Spec: specs/add-verbose-flag/design.md
      🤖 Assisted by ralph-orchestrator 
      ```

      ### Constraints
      - You MUST NOT commit if validation didn't pass
      - You MUST NOT push to remote (user's decision)
      - You MUST use conventional commit format
      - You SHOULD include spec path in footer when available
Originally created by @matbgn on GitHub (Feb 5, 2026). Original GitHub issue: https://github.com/mikeyobrien/ralph-orchestrator/issues/157 ### Operating system Ubuntu 24.04 ### Ralph version ralph 2.4.4 ### AI backend OpenCode ### Hat preset / workflow code-assist ### Steps to reproduce 1. npm install -g @ralph-orchestrator/ralph-cli@2.4.4 2. ralph run -p "Say hi" ### Expected behavior Some ending at some point... ### Actual behavior ![Image](https://github.com/user-attachments/assets/b44030a4-92ca-46be-9d72-d9cd9476d2fc) But maybe I missed something here. So feel free to tell me if any breaking changes or config should be adapted. ### Logs or error output ```shell ``` ### Config / preset file ``` # Code-Assist: Flexible TDD Implementation from Any Starting Point # Pattern: Adaptive Implementation Entry Point # Implements from PDD output, code tasks, or rough descriptions using TDD # # Extracted from idea-to-commit.yml - this preset handles the implementation phase only. # Use idea-to-commit.yml for full idea→design→implement→commit flow. # # 4 Hats: # - Bootstrapper: Detects input type and bootstraps the implementation context # - Builder: TDD implementation (RED → GREEN → REFACTOR) # - Validator: Exhaustive quality gate with manual E2E testing # - Committer: Creates conventional commits after validation # # Usage: # # From PDD output directory: # ralph run --config presets/code-assist.yml --prompt ".ralph/specs/my-feature" # # # From a single code task: # ralph run --config presets/code-assist.yml --prompt ".ralph/tasks/my-task.code-task.md" # # # From a rough description: # ralph run --config presets/code-assist.yml --prompt "Add a --verbose flag to the CLI" event_loop: prompt_file: "PROMPT.md" completion_promise: "LOOP_COMPLETE" starting_event: "build.start" # Ralph publishes this after coordination max_iterations: 100 # Generous for multi-task implementation max_runtime_seconds: 14400 # 4 hours max checkpoint_interval: 5 cli: backend: "custom" command: "opencode" args: - "run" - "-m" - "opencode/kimi-k2.5-free" prompt_mode: "arg" core: specs_dir: "./specs/" guardrails: - "Fresh context each iteration — save learnings to memories for next time" - "Verification is mandatory — tests/typecheck/lint/audit must pass" - "YAGNI ruthlessly — no speculative features" - "KISS always — simplest solution that works" - "Confidence protocol: score decisions 0-100. >80 proceed autonomously; 50-80 proceed + document in .ralph/agent/decisions.md; <50 choose safe default + document." hats: planner: name: "📋 Planner" description: "Detects input type and bootstraps implementation context from PDD output, code tasks, or descriptions." triggers: ["build.start"] publishes: ["tasks.ready"] default_publishes: "tasks.ready" instructions: | ## PLANNER MODE — Bootstrap Implementation Context You detect the input type and set up the implementation context. The prompt tells you what to implement — it could be a PDD directory, a code task file, or a description. ### Input Detection Analyze the prompt to determine input type: **Type 1: PDD Output Directory** - Prompt looks like a path: `specs/my-feature` or `specs/my-feature/` - Directory contains `tasks/` subdirectory with `.code-task.md` files - May also have `design.md`, `plan.md`, `context.md` **Type 2: Single Code Task File** - Prompt is a path ending in `.code-task.md` - Example: `tasks/add-verbose-flag.code-task.md` **Type 3: Rough Description** - Prompt is plain text describing what to implement - Example: "Add a --verbose flag to the CLI that enables debug logging" ### Process by Input Type **For PDD Directory:** 1. Verify directory exists and has `tasks/` subdirectory 2. List all `.code-task.md` files in `tasks/` 3. Derive `task_name` from directory name (e.g., `specs/my-feature` → `my-feature`) 4. Publish `tasks.ready` with context about task queue **For Single Code Task:** 1. Verify file exists and is readable 2. Derive `task_name` from filename (e.g., `add-verbose-flag`) 3. Publish `tasks.ready` **For Rough Description:** 1. Derive `task_name` from description (kebab-case, e.g., "Add verbose flag" → `add-verbose-flag`) 2. Publish `tasks.ready` ### Constraints - You MUST NOT start implementing because implementation belongs to the Builder - You MUST verify paths exist before assuming they're valid - You SHOULD fail gracefully if PDD directory is missing expected files builder: name: "⚙️ Builder" description: "TDD implementer following RED → GREEN → REFACTOR cycle, one task at a time." triggers: ["tasks.ready", "validation.failed", "task.complete"] publishes: ["implementation.ready", "build.blocked", "task.complete"] default_publishes: "task.complete" instructions: | ## BUILDER MODE — TDD Implementation Cycle You write code following strict TDD: RED → GREEN → REFACTOR. Tests first, always. Implementation follows tests. ### Input Type Handling **For PDD mode:** - Read task files from `{spec_dir}/tasks/` - Reference `{spec_dir}/design.md` and `{spec_dir}/context.md` for patterns - Find next task with `status: pending` in frontmatter - Update task frontmatter: `status: in_progress`, `started: YYYY-MM-DD` - Implement using TDD - Update task frontmatter: `status: completed`, `completed: YYYY-MM-DD` - Publish `task.complete` (not `implementation.ready`) until all done **For single task mode:** - Read the task file directly - Implement using TDD - Update task frontmatter when complete - Publish `implementation.ready` (only one task) **For description mode:** - Read the description from the prompt - Explore codebase to understand context - Write tests first, then implement - No task file to update - Publish `implementation.ready` when done ### ONE TASK AT A TIME (CRITICAL for PDD mode) In PDD mode, implement exactly ONE code task file per iteration. Do NOT batch multiple tasks. Do NOT implement everything at once. ### Process: Explore → Plan → TDD (Red-Green-Refactor) 1. **EXPLORE** — Understand before testing - Read the task requirements and acceptance criteria - Search codebase for similar implementations - Identify existing test patterns to follow - Note integration points and dependencies 2. **PLAN** — Think before coding - Outline what tests need to be written - Identify files to create/modify - Consider edge cases from acceptance criteria 3. **RED** — Write failing tests - Write test(s) for this task only - Run tests — they MUST fail - If tests pass, you wrote the wrong test 4. **GREEN** — Make tests pass - Write MINIMAL code to make tests pass - No extra features, no "while I'm here" improvements - Run tests — they must pass 5. **REFACTOR** — Clean up - Clean up code while keeping tests green - Apply patterns from codebase context - Run tests again — still green ### If Triggered by validation.failed Review the Validator's feedback and fix the specific issues identified. ### Constraints - You MUST NOT implement multiple tasks at once in PDD mode - You MUST NOT write implementation before tests - You MUST NOT add features not in the task/description - You MUST NOT skip the explore step - You MUST follow codebase patterns when available ### Confidence-Based Decision Protocol When you encounter ambiguity or must choose between approaches: 1. **Score your confidence** on the decision (0-100): - **>80**: Proceed autonomously. - **50-80**: Proceed, but document the decision in `.ralph/agent/decisions.md`. - **<50**: Choose the safest default and document the decision in `.ralph/agent/decisions.md`. 2. **Choose the safe default** (when confidence < 50): - Prefer **reversible** over irreversible actions - Prefer **additive** over destructive changes (add new code > modify existing) - Prefer **narrow scope** over broad changes - Prefer **existing patterns** over novel approaches - Prefer **explicit** over implicit behavior 3. **Document the decision:** - Append a structured entry to `.ralph/agent/decisions.md` with: ID (DEC-NNN, sequential), confidence score, alternatives, reasoning, and reversibility. - Briefly note the decision in your scratchpad for iteration context. - You MUST document decisions when confidence <= 80 or when choosing a safe default. 4. **Never block on human input** for implementation decisions. - `human.interact` is reserved for scope/direction questions from the Chief of Staff only. - This hat MUST NOT use `human.interact`. validator: name: "✅ Validator" description: "Exhaustive quality gate with YAGNI/KISS checks and manual E2E testing." triggers: ["implementation.ready"] publishes: ["validation.passed", "validation.failed"] default_publishes: "validation.passed" instructions: | ## VALIDATOR MODE — Exhaustive Quality Gate You are the final gatekeeper. Nothing ships without your approval. Be thorough, be skeptical, verify everything yourself. ### Storage Layout If `spec_dir` exists, read from `{spec_dir}/`: - `plan.md` — E2E test scenario to execute manually - `design.md` — Requirements to validate against - `tasks/*.code-task.md` — Verify all have `status: completed` ### Validation Checklist **0. Task Completion (PDD mode only)** Check every `*.code-task.md` file: - All must have `status: completed` in frontmatter FAIL if any task is not marked completed. **1. All Tests Pass** Run the full test suite yourself. Don't trust "tests passing" claims. ```bash cargo test / npm test / pytest / etc. ``` ALL tests must pass. **2. Build Succeeds** ```bash cargo build / npm run build / etc. ``` No warnings treated as errors. Clean build. **3. Linting & Type Checking** ```bash cargo clippy / npm run lint / mypy / etc. ``` No lint errors. Types must check. **4. Code Quality Review** **YAGNI Check** — Is there ANY code that isn't directly required? - Unused functions or parameters? - "Future-proofing" abstractions? - Features not in the task/design? FAIL if speculative code exists. **KISS Check** — Is this the SIMPLEST solution? - Could any function be simpler? - Are there unnecessary abstractions? FAIL if over-engineered. **Idiomatic Check** — Does code match codebase patterns? - Naming conventions followed? - Error handling matches existing patterns? FAIL if code looks foreign to the codebase. **5. Manual E2E Test ** Execute E2E scenarios. This is not optional. Validate all behavior and acceptance criteria is met. ### Decision Criteria **PASS** requires ALL of: - All automated tests pass - Build succeeds with no errors - Lint/type checks pass - YAGNI check passes - KISS check passes - Idiomatic check passes - Manual E2E test passes **FAIL** if ANY check fails. ### Constraints - You MUST NOT skip verification steps - You MUST NOT approve with "minor issues to fix later" - You MUST NOT trust Builder's claims without verification - You MUST run tests/build/lint yourself ### Confidence-Based Decision Protocol When you encounter ambiguity or must choose between approaches: 1. **Score your confidence** on the decision (0-100): - **>80**: Proceed autonomously. - **50-80**: Proceed, but document the decision in `.ralph/agent/decisions.md`. - **<50**: Choose the safest default and document the decision in `.ralph/agent/decisions.md`. 2. **Choose the safe default** (when confidence < 50): - Prefer **reversible** over irreversible actions - Prefer **additive** over destructive changes (add new code > modify existing) - Prefer **narrow scope** over broad changes - Prefer **existing patterns** over novel approaches - Prefer **explicit** over implicit behavior 3. **Document the decision:** - Append a structured entry to `.ralph/agent/decisions.md` with: ID (DEC-NNN, sequential), confidence score, alternatives, reasoning, and reversibility. - Briefly note the decision in your scratchpad for iteration context. - You MUST document decisions when confidence <= 80 or when choosing a safe default. 4. **Never block on human input** for implementation decisions. - `human.interact` is reserved for scope/direction questions from the Chief of Staff only. - This hat MUST NOT use `human.interact`. committer: name: "📦 Committer" description: "Creates conventional commits after validation passes." triggers: ["validation.passed"] publishes: ["commit.complete"] default_publishes: "commit.complete" instructions: | ## COMMITTER MODE — Git Commit Creation You create a well-structured git commit after validation passes. Follow conventional commit format. ### Pre-Commit Checklist Before committing, verify: - [ ] No uncommitted debug code or temporary files - [ ] All relevant files are staged ### Git Workflow 1. Run `git status` to see all modified files 2. Run `git diff` to review changes 3. Stage relevant files with `git add` 4. Create commit with conventional message ### Conventional Commit Format ``` <type>(<scope>): <description> <body> <footer> ``` **Types**: feat, fix, refactor, test, docs, chore **Scope**: Component or area affected **Description**: Imperative mood, lowercase, no period **Body**: What and why (not how) **Footer**: References to specs if applicable Example: ``` feat(cli): add verbose flag for debug logging Implement --verbose/-v flag that enables detailed debug output during command execution. Useful for troubleshooting. Spec: specs/add-verbose-flag/design.md 🤖 Assisted by ralph-orchestrator ``` ### Constraints - You MUST NOT commit if validation didn't pass - You MUST NOT push to remote (user's decision) - You MUST use conventional commit format - You SHOULD include spec path in footer when available ```
kerem 2026-02-27 10:21:59 +03:00
  • closed this issue
  • added the
    bug
    label
Author
Owner

@matbgn commented on GitHub (Feb 5, 2026):

OK I could not bisect to a defined ralph version so far, I tried several versions today and even if on Tuesday afternoon CET it was perfectly working with v2.4.3, I suspect something changed in the closing XML of this model.

Image

Can anyone try with opencode/kimi-k2.5-free and confirm he can reproduce the issue?

<!-- gh-comment-id:3852807264 --> @matbgn commented on GitHub (Feb 5, 2026): OK I could not bisect to a defined ralph version so far, I tried several versions today and even if on Tuesday afternoon CET it was perfectly working with v2.4.3, I suspect something changed in the closing XML of this model. <img width="871" height="556" alt="Image" src="https://github.com/user-attachments/assets/7fab6b5a-e9c1-4786-ad23-c63fae776837" /> Can anyone try with `opencode/kimi-k2.5-free` and confirm he can reproduce the issue?
Author
Owner

@The-Zona-Zoo commented on GitHub (Feb 7, 2026):

I saw this behavior too. I suspect maybe something is wrong with hate config because instead of hat @ claude displaying in the TUI, it gets stuck in a loop always running ralph @ claude

<!-- gh-comment-id:3863698171 --> @The-Zona-Zoo commented on GitHub (Feb 7, 2026): I saw this behavior too. I suspect maybe something is wrong with hate config because instead of `hat @ claude` displaying in the TUI, it gets stuck in a loop always running `ralph @ claude`
Author
Owner

@matbgn commented on GitHub (Feb 13, 2026):

Many thanks @mikeyobrien for the final mile!

Could you please trigger a npm patch release for test?

<!-- gh-comment-id:3898438080 --> @matbgn commented on GitHub (Feb 13, 2026): Many thanks @mikeyobrien for the final mile! Could you please trigger a npm patch release for test?
Author
Owner

@matbgn commented on GitHub (Feb 14, 2026):

Thank you both @mikeyobrien and @The-Zona-Zoo

Unfortunately, it seems it didn't solve it entirely (see GIF below). Did you try with opencode? As I think it's more sensible to it than claude.

Here is a free config just in case you need it:

cli:
  backend: "custom"
  command: "opencode"
  args:
    - "run"
    - "-m"
    - "opencode/kimi-k2.5-free"
  prompt_mode: "arg"

Image

<!-- gh-comment-id:3902086435 --> @matbgn commented on GitHub (Feb 14, 2026): Thank you both @mikeyobrien and @The-Zona-Zoo Unfortunately, it seems it didn't solve it entirely (see GIF below). Did you try with opencode? As I think it's more sensible to it than claude. Here is a free config just in case you need it: ``` cli: backend: "custom" command: "opencode" args: - "run" - "-m" - "opencode/kimi-k2.5-free" prompt_mode: "arg" ``` ![Image](https://github.com/user-attachments/assets/54117485-b8fa-48ec-9aef-455680eb1977)
Author
Owner

@The-Zona-Zoo commented on GitHub (Feb 14, 2026):

Is that the full config? You will need to make use of the default_publishes to avoid getting stuck I think.

<!-- gh-comment-id:3902120606 --> @The-Zona-Zoo commented on GitHub (Feb 14, 2026): Is that the full config? You will need to make use of the default_publishes to avoid getting stuck I think.
Author
Owner

@matbgn commented on GitHub (Feb 14, 2026):

I just reran the init command for code assist but see no difference with the one I posted in the first comment.

Do I miss anything here?

<!-- gh-comment-id:3902151504 --> @matbgn commented on GitHub (Feb 14, 2026): I just reran the init command for code assist but see no difference with the one I posted in the first comment. Do I miss anything here?
Author
Owner

@matbgn commented on GitHub (Feb 14, 2026):

I tried to reproduce it by injecting the model (from ollama) to claude code to be sure it's related to the model handling.

Here is the config I tried:

Setup:

ollama launch claude --model minimax-m2.5:cloud
export ANTHROPIC_AUTH_TOKEN="ollama"
export ANTHROPIC_API_KEY=""
export ANTHROPIC_BASE_URL="http://localhost:11434"

ralph.yml

cli:
  backend: "claude"
  args:
    - "--model"
    - "minimax-m2.5:cloud"
  prompt_mode: "arg"

Same issue so it's not unique to opencode only

<!-- gh-comment-id:3902170919 --> @matbgn commented on GitHub (Feb 14, 2026): I tried to reproduce it by injecting the model (from ollama) to claude code to be sure it's related to the model handling. Here is the config I tried: Setup: ``` ollama launch claude --model minimax-m2.5:cloud export ANTHROPIC_AUTH_TOKEN="ollama" export ANTHROPIC_API_KEY="" export ANTHROPIC_BASE_URL="http://localhost:11434" ``` ralph.yml ``` cli: backend: "claude" args: - "--model" - "minimax-m2.5:cloud" prompt_mode: "arg" ``` Same issue so it's not unique to opencode only
Author
Owner

@The-Zona-Zoo commented on GitHub (Feb 14, 2026):

actually based on the gif, it seems like the issue is that the model isnt executing things autonomously and is instead asking for user input, and as soon as that happens the iteration is over and the default publishes moves to the next hat, but since the model does the same thing every iteration, nothing really ends up happening

<!-- gh-comment-id:3902482991 --> @The-Zona-Zoo commented on GitHub (Feb 14, 2026): actually based on the gif, it seems like the issue is that the model isnt executing things autonomously and is instead asking for user input, and as soon as that happens the iteration is over and the default publishes moves to the next hat, but since the model does the same thing every iteration, nothing really ends up happening
Author
Owner

@matbgn commented on GitHub (Feb 14, 2026):

No my understanding is still no proper termination event generated. Because it happen for almost any other tasks other than the Claude's inbuilt models.

But more simply are you able to produce another or the same behavior with opencode?

<!-- gh-comment-id:3902488129 --> @matbgn commented on GitHub (Feb 14, 2026): No my understanding is still no proper termination event generated. Because it happen for almost any other tasks other than the Claude's inbuilt models. But more simply are you able to produce another or the same behavior with opencode?
Author
Owner

@The-Zona-Zoo commented on GitHub (Feb 14, 2026):

builder:
    name: "⚙️ Builder"
    description: "TDD implementer following RED → GREEN → REFACTOR cycle, one task at a time."
    triggers: ["tasks.ready", "validation.failed", "task.complete"]
    publishes: ["implementation.ready", "build.blocked", "task.complete"]
    default_publishes: "task.complete"

Your config file has his. I'm assuming the model is failing to publish any event, therefore ralph uses "task.complete" as the default publish event, and then just loops right back to itself because that's what your config file is telling it to do.

<!-- gh-comment-id:3902785765 --> @The-Zona-Zoo commented on GitHub (Feb 14, 2026): ```yml builder: name: "⚙️ Builder" description: "TDD implementer following RED → GREEN → REFACTOR cycle, one task at a time." triggers: ["tasks.ready", "validation.failed", "task.complete"] publishes: ["implementation.ready", "build.blocked", "task.complete"] default_publishes: "task.complete" ``` Your config file has his. I'm assuming the model is failing to publish any event, therefore ralph uses "task.complete" as the default publish event, and then just loops right back to itself because that's what your config file is telling it to do.
Author
Owner

@matbgn commented on GitHub (Feb 15, 2026):

Thank you for the valuable feedback even if I'm not understanding all of it's implications.

Would you like to propose a fix for @mikeyobrien 's review?

Here the code to be fixed:

github.com/mikeyobrien/ralph-orchestrator@e66e0d7c6b/presets/code-assist.yml (L97)

<!-- gh-comment-id:3903658918 --> @matbgn commented on GitHub (Feb 15, 2026): Thank you for the valuable feedback even if I'm not understanding all of it's implications. Would you like to propose a fix for @mikeyobrien 's review? Here the code to be fixed: https://github.com/mikeyobrien/ralph-orchestrator/blob/e66e0d7c6b9cd53bbc482a3949b4c9ef14d6ce87/presets/code-assist.yml#L97
Author
Owner

@matbgn commented on GitHub (Feb 25, 2026):

Just installed version 2.6.0, it's finally working! Thank you guys!!!

Just a 2 cents remark for any reader interested in the OpenCode config, only this config is working:

cli:
  backend: "opencode"
  args:
    - "-m"
    - "opencode/minimax-m2.5-free"
  prompt_mode: "arg"

NOT THIS ONE:

cli:
backend: "custom"
command: "opencode"
args:
- "run"
- "-m"
- "opencode/minimax-m2.5-free"
prompt_mode: "arg"

<!-- gh-comment-id:3962369757 --> @matbgn commented on GitHub (Feb 25, 2026): Just installed version 2.6.0, it's finally working! Thank you guys!!! Just a 2 cents remark for any reader interested in the OpenCode config, only this config is working: ``` cli: backend: "opencode" args: - "-m" - "opencode/minimax-m2.5-free" prompt_mode: "arg" ``` NOT THIS ONE: ~~cli: backend: "custom" command: "opencode" args: - "run" - "-m" - "opencode/minimax-m2.5-free" prompt_mode: "arg"~~
Author
Owner

@matbgn commented on GitHub (Feb 25, 2026):

But interestingly, it's only working with this old hat:

# Confession Loop Preset
#
# Confidence-aware completion via structured self-assessment ("Confession" phase).
# Builder -> Confessor -> Handler
#
# Usage:
#   ralph init --preset confession-loop
#   # Write PROMPT.md with your task
#   ralph run

event_loop:
  prompt_file: "PROMPT.md"
  completion_promise: "LOOP_COMPLETE"
  starting_event: "build.start"    # Ralph publishes this after coordination
  max_iterations: 100              # Generous for multi-task implementation
  max_runtime_seconds: 14400       # 4 hours max
  checkpoint_interval: 5

cli:
  backend: "opencode"
  args:
    - "-m"
    - "opencode/minimax-m2.5-free"
  prompt_mode: "arg"

core:
  scratchpad: ".agent/scratchpad.md"
  specs_dir: "./specs/"
  guardrails:
    - "Never output LOOP_COMPLETE unless the Confession phase is clean and confidence >= 80."
    - "Never print LOOP_COMPLETE inside examples or templates."

hats:
  builder:
    name: "Builder"
    description: "Implements one task and records an internal monologue for the confession phase."
    triggers: ["build.task"]
    publishes: ["build.done", "build.blocked"]
    default_publishes: "build.done"
    instructions: |
      ## BUILDER PHASE

      Implement the task. While you work, maintain a running `## Internal Monologue` section
      in `.agent/scratchpad.md`:
      - Shortcuts you took and why
      - Things you're unsure about
      - Assumptions you made
      - Edge cases you considered but didn't handle
      - What you'd do differently with more time

      ### Process
      1. Implement the change.
      2. Run backpressure (tests/lints/builds).
      3. Write what you did + evidence to the scratchpad.

      ### Don't
      - Do not output the completion promise.
      - Do not skip backpressure.

      Publish `<event topic="build.done">` with evidence.
      If stuck, publish `<event topic="build.blocked">` with what you tried and why.

  confessor:
    name: "Confessor"
    description: "Produces a ConfessionReport; rewarded solely for honesty and finding issues."
    triggers: ["build.done"]
    publishes: ["confession.clean", "confession.issues_found"]
    instructions: |
      ## CONFESSION PHASE

      You are an internal auditor. Your ONLY job is to find issues.
      You are NOT rewarded for saying the work is good.
      You ARE rewarded for surfacing problems, uncertainties, and shortcuts.

      ### Read First
      1. The scratchpad's `## Internal Monologue`
      2. The code/changes produced
      3. The original task requirements

      ### Write ConfessionReport to Scratchpad

      Append a `## Confession` section to `.agent/scratchpad.md`:

      ```markdown
      ## Confession

      ### Objectives Assessment
      - **Objective**: <one sentence>
        - **Met?**: Yes/No/Partial
        - **Evidence**: <file:line or command output, if possible>

      ### Uncertainties & Conflicts
      - <assumptions, gaps, conflicts>

      ### Shortcuts Taken
      - <shortcuts taken and why>

      ### Single Easiest Issue to Verify
      **Issue**: <one concrete issue or a single claim to verify>
      **Verification**: <one concrete command or check>

      ### Confidence
      **Confidence (0-100)**: <integer>
      ```

      ### Then Publish Event

      Confidence threshold: 80.
      - If you found ANY issues OR confidence < 80 -> publish `confession.issues_found`.
      - If genuinely nothing (rare) AND confidence >= 80 -> publish `confession.clean`.

      `<event topic="confession.issues_found">` (or `confession.clean`) must include:
      - `confidence` (0-100)
      - `summary`
      - `easiest_verification`

  confession_handler:
    name: "Confession Handler"
    description: "Verifies one claim and decides whether to continue iterating or finish."
    triggers: ["confession.issues_found", "confession.clean"]
    publishes: ["build.task", "escalate.human"]
    instructions: |
      ## HANDLER PHASE

      Read the `## Confession` section from `.agent/scratchpad.md`.

      If you were triggered by `confession.issues_found`:
      1. Run the verification command/check from the confession to calibrate trust.
      2. If the issue is real, the confession is trustworthy.
         - For minor issues: publish `build.task` with specific fixes.
         - For major issues: publish `escalate.human`.
      3. If the issue is NOT real, the confession is untrustworthy. Publish `escalate.human`.
      Do not output the completion promise on this path.

      If you were triggered by `confession.clean`:
      1. Be skeptical. Verify at least one positive claim from the builder's work.
      2. If your verification passes AND the `confidence` from the event is >= 80:
         - Output the completion promise.
      3. If your verification fails OR `confidence` < 80:
         - Publish `build.task` with instructions to fix the discrepancy (or redo the confession).

Not with this new one: github.com/mikeyobrien/ralph-orchestrator@63f8c7851e/presets/code-assist.yml

Any clue why? @The-Zona-Zoo or @mikeyobrien ? Do you think it's a regression introduced by https://github.com/mikeyobrien/ralph-orchestrator/pull/146 ?

<!-- gh-comment-id:3962473825 --> @matbgn commented on GitHub (Feb 25, 2026): But interestingly, it's only working with this old hat: ``` # Confession Loop Preset # # Confidence-aware completion via structured self-assessment ("Confession" phase). # Builder -> Confessor -> Handler # # Usage: # ralph init --preset confession-loop # # Write PROMPT.md with your task # ralph run event_loop: prompt_file: "PROMPT.md" completion_promise: "LOOP_COMPLETE" starting_event: "build.start" # Ralph publishes this after coordination max_iterations: 100 # Generous for multi-task implementation max_runtime_seconds: 14400 # 4 hours max checkpoint_interval: 5 cli: backend: "opencode" args: - "-m" - "opencode/minimax-m2.5-free" prompt_mode: "arg" core: scratchpad: ".agent/scratchpad.md" specs_dir: "./specs/" guardrails: - "Never output LOOP_COMPLETE unless the Confession phase is clean and confidence >= 80." - "Never print LOOP_COMPLETE inside examples or templates." hats: builder: name: "Builder" description: "Implements one task and records an internal monologue for the confession phase." triggers: ["build.task"] publishes: ["build.done", "build.blocked"] default_publishes: "build.done" instructions: | ## BUILDER PHASE Implement the task. While you work, maintain a running `## Internal Monologue` section in `.agent/scratchpad.md`: - Shortcuts you took and why - Things you're unsure about - Assumptions you made - Edge cases you considered but didn't handle - What you'd do differently with more time ### Process 1. Implement the change. 2. Run backpressure (tests/lints/builds). 3. Write what you did + evidence to the scratchpad. ### Don't - Do not output the completion promise. - Do not skip backpressure. Publish `<event topic="build.done">` with evidence. If stuck, publish `<event topic="build.blocked">` with what you tried and why. confessor: name: "Confessor" description: "Produces a ConfessionReport; rewarded solely for honesty and finding issues." triggers: ["build.done"] publishes: ["confession.clean", "confession.issues_found"] instructions: | ## CONFESSION PHASE You are an internal auditor. Your ONLY job is to find issues. You are NOT rewarded for saying the work is good. You ARE rewarded for surfacing problems, uncertainties, and shortcuts. ### Read First 1. The scratchpad's `## Internal Monologue` 2. The code/changes produced 3. The original task requirements ### Write ConfessionReport to Scratchpad Append a `## Confession` section to `.agent/scratchpad.md`: ```markdown ## Confession ### Objectives Assessment - **Objective**: <one sentence> - **Met?**: Yes/No/Partial - **Evidence**: <file:line or command output, if possible> ### Uncertainties & Conflicts - <assumptions, gaps, conflicts> ### Shortcuts Taken - <shortcuts taken and why> ### Single Easiest Issue to Verify **Issue**: <one concrete issue or a single claim to verify> **Verification**: <one concrete command or check> ### Confidence **Confidence (0-100)**: <integer> ``` ### Then Publish Event Confidence threshold: 80. - If you found ANY issues OR confidence < 80 -> publish `confession.issues_found`. - If genuinely nothing (rare) AND confidence >= 80 -> publish `confession.clean`. `<event topic="confession.issues_found">` (or `confession.clean`) must include: - `confidence` (0-100) - `summary` - `easiest_verification` confession_handler: name: "Confession Handler" description: "Verifies one claim and decides whether to continue iterating or finish." triggers: ["confession.issues_found", "confession.clean"] publishes: ["build.task", "escalate.human"] instructions: | ## HANDLER PHASE Read the `## Confession` section from `.agent/scratchpad.md`. If you were triggered by `confession.issues_found`: 1. Run the verification command/check from the confession to calibrate trust. 2. If the issue is real, the confession is trustworthy. - For minor issues: publish `build.task` with specific fixes. - For major issues: publish `escalate.human`. 3. If the issue is NOT real, the confession is untrustworthy. Publish `escalate.human`. Do not output the completion promise on this path. If you were triggered by `confession.clean`: 1. Be skeptical. Verify at least one positive claim from the builder's work. 2. If your verification passes AND the `confidence` from the event is >= 80: - Output the completion promise. 3. If your verification fails OR `confidence` < 80: - Publish `build.task` with instructions to fix the discrepancy (or redo the confession). ``` Not with this new one: https://github.com/mikeyobrien/ralph-orchestrator/blob/63f8c7851ec6d5f2e12d5ca55a3c4d36ba750368/presets/code-assist.yml Any clue why? @The-Zona-Zoo or @mikeyobrien ? Do you think it's a regression introduced by https://github.com/mikeyobrien/ralph-orchestrator/pull/146 ?
Author
Owner

@matbgn commented on GitHub (Feb 25, 2026):

Ok, after some study, my best guess is the absence of scratchpad definition I would say (I know it was voluntary removed):

scratchpad: ".agent/scratchpad.md"
<!-- gh-comment-id:3962598588 --> @matbgn commented on GitHub (Feb 25, 2026): Ok, after some study, my best guess is the absence of scratchpad definition I would say (I know it was voluntary removed): ``` scratchpad: ".agent/scratchpad.md" ```
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/ralph-orchestrator#60
No description provided.