starred/ralph-orchestrator

Fork 0

mirror of https://github.com/mikeyobrien/ralph-orchestrator.git synced 2026-04-25 07:05:57 +03:00

[GH-ISSUE #17] [Proposal] User-Collaborative Validation Gates #2

New issue

Closed

opened 2026-02-27 10:21:42 +03:00 by kerem · 1 comment

kerem commented

2026-02-27 10:21:42 +03:00

Owner

Originally created by @krzemienski on GitHub (Jan 2, 2026).
Original GitHub issue: https://github.com/mikeyobrien/ralph-orchestrator/issues/17

Feature Proposal: User-Collaborative Validation Gates

The Problem

Ralph currently lacks end-user functional validation beyond build/test. For many projects, the real test is:

Web apps: Does it actually render in a browser?
iOS apps: Does it run in the Simulator?
CLI tools: Does the command produce expected output?
APIs: Do the endpoints respond correctly?

Proposed Solution

An opt-in, user-collaborative validation system that:

Proposes, doesn't prescribe - AI analyzes the project and PROPOSES a validation strategy
Requires user confirmation - Nothing runs until the user approves
Uses user's existing tools - Leverages MCP servers the user already has configured
Is Claude-only (for now) - Uses Claude's capabilities for intelligent proposal

Philosophy: Propose, Don't Prescribe

┌─────────────────────────────────────────────────────────────┐
│                    WRONG APPROACH                           │
├─────────────────────────────────────────────────────────────┤
│  1. Detect project type                                     │
│  2. Auto-generate validation_config.json                    │
│  3. Hardcode: "if web → use Puppeteer"                     │
│  4. Run validation without asking                           │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    RIGHT APPROACH                           │
├─────────────────────────────────────────────────────────────┤
│  1. Detect project type                                     │
│  2. Draft a validation PROPOSAL                             │
│  3. Present to user: "Here's what I recommend..."           │
│  4. Ask: "Does this make sense? Approve/Modify/Skip?"       │
│  5. Only proceed after explicit user confirmation           │
└─────────────────────────────────────────────────────────────┘

Connection to Long-Running Agent Patterns

This follows the testing philosophy from Anthropic's Effective Harnesses for Long-Running Agents:

"One final major failure mode that we observed was Claude's tendency to mark a feature as complete without proper testing. Absent explicit prompting, Claude tended to make code changes, and even do testing with unit tests or curl commands against a development server, but would fail to recognize that the feature didn't work end-to-end."

"Providing Claude with these kinds of testing tools dramatically improved performance, as the agent was able to identify and fix bugs that weren't obvious from the code alone."

The validation gates feature brings this same pattern - end-to-end functional testing - to Ralph users, but with user collaboration to ensure the right tools and approaches are used for each project.

Example Flow

$ ralph run -P PROMPT.md --enable-validation

🔍 Analyzing project for validation strategy...

📋 Validation Proposal:

Based on analyzing your project, here's what I found:

**Project Analysis:**
- Type: Next.js web application  
- Build: `npm run build`
- Run: `npm run dev` (serves at localhost:3000)

**My Validation Proposal:**

Since this is a web app, I recommend validating it the way a user would -
by actually loading it in a browser and interacting with it.

**Questions for you:**
- Which pages or features are most critical to validate?
- Any specific user flows I should test?
- Do you have a preferred browser automation tool?

Does this make sense? [Approve/Modify/Skip]: _

Implementation Approach

This feature would be built using Ralph's self-improvement system:

Write a prompt describing the feature (already drafted)
Run python scripts/self_improve.py -P prompts/VALIDATION_FEATURE_PROMPT.md
Ralph builds the feature into itself

Community Input Requested

Before running this, I'd love feedback on:

Scope: Is opt-in + user-confirmation the right balance?
Platform coverage: Web, iOS, CLI, API - any others?
Integration: Should this tie into CI/CD pipelines?
UX: How should the proposal/confirmation interaction work?

If you have thoughts, please comment! I can incorporate feedback directly into the prompt.

Tag this issue or add comments if you want to shape how this feature works!

Originally created by @krzemienski on GitHub (Jan 2, 2026). Original GitHub issue: https://github.com/mikeyobrien/ralph-orchestrator/issues/17 ## Feature Proposal: User-Collaborative Validation Gates ### The Problem Ralph currently lacks end-user functional validation beyond build/test. For many projects, the real test is: - **Web apps**: Does it actually render in a browser? - **iOS apps**: Does it run in the Simulator? - **CLI tools**: Does the command produce expected output? - **APIs**: Do the endpoints respond correctly? ### Proposed Solution An **opt-in, user-collaborative validation system** that: 1. **Proposes, doesn't prescribe** - AI analyzes the project and PROPOSES a validation strategy 2. **Requires user confirmation** - Nothing runs until the user approves 3. **Uses user's existing tools** - Leverages MCP servers the user already has configured 4. **Is Claude-only** (for now) - Uses Claude's capabilities for intelligent proposal ### Philosophy: Propose, Don't Prescribe ``` ┌─────────────────────────────────────────────────────────────┐ │ WRONG APPROACH │ ├─────────────────────────────────────────────────────────────┤ │ 1. Detect project type │ │ 2. Auto-generate validation_config.json │ │ 3. Hardcode: "if web → use Puppeteer" │ │ 4. Run validation without asking │ └─────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────┐ │ RIGHT APPROACH │ ├─────────────────────────────────────────────────────────────┤ │ 1. Detect project type │ │ 2. Draft a validation PROPOSAL │ │ 3. Present to user: "Here's what I recommend..." │ │ 4. Ask: "Does this make sense? Approve/Modify/Skip?" │ │ 5. Only proceed after explicit user confirmation │ └─────────────────────────────────────────────────────────────┘ ``` ### Connection to Long-Running Agent Patterns This follows the testing philosophy from Anthropic's [Effective Harnesses for Long-Running Agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents): > "One final major failure mode that we observed was Claude's tendency to mark a feature as complete without proper testing. Absent explicit prompting, Claude tended to make code changes, and even do testing with unit tests or curl commands against a development server, but would fail to recognize that the feature didn't work end-to-end." > "Providing Claude with these kinds of testing tools dramatically improved performance, as the agent was able to identify and fix bugs that weren't obvious from the code alone." The validation gates feature brings this same pattern - end-to-end functional testing - to Ralph users, but with user collaboration to ensure the right tools and approaches are used for each project. ### Example Flow ```bash $ ralph run -P PROMPT.md --enable-validation 🔍 Analyzing project for validation strategy... 📋 Validation Proposal: Based on analyzing your project, here's what I found: **Project Analysis:** - Type: Next.js web application - Build: `npm run build` - Run: `npm run dev` (serves at localhost:3000) **My Validation Proposal:** Since this is a web app, I recommend validating it the way a user would - by actually loading it in a browser and interacting with it. **Questions for you:** - Which pages or features are most critical to validate? - Any specific user flows I should test? - Do you have a preferred browser automation tool? Does this make sense? [Approve/Modify/Skip]: _ ``` ### Implementation Approach This feature would be built using Ralph's self-improvement system: 1. Write a prompt describing the feature (already drafted) 2. Run `python scripts/self_improve.py -P prompts/VALIDATION_FEATURE_PROMPT.md` 3. Ralph builds the feature into itself ### Community Input Requested Before running this, I'd love feedback on: - **Scope**: Is opt-in + user-confirmation the right balance? - **Platform coverage**: Web, iOS, CLI, API - any others? - **Integration**: Should this tie into CI/CD pipelines? - **UX**: How should the proposal/confirmation interaction work? If you have thoughts, please comment! I can incorporate feedback directly into the prompt. --- *Tag this issue or add comments if you want to shape how this feature works!*

kerem closed this issue

2026-02-27 10:21:43 +03:00

kerem commented

2026-02-27 10:21:44 +03:00

Author

Owner

@krzemienski commented on GitHub (Jan 3, 2026):

Update: Connection to PR #16

This validation feature proposal connects directly to PR #16 (Self-improvement runner).

The workflow is:

PR #16 adds scripts/self_improve.py - the pure Python runner using Ralph's SDK
This issue proposes a feature that would be built USING that self-improvement runner

Once PR #16 is merged, validation gates can be implemented via:

python scripts/self_improve.py -P prompts/VALIDATION_FEATURE_PROMPT.md

Note on PR #16 file changes: While PR #16 shows 7 files changed, the core addition is scripts/self_improve.py. The other files are supporting pieces (gitignore entries, example configs, prompts).

@krzemienski commented on GitHub (Jan 3, 2026): ## Update: Connection to PR #16 This validation feature proposal connects directly to **PR #16 (Self-improvement runner)**. The workflow is: 1. **PR #16** adds `scripts/self_improve.py` - the pure Python runner using Ralph's SDK 2. **This issue** proposes a feature that would be built USING that self-improvement runner Once PR #16 is merged, validation gates can be implemented via: ```bash python scripts/self_improve.py -P prompts/VALIDATION_FEATURE_PROMPT.md ``` --- **Note on PR #16 file changes**: While PR #16 shows 7 files changed, the core addition is `scripts/self_improve.py`. The other files are supporting pieces (gitignore entries, example configs, prompts).

kerem referenced this issue

2026-02-27 10:22:09 +03:00

[PR #2] [MERGED] feat: add production-ready features with async logging, security, and Rich output #81

kerem referenced this issue

2026-02-27 10:22:29 +03:00

[PR #146] [MERGED] chore: preflight gates + skills refresh #163