mirror of
https://github.com/mikeyobrien/ralph-orchestrator.git
synced 2026-04-25 15:15:57 +03:00
[GH-ISSUE #74] Proposal: Confidence-Aware Loop Completion via Structured Self-Assessment (“Confession” Phase) #29
Labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/ralph-orchestrator#29
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @matbgn on GitHub (Jan 19, 2026).
Original GitHub issue: https://github.com/mikeyobrien/ralph-orchestrator/issues/74
Source inspiration:
https://alignment.openai.com/confessions/
Summary
This proposal introduces a confidence-aware loop completion mechanism by adding a structured self-assessment phase (“Confession”) to each orchestration cycle. Instead of assuming
LOOP_COMPLETEwhen an answer is produced, the loop is considered complete only if the model’s own self-assessed confidence and honesty meet defined thresholds.The key idea is to decouple usefulness from honesty:
Only the ConfessionReport is used to decide whether to accept, retry, or escalate the answer.
Key insight
Why This Fits ralph-orchestrator
Practical recipe (raw attempt, not usable as is)
Two-stage interaction
ConfessionReport structure (strict template)
Rewarding / selection strategy
Prompt templates (inference-only fallback)
"Produce a ConfessionReport about your previous answer. Include: (1) list of explicit & implicit objectives, (2) for each, whether you met it and evidence, (3) items you omitted or shortcuts you took, (4) uncertainties and confidence scores."
Initial proposal Confession-style inference prompts
@mikeyobrien commented on GitHub (Jan 19, 2026):
This Can Be Implemented Today via Hat Instructions
Ralph's architecture can support this pattern without orchestrator changes — though with some important adaptations.
Understanding the Core Methodology
The OpenAI approach has specific properties worth preserving:
Adapting for Ralph's Fresh-Context Architecture
Ralph's per-iteration fresh context (Tenet #1) conflicts slightly with "shared weights/introspective access." Here's how to bridge that gap:
The builder explicitly externalizes its internal state:
The self-assessor is rewarded ONLY for finding issues:
A handler decides what to do with confessions:
Why This Preserves the OpenAI Methodology
Key Insight: "Telling the Truth Is Easier Than Lying"
The OpenAI paper notes that confessions work because "it is easier to verify a single thing the model claims to have done wrong, than to find and verify all the potential issues."
The hat implementation preserves this by:
Existing Prior Art
The
/presets/scientific-method.ymldemonstrates multi-stage reflection. The confession pattern extends this with explicit honesty-optimization in the instructions.Suggested Path Forward
confession-looppreset demonstrating this patternThis keeps Ralph thin while enabling the full confession methodology through hat instructions.
@matbgn commented on GitHub (Jan 19, 2026):
Regarding the OpenAI research paper, I would keep the numerical confidence assessment (0–100%). Prior research suggests a threshold of 80%, above which ralph may proceed; if the confidence falls below this threshold, the loop should repeat.
The key metrics for decision are mainly:
@matbgn commented on GitHub (Jan 22, 2026):
BRILLIANT! Simply Brilliant!
https://github.com/mikeyobrien/ralph-orchestrator/releases/tag/v2.2.0
I'm gonna test it thoroughly, but we can then work on issue base.
I read the git compare and you just blowed my mind. 🤯
Deeply thankful for your hard work 🙏
@matbgn commented on GitHub (Jan 22, 2026):
@simonw, regarding your recent post at https://simonwillison.net/2026/Jan/15/boaz-barak-gabriel-wu-jeremy-chen-and-manas-joglekar/, do you perceive any potential enhancements we could implement during the prompt stage?
Specifically, in relation to this preset and your broader experience:
github.com/mikeyobrien/ralph-orchestrator@41e2ca702a/crates/ralph-cli/presets/confession-loop.yml.Your insights on this matter would be greatly appreciated, and allow me to extend my gratitude for your valuable blog posts throughout the year.