[PR #3] Add dynamic model assignment with security hardening #3

Open
opened 2026-03-07 13:59:08 +03:00 by kerem · 0 comments
Owner

Original Pull Request: https://github.com/danpeg/bug-hunt/pull/3

State: open
Merged: No


What changed

This PR adds dynamic model assignment to bug-hunt, letting users run Hunter, Skeptic, and Referee on different AI providers (Claude, Codex CLI, Gemini CLI). It also fixes several security and robustness issues discovered by running bug-hunt on itself with mixed providers.

Dynamic model assignment (SKILL.md, README.md)

Users can now assign providers per role via CLI flags:

/bug-hunt --hunter=codex --skeptic=claude --referee=gemini src/
/bug-hunt --preset=mixed src/

Presets provide named configurations:

  • claude (default) — all three roles run as Claude Code subagents, identical to current behavior
  • codex — all roles shell out to Codex CLI
  • gemini — all roles shell out to Gemini CLI
  • mixed — Hunter=Codex, Skeptic=Claude, Referee=Gemini

Individual --hunter=, --skeptic=, --referee= flags override any preset. With no flags, behavior is unchanged from the original (all Claude).

Provider dispatch:

  • Claude roles use the Agent tool (general-purpose subagent) as before
  • Codex/Gemini roles write the prompt to a unique temp file (mktemp) and pipe it via stdin to the CLI (codex exec - / gemini -p -)

Security and robustness fixes (found by self-scan)

After implementing model assignment, we ran /bug-hunt --hunter=codex --skeptic=claude --referee=codex on this repo. The adversarial review confirmed 7 real issues, all now fixed:

Critical — shell injection (BUG-1, BUG-2):
The original external CLI instructions interpolated scan targets and report content directly into shell command strings. A path like src; rm -rf / or report text containing shell metacharacters could execute arbitrary commands. Fixed by always passing prompt content via stdin/temp file, never inlining into shell args.

Medium — tempfile collisions (BUG-3):
The hard-coded path /tmp/bug-hunt-hunter-prompt.md would corrupt concurrent runs. Fixed with mktemp /tmp/bug-hunt-{role}-XXXXXX.md for unique files per invocation, with cleanup after use.

Medium — no provider validation (BUG-4):
Invalid provider values (e.g., --hunter=gpt4) were silently accepted with undefined dispatch behavior. Step 0 now validates all provider values and stops with a clear error on invalid input.

Medium — no Hunter success gate (BUG-18):
If the Hunter agent failed (CLI not installed, crash, empty output), the flow continued to Skeptic/Referee with no input. Step 2b now explicitly verifies Hunter success before proceeding.

Low — no target validation (BUG-9):
Specifying a nonexistent scan target would dispatch agents that fail confusingly downstream. Step 0 now checks target existence and fails fast with a clear message.

Low — malformed markdown link (BUG-11):
The @systematicls attribution link in README.md used nested markdown syntax [text]([url](url)). Fixed to a single valid link.

Codex CLI invocation fix:
The original instructions used codex exec "prompt" which doesn't work for the exec subcommand. Corrected to cat file | codex exec - (stdin mode).

Files changed

  • SKILL.md — argument parsing, provider validation, target validation, external CLI dispatch via stdin, Hunter success gate
  • README.md — dynamic model assignment docs, provider table, fixed markdown link

No changes to the prompt files (hunter.md, skeptic.md, referee.md).

Backward compatibility

Running /bug-hunt or /bug-hunt src/ with no provider flags behaves identically to the current version. All three roles default to Claude Code subagents. The new functionality is additive.

How I tested

  1. Ran /bug-hunt --hunter=codex --skeptic=claude --referee=codex on this repo itself — Hunter (Codex gpt-5.3) found 20 issues, Skeptic (Claude) challenged them down to 4, Referee (Codex) confirmed 7. All confirmed bugs are fixed in this PR.
  2. Verified Codex CLI invocation works with stdin mode (codex exec -)
  3. Confirmed default behavior (no flags) matches original flow

Checklist

  • Tested locally with /bug-hunt
  • Updated docs if needed
**Original Pull Request:** https://github.com/danpeg/bug-hunt/pull/3 **State:** open **Merged:** No --- ## What changed This PR adds dynamic model assignment to bug-hunt, letting users run Hunter, Skeptic, and Referee on different AI providers (Claude, Codex CLI, Gemini CLI). It also fixes several security and robustness issues discovered by running bug-hunt on itself with mixed providers. ### Dynamic model assignment (SKILL.md, README.md) Users can now assign providers per role via CLI flags: ``` /bug-hunt --hunter=codex --skeptic=claude --referee=gemini src/ /bug-hunt --preset=mixed src/ ``` **Presets** provide named configurations: - `claude` (default) — all three roles run as Claude Code subagents, identical to current behavior - `codex` — all roles shell out to Codex CLI - `gemini` — all roles shell out to Gemini CLI - `mixed` — Hunter=Codex, Skeptic=Claude, Referee=Gemini Individual `--hunter=`, `--skeptic=`, `--referee=` flags override any preset. With no flags, behavior is unchanged from the original (all Claude). **Provider dispatch:** - Claude roles use the Agent tool (general-purpose subagent) as before - Codex/Gemini roles write the prompt to a unique temp file (`mktemp`) and pipe it via stdin to the CLI (`codex exec -` / `gemini -p -`) ### Security and robustness fixes (found by self-scan) After implementing model assignment, we ran `/bug-hunt --hunter=codex --skeptic=claude --referee=codex` on this repo. The adversarial review confirmed 7 real issues, all now fixed: **Critical — shell injection (BUG-1, BUG-2):** The original external CLI instructions interpolated scan targets and report content directly into shell command strings. A path like `src; rm -rf /` or report text containing shell metacharacters could execute arbitrary commands. Fixed by always passing prompt content via stdin/temp file, never inlining into shell args. **Medium — tempfile collisions (BUG-3):** The hard-coded path `/tmp/bug-hunt-hunter-prompt.md` would corrupt concurrent runs. Fixed with `mktemp /tmp/bug-hunt-{role}-XXXXXX.md` for unique files per invocation, with cleanup after use. **Medium — no provider validation (BUG-4):** Invalid provider values (e.g., `--hunter=gpt4`) were silently accepted with undefined dispatch behavior. Step 0 now validates all provider values and stops with a clear error on invalid input. **Medium — no Hunter success gate (BUG-18):** If the Hunter agent failed (CLI not installed, crash, empty output), the flow continued to Skeptic/Referee with no input. Step 2b now explicitly verifies Hunter success before proceeding. **Low — no target validation (BUG-9):** Specifying a nonexistent scan target would dispatch agents that fail confusingly downstream. Step 0 now checks target existence and fails fast with a clear message. **Low — malformed markdown link (BUG-11):** The `@systematicls` attribution link in README.md used nested markdown syntax `[text]([url](url))`. Fixed to a single valid link. **Codex CLI invocation fix:** The original instructions used `codex exec "prompt"` which doesn't work for the `exec` subcommand. Corrected to `cat file | codex exec -` (stdin mode). ### Files changed - `SKILL.md` — argument parsing, provider validation, target validation, external CLI dispatch via stdin, Hunter success gate - `README.md` — dynamic model assignment docs, provider table, fixed markdown link No changes to the prompt files (hunter.md, skeptic.md, referee.md). ### Backward compatibility Running `/bug-hunt` or `/bug-hunt src/` with no provider flags behaves identically to the current version. All three roles default to Claude Code subagents. The new functionality is additive. ## How I tested 1. Ran `/bug-hunt --hunter=codex --skeptic=claude --referee=codex` on this repo itself — Hunter (Codex gpt-5.3) found 20 issues, Skeptic (Claude) challenged them down to 4, Referee (Codex) confirmed 7. All confirmed bugs are fixed in this PR. 2. Verified Codex CLI invocation works with stdin mode (`codex exec -`) 3. Confirmed default behavior (no flags) matches original flow ## Checklist - [x] Tested locally with `/bug-hunt` - [x] Updated docs if needed
Sign in to join this conversation.
No labels
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/bug-hunt#3
No description provided.