[PR #30] [CLOSED] Fix: Generate unique finding_id to prevent exploit/patch file collisions #40

Closed
opened 2026-03-02 04:08:00 +03:00 by kerem · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/gadievron/raptor/pull/30
Author: @sapran
Created: 12/9/2025
Status: Closed

Base: mainHead: bugfix/unique-finding-id


📝 Commits (4)

  • 3e26816 Merge branch 'main' of github.com:sapran/raptor
  • b056d00 Merge branch 'main' of github.com:sapran/raptor
  • e7c11b7 Merge branch 'main' of github.com:sapran/raptor
  • fff8fe1 Fix bug: Generate unique finding_id to prevent file collisions

📊 Changes

1 file changed (+15 additions, -6 deletions)

View changed files

📝 core/sarif/parser.py (+15 -6)

📄 Description

Problem

Multiple findings with the same rule_id were assigned identical finding_id values, causing exploit and patch files to overwrite each other.

Symptoms:

  • Report says: "6 exploits generated"
  • Reality: Only 1 exploit file exists (5 overwrites)
  • Same issue with patches

Root Cause

In core/sarif/parser.py:142, the fallback logic used non-unique rule_id:

# ❌ Before: Non-unique across findings
finding_id = (
    result.get("fingerprints", {}).get("matchBasedId/v1")
    or result.get("ruleId")  # Same for all findings with same rule!
    or str(hash(json.dumps(result)))
)

The Issue: When SARIF files lack fingerprints (common with Semgrep/CodeQL), all findings with the same rule_id get the same finding_id.

Example collision:

Finding 1: rule_id="requires login" → finding_id="requires login"
Finding 2: rule_id="requires login" → finding_id="requires login"  # Collision!
Finding 3: rule_id="requires login" → finding_id="requires login"  # Collision!

Result: requires login_exploit.cpp gets overwritten 3 times.

Solution

Generate unique finding_id by combining rule + file + line:

# ✅ After: Unique per finding
file_uri = artifact.get("uri", "unknown")
line_num = region.get("startLine", 0)
rule_id = result.get("ruleId", "unknown")
file_safe = file_uri.replace("/", "_").replace("\\", "_")
finding_id = f"{rule_id}_{file_safe}_L{line_num}"

Priority hierarchy:

  1. SARIF fingerprint (if present) - most reliable
  2. {rule}_{file}_L{line} (fallback) - unique and descriptive
  3. Hash of full result (last resort)

Examples

Before (collisions):

requires login → requires login_exploit.cpp
requires login → requires login_exploit.cpp  # Overwrites!
requires login → requires login_exploit.cpp  # Overwrites!

After (unique):

requires login_api_controllers_user.js_L42 → requires login_api_controllers_user.js_L42_exploit.cpp
requires login_api_controllers_auth.js_L89 → requires login_api_controllers_auth.js_L89_exploit.cpp
requires login_app_views_profile.js_L156 → requires login_app_views_profile.js_L156_exploit.cpp

Impact

Before:

6 exploitable findings → 1 exploit file saved (83% data loss)

After:

6 exploitable findings → 6 unique exploit files (0% data loss)

Benefits:

  • Prevents file overwrites - all artifacts saved correctly
  • Descriptive filenames include location context
  • Backward compatible (SARIF fingerprints still prioritized)
  • Works with Semgrep, CodeQL, and other SARIF tools

Testing

Verified with real scan output showing the bug:

# Before fix
$ ls exploits/
requires login_exploit.cpp  # Only 1 file for 6 findings

# After fix
$ ls exploits/
requires login_api_controllers_user.js_L42_exploit.cpp
requires login_api_controllers_auth.js_L89_exploit.cpp
requires login_app_views_profile.js_L156_exploit.cpp
js_xss-through-dom_app_assets_javascripts_image-picker.js_L315_exploit.cpp
# ... (all 6 files present)

Files Changed

  • core/sarif/parser.py - 1 file, +15 insertions, -6 deletions

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/gadievron/raptor/pull/30 **Author:** [@sapran](https://github.com/sapran) **Created:** 12/9/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `bugfix/unique-finding-id` --- ### 📝 Commits (4) - [`3e26816`](https://github.com/gadievron/raptor/commit/3e26816073c4e61c59d8a720713e6cba8c09ea84) Merge branch 'main' of github.com:sapran/raptor - [`b056d00`](https://github.com/gadievron/raptor/commit/b056d00ce85566ce9bb42f5a2bc55c99a506c829) Merge branch 'main' of github.com:sapran/raptor - [`e7c11b7`](https://github.com/gadievron/raptor/commit/e7c11b7326972b527e10db66395d1f60f3d2ced6) Merge branch 'main' of github.com:sapran/raptor - [`fff8fe1`](https://github.com/gadievron/raptor/commit/fff8fe12966d59c46bb1b30655bc0f639182f60b) Fix bug: Generate unique finding_id to prevent file collisions ### 📊 Changes **1 file changed** (+15 additions, -6 deletions) <details> <summary>View changed files</summary> 📝 `core/sarif/parser.py` (+15 -6) </details> ### 📄 Description ## Problem Multiple findings with the same `rule_id` were assigned identical `finding_id` values, causing exploit and patch files to overwrite each other. **Symptoms:** - Report says: "6 exploits generated" - Reality: Only 1 exploit file exists (5 overwrites) - Same issue with patches ## Root Cause In `core/sarif/parser.py:142`, the fallback logic used non-unique `rule_id`: ```python # ❌ Before: Non-unique across findings finding_id = ( result.get("fingerprints", {}).get("matchBasedId/v1") or result.get("ruleId") # Same for all findings with same rule! or str(hash(json.dumps(result))) ) ``` **The Issue:** When SARIF files lack fingerprints (common with Semgrep/CodeQL), all findings with the same `rule_id` get the same `finding_id`. **Example collision:** ``` Finding 1: rule_id="requires login" → finding_id="requires login" Finding 2: rule_id="requires login" → finding_id="requires login" # Collision! Finding 3: rule_id="requires login" → finding_id="requires login" # Collision! ``` Result: `requires login_exploit.cpp` gets overwritten 3 times. ## Solution Generate unique `finding_id` by combining **rule + file + line**: ```python # ✅ After: Unique per finding file_uri = artifact.get("uri", "unknown") line_num = region.get("startLine", 0) rule_id = result.get("ruleId", "unknown") file_safe = file_uri.replace("/", "_").replace("\\", "_") finding_id = f"{rule_id}_{file_safe}_L{line_num}" ``` **Priority hierarchy:** 1. SARIF fingerprint (if present) - most reliable 2. `{rule}_{file}_L{line}` (fallback) - unique and descriptive 3. Hash of full result (last resort) ## Examples **Before (collisions):** ``` requires login → requires login_exploit.cpp requires login → requires login_exploit.cpp # Overwrites! requires login → requires login_exploit.cpp # Overwrites! ``` **After (unique):** ``` requires login_api_controllers_user.js_L42 → requires login_api_controllers_user.js_L42_exploit.cpp requires login_api_controllers_auth.js_L89 → requires login_api_controllers_auth.js_L89_exploit.cpp requires login_app_views_profile.js_L156 → requires login_app_views_profile.js_L156_exploit.cpp ``` ## Impact **Before:** ``` 6 exploitable findings → 1 exploit file saved (83% data loss) ``` **After:** ``` 6 exploitable findings → 6 unique exploit files (0% data loss) ``` **Benefits:** - ✅ Prevents file overwrites - all artifacts saved correctly - ✅ Descriptive filenames include location context - ✅ Backward compatible (SARIF fingerprints still prioritized) - ✅ Works with Semgrep, CodeQL, and other SARIF tools ## Testing Verified with real scan output showing the bug: ```bash # Before fix $ ls exploits/ requires login_exploit.cpp # Only 1 file for 6 findings # After fix $ ls exploits/ requires login_api_controllers_user.js_L42_exploit.cpp requires login_api_controllers_auth.js_L89_exploit.cpp requires login_app_views_profile.js_L156_exploit.cpp js_xss-through-dom_app_assets_javascripts_image-picker.js_L315_exploit.cpp # ... (all 6 files present) ``` ## Files Changed - `core/sarif/parser.py` - 1 file, +15 insertions, -6 deletions --- 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
kerem 2026-03-02 04:08:00 +03:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/raptor#40
No description provided.