mirror of
https://github.com/gadievron/raptor.git
synced 2026-04-25 05:56:00 +03:00
[PR #9] [MERGED] Fix: Ollama JSON comments + semgrep crypto 404 + optimize .strip() #21
Labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
starred/raptor#21
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
📋 Pull Request Information
Original PR: https://github.com/gadievron/raptor/pull/9
Author: @gadievron
Created: 11/29/2025
Status: ✅ Merged
Merged: 11/30/2025
Merged by: @danielcuthbert
Base:
main← Head:fix/ollama-json-comments📝 Commits (4)
0a7265bdocs: Fix /test-workflows command description (48 tests → 9 test categories)181abf0Add RAPTOR test suite to repository52379d6Merge remote changes with tests commit9223211Fix: Remove JSON comments from Ollama responses + semgrep crypto📊 Changes
12 files changed (+2771 additions, -2 deletions)
View changed files
📝
core/config.py(+1 -1)📝
packages/llm_analysis/llm/providers.py(+6 -1)➕
tests/fixtures/empty.sarif(+14 -0)➕
tests/fixtures/large.sarif(+2015 -0)➕
tests/fixtures/malformed.sarif(+29 -0)➕
tests/fixtures/medium.sarif(+215 -0)➕
tests/fixtures/minimal.sarif(+36 -0)➕
tests/fixtures/test_repo/vuln.c(+12 -0)➕
tests/fixtures/test_repo/vuln.js(+9 -0)➕
tests/fixtures/test_repo/vuln.py(+14 -0)➕
tests/generate_test_data.py(+220 -0)➕
tests/smoke_test.py(+200 -0)📄 Description
Problem
Issue 1: JSON Comments Breaking Structured Generation
Ollama code models (deepseek-coder, codellama, qwen) add explanatory comments to JSON responses:
//JavaScript-style comments#Python-style comments/* */C-style commentsStandard
json.loads()cannot parse JSON with comments, causing structured generation to fail.Issue 2: Semgrep Crypto Ruleset 404
The
p/cryptosemgrep ruleset is deprecated and returns HTTP 404, causing scan failures.Issue 3: Repeated .strip() Calls
Two consecutive
.strip()calls on the same variable (lines 351 and 357) - inefficient.Impact
Before fix:
After fix:
Efficiency gain: 4x faster analysis, 4x fewer API calls, 4x less token usage
Solution
1. JSON Comment Removal (providers.py:352-356)
Location: Line 352 (after thinking tag removal, before markdown removal)
2. Optimize .strip() Calls (providers.py:351)
Before:
After:
3. Fix Semgrep Config (config.py:72)
Before:
After:
Testing
✅ JSON comment fix: Tested against 6 production examples from
raptor_1764425803.jsonl- all pass✅ Comment styles: All three types (
//,#,/* */) verified in production logs✅ Semgrep fix:
category/cryptogenerates 13K of findings (vs 577B error withp/crypto)✅ Optimization: Single
.strip()call performs same functionEvidence
Production Log Examples (4/5 Failure Pattern)
Semgrep Scan Results
Before (p/crypto):
After (category/crypto):
Files Changed
packages/llm_analysis/llm/providers.pycore/config.pyRisk Assessment
Related: PR #8 (Python scoping bug fix)
Verification: Gemini 2.0 Flash Exp independently confirmed all issues and reviewed fixes
🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.