[GH-ISSUE #38] Clarify whether benchmark reports used real customer applications or intentionally vulnerable test apps #10

Closed
opened 2026-02-27 07:19:58 +03:00 by kerem · 1 comment
Owner

Originally created by @saiqulhaq on GitHub (Jan 4, 2026).
Original GitHub issue: https://github.com/KeygraphHQ/shannon/issues/38

The README showcases Shannon's capabilities through sample reports in the sample-reports/ directory, specifically:

  • shannon-report-capital-api.md
  • shannon-report-crapi.md
  • shannon-report-juice-shop.md

While these reports demonstrate impressive vulnerability detection (20+ critical issues in Juice Shop, complete auth bypass, database exfiltration, etc.), it's not immediately clear to potential users whether these reports represent:

  1. Intentionally vulnerable benchmark applications (e.g., OWASP Juice Shop, Checkmarx c{api}tal, OWASP crAPI) - which are designed specifically for testing security tools
  2. Real customer applications that Shannon was contracted to test

Why this matters:

For organizations evaluating Shannon for their security testing needs, understanding whether these impressive results come from purpose-built vulnerable apps versus real-world applications significantly affects credibility and expected performance.

Testing against deliberately insecure applications is valuable for demonstrating capabilities, but results from real customer engagements (with appropriate anonymization) would provide stronger validation of Shannon's effectiveness in production scenarios.

Suggested clarification:

Could the README explicitly state that these sample reports are from testing against intentionally vulnerable benchmark applications rather than customer engagements? Something like:

### Benchmark Results

Shannon's capabilities are demonstrated through testing against industry-standard 
intentionally vulnerable applications designed by security organizations to 
benchmark penetration testing tools:

This would help potential users properly calibrate their expectations and understand that these results represent Shannon's performance against purpose-built vulnerable targets rather than real-world customer applications.

Originally created by @saiqulhaq on GitHub (Jan 4, 2026). Original GitHub issue: https://github.com/KeygraphHQ/shannon/issues/38 The [README](https://[github](https://github.com/KeygraphHQ/shannon/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen).com/KeygraphHQ/shannon) showcases Shannon's capabilities through sample reports in the `sample-reports/` directory, specifically: - shannon-report-capital-api.md - shannon-report-crapi.md - shannon-report-juice-shop.md While these reports demonstrate impressive vulnerability detection (20+ critical issues in Juice Shop, complete auth bypass, database exfiltration, etc.), it's not immediately clear to potential users whether these reports represent: 1. **Intentionally vulnerable benchmark applications** (e.g., OWASP Juice Shop, Checkmarx c{api}tal, OWASP crAPI) - which are designed specifically for testing security tools 2. **Real customer applications** that Shannon was contracted to test **Why this matters:** For organizations evaluating Shannon for their security testing needs, understanding whether these impressive results come from purpose-built vulnerable apps versus real-world applications significantly affects credibility and expected performance. Testing against deliberately insecure applications is valuable for demonstrating capabilities, but results from real customer engagements (with appropriate anonymization) would provide stronger validation of Shannon's effectiveness in production scenarios. **Suggested clarification:** Could the README explicitly state that these sample reports are from testing against intentionally vulnerable benchmark applications rather than customer engagements? Something like: ```markdown ### Benchmark Results Shannon's capabilities are demonstrated through testing against industry-standard intentionally vulnerable applications designed by security organizations to benchmark penetration testing tools: ``` This would help potential users properly calibrate their expectations and understand that these results represent Shannon's performance against purpose-built vulnerable targets rather than real-world customer applications.
kerem closed this issue 2026-02-27 07:19:58 +03:00
Author
Owner

@keygraphVarun commented on GitHub (Jan 5, 2026):

Thanks for the feedback.

To clarify: the sample reports section is already prefaced with "See Shannon's capabilities in action with real penetration test results from industry-standard vulnerable applications" - which is what Juice Shop, crAPI, and c{api}tal are. Each entry also links to the GitHub repo and describes what the app is (e.g., "A notoriously insecure web application maintained by OWASP").

That said, you make a fair point that the "Benchmark Results" header could be clearer. Our actual quantitative benchmark is the XBOW benchmark as linked at the very top of the README. We'll rename that section to "Sample Reports" and add a link to the XBOW results for folks looking for standardized evaluation metrics.
Thanks for raising it.

<!-- gh-comment-id:3712032132 --> @keygraphVarun commented on GitHub (Jan 5, 2026): Thanks for the feedback. To clarify: the sample reports section is already prefaced with _"See Shannon's capabilities in action with real penetration test results from **industry-standard vulnerable applications**"_ - which is what Juice Shop, crAPI, and c{api}tal are. Each entry also links to the GitHub repo and describes what the app is (e.g., _"A notoriously insecure web application maintained by OWASP"_). That said, you make a fair point that the "Benchmark Results" header could be clearer. Our actual quantitative benchmark is the [XBOW benchmark](https://github.com/KeygraphHQ/shannon/blob/main/xben-benchmark-results/README.md) as linked at the very top of the README. We'll rename that section to "Sample Reports" and add a link to the XBOW results for folks looking for standardized evaluation metrics. Thanks for raising it.
Sign in to join this conversation.
No labels
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/shannon-KeygraphHQ#10
No description provided.