mirror of https://github.com/KeygraphHQ/shannon.git synced 2026-07-09 22:48:45 +02:00

Files

T

ajmallesh 8f2825b32f docs: clarify Shannon is a white-box pentesting tool

- Add prominent callout that Shannon Lite is designed for white-box
  (source-available) application security testing
- Update XBOW benchmark description to "hint-free, source-aware"
- Clarify benchmark comparison context (white-box vs black-box results)
- Update benchmark performance comparison image

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-24 12:37:55 -08:00

XBEN-001-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-002-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-003-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-004-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-005-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-006-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-007-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-008-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-009-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-010-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-011-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-012-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-013-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-014-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-015-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-016-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-017-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-018-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-019-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-020-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-021-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-022-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-023-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-024-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-025-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-026-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-027-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-028-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-029-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-030-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-031-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-032-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-033-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-034-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-035-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-036-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-037-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-038-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-039-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-040-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-041-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-042-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-043-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-044-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-045-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-046-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-047-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-048-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-049-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-050-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-051-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-052-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-053-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-054-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-055-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-056-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-057-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-058-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-059-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-060-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-061-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-062-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-063-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-064-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-065-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-066-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-067-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-068-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-069-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-070-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-071-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-072-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-073-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-074-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-075-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-076-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-077-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-078-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-079-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-080-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-081-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-082-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-083-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-084-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-085-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-086-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-087-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-088-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-089-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-090-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-091-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-092-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-093-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-094-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-095-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-096-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-097-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-098-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-099-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-100-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-101-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-102-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-103-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

XBEN-104-24

chore: add MPL license comments

2025-11-13 16:55:13 +05:30

README.md

docs: clarify Shannon is a white-box pentesting tool

2025-11-24 12:37:55 -08:00

README.md

Achieving 96.15% Success on a Hint-Free, Source-Aware XBOW Benchmark

Shannon Lite, our open-source AI pentester, achieved a 96.15% success rate (100/104 exploits) on a systematically cleaned, hint-free version of the XBOW security benchmark, running in a white-box (source-available) configuration.

For context, previously reported XBOW results for leading AI agents and expert human penetration testers achieved around 85% success on the original benchmark in black-box mode. Because Shannon was evaluated with full access to source code on a cleaned, hint-free variant, these results are not apples-to-apples, but they do highlight Shannon’s ability to perform deep, code-level reasoning in a realistic internal security review setting.

Shannon Github: github.com/KeygraphHQ/shannon

Cleaned Benchmark: xbow-validation-benchmarks

Benchmark Results, with detailed turn by turn agentic logs and full pentest report for each challenge: View Full Results

Data sourced from: XBOW and Cyber-AutoAgent

About the benchmark: XBOW is an open-source security benchmark containing 104 intentionally vulnerable applications designed to test AI agent capabilities on realistic penetration testing scenarios.

We tested against a fully cleaned, hint-free version of the benchmark in white-box mode, removing shortcuts like descriptive variable names, comments, and filenames that could artificially boost performance. This represents a more realistic evaluation of Shannon's core analysis and reasoning capabilities.

Why This Matters: From Annual Audits to Continuous Security

Modern development teams ship code constantly. Your penetration test? That happens once a year, maybe twice if you're diligent. This creates a 364-day security gap where vulnerabilities can silently ship to production.

Shannon closes this gap by delivering autonomous, on-demand penetration testing with proof-based validation. It doesn't just flag potential issues, it executes real exploits to confirm vulnerabilities are actually exploitable.

The shift in security economics:

Metric	Traditional Pentest	Shannon
Cost	$10,000+	~$16 (API costs)
Time	Weeks to months, scheduled	Under 1.5 hours, on-demand
Frequency	1-2x per year	Every deployment

The 96.15% success rate on XBOW demonstrates that autonomous, continuous security testing is no longer theoretical, it's ready for real-world use.

Performance Across Vulnerability Classes

Shannon demonstrated consistent performance across diverse attack vectors:

Vulnerability Type	Total Challenges	Succeeded	Success Rate
Broken Authorization	25	25	100%
SQL Injection	7	7	100%
Blind SQL Injection	3	3	100%
SSRF / Misconfiguration	22	21	95.45%
XSS	23	22	95.65%
Server-Side Template Injection	13	12	92.31%
Command Injection	11	10	90.91%

This consistency reflects Shannon's structured, phase-based approach that maintains strategic coherence through complex, multi-step attack chains.

Transparent Analysis: Our 4 Failures

Here is a detailed analysis of the 4 challenges Shannon did not solve. These failures highlight current limitations and define our roadmap.

XBEN-10 (XSS):
- Diagnosis: The agent correctly identified the vulnerability and character blacklist but incorrectly assessed the limitations of a JSFuck payload, abandoning a viable exploit path.
- Insight: A knowledge gap exists in Shannon's payload encoding and obfuscation library.
XBEN-22 (SSTI/Default Credentials/Path Traversal):
- Diagnosis: While it successfully exploited 2 of the 3 vulnerabilities, the agent misclassified the Server-Side Template Injection (SSTI) vulnerability as a false positive and did not proceed with exploitation.
- Insight: The agent's classification model for SSTI needs refinement to reduce false negatives.
XBEN-34 (RFI):
- Diagnosis: The agent correctly found the file inclusion vulnerability but misclassified it as Local File Inclusion (LFI) instead of Remote File Inclusion (RFI), leading it to attempt the wrong exploitation technique.
- Insight: The classification logic between LFI and RFI must be improved based on server configuration analysis.
XBEN-82 (Command Injection via SSRF):
- Diagnosis: Shannon identified the full attack path but failed on two fronts: the analysis agent misclassified eval() as incapable of OS command execution, and the exploitation agent failed to initiate a local web server for the payload.
- Insight: Agent capabilities need to be updated to correctly classify eval() risks and to utilize available local tooling for payload delivery.

How Shannon Works: Proof by Exploitation

Shannon follows a structured, five-phase workflow designed to eliminate false positives:

    Reconnaissance → Vulnerability Analysis → Exploitation → Reporting

The key difference: Shannon doesn't stop at detection. Every reported vulnerability includes a working proof-of-concept exploit. If Shannon can't successfully exploit a vulnerability, it's not included in the report. No exploit = no report.

This "proof by exploitation" approach ensures every finding is:

Verified: Confirmed through actual exploitation
Reproducible: Includes copy-paste PoC code
Actionable: Shows real impact, not theoretical risk

Shannon utilizes specialized agents for different vulnerability classes, running analysis and exploitation in parallel for efficiency. The system integrates industry-standard tools (Nmap, Subfinder, WhatWeb, Schemathesis) with custom browser automation and code analysis.

For the complete technical breakdown, see our article Proof by Exploitation: Shannon's Approach to Autonomous Penetration Testing and the GitHub repository.

What’s Next: Shannon Pro and Beyond

The 4 failures we analyzed above directly inform our immediate roadmap.

Shannon Pro is coming: While Shannon Lite uses a straightforward, context-window-based approach to code analysis, Shannon Pro will feature an advanced LLM-powered data flow analysis engine (inspired by the LLMDFA paper) that provides comprehensive, graph-based analysis of entire codebases, enabling detection of complex vulnerabilities that span multiple files and modules.

Shannon Pro will also bring enterprise-grade capabilities: production orchestration, dedicated support, and seamless integration into existing security and compliance workflows.

Beyond Shannon Pro, we're working toward a vision where security testing is as continuous as deployment:

Deeper coverage: Expanding to additional OWASP categories and complex multi-step exploits
CI/CD integration: Native support for automated testing in deployment pipelines
Faster iteration: Optimizing for both thoroughness and speed

The 96.15% success rate on the XBOW benchmark demonstrates the feasibility. The next step is making autonomous pentesting a standard part of every development workflow.

Please fill out this form if you are interested in Shannon Pro.

Open Source Release: Benchmarks and Complete Results

We're releasing everything needed for independent validation:

1. The Cleaned XBOW Benchmark

All 104 challenges with hints systematically removed
Enables reproducible, unbiased agent evaluation
Available at: KeygraphHQ/xbow-validation-benchmarks

2. Complete Shannon Results Package

All 104 penetration testing reports
Turn-by-turn agentic logs
Available in the same repository

These resources record the benchmark configuration and complete results for all 104 challenges.

Join the Community

GitHub: KeygraphHQ/shannon
Discord: Join our Discord
Twitter/X: @KeygraphHQ
Enterprise inquiries: shannon@keygraph.io

Appendix: Methodology Notes

Benchmark Cleaning Process

The original XBOW benchmark contains unintentional hints that can guide AI agents in white-box testing scenarios. To conduct a rigorous evaluation, we systematically removed hints from all 104 challenges:

Descriptive variable names
Source code comments
Filepaths and filenames
Application titles
Dockerfile configurations

Shannon's 96.15% success rate was achieved exclusively on this cleaned version, representing a more realistic assessment of autonomous pentesting capabilities.

This cleaned benchmark is now available to the research community to establish a more rigorous standard for evaluating security agents.

Adaptation for CTF-Style Testing

Shannon was originally designed as a generalist penetration testing system for complex, production applications. The XBOW benchmark consists of simpler, isolated CTF-style challenges.

Shannon's production-grade workflow includes comprehensive reconnaissance and vulnerability analysis phases that run regardless of target complexity. While this thoroughness is essential for real-world applications, it adds overhead on simpler CTF targets.

Additionally, Shannon's primary goal is exploit confirmation rather than CTF flag capture. A straightforward adaptation was made to extract flags when exploits succeeded, reflected in our public repository.

Built with ❤️ by the Keygraph team

Making application security accessible to everyone

README.md Unescape Escape

Achieving 96.15% Success on a Hint-Free, Source-Aware XBOW Benchmark

Why This Matters: From Annual Audits to Continuous Security

Performance Across Vulnerability Classes

Transparent Analysis: Our 4 Failures

How Shannon Works: Proof by Exploitation

What’s Next: Shannon Pro and Beyond

Open Source Release: Benchmarks and Complete Results

Join the Community

Appendix: Methodology Notes

Benchmark Cleaning Process

Adaptation for CTF-Style Testing

README.md