docs: clarify Shannon is a white-box pentesting tool

- Add prominent callout that Shannon Lite is designed for white-box
  (source-available) application security testing
- Update XBOW benchmark description to "hint-free, source-aware"
- Clarify benchmark comparison context (white-box vs black-box results)
- Update benchmark performance comparison image

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
ajmallesh
2025-11-24 12:37:55 -08:00
parent 369e3a34cf
commit 8f2825b32f
3 changed files with 13 additions and 7 deletions

View File

@@ -1,5 +1,6 @@
> [!NOTE]
> **[Shannon Lite achieves a 96.15% success rate on the hint-free XBOW benchmark, surpassing top human pentesters. &rarr;](https://github.com/KeygraphHQ/shannon/tree/main/xben-benchmark-results/README.md)**
> **[Shannon Lite achieves a 96.15% success rate on a hint-free, source-aware XBOW benchmark. &rarr;](https://github.com/KeygraphHQ/shannon/tree/main/xben-benchmark-results/README.md)**
<div align="center">
@@ -54,7 +55,6 @@ Shannon closes this gap by acting as your on-demand whitebox pentester. It doesn
- **Powered by Integrated Security Tools**: Enhances its discovery phase by leveraging leading reconnaissance and testing tools—including **Nmap, Subfinder, WhatWeb, and Schemathesis**—for deep analysis of the target environment.
- **Parallel Processing for Faster Results**: Get your report faster. The system parallelizes the most time-intensive phases, running analysis and exploitation for all vulnerability types concurrently.
## 📦 Product Line
Shannon is available in two editions:
@@ -65,7 +65,11 @@ Shannon is available in two editions:
| **Shannon Pro** | Commercial | Enterprises requiring advanced features, CI/CD integration, and dedicated support |
> **This repository contains Shannon Lite,** which utilizes our core autonomous AI pentesting framework. **Shannon Pro** enhances this foundation with an advanced, LLM-powered data flow analysis engine (inspired by the [LLMDFA paper](https://arxiv.org/abs/2402.10754)) for enterprise-grade code analysis and deeper vulnerability detection.
>
> [!IMPORTANT]
> **White-box only.** Shannon Lite is designed for **white-box (source-available)** application security testing.
> It expects access to your application's source code and repository layout.
[See feature comparison](./SHANNON-PRO.md)
## 📑 Table of Contents

Binary file not shown.

Before

Width:  |  Height:  |  Size: 160 KiB

After

Width:  |  Height:  |  Size: 170 KiB

View File

@@ -1,6 +1,8 @@
# Achieving 96.15% Success on the hint-free XBOW Benchmark
# Achieving 96.15% Success on a Hint-Free, Source-Aware XBOW Benchmark
Shannon Lite, our open-source AI pentester, achieved a **96.15% success rate (100/104 exploits)** on a systematically cleaned, hint-free version of the XBOW security benchmark. This performance surpasses the 85% score achieved by both leading AI agents and expert human penetration testers on the original benchmark.
Shannon Lite, our open-source AI pentester, achieved a **96.15% success rate (100/104 exploits)** on a systematically cleaned, hint-free version of the XBOW security benchmark, running in a *white-box (source-available)* configuration.
For context, previously reported XBOW results for leading AI agents and expert human penetration testers achieved around 85% success on the original benchmark in *black-box mode*. Because Shannon was evaluated with full access to source code on a cleaned, hint-free variant, these results are not *apples-to-apples*, but they do highlight Shannons ability to perform deep, code-level reasoning in a realistic internal security review setting.
**Shannon Github:** [github.com/KeygraphHQ/shannon](https://github.com/KeygraphHQ/shannon)
@@ -14,7 +16,7 @@ Shannon Lite, our open-source AI pentester, achieved a **96.15% success rate (1
**About the benchmark:** XBOW is an open-source security benchmark containing 104 intentionally vulnerable applications designed to test AI agent capabilities on realistic penetration testing scenarios.
We tested against a fully cleaned, hint-free version of the benchmarkremoving shortcuts like descriptive variable names, comments, and filenames that could artificially boost performance. This represents a more realistic evaluation of Shannon's core analysis and reasoning capabilities.
We tested against a fully cleaned, hint-free version of the benchmark in white-box mode, removing shortcuts like descriptive variable names, comments, and filenames that could artificially boost performance. This represents a more realistic evaluation of Shannon's core analysis and reasoning capabilities.
---
@@ -142,7 +144,7 @@ We're releasing everything needed for independent validation:
- Turn-by-turn agentic logs
- **Available in the same repository**
We believe reproducible research is the only way to make genuine progress. Use these resources to validate our findings, benchmark your own tools, or build upon this work.
These resources record the benchmark configuration and complete results for all 104 challenges.
---