diff --git a/README.md b/README.md
index e88e6ef..e66c6dd 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,6 @@
> [!NOTE]
-> **[Shannon Lite achieves a 96.15% success rate on the hint-free XBOW benchmark, surpassing top human pentesters. →](https://github.com/KeygraphHQ/shannon/tree/main/xben-benchmark-results/README.md)**
+> **[Shannon Lite achieves a 96.15% success rate on a hint-free, source-aware XBOW benchmark. →](https://github.com/KeygraphHQ/shannon/tree/main/xben-benchmark-results/README.md)**
+
@@ -54,7 +55,6 @@ Shannon closes this gap by acting as your on-demand whitebox pentester. It doesn
- **Powered by Integrated Security Tools**: Enhances its discovery phase by leveraging leading reconnaissance and testing tools—including **Nmap, Subfinder, WhatWeb, and Schemathesis**—for deep analysis of the target environment.
- **Parallel Processing for Faster Results**: Get your report faster. The system parallelizes the most time-intensive phases, running analysis and exploitation for all vulnerability types concurrently.
-
## 📦 Product Line
Shannon is available in two editions:
@@ -65,7 +65,11 @@ Shannon is available in two editions:
| **Shannon Pro** | Commercial | Enterprises requiring advanced features, CI/CD integration, and dedicated support |
> **This repository contains Shannon Lite,** which utilizes our core autonomous AI pentesting framework. **Shannon Pro** enhances this foundation with an advanced, LLM-powered data flow analysis engine (inspired by the [LLMDFA paper](https://arxiv.org/abs/2402.10754)) for enterprise-grade code analysis and deeper vulnerability detection.
->
+
+> [!IMPORTANT]
+> **White-box only.** Shannon Lite is designed for **white-box (source-available)** application security testing.
+> It expects access to your application's source code and repository layout.
+
[See feature comparison](./SHANNON-PRO.md)
## 📑 Table of Contents
diff --git a/assets/xbow-performance-comparison.png b/assets/xbow-performance-comparison.png
index b9fff52..f89e5dd 100644
Binary files a/assets/xbow-performance-comparison.png and b/assets/xbow-performance-comparison.png differ
diff --git a/xben-benchmark-results/README.md b/xben-benchmark-results/README.md
index 18521dd..8bb0707 100644
--- a/xben-benchmark-results/README.md
+++ b/xben-benchmark-results/README.md
@@ -1,6 +1,8 @@
-# Achieving 96.15% Success on the hint-free XBOW Benchmark
+# Achieving 96.15% Success on a Hint-Free, Source-Aware XBOW Benchmark
-Shannon Lite, our open-source AI pentester, achieved a **96.15% success rate (100/104 exploits)** on a systematically cleaned, hint-free version of the XBOW security benchmark. This performance surpasses the 85% score achieved by both leading AI agents and expert human penetration testers on the original benchmark.
+Shannon Lite, our open-source AI pentester, achieved a **96.15% success rate (100/104 exploits)** on a systematically cleaned, hint-free version of the XBOW security benchmark, running in a *white-box (source-available)* configuration.
+
+For context, previously reported XBOW results for leading AI agents and expert human penetration testers achieved around 85% success on the original benchmark in *black-box mode*. Because Shannon was evaluated with full access to source code on a cleaned, hint-free variant, these results are not *apples-to-apples*, but they do highlight Shannon’s ability to perform deep, code-level reasoning in a realistic internal security review setting.
**Shannon Github:** [github.com/KeygraphHQ/shannon](https://github.com/KeygraphHQ/shannon)
@@ -14,7 +16,7 @@ Shannon Lite, our open-source AI pentester, achieved a **96.15% success rate (1
**About the benchmark:** XBOW is an open-source security benchmark containing 104 intentionally vulnerable applications designed to test AI agent capabilities on realistic penetration testing scenarios.
-We tested against a fully cleaned, hint-free version of the benchmark—removing shortcuts like descriptive variable names, comments, and filenames that could artificially boost performance. This represents a more realistic evaluation of Shannon's core analysis and reasoning capabilities.
+We tested against a fully cleaned, hint-free version of the benchmark in white-box mode, removing shortcuts like descriptive variable names, comments, and filenames that could artificially boost performance. This represents a more realistic evaluation of Shannon's core analysis and reasoning capabilities.
---
@@ -142,7 +144,7 @@ We're releasing everything needed for independent validation:
- Turn-by-turn agentic logs
- **Available in the same repository**
-We believe reproducible research is the only way to make genuine progress. Use these resources to validate our findings, benchmark your own tools, or build upon this work.
+These resources record the benchmark configuration and complete results for all 104 challenges.
---