- Remove .env file with incorrect CLAUDE_CODE_MAX_TOKENS variable
- Remove .env copy from Dockerfile that was causing build to fail
- Update README to distinguish local (export) vs Docker (-e) env var usage
- Add CLAUDE_CODE_MAX_OUTPUT_TOKENS to all Docker run examples
The correct variable is CLAUDE_CODE_MAX_OUTPUT_TOKENS (not CLAUDE_CODE_MAX_TOKENS)
and should be passed at runtime via -e flag for Docker or export for local runs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Consolidates SQL Injection and Command Injection references to the unified "Injection" terminology for consistency with agent naming and OWASP categorization.
Changes:
- Updated feature descriptions and vulnerability lists
- Modified architecture diagrams
- Simplified targeted vulnerability scope
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Updates .gitignore to only ignore top-level audit-logs/ directory, allowing xben-benchmark-results audit logs to be tracked. This enables full reproducibility of benchmark runs with complete session data, prompts, and agent execution logs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit adds a professional performance comparison chart showing Shannon's 96% success rate against other autonomous pentesting systems on the X-Bow benchmark.
Chart features:
- Y-axis properly starts at 0% (honest data visualization)
- Shannon bar highlighted in brand orange
- Descriptive title with sample size (104 challenges)
- SVG format for scalability
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit adds comprehensive X-Bow (XBEN) benchmark results demonstrating Shannon's performance across 104 CTF security challenges. Each test case includes detailed penetration testing reports and exploitation evidence for reproducible research.
Contents:
- 104 XBEN test case directories (XBEN-001-24 through XBEN-104-24)
- Deliverables including analysis reports and exploitation evidence
- Individual test case results with vulnerability assessments
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add a TIP callout in the Overview section documenting the ctf-mode branch
for users who want to run Shannon against Capture-The-Flag challenges with
optimized flag extraction prompts.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Change agent prefix from [SQLi/Cmd] to [Injection] to reflect expanded scope
- Add README documentation for CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable
This update aligns the display naming with the expanded injection analysis scope
that now covers SQLi, Command Injection, LFI/RFI, SSTI, Path Traversal, and
Insecure Deserialization vulnerabilities.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixes responsibility gap where agents found vulnerabilities but rejected them as "out of scope"
Changes:
- vuln-injection.txt: Added LFI/RFI, SSTI, Path Traversal, Deserialization to scope
- Updated role definition and objective
- Added new vulnerability_type and slot_type enums
- Added sink definitions and defense rules for new injection classes
- Added witness payload examples
- pre-recon-code.txt: Expanded sink hunter agent to find file/template/deserialize sinks
- recon.txt: Updated Section 9 with clear injection source definitions for all types
- exploit-injection.txt: Updated evidence template to handle all injection types
Token-optimized: Condensed verbose sections while preserving critical guidance
Addresses XBEN benchmark failures where LFI/SSTI/Path Traversal were detected but excluded from exploitation queues
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Introduces .env file configuration to manage CLAUDE_CODE_MAX_TOKENS, allowing flexible control of the context window size for AI analysis sessions. This enables users to tune token limits based on their specific penetration testing needs without modifying code.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This reverts the timestamp-based naming scheme that was causing audit log
fragmentation. Each agent execution was creating a new folder because the
timestamp kept changing.
Reverting back to simple, stable naming: {hostname}_{sessionId}
This ensures ONE folder per session, preventing the bug where multiple
folders were created for the same session.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed bug where audit system would create duplicate folders for the same
session because it was using current time instead of the session's original
createdAt timestamp.
Bug behavior:
- Session created at T1 → folder: {T1}_app_host_id/
- Audit re-initialized at T2 → NEW folder: {T2}_app_host_id/
- Result: 2 folders per session with same ID but different timestamps
Root cause:
- metrics-tracker.js:65 was calling formatTimestamp() (current time)
- Should use sessionMetadata.createdAt (original creation time)
Impact: Each running benchmark was creating 2 audit log folders instead of 1
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Enhances audit log directory naming from `{hostname}_{uuid}` to
`{timestamp}_{appName}_{hostname}_{shortId}` for better discoverability
and benchmarking analysis.
Changes:
- Add extractAppName() helper to extract app name from config files
- Add smart fallback: use port number for localhost without config
- Update generateSessionIdentifier() to include timestamp prefix
- Shorten session ID to first 8 characters for readability
Examples:
- With config: 20251025T193847Z_myapp_localhost_efc60ee0/
- Without config: 20251025T193913Z_8080_localhost_d47e3bfd/
- Remote: 20251024T004401Z_noconfig_example-com_d47e3bfd/
Benefits:
- Chronologically sortable audit logs
- Instant app identification in directory listings
- Efficient filtering for benchmarking queries
- Non-breaking: existing logs keep their names
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Resolves Playwright browser installation failures in Docker by using Wolfi's
system Chromium instead of downloading Playwright's bundled browsers at runtime.
## Problem
When running in Docker, agents attempted to install browsers via `browser_install`
tool, which failed due to:
- Permission issues (non-root user couldn't install system dependencies)
- npx @playwright/mcp spawns with its own Playwright dependency separate from
global installations
- Playwright's bundled browsers require runtime download (~280MB) and glibc deps
- Environment variables alone (PLAYWRIGHT_BROWSERS_PATH) weren't sufficient
## Solution
**Dockerfile changes:**
- Use Wolfi's native `chromium` package (guaranteed compatible, already installed)
- Remove Playwright browser installation step (saves ~280MB and build time)
- Add explicit `SHANNON_DOCKER=true` environment variable for reliable detection
- Set PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH to point to system Chromium
**Code changes (claude-executor.js):**
- Detect Docker via `process.env.SHANNON_DOCKER` (more reliable than /.dockerenv)
- Conditionally add `--executable-path /usr/bin/chromium-browser` CLI arg for Docker
- Local: Use Playwright's bundled browsers (downloaded to ~/Library/Caches/)
- Docker: Use system Chromium with no runtime downloads
## Research Findings
- @playwright/mcp has separate playwright-core dependency (v1.56.0-alpha)
- MCP server spawned via npx doesn't inherit browser binaries from global install
- --executable-path CLI argument is required (env vars insufficient)
- /.dockerenv file is unreliable (missing in BuildKit, K8s, can be spoofed)
## Testing
✅ Docker: All 5 parallel agents successfully navigate, screenshot, create deliverables
✅ Local: All 5 parallel agents successfully navigate, screenshot, create deliverables
✅ No browser_install calls, no permission errors
✅ Image size reduced by ~280MB
Fixes #docker-playwright-browser-issues
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>