- Remove .env file with incorrect CLAUDE_CODE_MAX_TOKENS variable
- Remove .env copy from Dockerfile that was causing build to fail
- Update README to distinguish local (export) vs Docker (-e) env var usage
- Add CLAUDE_CODE_MAX_OUTPUT_TOKENS to all Docker run examples
The correct variable is CLAUDE_CODE_MAX_OUTPUT_TOKENS (not CLAUDE_CODE_MAX_TOKENS)
and should be passed at runtime via -e flag for Docker or export for local runs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Consolidates SQL Injection and Command Injection references to the unified "Injection" terminology for consistency with agent naming and OWASP categorization.
Changes:
- Updated feature descriptions and vulnerability lists
- Modified architecture diagrams
- Simplified targeted vulnerability scope
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Updates .gitignore to only ignore top-level audit-logs/ directory, allowing xben-benchmark-results audit logs to be tracked. This enables full reproducibility of benchmark runs with complete session data, prompts, and agent execution logs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit adds a professional performance comparison chart showing Shannon's 96% success rate against other autonomous pentesting systems on the X-Bow benchmark.
Chart features:
- Y-axis properly starts at 0% (honest data visualization)
- Shannon bar highlighted in brand orange
- Descriptive title with sample size (104 challenges)
- SVG format for scalability
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit adds comprehensive X-Bow (XBEN) benchmark results demonstrating Shannon's performance across 104 CTF security challenges. Each test case includes detailed penetration testing reports and exploitation evidence for reproducible research.
Contents:
- 104 XBEN test case directories (XBEN-001-24 through XBEN-104-24)
- Deliverables including analysis reports and exploitation evidence
- Individual test case results with vulnerability assessments
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add a TIP callout in the Overview section documenting the ctf-mode branch
for users who want to run Shannon against Capture-The-Flag challenges with
optimized flag extraction prompts.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Change agent prefix from [SQLi/Cmd] to [Injection] to reflect expanded scope
- Add README documentation for CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable
This update aligns the display naming with the expanded injection analysis scope
that now covers SQLi, Command Injection, LFI/RFI, SSTI, Path Traversal, and
Insecure Deserialization vulnerabilities.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixes responsibility gap where agents found vulnerabilities but rejected them as "out of scope"
Changes:
- vuln-injection.txt: Added LFI/RFI, SSTI, Path Traversal, Deserialization to scope
- Updated role definition and objective
- Added new vulnerability_type and slot_type enums
- Added sink definitions and defense rules for new injection classes
- Added witness payload examples
- pre-recon-code.txt: Expanded sink hunter agent to find file/template/deserialize sinks
- recon.txt: Updated Section 9 with clear injection source definitions for all types
- exploit-injection.txt: Updated evidence template to handle all injection types
Token-optimized: Condensed verbose sections while preserving critical guidance
Addresses XBEN benchmark failures where LFI/SSTI/Path Traversal were detected but excluded from exploitation queues
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Introduces .env file configuration to manage CLAUDE_CODE_MAX_TOKENS, allowing flexible control of the context window size for AI analysis sessions. This enables users to tune token limits based on their specific penetration testing needs without modifying code.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This reverts the timestamp-based naming scheme that was causing audit log
fragmentation. Each agent execution was creating a new folder because the
timestamp kept changing.
Reverting back to simple, stable naming: {hostname}_{sessionId}
This ensures ONE folder per session, preventing the bug where multiple
folders were created for the same session.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed bug where audit system would create duplicate folders for the same
session because it was using current time instead of the session's original
createdAt timestamp.
Bug behavior:
- Session created at T1 → folder: {T1}_app_host_id/
- Audit re-initialized at T2 → NEW folder: {T2}_app_host_id/
- Result: 2 folders per session with same ID but different timestamps
Root cause:
- metrics-tracker.js:65 was calling formatTimestamp() (current time)
- Should use sessionMetadata.createdAt (original creation time)
Impact: Each running benchmark was creating 2 audit log folders instead of 1
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Enhances audit log directory naming from `{hostname}_{uuid}` to
`{timestamp}_{appName}_{hostname}_{shortId}` for better discoverability
and benchmarking analysis.
Changes:
- Add extractAppName() helper to extract app name from config files
- Add smart fallback: use port number for localhost without config
- Update generateSessionIdentifier() to include timestamp prefix
- Shorten session ID to first 8 characters for readability
Examples:
- With config: 20251025T193847Z_myapp_localhost_efc60ee0/
- Without config: 20251025T193913Z_8080_localhost_d47e3bfd/
- Remote: 20251024T004401Z_noconfig_example-com_d47e3bfd/
Benefits:
- Chronologically sortable audit logs
- Instant app identification in directory listings
- Efficient filtering for benchmarking queries
- Non-breaking: existing logs keep their names
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>