Consolidates SQL Injection and Command Injection references to the unified "Injection" terminology for consistency with agent naming and OWASP categorization.
Changes:
- Updated feature descriptions and vulnerability lists
- Modified architecture diagrams
- Simplified targeted vulnerability scope
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Updates .gitignore to only ignore top-level audit-logs/ directory, allowing xben-benchmark-results audit logs to be tracked. This enables full reproducibility of benchmark runs with complete session data, prompts, and agent execution logs.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit adds a professional performance comparison chart showing Shannon's 96% success rate against other autonomous pentesting systems on the X-Bow benchmark.
Chart features:
- Y-axis properly starts at 0% (honest data visualization)
- Shannon bar highlighted in brand orange
- Descriptive title with sample size (104 challenges)
- SVG format for scalability
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit adds comprehensive X-Bow (XBEN) benchmark results demonstrating Shannon's performance across 104 CTF security challenges. Each test case includes detailed penetration testing reports and exploitation evidence for reproducible research.
Contents:
- 104 XBEN test case directories (XBEN-001-24 through XBEN-104-24)
- Deliverables including analysis reports and exploitation evidence
- Individual test case results with vulnerability assessments
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add a TIP callout in the Overview section documenting the ctf-mode branch
for users who want to run Shannon against Capture-The-Flag challenges with
optimized flag extraction prompts.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add a TIP callout in the Overview section documenting the ctf-mode branch
for users who want to run Shannon against Capture-The-Flag challenges with
optimized flag extraction prompts.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Change agent prefix from [SQLi/Cmd] to [Injection] to reflect expanded scope
- Add README documentation for CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable
This update aligns the display naming with the expanded injection analysis scope
that now covers SQLi, Command Injection, LFI/RFI, SSTI, Path Traversal, and
Insecure Deserialization vulnerabilities.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixes responsibility gap where agents found vulnerabilities but rejected them as "out of scope"
Changes:
- vuln-injection.txt: Added LFI/RFI, SSTI, Path Traversal, Deserialization to scope
- Updated role definition and objective
- Added new vulnerability_type and slot_type enums
- Added sink definitions and defense rules for new injection classes
- Added witness payload examples
- pre-recon-code.txt: Expanded sink hunter agent to find file/template/deserialize sinks
- recon.txt: Updated Section 9 with clear injection source definitions for all types
- exploit-injection.txt: Updated evidence template to handle all injection types
Token-optimized: Condensed verbose sections while preserving critical guidance
Addresses XBEN benchmark failures where LFI/SSTI/Path Traversal were detected but excluded from exploitation queues
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Introduces .env file configuration to manage CLAUDE_CODE_MAX_TOKENS, allowing flexible control of the context window size for AI analysis sessions. This enables users to tune token limits based on their specific penetration testing needs without modifying code.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This reverts the timestamp-based naming scheme that was causing audit log
fragmentation. Each agent execution was creating a new folder because the
timestamp kept changing.
Reverting back to simple, stable naming: {hostname}_{sessionId}
This ensures ONE folder per session, preventing the bug where multiple
folders were created for the same session.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed bug where audit system would create duplicate folders for the same
session because it was using current time instead of the session's original
createdAt timestamp.
Bug behavior:
- Session created at T1 → folder: {T1}_app_host_id/
- Audit re-initialized at T2 → NEW folder: {T2}_app_host_id/
- Result: 2 folders per session with same ID but different timestamps
Root cause:
- metrics-tracker.js:65 was calling formatTimestamp() (current time)
- Should use sessionMetadata.createdAt (original creation time)
Impact: Each running benchmark was creating 2 audit log folders instead of 1
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Enhances audit log directory naming from `{hostname}_{uuid}` to
`{timestamp}_{appName}_{hostname}_{shortId}` for better discoverability
and benchmarking analysis.
Changes:
- Add extractAppName() helper to extract app name from config files
- Add smart fallback: use port number for localhost without config
- Update generateSessionIdentifier() to include timestamp prefix
- Shorten session ID to first 8 characters for readability
Examples:
- With config: 20251025T193847Z_myapp_localhost_efc60ee0/
- Without config: 20251025T193913Z_8080_localhost_d47e3bfd/
- Remote: 20251024T004401Z_noconfig_example-com_d47e3bfd/
Benefits:
- Chronologically sortable audit logs
- Instant app identification in directory listings
- Efficient filtering for benchmarking queries
- Non-breaking: existing logs keep their names
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Resolves Playwright browser installation failures in Docker by using Wolfi's
system Chromium instead of downloading Playwright's bundled browsers at runtime.
## Problem
When running in Docker, agents attempted to install browsers via `browser_install`
tool, which failed due to:
- Permission issues (non-root user couldn't install system dependencies)
- npx @playwright/mcp spawns with its own Playwright dependency separate from
global installations
- Playwright's bundled browsers require runtime download (~280MB) and glibc deps
- Environment variables alone (PLAYWRIGHT_BROWSERS_PATH) weren't sufficient
## Solution
**Dockerfile changes:**
- Use Wolfi's native `chromium` package (guaranteed compatible, already installed)
- Remove Playwright browser installation step (saves ~280MB and build time)
- Add explicit `SHANNON_DOCKER=true` environment variable for reliable detection
- Set PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH to point to system Chromium
**Code changes (claude-executor.js):**
- Detect Docker via `process.env.SHANNON_DOCKER` (more reliable than /.dockerenv)
- Conditionally add `--executable-path /usr/bin/chromium-browser` CLI arg for Docker
- Local: Use Playwright's bundled browsers (downloaded to ~/Library/Caches/)
- Docker: Use system Chromium with no runtime downloads
## Research Findings
- @playwright/mcp has separate playwright-core dependency (v1.56.0-alpha)
- MCP server spawned via npx doesn't inherit browser binaries from global install
- --executable-path CLI argument is required (env vars insufficient)
- /.dockerenv file is unreliable (missing in BuildKit, K8s, can be spoofed)
## Testing
✅ Docker: All 5 parallel agents successfully navigate, screenshot, create deliverables
✅ Local: All 5 parallel agents successfully navigate, screenshot, create deliverables
✅ No browser_install calls, no permission errors
✅ Image size reduced by ~280MB
Fixes #docker-playwright-browser-issues
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Reduce prompts/pipeline-testing/report-executive.txt from 137 to 30 lines by:
- Removing hardcoded detailed vulnerability content
- Testing actual workflow (read → modify → save) instead of creating from scratch
- Removing meta-commentary, keeping only direct instructions
- Making it consistent with other pipeline testing prompts (30 lines like exploit agents)
The prompt now properly mimics the real reporting agent behavior where the orchestration code stitches files first, then the agent modifies the result.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implemented @include() directive system to eliminate ~800 lines of duplicated content across 10 specialist prompt files. All prompt-related content now consolidated under prompts/ directory for better maintainability.
Changes:
- Added processIncludes() to prompt-manager.js for generic @include() support
- Created prompts/shared/ with 5 reusable template files
- Refactored all 10 specialist prompts to use @include() for common sections
- Moved login_instructions.txt to prompts/shared/ (deleted login_resources/)
- Updated CLAUDE.md to reflect new structure
Impact: -137 net lines, zero breaking changes, infinitely scalable for future shared content.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Anthropic rebranded the SDK in 2025 from "Claude Code SDK" to "Claude Agent SDK". Updated all references across package.json, Dockerfile, and documentation to use the current @anthropic-ai/claude-agent-sdk package.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
ROOT CAUSE:
- Exploitation phase checked session.validationResults to determine eligibility
- validationResults field was removed during audit system refactor
- Field never existed in session schema, so all exploits were skipped
THE FIX:
- Exploitation phase now validates queue files directly when checking eligibility
- Reads exploitation_queue.json and checks if vulnerabilities array is non-empty
- No need to store validation results - just re-validate on demand
CHANGES:
1. runParallelExploit() now calls safeValidateQueueAndDeliverable() directly
2. Removed validationResults parameter from markAgentCompleted()
3. Simplified calculateVulnerabilityAnalysisSummary() - no longer needs validation data
4. Simplified calculateExploitationSummary() - no longer needs validation data
IMPACT:
- Exploitation agents will now run when vulnerabilities are found
- Queue files are the single source of truth for eligibility
- Simpler architecture - no duplicate state storage
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Reasoning:
- Pollutes target repo with run-metadata.json
- Redundant with audit system (session.json has all metadata)
- Less useful than comprehensive audit logs
- Target repos should stay clean - only deliverables belong there
All debugging info now lives in audit-logs/{hostname}_{sessionId}/session.json
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Added comprehensive header comment explaining use case
- Documents data source (session.json from audit-logs)
- CSV output format and use cases clearly described
- Includes usage examples and note about raw data access
- Removes need for separate docs/ folder in repo
Docs were design artifacts, not needed in open source repo.
All relevant documentation now lives in code comments.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Reasoning:
- Shannon is a local CLI tool with direct filesystem access
- Manual file editing (JSON, rm -rf) is simpler than reconciliation script
- Automatic reconciliation runs before every command (built-in)
- If auto-reconciliation has bugs, fix the code, don't create workarounds
- Over-engineered for a local development tool
For recovery: Just delete .shannon-store.json or edit JSON files directly
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Remove unnecessary screenshot storage to reduce file I/O and disk usage:
- Removed screenshot directory creation
- Removed --output-dir flag from Playwright MCP setup
- Agents can still take screenshots, but they won't persist to disk
Screenshots were not being used by any part of Shannon for analysis
or reporting, making their storage unnecessary overhead.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Simplified deliverable management by removing automatic copying to ~/Documents/pentest-deliverables/. All deliverables now remain only in <target-repo>/deliverables/, eliminating file duplication and improving UX.
Changes:
- Removed savePermanentDeliverables() function from src/setup/deliverables.js
- Removed function call and related console output from shannon.mjs
- Removed unused 'os' import from deliverables.js
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>