shannon

mirror of https://github.com/KeygraphHQ/shannon.git synced 2026-07-05 04:38:03 +02:00

Author	SHA1	Message	Date
ajmallesh	515ade8302	fix: configure git to trust all directories in Docker Co-Authored-By: Khaushik-keygraph <khaushik.contractor@keygraph.io>	2025-12-15 10:34:25 -08:00
ajmallesh	26b42ecd67	docs: add Docker instructions for testing local applications Co-Authored-By: Khaushik-keygraph <khaushik.contractor@keygraph.io>	2025-12-15 10:34:24 -08:00
Khaushik-keygraph	37409a24fb	chore: added disable loader functionality	2025-12-10 00:59:56 +05:30
Arjun Malleswaran	42687d30fb	Merge pull request #19 from KeygraphHQ/additional-flags chore: added flag additions for minimizing logs	2025-12-09 10:33:36 -08:00
Khaushik-keygraph	ad0d1a04e9	chore: added flag additions for minimizing logs	2025-12-09 23:59:12 +05:30
Arjun Malleswaran	0d3812cdd2	Merge pull request #18 from KeygraphHQ/16-windows-defender-flags-benchmark-deliverables-as-backdoorphpperhetshell-during-local-use docs: add Windows Defender false positive guidance	2025-12-08 10:20:51 -08:00
ajmallesh	cecb64729f	docs: add Windows Defender false positive guidance Closes #16	2025-12-02 19:07:37 -08:00
ajmallesh	c7de6636d9	docs: update Discord invite links	2025-12-01 09:24:19 -08:00
ajmallesh	7c2edeb4c0	chore: change license to AGPL-3.0	2025-11-26 18:45:36 -08:00
ajmallesh	9d20d94dda	docs: clarify Shannon is a white-box pentesting tool - Add prominent callout that Shannon Lite is designed for white-box (source-available) application security testing - Update XBOW benchmark description to "hint-free, source-aware" - Clarify benchmark comparison context (white-box vs black-box results) - Update benchmark performance comparison image 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-24 12:37:55 -08:00
Khaushik-keygraph	a804c94834	chore: added licensing to dockerfile	2025-11-22 20:46:15 +05:30
keygraphVarun	20cdf0b026	fix link	2025-11-22 20:43:09 +05:30
keygraphVarun	7e0b2b28fe	cleanup	2025-11-22 20:43:09 +05:30
keygraphVarun	a52c1ab7c3	consistency on score	2025-11-22 20:43:09 +05:30
ajmallesh	719bf03293	fix: resolve Docker build failure and clarify env var configuration - Remove .env file with incorrect CLAUDE_CODE_MAX_TOKENS variable - Remove .env copy from Dockerfile that was causing build to fail - Update README to distinguish local (export) vs Docker (-e) env var usage - Add CLAUDE_CODE_MAX_OUTPUT_TOKENS to all Docker run examples The correct variable is CLAUDE_CODE_MAX_OUTPUT_TOKENS (not CLAUDE_CODE_MAX_TOKENS) and should be passed at runtime via -e flag for Docker or export for local runs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-19 10:28:44 -08:00
Khaushik-keygraph	23618f1fd1	fix: removed comments	2025-11-13 20:33:58 +05:30
keygraphVarun	68ec5ccc5a	style changes	2025-11-13 20:28:15 +05:30
keygraphVarun	f4f320dcb5	Link to benchmark	2025-11-13 20:27:26 +05:30
ajmallesh	614caa1787	chore: add licensing comments to prompts	2025-11-13 17:53:41 +05:30
ajmallesh	acc4a1b032	Update license references from BSL to MPL in documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-13 17:48:05 +05:30
Arjun Malleswaran	323720f3b0	Merge pull request #14 from KeygraphHQ/license-change License change	2025-11-13 16:57:18 +05:30
Arjun Malleswaran	98e79d0125	Update LICENSE	2025-11-13 16:56:19 +05:30
ajmallesh	e4eb59870a	chore: add MPL license comments	2025-11-13 16:55:13 +05:30
Arjun Malleswaran	6e7a7ec1cd	Update README.md	2025-11-04 08:47:18 -08:00
Arjun Malleswaran	b5c286fc80	Update README.md	2025-11-04 08:46:15 -08:00
ajmallesh	fe351604f9	Update README.md	2025-11-03 20:23:16 -08:00
ajmallesh	bfaffe89e6	Merge branch 'main' of github.com:KeygraphHQ/shannon	2025-11-03 20:22:27 -08:00
ajmallesh	5f24311a4e	Update README.md	2025-11-03 20:22:18 -08:00
Arjun Malleswaran	236c4d2a2f	Merge pull request #9 from KeygraphHQ/adding-xben-results Update README.md	2025-11-03 20:19:55 -08:00
ajmallesh	ce0d7b96c2	Update README.md	2025-11-03 20:16:08 -08:00
Arjun Malleswaran	b45e3e2844	Merge pull request #7 from KeygraphHQ/adding-xben-results Adding xben results	2025-11-03 20:04:45 -08:00
ajmallesh	a909572596	Update README.md	2025-11-03 20:04:21 -08:00
ajmallesh	bb4aa03dd1	docs: add benchmarks README	2025-11-03 20:03:06 -08:00
ajmallesh	abfc4eba82	Rename SQLi/Command Injection to Injection throughout README Consolidates SQL Injection and Command Injection references to the unified "Injection" terminology for consistency with agent naming and OWASP categorization. Changes: - Updated feature descriptions and vulnerability lists - Modified architecture diagrams - Simplified targeted vulnerability scope 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-03 16:56:40 -08:00
ajmallesh	d5b064e0c0	Add audit logs and update gitignore for xben results Updates .gitignore to only ignore top-level audit-logs/ directory, allowing xben-benchmark-results audit logs to be tracked. This enables full reproducibility of benchmark runs with complete session data, prompts, and agent execution logs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-03 16:29:56 -08:00
ajmallesh	e1f369b233	Add X-Bow benchmark performance visualization This commit adds a professional performance comparison chart showing Shannon's 96% success rate against other autonomous pentesting systems on the X-Bow benchmark. Chart features: - Y-axis properly starts at 0% (honest data visualization) - Shannon bar highlighted in brand orange - Descriptive title with sample size (104 challenges) - SVG format for scalability 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-03 12:34:55 -08:00
ajmallesh	ca5515c23c	Add X-Bow benchmark results (104 test cases) This commit adds comprehensive X-Bow (XBEN) benchmark results demonstrating Shannon's performance across 104 CTF security challenges. Each test case includes detailed penetration testing reports and exploitation evidence for reproducible research. Contents: - 104 XBEN test case directories (XBEN-001-24 through XBEN-104-24) - Deliverables including analysis reports and exploitation evidence - Individual test case results with vulnerability assessments 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-03 12:34:41 -08:00
ajmallesh	92db01bd2d	docs: add ctf-mode branch documentation to README Add a TIP callout in the Overview section documenting the ctf-mode branch for users who want to run Shannon against Capture-The-Flag challenges with optimized flag extraction prompts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-03 10:35:45 -08:00
ajmallesh	34850477a2	refactor: update injection display name and add max tokens docs - Change agent prefix from [SQLi/Cmd] to [Injection] to reflect expanded scope - Add README documentation for CLAUDE_CODE_MAX_OUTPUT_TOKENS environment variable This update aligns the display naming with the expanded injection analysis scope that now covers SQLi, Command Injection, LFI/RFI, SSTI, Path Traversal, and Insecure Deserialization vulnerabilities. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-03 10:21:17 -08:00
ajmallesh	d82d1fa753	feat: expand injection analysis scope to cover LFI/RFI/SSTI/Path Traversal/Deserialization Fixes responsibility gap where agents found vulnerabilities but rejected them as "out of scope" Changes: - vuln-injection.txt: Added LFI/RFI, SSTI, Path Traversal, Deserialization to scope - Updated role definition and objective - Added new vulnerability_type and slot_type enums - Added sink definitions and defense rules for new injection classes - Added witness payload examples - pre-recon-code.txt: Expanded sink hunter agent to find file/template/deserialize sinks - recon.txt: Updated Section 9 with clear injection source definitions for all types - exploit-injection.txt: Updated evidence template to handle all injection types Token-optimized: Condensed verbose sections while preserving critical guidance Addresses XBEN benchmark failures where LFI/SSTI/Path Traversal were detected but excluded from exploitation queues 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-03 10:20:15 -08:00
ajmallesh	0b9580a99a	feat: add environment variable support for Claude Code token limits Introduces .env file configuration to manage CLAUDE_CODE_MAX_TOKENS, allowing flexible control of the context window size for AI analysis sessions. This enables users to tune token limits based on their specific penetration testing needs without modifying code. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-30 10:53:42 -07:00
ajmallesh	cc36fe933d	fix: err handling for claude code session limit	2025-10-30 10:28:35 -07:00
ajmallesh	5b92ff52c4	chore: print audit logs folder location	2025-10-28 10:31:00 -07:00
ajmallesh	d8efd78ac0	Merge pull request #3 from KeygraphHQ/feature/improve-audit-log-naming Feature/improve audit log naming	2025-10-27 14:56:57 -07:00
ajmallesh	a099500d9b	Revert "feat: improve audit log naming with timestamp and app context" This reverts the timestamp-based naming scheme that was causing audit log fragmentation. Each agent execution was creating a new folder because the timestamp kept changing. Reverting back to simple, stable naming: {hostname}_{sessionId} This ensures ONE folder per session, preventing the bug where multiple folders were created for the same session. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-27 13:30:25 -07:00
ajmallesh	f0b8c3aa6e	fix: use session's original createdAt instead of current time Fixed bug where audit system would create duplicate folders for the same session because it was using current time instead of the session's original createdAt timestamp. Bug behavior: - Session created at T1 → folder: {T1}_app_host_id/ - Audit re-initialized at T2 → NEW folder: {T2}_app_host_id/ - Result: 2 folders per session with same ID but different timestamps Root cause: - metrics-tracker.js:65 was calling formatTimestamp() (current time) - Should use sessionMetadata.createdAt (original creation time) Impact: Each running benchmark was creating 2 audit log folders instead of 1 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-27 10:55:53 -07:00
ajmallesh	258830b030	feat: improve audit log naming with timestamp and app context Enhances audit log directory naming from `{hostname}_{uuid}` to `{timestamp}_{appName}_{hostname}_{shortId}` for better discoverability and benchmarking analysis. Changes: - Add extractAppName() helper to extract app name from config files - Add smart fallback: use port number for localhost without config - Update generateSessionIdentifier() to include timestamp prefix - Shorten session ID to first 8 characters for readability Examples: - With config: 20251025T193847Z_myapp_localhost_efc60ee0/ - Without config: 20251025T193913Z_8080_localhost_d47e3bfd/ - Remote: 20251024T004401Z_noconfig_example-com_d47e3bfd/ Benefits: - Chronologically sortable audit logs - Instant app identification in directory listings - Efficient filtering for benchmarking queries - Non-breaking: existing logs keep their names 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-27 10:14:19 -07:00
ajmallesh	d85b6af5f5	Merge pull request #2 from KeygraphHQ/fixing-bugs Fixing bugs	2025-10-23 18:18:21 -07:00
ajmallesh	f40f52f118	fix: enable Playwright MCP browser automation in Docker containers Resolves Playwright browser installation failures in Docker by using Wolfi's system Chromium instead of downloading Playwright's bundled browsers at runtime. ## Problem When running in Docker, agents attempted to install browsers via `browser_install` tool, which failed due to: - Permission issues (non-root user couldn't install system dependencies) - npx @playwright/mcp spawns with its own Playwright dependency separate from global installations - Playwright's bundled browsers require runtime download (~280MB) and glibc deps - Environment variables alone (PLAYWRIGHT_BROWSERS_PATH) weren't sufficient ## Solution Dockerfile changes: - Use Wolfi's native `chromium` package (guaranteed compatible, already installed) - Remove Playwright browser installation step (saves ~280MB and build time) - Add explicit `SHANNON_DOCKER=true` environment variable for reliable detection - Set PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH to point to system Chromium Code changes (claude-executor.js): - Detect Docker via `process.env.SHANNON_DOCKER` (more reliable than /.dockerenv) - Conditionally add `--executable-path /usr/bin/chromium-browser` CLI arg for Docker - Local: Use Playwright's bundled browsers (downloaded to ~/Library/Caches/) - Docker: Use system Chromium with no runtime downloads ## Research Findings - @playwright/mcp has separate playwright-core dependency (v1.56.0-alpha) - MCP server spawned via npx doesn't inherit browser binaries from global install - --executable-path CLI argument is required (env vars insufficient) - /.dockerenv file is unreliable (missing in BuildKit, K8s, can be spoofed) ## Testing ✅ Docker: All 5 parallel agents successfully navigate, screenshot, create deliverables ✅ Local: All 5 parallel agents successfully navigate, screenshot, create deliverables ✅ No browser_install calls, no permission errors ✅ Image size reduced by ~280MB Fixes #docker-playwright-browser-issues 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-23 17:56:19 -07:00
ajmallesh	f2870e3340	refactor: simplify pipeline testing report prompt by 78% Reduce prompts/pipeline-testing/report-executive.txt from 137 to 30 lines by: - Removing hardcoded detailed vulnerability content - Testing actual workflow (read → modify → save) instead of creating from scratch - Removing meta-commentary, keeping only direct instructions - Making it consistent with other pipeline testing prompts (30 lines like exploit agents) The prompt now properly mimics the real reporting agent behavior where the orchestration code stitches files first, then the agent modifies the result. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-10-23 17:13:25 -07:00

1 2

82 Commits