mirror of
https://github.com/KeygraphHQ/shannon.git
synced 2026-06-30 18:45:34 +02:00
Compare commits
10 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 5a2f78c5d9 | |||
| 7abcc1d3e1 | |||
| 4be4853fd3 | |||
| cb6cbf101d | |||
| 63ca5604a1 | |||
| 8fb62a59d6 | |||
| c259a34ed9 | |||
| 10b26355be | |||
| 82b5278541 | |||
| 8b956c9972 |
+21
-1
@@ -1,6 +1,9 @@
|
||||
# Shannon Environment Configuration
|
||||
# Copy this file to .env and fill in your credentials
|
||||
|
||||
# Recommended output token configuration for larger tool outputs
|
||||
CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000
|
||||
|
||||
# Adaptive thinking is enabled automatically on Opus 4.6/4.7/4.8. Set to false to disable.
|
||||
# CLAUDE_ADAPTIVE_THINKING=false
|
||||
|
||||
@@ -26,7 +29,7 @@ ANTHROPIC_API_KEY=your-api-key-here
|
||||
# Model Tier Overrides (Anthropic API / OAuth / Custom Base URL / Bedrock)
|
||||
# =============================================================================
|
||||
# Override which model is used for each tier. Defaults are used if not set.
|
||||
# Optional for direct Anthropic and custom base URL modes. Required for Bedrock.
|
||||
# Optional for direct Anthropic and custom base URL modes. Required for Bedrock/Vertex.
|
||||
# ANTHROPIC_SMALL_MODEL=... # Small tier (default: claude-haiku-4-5-20251001)
|
||||
# ANTHROPIC_MEDIUM_MODEL=... # Medium tier (default: claude-sonnet-4-6)
|
||||
# ANTHROPIC_LARGE_MODEL=... # Large tier (default: claude-opus-4-8)
|
||||
@@ -44,3 +47,20 @@ ANTHROPIC_API_KEY=your-api-key-here
|
||||
# CLAUDE_CODE_USE_BEDROCK=1
|
||||
# AWS_REGION=us-east-1
|
||||
# AWS_BEARER_TOKEN_BEDROCK=your-bearer-token
|
||||
|
||||
# =============================================================================
|
||||
# OPTION 4: Google Vertex AI
|
||||
# =============================================================================
|
||||
# https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-partner-models
|
||||
# Requires a GCP service account with roles/aiplatform.user.
|
||||
# Download the SA key JSON from GCP Console (IAM > Service Accounts > Keys).
|
||||
# Requires the model tier overrides above to be set with Vertex AI model IDs.
|
||||
# Example Vertex AI model IDs:
|
||||
# ANTHROPIC_SMALL_MODEL=claude-haiku-4-5@20251001
|
||||
# ANTHROPIC_MEDIUM_MODEL=claude-sonnet-4-6
|
||||
# ANTHROPIC_LARGE_MODEL=claude-opus-4-8
|
||||
|
||||
# CLAUDE_CODE_USE_VERTEX=1
|
||||
# CLOUD_ML_REGION=us-east5
|
||||
# ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project-id
|
||||
# GOOGLE_APPLICATION_CREDENTIALS=./credentials/google-sa-key.json
|
||||
|
||||
@@ -4,7 +4,7 @@ AI-powered penetration testing agent for defensive security analysis. Automates
|
||||
|
||||
## Commands
|
||||
|
||||
**Prerequisites:** Docker, AI provider credentials (`.env` for local, `shn setup` or env vars for npx)
|
||||
**Prerequisites:** Docker, AI provider credentials (`.env` for local, `npx @keygraph/shannon setup` or env vars for npx)
|
||||
|
||||
### Dual CLI
|
||||
|
||||
@@ -15,8 +15,8 @@ Shannon supports two CLI modes, auto-detected based on the current working direc
|
||||
| **Install** | Zero-install via npm | Clone the repo |
|
||||
| **Image** | Pulled from Docker Hub (`keygraph/shannon:latest`) | Built locally (`shannon-worker`) |
|
||||
| **State** | `~/.shannon/` | Project directory |
|
||||
| **Credentials** | `~/.shannon/config.toml` (via `shn setup`) or env vars | `./.env` |
|
||||
| **Config** | `~/.shannon/config.toml` (via `shn setup`) | N/A |
|
||||
| **Credentials** | `~/.shannon/config.toml` (via `npx @keygraph/shannon setup`) or env vars | `./.env` |
|
||||
| **Config** | `~/.shannon/config.toml` (via `npx @keygraph/shannon setup`) | N/A |
|
||||
| **Prompts** | Bundled in Docker image | Mounted from `./apps/worker/prompts/` (live-editable) |
|
||||
|
||||
Mode auto-detection: local mode activates when env var `SHANNON_LOCAL=1` is set by the `./shannon` entry point (`apps/cli/src/mode.ts`). Otherwise npx mode.
|
||||
@@ -122,7 +122,7 @@ Infra (Temporal) runs via `docker-compose.yml`. Workers are ephemeral `docker ru
|
||||
- `apps/worker/src/paths.ts` — Centralized path constants (`PROMPTS_DIR`, `CONFIGS_DIR`, `WORKSPACES_DIR`)
|
||||
- `apps/worker/src/session-manager.ts` — Agent definitions (`AGENTS` record). Agent types in `apps/worker/src/types/agents.ts`
|
||||
- `apps/worker/src/config-parser.ts` — YAML config parsing with JSON Schema validation
|
||||
- `apps/worker/src/ai/pi-executor.ts` — pi harness integration (retry disabled; Temporal owns retry)
|
||||
- `apps/worker/src/ai/claude-executor.ts` — Claude Agent SDK integration with retry logic
|
||||
- `apps/worker/src/services/` — Business logic layer (Temporal-agnostic). Activities delegate here. Key: `agent-execution.ts`, `error-handling.ts`, `container.ts`
|
||||
- `apps/worker/src/types/` — Consolidated types: `Result<T,E>`, `ErrorCode`, `AgentName`, `ActivityLogger`, etc.
|
||||
- `apps/worker/src/utils/` — Shared utilities (file I/O, formatting, concurrency)
|
||||
@@ -145,9 +145,9 @@ Durable workflow orchestration with crash recovery, queryable progress, intellig
|
||||
5. **Reporting** (`report`) — Executive-level security report
|
||||
|
||||
### Supporting Systems
|
||||
- **Configuration** — YAML configs in `apps/worker/configs/` with JSON Schema validation (`config-schema.json`). Supports auth settings (MFA/TOTP), URL/code rule scoping (`rules.avoid`/`rules.focus`), run-scope steering (`vuln_classes`, `exploit`), free-form `rules_of_engagement`, and post-hoc `report` filters (`min_severity`, `min_confidence`, `guidance`). `code_path` avoid rules are enforced via the `@gotgenes/pi-permission-system` extension: `apps/worker/src/temporal/activities.ts:syncCodePathDenyRules` writes a global `path` deny config once per workflow (`apps/worker/src/ai/settings-writer.ts:writeCodePathPermissionConfig`), and the executor loads the extension when that config is present (`apps/worker/src/ai/pi-executor.ts`), so denies fire across every tool and child `task` session. `vuln_classes`/`exploit` scope is locked into `session.json` on first run; resumes with a different scope fail fast (`persistOrValidateRunScope`). Credential resolution — local mode: env vars → `./.env`; npx mode: env vars → `~/.shannon/config.toml` (via `shn setup`)
|
||||
- **Configuration** — YAML configs in `apps/worker/configs/` with JSON Schema validation (`config-schema.json`). Supports auth settings (MFA/TOTP), URL/code rule scoping (`rules.avoid`/`rules.focus`), run-scope steering (`vuln_classes`, `exploit`), free-form `rules_of_engagement`, and post-hoc `report` filters (`min_severity`, `min_confidence`, `guidance`). `code_path` avoid rules are written into `~/.claude/settings.json` `permissions.deny` (`Read`/`Edit`) once per workflow by `apps/worker/src/temporal/activities.ts:syncCodePathDenyRules` so the SDK enforces them at the tool layer even in `bypassPermissions` mode. `vuln_classes`/`exploit` scope is locked into `session.json` on first run; resumes with a different scope fail fast (`persistOrValidateRunScope`). Credential resolution — local mode: env vars → `./.env`; npx mode: env vars → `~/.shannon/config.toml` (via `npx @keygraph/shannon setup`)
|
||||
- **Prompts** — Per-phase templates in `apps/worker/prompts/` with variable substitution (`{{TARGET_URL}}`, `{{CONFIG_CONTEXT}}`). Shared partials in `apps/worker/prompts/shared/` via `apps/worker/src/services/prompt-manager.ts`, including `_code-path-rules.txt` (focus/avoid `[FILE]`/`[GLOB]` routing) and `_rules-of-engagement.txt` (free-text engagement rules). When `exploit: false`, `apps/worker/src/services/findings-renderer.ts` deterministically converts each `*_exploitation_queue.json` into a `*_findings.md` for report assembly — no LLM in the loop
|
||||
- **Agent Harness (pi)** — Uses the **pi harness** (`@earendil-works/pi-coding-agent`, requires Node ≥ 22.19) via `apps/worker/src/ai/pi-executor.ts` (`runPiPrompt` → `createAgentSession`, retry disabled so Temporal owns retry). Models resolve through pi-ai in `apps/worker/src/ai/models.ts` (Anthropic / Bedrock / custom base URL via `ModelRegistry`+`AuthStorage`). pi ships no JSON-schema output or `Task`/`TodoWrite` built-ins, so structured queues are captured via a `submit_exploitation_queue` custom tool (`apps/worker/src/ai/queue-schemas.ts`), and `task` (read-only child sessions) + `todo_write` are provided as custom tools (`apps/worker/src/ai/tools.ts`); the per-phase MCP collectors are pi custom tools (TypeBox `defineTool` in `apps/worker/src/mcp-server/`). Adaptive thinking (pi's `medium` level) is enabled only on Opus 4.6/4.7/4.8 (`supportsAdaptiveThinking`); every other model runs with thinking `off`. Disable per-scan via `CLAUDE_ADAPTIVE_THINKING=false` (→ `off`) / `core.adaptive_thinking = false` (npx TOML). Browser automation via `playwright-cli` with session isolation (`-s=<session>`). TOTP generation via `generate-totp` CLI tool. Login flow template at `apps/worker/prompts/shared/login-instructions.txt` supports form, SSO, API, and basic auth. On authenticated whitebox scans, the `validate-authentication` preflight performs the single real login and saves the browser session to `auth-state.json` in the per-session audit directory (path from `authStateFile()` in `apps/worker/src/audit/utils.ts`, derived from `generateAuditPath()`). The validation activity (`apps/worker/src/services/validate-authentication.ts`) removes any stale file from a prior run before the agent runs and verifies the file parses and contains cookies or storage before the preflight is marked complete; `logWorkflowComplete` deletes it when the workflow ends so authenticated cookies don't sit on disk between scans. Agent prompts opt in to session reuse by `@include(shared/_shared-session.txt)` before their `<login_instructions>` block — the partial restores the session and falls through to the full login flow if verification fails. `vuln-auth`/`exploit-auth` omit the include and own their own login
|
||||
- **SDK Integration** — Uses `@anthropic-ai/claude-agent-sdk` with `maxTurns: 10_000` and `bypassPermissions` mode. Adaptive thinking is enabled by default on Opus 4.6/4.7/4.8 (`supportsAdaptiveThinking` in `apps/worker/src/ai/models.ts`); disable per-scan via `CLAUDE_ADAPTIVE_THINKING=false` (env) or `core.adaptive_thinking = false` (npx TOML). Browser automation via `playwright-cli` with session isolation (`-s=<session>`). TOTP generation via `generate-totp` CLI tool. Login flow template at `apps/worker/prompts/shared/login-instructions.txt` supports form, SSO, API, and basic auth. On authenticated whitebox scans, the `validate-authentication` preflight performs the single real login and saves the browser session to `auth-state.json` in the per-session audit directory (path from `authStateFile()` in `apps/worker/src/audit/utils.ts`, derived from `generateAuditPath()`). The validation activity (`apps/worker/src/services/validate-authentication.ts`) removes any stale file from a prior run before the agent runs and verifies the file parses and contains cookies or storage before the preflight is marked complete; `logWorkflowComplete` deletes it when the workflow ends so authenticated cookies don't sit on disk between scans. Agent prompts opt in to session reuse by `@include(shared/_shared-session.txt)` before their `<login_instructions>` block — the partial restores the session and falls through to the full login flow if verification fails. `vuln-auth`/`exploit-auth` omit the include and own their own login
|
||||
- **Audit System** — Crash-safe append-only logging in `workspaces/{hostname}_{sessionId}/`. Tracks session metrics, per-agent logs, prompts, and deliverables. WorkflowLogger (`apps/worker/src/audit/workflow-logger.ts`) provides unified human-readable per-workflow logs, backed by LogStream (`apps/worker/src/audit/log-stream.ts`) shared stream primitive
|
||||
- **Deliverables** — Saved to `deliverables/` in the target repo via the `save-deliverable` CLI script (`apps/worker/src/scripts/save-deliverable.ts`)
|
||||
- **Workspaces & Resume** — Named workspaces via `-w <name>` or auto-named from URL+timestamp. Resume detects completed agents via `session.json`. `loadResumeState()` in `apps/worker/src/temporal/activities.ts` validates deliverable existence, restores git checkpoints, and cleans up incomplete deliverables. Workspace listing via `apps/worker/src/temporal/workspaces.ts`
|
||||
@@ -168,7 +168,7 @@ Durable workflow orchestration with crash recovery, queryable progress, intellig
|
||||
### Key Design Patterns
|
||||
- **Configuration-Driven** — YAML configs with JSON Schema validation
|
||||
- **Progressive Analysis** — Each phase builds on previous results
|
||||
- **Harness-First** — the pi harness (`@earendil-works/pi-coding-agent`) handles autonomous analysis
|
||||
- **SDK-First** — Claude Agent SDK handles autonomous analysis
|
||||
- **Modular Error Handling** — `ErrorCode` enum, `Result<T,E>` for explicit error propagation, automatic retry (3 attempts per agent)
|
||||
- **Services Boundary** — Activities are thin Temporal wrappers; `apps/worker/src/services/` owns business logic, accepts `ActivityLogger`, returns `Result<T,E>`. No Temporal imports in services
|
||||
- **DI Container** — Per-workflow in `apps/worker/src/services/container.ts`. `AuditSession` excluded (parallel safety)
|
||||
@@ -228,7 +228,7 @@ Comments must be **timeless** — no references to this conversation, refactorin
|
||||
|
||||
**Entry Points:** `apps/worker/src/temporal/workflows.ts`, `apps/worker/src/temporal/activities.ts`, `apps/worker/src/temporal/worker.ts`
|
||||
|
||||
**Core Logic:** `apps/worker/src/session-manager.ts`, `apps/worker/src/ai/pi-executor.ts`, `apps/worker/src/ai/settings-writer.ts` (writes `code_path` deny rules to the `@gotgenes/pi-permission-system` global config), `apps/worker/src/config-parser.ts`, `apps/worker/src/services/` (incl. `preflight.ts`, `findings-renderer.ts`, `reporting.ts`), `apps/worker/src/audit/`
|
||||
**Core Logic:** `apps/worker/src/session-manager.ts`, `apps/worker/src/ai/claude-executor.ts`, `apps/worker/src/ai/settings-writer.ts` (writes `code_path` deny rules to `~/.claude/settings.json`), `apps/worker/src/config-parser.ts`, `apps/worker/src/services/` (incl. `preflight.ts`, `findings-renderer.ts`, `reporting.ts`), `apps/worker/src/audit/`
|
||||
|
||||
**Config:** `docker-compose.yml`, `apps/cli/infra/compose.yml`, `apps/worker/configs/`, `apps/worker/prompts/`, `tsconfig.base.json` (shared compiler options), `turbo.json`, `biome.json`
|
||||
|
||||
|
||||
+1
-1
@@ -91,7 +91,7 @@ COPY --from=builder /app/node_modules /app/node_modules
|
||||
COPY --from=builder /app/apps/worker /app/apps/worker
|
||||
COPY --from=builder /app/apps/cli/package.json /app/apps/cli/package.json
|
||||
|
||||
RUN npm install -g --ignore-scripts @playwright/cli@0.1.1
|
||||
RUN npm install -g --ignore-scripts @anthropic-ai/claude-code@2.1.84 @playwright/cli@0.1.1
|
||||
RUN mkdir -p /tmp/.claude/skills && \
|
||||
playwright-cli install --skills && \
|
||||
cp -r .claude/skills/playwright-cli /tmp/.claude/skills/ && \
|
||||
|
||||
@@ -1,9 +1,9 @@
|
||||
>[!NOTE]
|
||||
> **[Better Steerability, Authentication Improvements, and the Migration to the Pi Harness](https://github.com/KeygraphHQ/shannon/discussions/348)**
|
||||
> [!NOTE]
|
||||
> **[Shannon Now Runs on the Pi Harness (Beta) - run it today with `npx @keygraph/shannon@beta`](https://github.com/KeygraphHQ/shannon/discussions/358)**
|
||||
|
||||
<div align="center">
|
||||
|
||||
<img src="./assets/github-banner.png" alt="Shannon - AI Pentester for Web Applications and APIs" width="100%">
|
||||
<img src="./assets/github-banner.png" alt="Shannon - AI Pentester by Keygraph" width="100%">
|
||||
|
||||
# Shannon - AI Pentester by Keygraph
|
||||
|
||||
@@ -12,6 +12,8 @@
|
||||
Shannon is an autonomous, white-box AI pentester for web applications and APIs. <br />
|
||||
It analyzes your source code, identifies attack paths, and executes real exploits to prove vulnerabilities before they reach production.
|
||||
|
||||
**This repository is Shannon Open Source: the full agent, run locally from your command line.**
|
||||
|
||||
---
|
||||
|
||||
<a href="https://discord.gg/9ZqQPuhJB7"><img src="./assets/discord.png" height="40" alt="Join Discord"></a>
|
||||
@@ -26,45 +28,38 @@ It analyzes your source code, identifies attack paths, and executes real exploit
|
||||
## Table of Contents
|
||||
|
||||
- [What is Shannon?](#what-is-shannon)
|
||||
- [Product Line](#product-line)
|
||||
- [Shannon Lite in Action](#shannon-lite-in-action)
|
||||
- [Shannon in Action](#shannon-in-action)
|
||||
- [Quick Start](#quick-start)
|
||||
- [Key Capabilities](#key-capabilities)
|
||||
- [Shannon Lite and Shannon Pro](#shannon-lite-and-shannon-pro)
|
||||
- [Editions](#editions)
|
||||
- [Architecture](#architecture)
|
||||
- [Documentation](#documentation)
|
||||
- [Safety, Scope, and Limitations](#safety-scope-and-limitations)
|
||||
- [License and Enterprise Licensing](#license-and-enterprise-licensing)
|
||||
- [About Keygraph](#about-keygraph)
|
||||
- [Community and Support](#community-and-support)
|
||||
|
||||
## What is Shannon?
|
||||
|
||||
Shannon is an AI pentester developed by [Keygraph](https://keygraph.io). It performs white-box security testing of web applications and their underlying APIs by combining source-code analysis with live exploitation.
|
||||
Shannon is an autonomous AI pentester developed by [Keygraph](https://keygraph.io). It performs white-box security testing of web applications and their underlying APIs by combining source-code analysis with live exploitation.
|
||||
|
||||
Shannon analyzes your web application's source code to identify potential attack vectors, then uses browser automation and command-line tools to execute real exploits against the running application and its APIs. Only vulnerabilities with a working proof-of-concept are included in the final report.
|
||||
|
||||
Shannon is the agent. This repository is Shannon Open Source, the standalone pentester you run yourself. The same Shannon also powers the [Keygraph platform](https://keygraph.io), Keygraph's commercial pentesting product. See [Editions](#editions) for how the two compare.
|
||||
|
||||
### Why Shannon Exists
|
||||
|
||||
Thanks to tools like Claude Code and Cursor, your team ships code non-stop. But your penetration test? That happens once a year. This creates a massive security gap. For the other 364 days, you could be unknowingly shipping vulnerabilities to production.
|
||||
|
||||
Shannon closes that gap by providing on-demand, automated penetration testing that can run against every build or release.
|
||||
|
||||
## Product Line
|
||||
|
||||
Shannon is developed by [Keygraph](https://keygraph.io) and available in two editions:
|
||||
|
||||
| Edition | License | Best For |
|
||||
| --- | --- | --- |
|
||||
| **Shannon Lite** | AGPL-3.0 | Local, strictly white-box testing of applications you own or are authorized to test. |
|
||||
| **Shannon Pro** | Commercial | Organizations needing a continuous pentesting and AppSec platform with black-box and white-box pentesting, parsed-code SAST, CI/CD gating, verified remediation, SLA tracking, and enterprise deployment. |
|
||||
|
||||
## Shannon Lite in Action
|
||||
## Shannon in Action
|
||||
|
||||
<p align="center">
|
||||
<img src="assets/shannon-action.gif" alt="Shannon Lite running an autonomous pentest" width="100%">
|
||||
<img src="assets/shannon-action.gif" alt="Shannon running an autonomous pentest" width="100%">
|
||||
</p>
|
||||
|
||||
Sample Shannon Lite penetration test reports from intentionally vulnerable applications:
|
||||
Sample penetration test reports from intentionally vulnerable applications, produced by Shannon Open Source:
|
||||
|
||||
| Target | Summary | Report |
|
||||
| --- | --- | --- |
|
||||
@@ -76,14 +71,14 @@ Sample Shannon Lite penetration test reports from intentionally vulnerable appli
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- **Docker** - required for the worker container.
|
||||
- **Node.js 18+** - required for the recommended `npx` workflow.
|
||||
- **AI provider credentials** - Anthropic is recommended; AWS Bedrock and compatible proxy setups are documented separately.
|
||||
- **Docker**: required for the worker container.
|
||||
- **Node.js 18+**: required for the recommended `npx` workflow.
|
||||
- **AI provider credentials**: Anthropic is recommended. AWS Bedrock, Google Vertex AI, and compatible proxy setups are documented separately.
|
||||
|
||||
### Run Shannon Lite
|
||||
### Run Shannon
|
||||
|
||||
> [!WARNING]
|
||||
> Shannon Lite actively executes exploits. Run it only against applications and environments you own or have explicit written authorization to test. Do not run Shannon Lite against production systems.
|
||||
> Shannon actively executes exploits. Run it only against applications and environments you own or have explicit written authorization to test. Do not run Shannon against production systems.
|
||||
|
||||
```bash
|
||||
# Configure credentials with the interactive wizard.
|
||||
@@ -93,52 +88,49 @@ npx @keygraph/shannon setup
|
||||
npx @keygraph/shannon start -u https://your-app.com -r /path/to/your-repo
|
||||
```
|
||||
|
||||
Shannon Lite pulls the worker image from Docker Hub, starts the required local infrastructure, mounts the target repository read-only inside an ephemeral worker container, and writes results to a local workspace.
|
||||
Shannon pulls the worker image from Docker Hub, starts the required local infrastructure, mounts the target repository read-only inside an ephemeral worker container, and writes results to a local workspace.
|
||||
|
||||
For source builds, authenticated scans, provider-specific setup, and platform notes, see [Documentation](#documentation).
|
||||
|
||||
## Key Capabilities
|
||||
|
||||
- **Proof-by-exploitation reports**: Shannon Lite reports validated findings with reproducible proof-of-concept steps instead of speculative warnings.
|
||||
- **White-box attack planning**: Shannon Lite uses source-code analysis to guide dynamic testing and focus on realistic attack paths.
|
||||
- **Autonomous execution**: Shannon Lite launches reconnaissance, vulnerability analysis, exploitation, and report generation from a single command.
|
||||
- **Authenticated testing**: Shannon Lite configuration files can describe login flows, test credentials, TOTP, email-based login flows, focus areas, and rules of engagement.
|
||||
- **OWASP-focused coverage**: Shannon Lite targets exploitable Injection, XSS, SSRF, Broken Authentication, and Broken Authorization issues.
|
||||
- **Resumable workspaces**: Shannon Lite can resume interrupted runs without re-running completed agents.
|
||||
- **Proof-by-exploitation reports**: Shannon reports validated findings with reproducible proof-of-concept steps instead of speculative warnings.
|
||||
- **White-box attack planning**: Shannon uses source-code analysis to guide dynamic testing and focus on realistic attack paths.
|
||||
- **Autonomous execution**: Shannon launches reconnaissance, vulnerability analysis, exploitation, and report generation from a single command.
|
||||
- **Authenticated testing**: configuration files can describe login flows, test credentials, TOTP, email-based login flows, focus areas, and rules of engagement.
|
||||
- **OWASP-focused coverage**: Shannon targets exploitable Injection, XSS, SSRF, Broken Authentication, and Broken Authorization issues.
|
||||
- **Resumable workspaces**: Shannon can resume interrupted runs without re-running completed agents.
|
||||
|
||||
## Shannon Lite and Shannon Pro
|
||||
## Editions
|
||||
|
||||
This repository contains **Shannon Lite**, the AGPL-3.0 open-source CLI for strictly white-box, proof-by-exploitation testing of web applications and APIs you own or are authorized to test. Shannon Lite requires access to the target application's source code and repository layout.
|
||||
Shannon ships in two ways: **Shannon Open Source**, the pentester you run yourself, and the **Keygraph platform**, the commercial pentesting product that runs Shannon continuously and closes the full AppSec lifecycle around it.
|
||||
|
||||
**Shannon Pro** is Keygraph's commercial continuous pentesting and AppSec platform for teams running security across many repositories, services, and environments. While Shannon Lite is a local white-box pentesting CLI, Shannon Pro is a full platform: it combines parsed-code SAST, source-to-sink analysis, black-box and white-box agentic pentesting, verified remediation, CI/CD gating, SLA tracking, and reporting for security and compliance teams.
|
||||
**Shannon Open Source** (this repository) is the standalone pentester: a CLI agent for white-box, proof-by-exploitation testing of web applications and APIs you own or are authorized to test. It reads your source, plans attacks, executes real exploits, and reports only what it can prove. It runs on demand and is complete in that lane. You point it at a target, it pentests, it reports.
|
||||
|
||||
Shannon Pro supports both **white-box and black-box agentic pentesting**: use source-aware testing when code is available, or run autonomous black-box testing against deployed applications and APIs when source access is unavailable or unnecessary.
|
||||
The **Keygraph platform** is the enterprise-ready, continuous pentesting product powered by Shannon. In the Keygraph platform, an enhanced build of Shannon runs continuously in a hardened, orchestrated environment fed by Keygraph's full code-analysis stack. Around that engine, the platform closes the entire vulnerability lifecycle, from analysis to a verified fix:
|
||||
|
||||
Shannon Pro covers the full vulnerability lifecycle: finding exploitable issues, deduplicating and prioritizing them, syncing work into developer workflows, generating verified remediations, re-testing fixes, tracking SLAs, and producing dashboards for security reporting and compliance.
|
||||
- **Analyze**: Code Property Graph SAST, SCA with reachability, secrets, IaC, and container scanning. First-class detection in their own right, and context that sharpens Shannon's attacks.
|
||||
- **Prove**: autonomous black-box and source-aware white-box pentests turn candidate findings into proven, exploited vulnerabilities rather than speculative alerts.
|
||||
- **Manage**: one canonical record per vulnerability per repository, deduplicated across every source, with ownership, status, SLA tracking, dashboards, and bidirectional Jira sync.
|
||||
- **Remediate and verify**: patches written automatically and re-tested against the patched code before delivery, landing in your existing review workflow rather than auto-applied.
|
||||
- **Deploy**: self-hosted and air-gapped environments, strict bring-your-own-key model access, and customer-controlled LLM gateway patterns, so source, results, and model traffic stay inside your perimeter.
|
||||
|
||||
For enterprise deployments, Shannon Pro supports self-hosted and air-gapped environments, strict bring-your-own-key model access, and customer-controlled LLM gateway patterns. Deployments can be designed so source code, scan results, prompts, completions, and model traffic remain inside your security perimeter.
|
||||
Shannon is the proof engine at the center of the Keygraph platform. Shannon Open Source gives you that engine to run yourself. The Keygraph platform surrounds Shannon with continuous analysis, finding management, remediation, verification, and enterprise deployment.
|
||||
|
||||
Shannon Lite is a strong fit for local and project-level white-box testing. Shannon Pro is intended for organizations that need continuous AppSec coverage, black-box and white-box pentesting, centralized triage, verified remediation workflows, compliance-ready reporting, enterprise integrations, and commercial support.
|
||||
|
||||
| Need | Shannon Lite | Shannon Pro |
|
||||
| AppSec lifecycle stage | Shannon Open Source | Keygraph platform |
|
||||
| --- | --- | --- |
|
||||
| License | AGPL-3.0 | Commercial |
|
||||
| White-box pentesting | Yes; source code required | Yes; source-aware testing with platform workflows |
|
||||
| Black-box pentesting | No | Yes; autonomous testing without source-code access |
|
||||
| Code analysis / SAST | Prompting and source pass-through to guide pentesting | Actual code parsing, Code Property Graph analysis, source-to-sink path analysis, and agentic SAST |
|
||||
| AppSec coverage | OWASP-focused agentic pentesting | Agentic pentesting, SAST, SCA, secrets, IaC, containers, and business logic testing |
|
||||
| CI/CD and gating | Manual/local CLI runs | Headless commercial CLI for CI/CD gating across enterprise CI/CD platforms |
|
||||
| Finding lifecycle | Local Markdown reports | Canonical findings, deduplication, ownership, status, SLA tracking, workflow sync, and reporting dashboards |
|
||||
| Remediation | Manual | User-initiated remediation with verification before delivery |
|
||||
| Fix verification | None; manual reruns only | Targeted verification without rerunning the entire scan, completing the remediation lifecycle |
|
||||
| Enterprise deployment | Local CLI and Docker worker | Self-hosted, air-gapped, BYOK, and customer-controlled LLM gateway options |
|
||||
| Support | Community | Commercial support |
|
||||
| Analyze | Basic LLM pass-through of source to plan attacks | Actual code-base parsing, plus Code Property Graph, SAST, SCA with reachability, secrets, IaC, and containers |
|
||||
| Pentest and prove | White-box only, proof by exploitation | Enhanced white-box, plus black-box and grey-box modes, run continuously |
|
||||
| Manage findings | Local Markdown report | Canonical findings system: deduplication across sources, ownership, SLA, dashboards, Jira sync, and professional pentest-grade PDF reports |
|
||||
| Remediate and verify | Fix manually from the report, then re-run the full scan to verify | Automated remediation: opens a PR with the fix, verified by point re-test without re-running the full scan |
|
||||
| Deploy and operate | Local CLI and Docker worker | Self-hosted, air-gapped, BYOK, continuous, enterprise integrations |
|
||||
| License and support | AGPL-3.0, community | Commercial, supported |
|
||||
|
||||
Learn more on the [Keygraph website](https://keygraph.io), read the [Shannon Pro technical overview](docs/shannon-pro.md), start a free trial or book a [Shannon Pro demo](https://cal.com/team/keygraph/shannon-pro), or contact [shannon@keygraph.io](mailto:shannon@keygraph.io).
|
||||
Learn more on the [Keygraph website](https://keygraph.io), read the [Keygraph platform technical overview](docs/keygraph-platform.md), start a free trial or book a [demo](https://cal.com/team/keygraph/shannon-pro), or contact [shannon@keygraph.io](mailto:shannon@keygraph.io).
|
||||
|
||||
## Architecture
|
||||
|
||||
Shannon Lite uses a multi-agent workflow that combines source-code analysis with live exploitation:
|
||||
Shannon uses a multi-agent workflow that combines source-code analysis with live exploitation:
|
||||
|
||||
```text
|
||||
┌──────────────────────┐
|
||||
@@ -194,37 +186,41 @@ Use these guides for operational detail:
|
||||
| --- | --- |
|
||||
| [Source build and CLI commands](docs/development.md) | Cloning, building, common commands, output paths, and local development. |
|
||||
| [Configuration](docs/configuration.md) | Authenticated testing, login flows, rules of engagement, report filters, and rate-limit settings. |
|
||||
| [AI providers](docs/ai-providers.md) | Anthropic, AWS Bedrock, and custom Anthropic-compatible endpoints. |
|
||||
| [AI providers](docs/ai-providers.md) | Anthropic, AWS Bedrock, Google Vertex AI, and custom Anthropic-compatible endpoints. |
|
||||
| [Platforms and networking](docs/platforms.md) | Windows/WSL2, Linux, macOS, Docker networking, local apps, and custom hostnames. |
|
||||
| [Workspaces and resuming](docs/workspaces.md) | Naming workspaces, resuming interrupted scans, and workspace storage. |
|
||||
| [Safety and limitations](docs/safety.md) | Authorized-use requirements, non-production guidance, mutative effects, cost, and model caveats. |
|
||||
| [Coverage and roadmap](docs/coverage-roadmap.md) | Current vulnerability coverage and planned work. |
|
||||
| [Shannon Pro](docs/shannon-pro.md) | Commercial platform, black-box and white-box pentesting, full lifecycle workflows, and enterprise deployment. |
|
||||
| [Keygraph platform](docs/keygraph-platform.md) | The continuous, agentic pentesting platform: code analysis, black-box and white-box testing, finding management, remediation, verification, and enterprise deployment. |
|
||||
|
||||
## Safety, Scope, and Limitations
|
||||
|
||||
Shannon Lite is not a passive scanner. Its exploitation agents can create users, submit forms, mutate application state, trigger outbound requests, and otherwise affect the target system. Use sandboxed, staging, or local development environments with disposable data.
|
||||
Shannon is not a passive scanner. Its exploitation agents can create users, submit forms, mutate application state, trigger outbound requests, and otherwise affect the target system. Use sandboxed, staging, or local development environments with disposable data.
|
||||
|
||||
You are responsible for using Shannon Lite legally and ethically. Do not point Shannon Lite at systems, repositories, or applications you do not own or do not have explicit authorization to test.
|
||||
You are responsible for using Shannon legally and ethically. Do not point Shannon at systems, repositories, or applications you do not own or do not have explicit authorization to test.
|
||||
|
||||
Important limitations:
|
||||
|
||||
- Shannon Lite focuses on actively exploitable issues such as Injection, XSS, SSRF, Broken Authentication, and Broken Authorization. Broader static-analysis findings, including vulnerable dependencies and insecure configurations, are a core focus of Shannon Pro.
|
||||
- Shannon Open Source focuses on actively exploitable issues such as Injection, XSS, SSRF, Broken Authentication, and Broken Authorization. Broader static-analysis coverage, including vulnerable dependencies and insecure configurations, is delivered through the Keygraph platform.
|
||||
- Findings still require human review. LLM-generated reports can contain weakly supported or incorrect details.
|
||||
- Shannon Lite is officially supported with Claude models. Smaller, alternative, or proxied non-Claude models may be incomplete or unstable.
|
||||
- Shannon is officially supported with Claude models. Smaller, alternative, or proxied non-Claude models may be incomplete or unstable.
|
||||
- A full run can take roughly 1 to 1.5 hours and may incur LLM API costs depending on model pricing and application complexity.
|
||||
- Do not scan untrusted or adversarial codebases; AI-powered tools that read source code can be exposed to prompt injection.
|
||||
- Do not scan untrusted or adversarial codebases. AI-powered tools that read source code can be exposed to prompt injection.
|
||||
|
||||
Read the full [Safety and limitations](docs/safety.md) guide before running Shannon Lite in a new environment.
|
||||
Read the full [Safety and limitations](docs/safety.md) guide before running Shannon in a new environment.
|
||||
|
||||
## License and Enterprise Licensing
|
||||
|
||||
Shannon Lite is licensed under the [GNU Affero General Public License v3.0](LICENSE).
|
||||
Shannon Open Source is licensed under the [GNU Affero General Public License v3.0](LICENSE).
|
||||
|
||||
Commercial and enterprise licensing is available for organizations that need different license terms, commercial support, private redistribution, managed-service use, or broader deployment options.
|
||||
Commercial and enterprise licensing is available for organizations that need different license terms, commercial support, private redistribution, managed-service use, or broader deployment options, including the Keygraph platform.
|
||||
|
||||
For commercial licensing, contact [shannon@keygraph.io](mailto:shannon@keygraph.io).
|
||||
|
||||
## About Keygraph
|
||||
|
||||
**Keygraph** is the company behind Shannon. It also builds the **Keygraph platform**, the commercial agentic pentesting product that closes the full AppSec lifecycle and runs an enhanced build of Shannon as its pentesting engine.
|
||||
|
||||
## Community and Support
|
||||
|
||||
**Community office hours** are available for hands-on help with bugs, deployments, and configuration questions.
|
||||
@@ -233,7 +229,7 @@ For commercial licensing, contact [shannon@keygraph.io](mailto:shannon@keygraph.
|
||||
- Asia: Thursday, 2:00 PM IST
|
||||
- [Book a slot](https://cal.com/george-flores-keygraph/shannon-community-office-hours)
|
||||
|
||||
[Join Discord](https://discord.gg/cmctpMBXwE) to ask questions, share feedback, and connect with other Shannon Lite users.
|
||||
[Join Discord](https://discord.gg/cmctpMBXwE) to ask questions, share feedback, and connect with other Shannon users.
|
||||
|
||||
At this time, Keygraph is not accepting external code contributions. Issues are welcome for bug reports and feature requests:
|
||||
|
||||
|
||||
@@ -1,10 +1,11 @@
|
||||
/**
|
||||
* `shn setup` — interactive TUI wizard for one-time credential configuration.
|
||||
* `npx @keygraph/shannon setup` — interactive TUI wizard for one-time credential configuration.
|
||||
*
|
||||
* Walks the user through selecting a provider and entering credentials,
|
||||
* then persists everything to ~/.shannon/config.toml with 0o600 permissions.
|
||||
*/
|
||||
|
||||
import fs from 'node:fs';
|
||||
import os from 'node:os';
|
||||
import path from 'node:path';
|
||||
import * as p from '@clack/prompts';
|
||||
@@ -12,7 +13,7 @@ import { type ShannonConfig, saveConfig } from '../config/writer.js';
|
||||
|
||||
const SHANNON_HOME = path.join(os.homedir(), '.shannon');
|
||||
|
||||
type Provider = 'anthropic' | 'custom_base_url' | 'bedrock';
|
||||
type Provider = 'anthropic' | 'custom_base_url' | 'bedrock' | 'vertex';
|
||||
|
||||
export async function setup(): Promise<void> {
|
||||
p.intro('Shannon Setup');
|
||||
@@ -24,6 +25,7 @@ export async function setup(): Promise<void> {
|
||||
{ value: 'anthropic' as const, label: 'Claude Direct', hint: 'recommended' },
|
||||
{ value: 'custom_base_url' as const, label: 'Custom Base URL', hint: 'proxies, gateways' },
|
||||
{ value: 'bedrock' as const, label: 'Claude via AWS Bedrock' },
|
||||
{ value: 'vertex' as const, label: 'Claude via Google Vertex AI' },
|
||||
],
|
||||
});
|
||||
if (p.isCancel(provider)) return cancelAndExit();
|
||||
@@ -38,7 +40,7 @@ export async function setup(): Promise<void> {
|
||||
|
||||
const configPath = path.join(SHANNON_HOME, 'config.toml');
|
||||
p.log.success(`Configuration saved to ${configPath}`);
|
||||
p.outro('Run `npx @keygraph/shannon@beta start` to begin a scan.');
|
||||
p.outro('Run `npx @keygraph/shannon start` to begin a scan.');
|
||||
}
|
||||
|
||||
async function setupProvider(provider: Provider): Promise<ShannonConfig> {
|
||||
@@ -49,6 +51,8 @@ async function setupProvider(provider: Provider): Promise<ShannonConfig> {
|
||||
return setupCustomBaseUrl();
|
||||
case 'bedrock':
|
||||
return setupBedrock();
|
||||
case 'vertex':
|
||||
return setupVertex();
|
||||
}
|
||||
}
|
||||
|
||||
@@ -209,6 +213,75 @@ async function setupBedrock(): Promise<ShannonConfig> {
|
||||
};
|
||||
}
|
||||
|
||||
async function setupVertex(): Promise<ShannonConfig> {
|
||||
// 1. Collect region and project ID
|
||||
const region = await p.text({
|
||||
message: 'Google Cloud region',
|
||||
placeholder: 'us-east5',
|
||||
validate: required('Region is required'),
|
||||
});
|
||||
if (p.isCancel(region)) return cancelAndExit();
|
||||
|
||||
const projectId = await p.text({
|
||||
message: 'GCP Project ID',
|
||||
validate: required('Project ID is required'),
|
||||
});
|
||||
if (p.isCancel(projectId)) return cancelAndExit();
|
||||
|
||||
// 2. File picker for service account key
|
||||
p.log.info('Select the path to your GCP Service Account JSON key file.');
|
||||
const keySourcePath = await p.path({
|
||||
message: 'Service Account JSON key file',
|
||||
validate: (value) => {
|
||||
if (!value) return 'Path is required';
|
||||
if (!fs.existsSync(value)) return 'File not found';
|
||||
if (!value.endsWith('.json')) return 'Must be a .json file';
|
||||
return undefined;
|
||||
},
|
||||
});
|
||||
if (p.isCancel(keySourcePath)) return cancelAndExit();
|
||||
|
||||
// 3. Copy key to ~/.shannon/ and lock permissions
|
||||
const destPath = path.join(SHANNON_HOME, 'google-sa-key.json');
|
||||
fs.mkdirSync(SHANNON_HOME, { recursive: true });
|
||||
fs.copyFileSync(keySourcePath, destPath);
|
||||
fs.chmodSync(destPath, 0o600);
|
||||
p.log.success(`Key copied to ${destPath} (permissions: 0600)`);
|
||||
|
||||
// 4. Model tiers
|
||||
const models = await p.group({
|
||||
small: () =>
|
||||
p.text({
|
||||
message: 'Small model ID',
|
||||
placeholder: 'claude-haiku-4-5@20251001',
|
||||
validate: required('Small model ID is required'),
|
||||
}),
|
||||
medium: () =>
|
||||
p.text({
|
||||
message: 'Medium model ID',
|
||||
placeholder: 'claude-sonnet-4-6',
|
||||
validate: required('Medium model ID is required'),
|
||||
}),
|
||||
large: () =>
|
||||
p.text({
|
||||
message: 'Large model ID',
|
||||
placeholder: 'claude-opus-4-8',
|
||||
validate: required('Large model ID is required'),
|
||||
}),
|
||||
});
|
||||
if (p.isCancel(models)) return cancelAndExit();
|
||||
|
||||
return {
|
||||
vertex: {
|
||||
use: true,
|
||||
region,
|
||||
project_id: projectId,
|
||||
key_path: destPath,
|
||||
},
|
||||
models: { small: models.small, medium: models.medium, large: models.large },
|
||||
};
|
||||
}
|
||||
|
||||
// === Helpers ===
|
||||
|
||||
async function maybePromptAdaptiveThinking(config: ShannonConfig): Promise<void> {
|
||||
|
||||
@@ -10,7 +10,7 @@ import fs from 'node:fs';
|
||||
import path from 'node:path';
|
||||
import { ensureImage, ensureInfra, randomSuffix, spawnWorker } from '../docker.js';
|
||||
import { buildEnvFlags, loadEnv, validateCredentials } from '../env.js';
|
||||
import { getWorkspacesDir, initHome } from '../home.js';
|
||||
import { getCredentialsPath, getWorkspacesDir, initHome } from '../home.js';
|
||||
import { isLocal } from '../mode.js';
|
||||
import { resolveConfig, resolveRepo } from '../paths.js';
|
||||
import { displaySplash } from '../splash.js';
|
||||
@@ -78,6 +78,13 @@ export async function start(args: StartArgs): Promise<void> {
|
||||
}
|
||||
fs.mkdirSync(path.join(repo.hostPath, '.playwright'), { recursive: true });
|
||||
|
||||
const credentialsPath = getCredentialsPath();
|
||||
const hasCredentials = fs.existsSync(credentialsPath);
|
||||
|
||||
if (hasCredentials) {
|
||||
process.env.GOOGLE_APPLICATION_CREDENTIALS = '/app/credentials/google-sa-key.json';
|
||||
}
|
||||
|
||||
// 10. Resolve output directory
|
||||
const outputDir = args.output ? path.resolve(args.output) : undefined;
|
||||
if (outputDir) {
|
||||
@@ -100,6 +107,7 @@ export async function start(args: StartArgs): Promise<void> {
|
||||
containerName,
|
||||
envFlags: buildEnvFlags(),
|
||||
...(config && { config }),
|
||||
...(hasCredentials && { credentials: credentialsPath }),
|
||||
...(promptsDir && { promptsDir }),
|
||||
...(outputDir && { outputDir }),
|
||||
workspace,
|
||||
@@ -215,7 +223,7 @@ function printInfo(
|
||||
repoPath: string,
|
||||
workspacesDir: string,
|
||||
): void {
|
||||
const logsCmd = isLocal() ? `./shannon logs ${workspace}` : `npx @keygraph/shannon@beta logs ${workspace}`;
|
||||
const logsCmd = isLocal() ? `./shannon logs ${workspace}` : `npx @keygraph/shannon logs ${workspace}`;
|
||||
const reportsPath = path.join(workspacesDir, workspace);
|
||||
|
||||
console.log(` Target: ${args.url}`);
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
/**
|
||||
* `shn uninstall` command — remove ~/.shannon/ after confirmation (npx only).
|
||||
* `npx @keygraph/shannon uninstall` command — remove ~/.shannon/ after confirmation (npx only).
|
||||
*/
|
||||
|
||||
import fs from 'node:fs';
|
||||
@@ -33,5 +33,5 @@ export async function uninstall(): Promise<void> {
|
||||
|
||||
fs.rmSync(SHANNON_HOME, { recursive: true, force: true });
|
||||
p.log.success('All Shannon data has been removed.');
|
||||
p.outro('Shannon has been uninstalled. Run `npx @keygraph/shannon@beta setup` to start fresh.');
|
||||
p.outro('Shannon has been uninstalled. Run `npx @keygraph/shannon setup` to start fresh.');
|
||||
}
|
||||
|
||||
@@ -24,6 +24,7 @@ interface ConfigMapping {
|
||||
/** Maps every supported env var to its TOML path (section.key) and expected type. */
|
||||
const CONFIG_MAP: readonly ConfigMapping[] = [
|
||||
// Core
|
||||
{ env: 'CLAUDE_CODE_MAX_OUTPUT_TOKENS', toml: 'core.max_tokens', type: 'number' },
|
||||
{ env: 'CLAUDE_ADAPTIVE_THINKING', toml: 'core.adaptive_thinking', type: 'boolean', boolFormat: 'literal' },
|
||||
|
||||
// Anthropic
|
||||
@@ -35,6 +36,12 @@ const CONFIG_MAP: readonly ConfigMapping[] = [
|
||||
{ env: 'AWS_REGION', toml: 'bedrock.region', type: 'string' },
|
||||
{ env: 'AWS_BEARER_TOKEN_BEDROCK', toml: 'bedrock.token', type: 'string' },
|
||||
|
||||
// Vertex
|
||||
{ env: 'CLAUDE_CODE_USE_VERTEX', toml: 'vertex.use', type: 'boolean' },
|
||||
{ env: 'CLOUD_ML_REGION', toml: 'vertex.region', type: 'string' },
|
||||
{ env: 'ANTHROPIC_VERTEX_PROJECT_ID', toml: 'vertex.project_id', type: 'string' },
|
||||
{ env: 'GOOGLE_APPLICATION_CREDENTIALS', toml: 'vertex.key_path', type: 'string' },
|
||||
|
||||
// Custom Base URL
|
||||
{ env: 'ANTHROPIC_BASE_URL', toml: 'custom_base_url.base_url', type: 'string' },
|
||||
{ env: 'ANTHROPIC_AUTH_TOKEN', toml: 'custom_base_url.auth_token', type: 'string' },
|
||||
@@ -92,7 +99,7 @@ function loadTOML(): TOMLConfig | null {
|
||||
} catch (err) {
|
||||
const message = err instanceof Error ? err.message : String(err);
|
||||
console.error(`\nFailed to parse ${configPath}: ${message}`);
|
||||
console.error(`\nRun 'npx @keygraph/shannon@beta setup' to reconfigure.\n`);
|
||||
console.error(`\nRun 'npx @keygraph/shannon setup' to reconfigure.\n`);
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
@@ -147,10 +154,20 @@ function validateProviderFields(config: TOMLConfig, provider: string, errors: st
|
||||
validateModelTiers(config, 'bedrock', errors);
|
||||
break;
|
||||
}
|
||||
|
||||
case 'vertex': {
|
||||
const required = ['use', 'region', 'project_id', 'key_path'];
|
||||
const missing = required.filter((k) => !keys.includes(k));
|
||||
if (missing.length > 0) {
|
||||
errors.push(`[vertex] missing required keys: ${missing.join(', ')}`);
|
||||
}
|
||||
validateModelTiers(config, 'vertex', errors);
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/** Bedrock requires a [models] section with all three tiers. */
|
||||
/** Bedrock and Vertex require a [models] section with all three tiers. */
|
||||
function validateModelTiers(config: TOMLConfig, provider: string, errors: string[]): void {
|
||||
const models = config.models as Record<string, unknown> | undefined;
|
||||
if (!models || typeof models !== 'object') {
|
||||
@@ -210,7 +227,7 @@ function validateConfig(config: TOMLConfig): string[] {
|
||||
}
|
||||
|
||||
// 4. Only one provider section allowed (ignore empty sections)
|
||||
const PROVIDER_SECTIONS = ['anthropic', 'custom_base_url', 'bedrock'] as const;
|
||||
const PROVIDER_SECTIONS = ['anthropic', 'custom_base_url', 'bedrock', 'vertex'] as const;
|
||||
const present = PROVIDER_SECTIONS.filter((s) => {
|
||||
const section = config[s];
|
||||
return section && typeof section === 'object' && Object.keys(section).length > 0;
|
||||
@@ -253,7 +270,7 @@ export function resolveConfig(): void {
|
||||
for (const err of errors) {
|
||||
console.error(` - ${err}`);
|
||||
}
|
||||
console.error(`\nRun 'shn setup' to reconfigure.\n`);
|
||||
console.error(`\nRun 'npx @keygraph/shannon setup' to reconfigure.\n`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
|
||||
@@ -8,10 +8,11 @@ import { getConfigFile } from '../home.js';
|
||||
// === Types ===
|
||||
|
||||
export interface ShannonConfig {
|
||||
core?: { adaptive_thinking?: boolean };
|
||||
core?: { max_tokens?: number; adaptive_thinking?: boolean };
|
||||
anthropic?: { api_key?: string; oauth_token?: string };
|
||||
custom_base_url?: { base_url?: string; auth_token?: string };
|
||||
bedrock?: { use?: boolean; region?: string; token?: string };
|
||||
vertex?: { use?: boolean; region?: string; project_id?: string; key_path?: string };
|
||||
models?: { small?: string; medium?: string; large?: string };
|
||||
}
|
||||
|
||||
|
||||
@@ -236,6 +236,7 @@ export interface WorkerOptions {
|
||||
containerName: string;
|
||||
envFlags: string[];
|
||||
config?: { hostPath: string; containerPath: string };
|
||||
credentials?: string;
|
||||
promptsDir?: string;
|
||||
outputDir?: string;
|
||||
workspace: string;
|
||||
@@ -290,6 +291,11 @@ export function spawnWorker(opts: WorkerOptions): ChildProcess {
|
||||
args.push('-v', `${opts.outputDir}:/app/output`);
|
||||
}
|
||||
|
||||
// Mount credentials file to fixed container path
|
||||
if (opts.credentials) {
|
||||
args.push('-v', `${opts.credentials}:/app/credentials/google-sa-key.json:ro`);
|
||||
}
|
||||
|
||||
// Environment
|
||||
args.push(...opts.envFlags);
|
||||
|
||||
|
||||
+31
-2
@@ -18,9 +18,14 @@ const FORWARD_VARS = [
|
||||
'CLAUDE_CODE_USE_BEDROCK',
|
||||
'AWS_REGION',
|
||||
'AWS_BEARER_TOKEN_BEDROCK',
|
||||
'CLAUDE_CODE_USE_VERTEX',
|
||||
'CLOUD_ML_REGION',
|
||||
'ANTHROPIC_VERTEX_PROJECT_ID',
|
||||
'GOOGLE_APPLICATION_CREDENTIALS',
|
||||
'ANTHROPIC_SMALL_MODEL',
|
||||
'ANTHROPIC_MEDIUM_MODEL',
|
||||
'ANTHROPIC_LARGE_MODEL',
|
||||
'CLAUDE_CODE_MAX_OUTPUT_TOKENS',
|
||||
'CLAUDE_ADAPTIVE_THINKING',
|
||||
] as const;
|
||||
|
||||
@@ -57,7 +62,7 @@ export function buildEnvFlags(): string[] {
|
||||
interface CredentialValidation {
|
||||
valid: boolean;
|
||||
error?: string;
|
||||
mode: 'api-key' | 'oauth' | 'custom-base-url' | 'bedrock';
|
||||
mode: 'api-key' | 'oauth' | 'custom-base-url' | 'bedrock' | 'vertex';
|
||||
}
|
||||
|
||||
/** Check if a custom Anthropic-compatible base URL is configured. */
|
||||
@@ -72,6 +77,7 @@ function detectProviders(): string[] {
|
||||
if (process.env.CLAUDE_CODE_OAUTH_TOKEN) providers.push('Anthropic OAuth');
|
||||
if (isCustomBaseUrlConfigured()) providers.push('Custom Base URL');
|
||||
if (process.env.CLAUDE_CODE_USE_BEDROCK === '1') providers.push('AWS Bedrock');
|
||||
if (process.env.CLAUDE_CODE_USE_VERTEX === '1') providers.push('Google Vertex');
|
||||
return providers;
|
||||
}
|
||||
|
||||
@@ -114,11 +120,34 @@ export function validateCredentials(): CredentialValidation {
|
||||
}
|
||||
return { valid: true, mode: 'bedrock' };
|
||||
}
|
||||
if (process.env.CLAUDE_CODE_USE_VERTEX === '1') {
|
||||
const missing: string[] = [];
|
||||
if (!process.env.CLOUD_ML_REGION) missing.push('CLOUD_ML_REGION');
|
||||
if (!process.env.ANTHROPIC_VERTEX_PROJECT_ID) missing.push('ANTHROPIC_VERTEX_PROJECT_ID');
|
||||
if (!process.env.ANTHROPIC_SMALL_MODEL) missing.push('ANTHROPIC_SMALL_MODEL');
|
||||
if (!process.env.ANTHROPIC_MEDIUM_MODEL) missing.push('ANTHROPIC_MEDIUM_MODEL');
|
||||
if (!process.env.ANTHROPIC_LARGE_MODEL) missing.push('ANTHROPIC_LARGE_MODEL');
|
||||
if (missing.length > 0) {
|
||||
return {
|
||||
valid: false,
|
||||
mode: 'vertex',
|
||||
error: `Vertex AI mode requires: ${missing.join(', ')}`,
|
||||
};
|
||||
}
|
||||
if (!process.env.GOOGLE_APPLICATION_CREDENTIALS) {
|
||||
return {
|
||||
valid: false,
|
||||
mode: 'vertex',
|
||||
error: 'Vertex AI mode requires GOOGLE_APPLICATION_CREDENTIALS',
|
||||
};
|
||||
}
|
||||
return { valid: true, mode: 'vertex' };
|
||||
}
|
||||
|
||||
const hint =
|
||||
getMode() === 'local'
|
||||
? `No credentials found. Set ANTHROPIC_API_KEY in .env or export it.`
|
||||
: `Authentication not configured. Export variables or run 'npx @keygraph/shannon@beta setup'.`;
|
||||
: `Authentication not configured. Export variables or run 'npx @keygraph/shannon setup'.`;
|
||||
return {
|
||||
valid: false,
|
||||
mode: 'api-key',
|
||||
|
||||
+20
-2
@@ -1,7 +1,7 @@
|
||||
/**
|
||||
* Shannon state directory management.
|
||||
*
|
||||
* Local mode (cloned repo): uses ./workspaces/
|
||||
* Local mode (cloned repo): uses ./workspaces/, ./credentials/
|
||||
* NPX mode: uses ~/.shannon/workspaces/, ~/.shannon/
|
||||
*/
|
||||
|
||||
@@ -20,14 +20,32 @@ export function getWorkspacesDir(): string {
|
||||
return getMode() === 'local' ? path.resolve('workspaces') : path.join(SHANNON_HOME, 'workspaces');
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve the Vertex credentials file path.
|
||||
*
|
||||
* Checks GOOGLE_APPLICATION_CREDENTIALS env var first (may be set by TOML resolver),
|
||||
* then falls back to mode-appropriate default location.
|
||||
*/
|
||||
export function getCredentialsPath(): string {
|
||||
const envPath = process.env.GOOGLE_APPLICATION_CREDENTIALS;
|
||||
if (envPath && fs.existsSync(envPath)) return path.resolve(envPath);
|
||||
|
||||
if (getMode() === 'local') {
|
||||
return path.resolve('credentials', 'google-sa-key.json');
|
||||
}
|
||||
|
||||
return path.join(SHANNON_HOME, 'google-sa-key.json');
|
||||
}
|
||||
|
||||
/**
|
||||
* Initialize state directories.
|
||||
* Local mode: creates ./workspaces/
|
||||
* Local mode: creates ./workspaces/ and ./credentials/
|
||||
* NPX mode: creates ~/.shannon/workspaces/
|
||||
*/
|
||||
export function initHome(): void {
|
||||
if (getMode() === 'local') {
|
||||
fs.mkdirSync(path.resolve('workspaces'), { recursive: true });
|
||||
fs.mkdirSync(path.resolve('credentials'), { recursive: true });
|
||||
} else {
|
||||
fs.mkdirSync(path.join(SHANNON_HOME, 'workspaces'), { recursive: true });
|
||||
}
|
||||
|
||||
@@ -56,7 +56,7 @@ function getVersion(): string {
|
||||
|
||||
function showHelp(): void {
|
||||
const mode = getMode();
|
||||
const prefix = mode === 'local' ? './shannon' : 'npx @keygraph/shannon@beta';
|
||||
const prefix = mode === 'local' ? './shannon' : 'npx @keygraph/shannon';
|
||||
|
||||
console.log(`
|
||||
Shannon - AI Penetration Testing Framework
|
||||
@@ -173,14 +173,14 @@ function parseStartArgs(argv: string[]): ParsedStartArgs {
|
||||
break;
|
||||
default:
|
||||
console.error(`Unknown option: ${arg}`);
|
||||
console.error(`Run "${getMode() === 'local' ? './shannon' : 'npx @keygraph/shannon@beta'} help" for usage`);
|
||||
console.error(`Run "${getMode() === 'local' ? './shannon' : 'npx @keygraph/shannon'} help" for usage`);
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
if (!url || !repo) {
|
||||
console.error('ERROR: --url and --repo are required');
|
||||
console.error(`Usage: ${getMode() === 'local' ? './shannon' : 'npx @keygraph/shannon@beta'} start -u <url> -r <path>`);
|
||||
console.error(`Usage: ${getMode() === 'local' ? './shannon' : 'npx @keygraph/shannon'} start -u <url> -r <path>`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
@@ -215,7 +215,7 @@ switch (command) {
|
||||
const workspaceId = args[1];
|
||||
if (!workspaceId) {
|
||||
console.error('ERROR: Workspace ID is required');
|
||||
console.error(`Usage: ${getMode() === 'local' ? './shannon' : 'npx @keygraph/shannon@beta'} logs <workspace>`);
|
||||
console.error(`Usage: ${getMode() === 'local' ? './shannon' : 'npx @keygraph/shannon'} logs <workspace>`);
|
||||
process.exit(1);
|
||||
}
|
||||
logs(workspaceId);
|
||||
|
||||
@@ -19,10 +19,7 @@
|
||||
"clean": "rm -rf dist"
|
||||
},
|
||||
"dependencies": {
|
||||
"@earendil-works/pi-agent-core": "^0.79.1",
|
||||
"@earendil-works/pi-ai": "^0.79.1",
|
||||
"@earendil-works/pi-coding-agent": "^0.79.1",
|
||||
"@gotgenes/pi-permission-system": "^10.9.0",
|
||||
"@anthropic-ai/claude-agent-sdk": "catalog:",
|
||||
"@temporalio/activity": "^1.11.0",
|
||||
"@temporalio/client": "^1.11.0",
|
||||
"@temporalio/worker": "^1.11.0",
|
||||
@@ -31,7 +28,6 @@
|
||||
"ajv-formats": "^2.1.1",
|
||||
"dotenv": "^16.4.5",
|
||||
"js-yaml": "^4.1.0",
|
||||
"typebox": "1.1.38",
|
||||
"zod": "^4.3.6",
|
||||
"zx": "^8.0.0"
|
||||
},
|
||||
|
||||
@@ -116,7 +116,7 @@ Before beginning exploitation, read these strategic intelligence files in order:
|
||||
2. `.shannon/deliverables/recon_deliverable.md` - Complete API inventory, user roles, and data flow maps.
|
||||
3. `.shannon/deliverables/auth_analysis_deliverable.md` - Strategic context from the Auth analysis specialist, including notes on session mechanisms, password policies, and flawed logic paths.
|
||||
|
||||
- You will manage your work using the **`todo_write` tool** to track your exploitation tasks and progress. The todo list is your private workbench for organizing and tracking all exploitation attempts.
|
||||
- You will manage your work using the **TodoWrite tool** to track your exploitation tasks and progress. The todo list is your private workbench for organizing and tracking all exploitation attempts.
|
||||
</starting_context>
|
||||
|
||||
<system_architecture>
|
||||
@@ -145,18 +145,18 @@ You are the **Identity Compromise Specialist** - proving tangible impact of brok
|
||||
|
||||
<cli_tools>
|
||||
- **Browser Automation (playwright-cli skill):** Essential for interacting with multi-step authentication flows, injecting stolen session cookies, and verifying account takeover in a real browser context. Invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
|
||||
- **`bash` tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **`task` agent:** Mandatory coder-executor for brute force batches, credential stuffing, token replay automation, and any scripted workflow.
|
||||
- **`todo_write` tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
|
||||
- **`read` tool:** To read false positives from your tracking file at `workspace/auth_false_positives.md`.
|
||||
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **Task Agent:** Mandatory coder-executor for brute force batches, credential stuffing, token replay automation, and any scripted workflow.
|
||||
- **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
|
||||
- **Read tool:** To read false positives from your tracking file at `workspace/auth_false_positives.md`.
|
||||
|
||||
**CRITICAL TASK AGENT WORKFLOW:**
|
||||
- Delegate every custom script or multi-step automation to the `task` agent; do not hand-write scripts in Bash.
|
||||
- Keep requests ≤15 lines and use the template below so the `task` agent has clear inputs and success criteria.
|
||||
- Delegate every custom script or multi-step automation to the Task Agent; do not hand-write scripts in Bash.
|
||||
- Keep requests ≤15 lines and use the template below so the Task Agent has clear inputs and success criteria.
|
||||
</cli_tools>
|
||||
|
||||
## `task` agent Scripting Requirements
|
||||
Use the `task` agent to execute all bulk or repetitive auth attacks. Manual commands are for quick verification only.
|
||||
## Task Agent Scripting Requirements
|
||||
Use the Task Agent to execute all bulk or repetitive auth attacks. Manual commands are for quick verification only.
|
||||
|
||||
**MANDATORY TEMPLATE:**
|
||||
"
|
||||
@@ -172,7 +172,7 @@ Inputs:
|
||||
Success Criteria: Print each attempt with status code and key response text so I can evaluate impact.
|
||||
"
|
||||
|
||||
**DO NOT:** exceed 5–6 inputs per run, ask for advanced parsing, or bypass the `task` agent for scripting.
|
||||
**DO NOT:** exceed 5–6 inputs per run, ask for advanced parsing, or bypass the Task Agent for scripting.
|
||||
|
||||
<methodology_and_domain_expertise>
|
||||
|
||||
@@ -183,11 +183,11 @@ Your primary goal is to achieve a definitive, evidence-backed conclusion for **e
|
||||
- **Complete the Workflow:** For each vulnerability, you must follow the full Exploitation Workflow from Confirmation to Impact Demonstration.
|
||||
|
||||
## **The Task-Based Attack Plan**
|
||||
Your entire exploitation process will be managed through the `todo_write` tool. You MUST adhere to this process.
|
||||
Your entire exploitation process will be managed through the TodoWrite tool. You MUST adhere to this process.
|
||||
|
||||
**1. Initialization:**
|
||||
- Read `.shannon/deliverables/auth_exploitation_queue.json`.
|
||||
- Use the `todo_write` tool to create tasks for each vulnerability in the queue.
|
||||
- Use the TodoWrite tool to create tasks for each vulnerability in the queue.
|
||||
- Each task should include the vulnerability ID and the hypothesis to test.
|
||||
Example tasks:
|
||||
- "AUTH-VULN-01: Exploit Session Fixation on /login (Hypothesis: Session ID not rotated)"
|
||||
@@ -195,10 +195,10 @@ Your entire exploitation process will be managed through the `todo_write` tool.
|
||||
|
||||
**2. The Execution Loop:**
|
||||
- **A. Get Next Task:** Review your todo list and identify the next pending task to work on.
|
||||
- **B. Plan & Decompose:** Mark the task as in_progress using `todo_write`. Read the `suggested_exploit_technique` from the vulnerability details. This is your primary directive. Plan the specific commands and tools needed to execute this attack pattern. If the attack is complex (e.g., a multi-step logic flaw), use `todo_write` to add smaller tasks.
|
||||
- **B. Plan & Decompose:** Mark the task as in_progress using TodoWrite. Read the `suggested_exploit_technique` from the vulnerability details. This is your primary directive. Plan the specific commands and tools needed to execute this attack pattern. If the attack is complex (e.g., a multi-step logic flaw), use TodoWrite to add smaller tasks.
|
||||
- **C. Execute:** Run the command for the current task.
|
||||
- **D. Update & Document:** Based on the result, update the `todo_write` status to completed if successful. Document the outcome in your memory for the final report.
|
||||
- **E. Expand the Plan (Branching):** If successful and more steps are needed, use `todo_write` to add the next logical tasks required by the OWASP workflow.
|
||||
- **D. Update & Document:** Based on the result, update the TodoWrite status to completed if successful. Document the outcome in your memory for the final report.
|
||||
- **E. Expand the Plan (Branching):** If successful and more steps are needed, use TodoWrite to add the next logical tasks required by the OWASP workflow.
|
||||
|
||||
## **OWASP Authentication Exploitation Workflow**
|
||||
For each root vulnerability, follow this systematic, two-stage attack flow.
|
||||
@@ -289,7 +289,7 @@ Prioritize these based on the intelligence from your queue and vulnerability ana
|
||||
**2. Credential-Based Attacks**
|
||||
- **Credential Stuffing with cURL:**
|
||||
`curl -X POST -H "Content-Type: application/json" -d '{"username":"admin", "password":"CrackedPassword123"}' https://target.com/api/login`
|
||||
- **Brute Force (Example with `task` agent):**
|
||||
- **Brute Force (Example with Task Agent):**
|
||||
"Task: Write a python script to POST to /api/login with username 'admin' and passwords from this list: ['admin', 'password', '123456']. Print the status code for each."
|
||||
|
||||
**3. Logic Flaws**
|
||||
@@ -304,8 +304,8 @@ Prioritize these based on the intelligence from your queue and vulnerability ana
|
||||
</attack_patterns>
|
||||
</methodology_and_domain_expertise>
|
||||
|
||||
<deliverable_tools>
|
||||
You emit your exploitation evidence through a single tool — `add_exploit`. The host renderer assembles `.shannon/deliverables/auth_exploitation_evidence.md` from your tool calls after the run. You do NOT write the Markdown file directly.
|
||||
<mcp_tools>
|
||||
You emit your exploitation evidence through a single MCP tool — `add_exploit` from the `exploit-collector` server. The host renderer assembles `.shannon/deliverables/auth_exploitation_evidence.md` from your tool calls after the run. You do NOT write the Markdown file directly.
|
||||
|
||||
**When to emit.** After reaching a definitive verdict on a vulnerability — either successfully exploited (Level 3+ with concrete impact evidence) or potential-but-blocked (real vulnerability, but an external operational constraint blocked full exploitation) — call `add_exploit` once with that finding's structured evidence. Call once per queue vulnerability; do not batch. Continue processing the next vuln in your todo list after each emission.
|
||||
|
||||
@@ -316,7 +316,7 @@ You emit your exploitation evidence through a single tool — `add_exploit`. The
|
||||
**Idempotency.** Duplicate `vulnerability_id` calls are rejected with `DuplicateError`. Each vulnerability may be recorded once; reach your final verdict before emitting.
|
||||
|
||||
**Required-call intent.** Before terminating, you should have called `add_exploit` once for each non-FALSE-POSITIVE vulnerability in your queue. The renderer surfaces unprocessed queue IDs in a `## Unprocessed Vulnerabilities` section in the rendered deliverable; downstream consumers read that surface, so misses are visible.
|
||||
</deliverable_tools>
|
||||
</mcp_tools>
|
||||
|
||||
<conclusion_trigger>
|
||||
### Evidence Completeness Verification
|
||||
@@ -328,7 +328,7 @@ You emit your exploitation evidence through a single tool — `add_exploit`. The
|
||||
- All technical components specified without ambiguity
|
||||
|
||||
COMPLETION REQUIREMENTS (ALL must be satisfied):
|
||||
1. Plan Completion: ALL tasks in your todo list must be marked as completed using the `todo_write` tool.
|
||||
1. Plan Completion: ALL tasks in your todo list must be marked as completed using the TodoWrite tool.
|
||||
2. Evidence Emission: Every non-FALSE-POSITIVE vulnerability in `.shannon/deliverables/auth_exploitation_queue.json` has been recorded with one `add_exploit` call carrying the appropriate `status` ("exploited" or "blocked") and required fields. FALSE POSITIVE findings are recorded only in `workspace/auth_false_positives.md`, not via `add_exploit`.
|
||||
|
||||
CRITICAL WARNING: Announcing completion before every item in .shannon/deliverables/auth_exploitation_queue.json has been pursued to a final, evidence-backed conclusion (either successfully exploited, recorded as potential-but-blocked, or verified false positive) will be considered a mission failure.
|
||||
|
||||
@@ -105,7 +105,7 @@ Before beginning exploitation, read these strategic intelligence files in order:
|
||||
2. `.shannon/deliverables/recon_deliverable.md` - Complete API inventory, user roles, and permission models.
|
||||
3. `.shannon/deliverables/authz_analysis_deliverable.md` - Strategic context from the Authz analysis specialist, including notes on access control patterns, role hierarchies, and flawed logic paths.
|
||||
|
||||
- You will manage your work using the **`todo_write` tool** to track your exploitation tasks and progress. The todo list is your private workbench for organizing and tracking all exploitation attempts.
|
||||
- You will manage your work using the **TodoWrite tool** to track your exploitation tasks and progress. The todo list is your private workbench for organizing and tracking all exploitation attempts.
|
||||
</starting_context>
|
||||
|
||||
<system_architecture>
|
||||
@@ -134,18 +134,18 @@ You are the **Privilege Escalation Specialist** - proving tangible impact of bro
|
||||
|
||||
<cli_tools>
|
||||
- **Browser Automation (playwright-cli skill):** Essential for interacting with complex authorization flows, testing role-based access controls in browser contexts, and verifying privilege escalation through UI elements. Invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
|
||||
- **`bash` tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **`task` agent:** Mandatory coder-executor for IDOR sweeps, role escalation loops, and workflow bypass automation.
|
||||
- **`todo_write` tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
|
||||
- **`read` tool:** To read false positives from your tracking file at `workspace/authz_false_positives.md`.
|
||||
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **Task Agent:** Mandatory coder-executor for IDOR sweeps, role escalation loops, and workflow bypass automation.
|
||||
- **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
|
||||
- **Read tool:** To read false positives from your tracking file at `workspace/authz_false_positives.md`.
|
||||
|
||||
**CRITICAL TASK AGENT WORKFLOW:**
|
||||
- Delegate every multi-user iteration, role toggle test, or workflow automation script to the `task` agent—never handcraft these scripts yourself.
|
||||
- Keep requests ≤15 lines and adhere to the template below so the `task` agent can act deterministically.
|
||||
- Delegate every multi-user iteration, role toggle test, or workflow automation script to the Task Agent—never handcraft these scripts yourself.
|
||||
- Keep requests ≤15 lines and adhere to the template below so the Task Agent can act deterministically.
|
||||
</cli_tools>
|
||||
|
||||
## `task` agent Scripting Requirements
|
||||
All repeated authorization tests must run through the `task` agent.
|
||||
## Task Agent Scripting Requirements
|
||||
All repeated authorization tests must run through the Task Agent.
|
||||
|
||||
**MANDATORY TEMPLATE:**
|
||||
"
|
||||
@@ -161,7 +161,7 @@ Inputs:
|
||||
Success Criteria: Execute one request per identity, logging status code and key response text so I can confirm access levels.
|
||||
"
|
||||
|
||||
**DO NOT:** exceed 5 identities per run, ask for complex diffing, or bypass the `task` agent for scripting.
|
||||
**DO NOT:** exceed 5 identities per run, ask for complex diffing, or bypass the Task Agent for scripting.
|
||||
|
||||
<methodology_and_domain_expertise>
|
||||
|
||||
@@ -172,11 +172,11 @@ Your primary goal is to achieve a definitive, evidence-backed conclusion for **e
|
||||
- **Complete the Workflow:** For each vulnerability, you must follow the full Exploitation Workflow from Confirmation to Impact Demonstration.
|
||||
|
||||
## **The Task-Based Attack Plan**
|
||||
Your entire exploitation process will be managed through the `todo_write` tool. You MUST adhere to this process.
|
||||
Your entire exploitation process will be managed through the TodoWrite tool. You MUST adhere to this process.
|
||||
|
||||
**1. Initialization:**
|
||||
- Read `.shannon/deliverables/authz_exploitation_queue.json`.
|
||||
- Use the `todo_write` tool to create tasks for each vulnerability in the queue.
|
||||
- Use the TodoWrite tool to create tasks for each vulnerability in the queue.
|
||||
- Each task should include the vulnerability ID, type, and the hypothesis to test.
|
||||
Example tasks:
|
||||
- "AUTHZ-VULN-01 (Horizontal): Exploit ownership bypass on /api/user/{id} (Hypothesis: Access to other users' data)"
|
||||
@@ -185,10 +185,10 @@ Your entire exploitation process will be managed through the `todo_write` tool.
|
||||
|
||||
**2. The Execution Loop:**
|
||||
- **A. Get Next Task:** Review your todo list and identify the next pending task to work on.
|
||||
- **B. Plan & Decompose:** Mark the task as in_progress using `todo_write`. Read the vulnerability type (`Horizontal`, `Vertical`, or `Context_Workflow`) and the `minimal_witness` from the vulnerability details. This is your primary directive. Plan the specific commands and tools needed to execute this attack pattern. If the attack is complex (e.g., a multi-step privilege escalation), use `todo_write` to add smaller tasks.
|
||||
- **B. Plan & Decompose:** Mark the task as in_progress using TodoWrite. Read the vulnerability type (`Horizontal`, `Vertical`, or `Context_Workflow`) and the `minimal_witness` from the vulnerability details. This is your primary directive. Plan the specific commands and tools needed to execute this attack pattern. If the attack is complex (e.g., a multi-step privilege escalation), use TodoWrite to add smaller tasks.
|
||||
- **C. Execute:** Run the command for the current task.
|
||||
- **D. Update & Document:** Based on the result, update the `todo_write` status to completed if successful. Document the outcome in your memory for the final report.
|
||||
- **E. Expand the Plan (Branching):** If successful and more steps are needed, use `todo_write` to add the next logical tasks required by the OWASP workflow.
|
||||
- **D. Update & Document:** Based on the result, update the TodoWrite status to completed if successful. Document the outcome in your memory for the final report.
|
||||
- **E. Expand the Plan (Branching):** If successful and more steps are needed, use TodoWrite to add the next logical tasks required by the OWASP workflow.
|
||||
|
||||
## **OWASP Authorization Exploitation Workflow**
|
||||
For each root vulnerability, follow this systematic, two-stage attack flow.
|
||||
@@ -312,8 +312,8 @@ Remember: The most effective attacks often come from understanding the specific
|
||||
</attack_patterns>
|
||||
</methodology_and_domain_expertise>
|
||||
|
||||
<deliverable_tools>
|
||||
You emit your exploitation evidence through a single tool — `add_exploit`. The host renderer assembles `.shannon/deliverables/authz_exploitation_evidence.md` from your tool calls after the run. You do NOT write the Markdown file directly.
|
||||
<mcp_tools>
|
||||
You emit your exploitation evidence through a single MCP tool — `add_exploit` from the `exploit-collector` server. The host renderer assembles `.shannon/deliverables/authz_exploitation_evidence.md` from your tool calls after the run. You do NOT write the Markdown file directly.
|
||||
|
||||
**When to emit.** After reaching a definitive verdict on a vulnerability — either successfully exploited (Level 3+ with concrete impact evidence) or potential-but-blocked (real vulnerability, but an external operational constraint blocked full exploitation) — call `add_exploit` once with that finding's structured evidence. Call once per queue vulnerability; do not batch. Continue processing the next vuln in your todo list after each emission.
|
||||
|
||||
@@ -324,7 +324,7 @@ You emit your exploitation evidence through a single tool — `add_exploit`. The
|
||||
**Idempotency.** Duplicate `vulnerability_id` calls are rejected with `DuplicateError`. Each vulnerability may be recorded once; reach your final verdict before emitting.
|
||||
|
||||
**Required-call intent.** Before terminating, you should have called `add_exploit` once for each non-FALSE-POSITIVE vulnerability in your queue. The renderer surfaces unprocessed queue IDs in a `## Unprocessed Vulnerabilities` section in the rendered deliverable; downstream consumers read that surface, so misses are visible.
|
||||
</deliverable_tools>
|
||||
</mcp_tools>
|
||||
|
||||
<conclusion_trigger>
|
||||
### Evidence Completeness Verification
|
||||
@@ -336,7 +336,7 @@ You emit your exploitation evidence through a single tool — `add_exploit`. The
|
||||
- All technical components specified without ambiguity
|
||||
|
||||
COMPLETION REQUIREMENTS (ALL must be satisfied):
|
||||
1. Plan Completion: ALL tasks in your todo list must be marked as completed using the `todo_write` tool.
|
||||
1. Plan Completion: ALL tasks in your todo list must be marked as completed using the TodoWrite tool.
|
||||
2. Evidence Emission: Every non-FALSE-POSITIVE vulnerability in `.shannon/deliverables/authz_exploitation_queue.json` has been recorded with one `add_exploit` call carrying the appropriate `status` ("exploited" or "blocked") and required fields. FALSE POSITIVE findings are recorded only in `workspace/authz_false_positives.md`, not via `add_exploit`.
|
||||
|
||||
CRITICAL WARNING: Announcing completion before every item in .shannon/deliverables/authz_exploitation_queue.json has been pursued to a final, evidence-backed conclusion (either successfully exploited, recorded as potential-but-blocked, or verified false positive) will be considered a mission failure.
|
||||
|
||||
@@ -87,7 +87,7 @@ Before beginning exploitation, read these strategic intelligence files in order:
|
||||
- Leverage vulnerability details and context from pre-analysis rather than blind testing.
|
||||
- Reference provided vulnerable code snippets to avoid blind guessing of query column counts, command syntax, or database/OS types.
|
||||
|
||||
- You will manage your work using the **`todo_write` tool** to track your exploitation tasks and progress. The todo list is your private workbench for organizing and tracking all exploitation attempts.
|
||||
- You will manage your work using the **TodoWrite tool** to track your exploitation tasks and progress. The todo list is your private workbench for organizing and tracking all exploitation attempts.
|
||||
</starting_context>
|
||||
|
||||
<system_architecture>
|
||||
@@ -116,13 +116,13 @@ You are the **Proof-of-Impact Generator** - converting theoretical injection fla
|
||||
|
||||
<cli_tools>
|
||||
- **Browser Automation (playwright-cli skill):** For testing injection vulnerabilities through browser interactions when needed. Invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
|
||||
- **`bash` tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **`task` agent:** Mandatory coder-executor for any custom scripting beyond single ad-hoc commands.
|
||||
- **`todo_write` tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
|
||||
- **`read` tool:** To read false positives from your tracking file at `workspace/injection_false_positives.md`.
|
||||
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **Task Agent:** Mandatory coder-executor for any custom scripting beyond single ad-hoc commands.
|
||||
- **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
|
||||
- **Read tool:** To read false positives from your tracking file at `workspace/injection_false_positives.md`.
|
||||
|
||||
**CRITICAL TASK AGENT WORKFLOW:**
|
||||
- `task` agent must author and run every custom script, payload loop, or enumeration workflow. Do not craft standalone scripts in Bash or other tools.
|
||||
- Task Agent must author and run every custom script, payload loop, or enumeration workflow. Do not craft standalone scripts in Bash or other tools.
|
||||
- Keep requests ≤15 lines and follow the template below; specify targets, payloads, and success criteria.
|
||||
</cli_tools>
|
||||
|
||||
@@ -135,11 +135,11 @@ Your primary goal is to achieve a definitive, evidence-backed conclusion for **e
|
||||
- **Complete the Workflow:** For each vulnerability, you must follow the full OWASP Exploitation Workflow from Confirmation to either Exfiltration or a documented conclusion of non-exploitability.
|
||||
|
||||
## **The Task-Based Attack Plan**
|
||||
Your entire exploitation process will be managed through the `todo_write` tool. You MUST adhere to this process.
|
||||
Your entire exploitation process will be managed through the TodoWrite tool. You MUST adhere to this process.
|
||||
|
||||
**1. Initialization:**
|
||||
- Read the `.shannon/deliverables/injection_exploitation_queue.json` file.
|
||||
- Use the `todo_write` tool to create tasks for each vulnerability in the queue.
|
||||
- Use the TodoWrite tool to create tasks for each vulnerability in the queue.
|
||||
- Each task should include the vulnerability ID and the hypothesis to test.
|
||||
Example tasks:
|
||||
- "SQLI-VULN-01: Exploit endpoint /api/search?q= (Hypothesis: Basic UNION injection)"
|
||||
@@ -150,16 +150,16 @@ You will repeatedly perform the following loop until all tasks are completed:
|
||||
|
||||
- **A. Get Next Task:** Review your todo list and identify the next pending task to work on.
|
||||
|
||||
- **B. Plan & Decompose:** Mark the task as in_progress using `todo_write`. Decide on the concrete command or action. If the task is complex (e.g., "Enumerate tables"), use `todo_write` to add smaller, actionable tasks.
|
||||
- **B. Plan & Decompose:** Mark the task as in_progress using TodoWrite. Decide on the concrete command or action. If the task is complex (e.g., "Enumerate tables"), use TodoWrite to add smaller, actionable tasks.
|
||||
|
||||
- **C. Execute:** Run the command for the current task (e.g., run `curl` with an `ORDER BY` payload).
|
||||
|
||||
- **D. Update & Document:** Based on the result, update the `todo_write` status:
|
||||
- **D. Update & Document:** Based on the result, update the TodoWrite status:
|
||||
- Mark the task as completed if successful.
|
||||
- Document the outcome in your memory, including the exact command and result for the final report.
|
||||
- Example outcome to remember: "Step 1.1: Determined column count is 4 using ORDER BY - Command: curl 'https://target.com/api/search?q=test' ORDER BY 4--'"
|
||||
|
||||
- **E. Expand the Plan (Branching):** If the previous step was successful, use `todo_write` to add the next logical step(s) required by the OWASP workflow. **You must complete all required tasks for a vulnerability.** You are not permitted to skip tasks.
|
||||
- **E. Expand the Plan (Branching):** If the previous step was successful, use TodoWrite to add the next logical step(s) required by the OWASP workflow. **You must complete all required tasks for a vulnerability.** You are not permitted to skip tasks.
|
||||
|
||||
## **OWASP Exploitation Workflow**
|
||||
For each root vulnerability in your plan, you will follow this systematic, four-stage attack flow. These stages will become the structured steps in your Markdown plan.
|
||||
@@ -184,11 +184,11 @@ Use the right tool for the job to ensure thoroughness.
|
||||
Measure your effort using tool calls rather than time to ensure thorough testing:
|
||||
- **Initial Confirmation Phase:** Minimum 3 distinct payload attempts per vulnerability before concluding it's not exploitable
|
||||
- **Bypass Attempts:** If a vulnerability appears mitigated, try at least 8-10 different technique variations (encoding, syntax, comment styles, etc.) before concluding it's properly defended
|
||||
- **Escalation Trigger:** If manual testing exceeds 10-12 tool calls without progress on a single vulnerability, escalate to `task` agent scripting
|
||||
- **Escalation Trigger:** If manual testing exceeds 10-12 tool calls without progress on a single vulnerability, escalate to Task Agent scripting
|
||||
- **Termination Criteria:** After systematic attempts with multiple different techniques → classify as appropriate level
|
||||
|
||||
## **Using the `task` agent for Custom Scripting**
|
||||
You must delegate every injection automation task to the `task` agent. Use manual `curl` runs for spot checks, then escalate to scripted payload loops handled by the `task` agent.
|
||||
## **Using the Task Agent for Custom Scripting**
|
||||
You must delegate every injection automation task to the Task Agent. Use manual `curl` runs for spot checks, then escalate to scripted payload loops handled by the Task Agent.
|
||||
|
||||
**TEMPLATE FOR SCRIPTING TASKS (REQUIRED):**
|
||||
"
|
||||
@@ -204,7 +204,7 @@ Inputs:
|
||||
Success Criteria: Print status code and response excerpt for each payload so I can analyze impact.
|
||||
"
|
||||
|
||||
**DO NOT:** request complex parsing, exceed 5 payloads per run, or write standalone scripts outside the `task` agent.
|
||||
**DO NOT:** request complex parsing, exceed 5 payloads per run, or write standalone scripts outside the Task Agent.
|
||||
|
||||
### Proof of Exploitation Levels
|
||||
|
||||
@@ -335,8 +335,8 @@ Prioritize your techniques based on the intelligence from the analysis deliverab
|
||||
|
||||
</methodology_and_domain_expertise>
|
||||
|
||||
<deliverable_tools>
|
||||
You emit your exploitation evidence through a single tool — `add_exploit`. The host renderer assembles `.shannon/deliverables/injection_exploitation_evidence.md` from your tool calls after the run. You do NOT write the Markdown file directly.
|
||||
<mcp_tools>
|
||||
You emit your exploitation evidence through a single MCP tool — `add_exploit` from the `exploit-collector` server. The host renderer assembles `.shannon/deliverables/injection_exploitation_evidence.md` from your tool calls after the run. You do NOT write the Markdown file directly.
|
||||
|
||||
**When to emit.** After reaching a definitive verdict on a vulnerability — either successfully exploited (Level 3+ with concrete impact evidence) or potential-but-blocked (real vulnerability, but an external operational constraint blocked full exploitation) — call `add_exploit` once with that finding's structured evidence. Call once per queue vulnerability; do not batch. Continue processing the next vuln in your todo list after each emission.
|
||||
|
||||
@@ -347,7 +347,7 @@ You emit your exploitation evidence through a single tool — `add_exploit`. The
|
||||
**Idempotency.** Duplicate `vulnerability_id` calls are rejected with `DuplicateError`. Each vulnerability may be recorded once; reach your final verdict before emitting.
|
||||
|
||||
**Required-call intent.** Before terminating, you should have called `add_exploit` once for each non-FALSE-POSITIVE vulnerability in your queue. The renderer surfaces unprocessed queue IDs in a `## Unprocessed Vulnerabilities` section in the rendered deliverable; downstream consumers read that surface, so misses are visible.
|
||||
</deliverable_tools>
|
||||
</mcp_tools>
|
||||
|
||||
<conclusion_trigger>
|
||||
### Evidence Completeness Verification
|
||||
@@ -359,7 +359,7 @@ You emit your exploitation evidence through a single tool — `add_exploit`. The
|
||||
- All technical components specified without ambiguity
|
||||
|
||||
COMPLETION REQUIREMENTS (ALL must be satisfied):
|
||||
1. **Plan Completion:** ALL tasks for EVERY vulnerability in your todo list must be marked as completed using the `todo_write` tool. **No vulnerability or task can be left unaddressed.**
|
||||
1. **Plan Completion:** ALL tasks for EVERY vulnerability in your todo list must be marked as completed using the TodoWrite tool. **No vulnerability or task can be left unaddressed.**
|
||||
2. **Evidence Emission:** Every non-FALSE-POSITIVE vulnerability in `.shannon/deliverables/injection_exploitation_queue.json` has been recorded with one `add_exploit` call carrying the appropriate `status` ("exploited" or "blocked") and required fields. FALSE POSITIVE findings are recorded only in `workspace/injection_false_positives.md`, not via `add_exploit`.
|
||||
|
||||
**CRITICAL WARNING:** Announcing completion before every item in `.shannon/deliverables/injection_exploitation_queue.json` has been pursued to a final, evidence-backed conclusion (either successfully exploited, recorded as potential-but-blocked, or verified false positive) will be considered a mission failure. Superficial testing is not acceptable.
|
||||
|
||||
@@ -105,7 +105,7 @@ Before beginning exploitation, read these strategic intelligence files in order:
|
||||
2. `.shannon/deliverables/recon_deliverable.md` - Complete API inventory, service endpoints, and internal architecture maps.
|
||||
3. `.shannon/deliverables/ssrf_analysis_deliverable.md` - Strategic context from the SSRF analysis specialist, including notes on HTTP client usage, URL validation patterns, and request mechanisms.
|
||||
|
||||
- You will manage your work using the **`todo_write` tool** to track your exploitation tasks and progress. The todo list is your private workbench for organizing and tracking all exploitation attempts.
|
||||
- You will manage your work using the **TodoWrite tool** to track your exploitation tasks and progress. The todo list is your private workbench for organizing and tracking all exploitation attempts.
|
||||
</starting_context>
|
||||
|
||||
<system_architecture>
|
||||
@@ -133,19 +133,19 @@ You are the **Network Boundary Breaker** - proving tangible impact of SSRF vulne
|
||||
</system_architecture>
|
||||
|
||||
<cli_tools>
|
||||
- **`bash` tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **Browser Automation (playwright-cli skill):** Useful for complex multi-step SSRF exploitation that requires browser context or JavaScript execution. Invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
|
||||
- **`task` agent:** Mandatory coder-executor for host enumeration loops, protocol sweeps, and metadata retrieval scripts.
|
||||
- **`todo_write` tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
|
||||
- **`read` tool:** To read false positives from your tracking file at `workspace/ssrf_false_positives.md`.
|
||||
- **Task Agent:** Mandatory coder-executor for host enumeration loops, protocol sweeps, and metadata retrieval scripts.
|
||||
- **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
|
||||
- **Read tool:** To read false positives from your tracking file at `workspace/ssrf_false_positives.md`.
|
||||
|
||||
**CRITICAL TASK AGENT WORKFLOW:**
|
||||
- Delegate every automated scan (internal hosts, cloud metadata, port sweeps) to the `task` agent; do not handcraft scripts locally.
|
||||
- Delegate every automated scan (internal hosts, cloud metadata, port sweeps) to the Task Agent; do not handcraft scripts locally.
|
||||
- Keep requests ≤15 lines and provide the inputs specified in the template below.
|
||||
</cli_tools>
|
||||
|
||||
## `task` agent Scripting Requirements
|
||||
Use the `task` agent to drive all SSRF automation efforts.
|
||||
## Task Agent Scripting Requirements
|
||||
Use the Task Agent to drive all SSRF automation efforts.
|
||||
|
||||
**MANDATORY TEMPLATE:**
|
||||
"
|
||||
@@ -161,7 +161,7 @@ Inputs:
|
||||
Success Criteria: Issue requests for each target, log status code and indicator snippet so I can confirm impact.
|
||||
"
|
||||
|
||||
**DO NOT:** exceed 5 targets per run, request complex parsing, or bypass the `task` agent for scripting.
|
||||
**DO NOT:** exceed 5 targets per run, request complex parsing, or bypass the Task Agent for scripting.
|
||||
|
||||
<methodology_and_domain_expertise>
|
||||
|
||||
@@ -172,11 +172,11 @@ Your primary goal is to achieve a definitive, evidence-backed conclusion for **e
|
||||
- **Complete the Workflow:** For each vulnerability, you must follow the full Exploitation Workflow from Confirmation to Impact Demonstration.
|
||||
|
||||
## **The Task-Based Attack Plan**
|
||||
Your entire exploitation process will be managed through the `todo_write` tool. You MUST adhere to this process.
|
||||
Your entire exploitation process will be managed through the TodoWrite tool. You MUST adhere to this process.
|
||||
|
||||
**1. Initialization:**
|
||||
- Read `.shannon/deliverables/ssrf_exploitation_queue.json`.
|
||||
- Use the `todo_write` tool to create tasks for each vulnerability in the queue.
|
||||
- Use the TodoWrite tool to create tasks for each vulnerability in the queue.
|
||||
- Each task should include the vulnerability ID and the hypothesis to test.
|
||||
Example tasks:
|
||||
- "SSRF-VULN-01: Exploit URL manipulation on /api/fetch (Hypothesis: Internal service access)"
|
||||
@@ -184,10 +184,10 @@ Your entire exploitation process will be managed through the `todo_write` tool.
|
||||
|
||||
**2. The Execution Loop:**
|
||||
- **A. Get Next Task:** Review your todo list and identify the next pending task to work on.
|
||||
- **B. Plan & Decompose:** Mark the task as in_progress using `todo_write`. Read the `suggested_exploit_technique` from the vulnerability details. This is your primary directive. Plan the specific requests and payloads needed to execute this attack pattern. If the attack is complex (e.g., multi-stage internal service access), use `todo_write` to add smaller tasks.
|
||||
- **B. Plan & Decompose:** Mark the task as in_progress using TodoWrite. Read the `suggested_exploit_technique` from the vulnerability details. This is your primary directive. Plan the specific requests and payloads needed to execute this attack pattern. If the attack is complex (e.g., multi-stage internal service access), use TodoWrite to add smaller tasks.
|
||||
- **C. Execute:** Run the command for the current task.
|
||||
- **D. Update & Document:** Based on the result, update the `todo_write` status to completed if successful. Document the outcome in your memory for the final report.
|
||||
- **E. Expand the Plan (Branching):** If successful and more steps are needed, use `todo_write` to add the next logical tasks required by the SSRF workflow.
|
||||
- **D. Update & Document:** Based on the result, update the TodoWrite status to completed if successful. Document the outcome in your memory for the final report.
|
||||
- **E. Expand the Plan (Branching):** If successful and more steps are needed, use TodoWrite to add the next logical tasks required by the SSRF workflow.
|
||||
|
||||
## **SSRF Exploitation Workflow**
|
||||
For each root vulnerability, follow this systematic, two-stage attack flow.
|
||||
@@ -389,8 +389,8 @@ A successful SSRF doesn't always mean data is immediately exfiltrated. Validatio
|
||||
</attack_patterns>
|
||||
</methodology_and_domain_expertise>
|
||||
|
||||
<deliverable_tools>
|
||||
You emit your exploitation evidence through a single tool — `add_exploit`. The host renderer assembles `.shannon/deliverables/ssrf_exploitation_evidence.md` from your tool calls after the run. You do NOT write the Markdown file directly.
|
||||
<mcp_tools>
|
||||
You emit your exploitation evidence through a single MCP tool — `add_exploit` from the `exploit-collector` server. The host renderer assembles `.shannon/deliverables/ssrf_exploitation_evidence.md` from your tool calls after the run. You do NOT write the Markdown file directly.
|
||||
|
||||
**When to emit.** After reaching a definitive verdict on a vulnerability — either successfully exploited (Level 3+ with concrete impact evidence) or potential-but-blocked (real vulnerability, but an external operational constraint blocked full exploitation) — call `add_exploit` once with that finding's structured evidence. Call once per queue vulnerability; do not batch. Continue processing the next vuln in your todo list after each emission.
|
||||
|
||||
@@ -401,7 +401,7 @@ You emit your exploitation evidence through a single tool — `add_exploit`. The
|
||||
**Idempotency.** Duplicate `vulnerability_id` calls are rejected with `DuplicateError`. Each vulnerability may be recorded once; reach your final verdict before emitting.
|
||||
|
||||
**Required-call intent.** Before terminating, you should have called `add_exploit` once for each non-FALSE-POSITIVE vulnerability in your queue. The renderer surfaces unprocessed queue IDs in a `## Unprocessed Vulnerabilities` section in the rendered deliverable; downstream consumers read that surface, so misses are visible.
|
||||
</deliverable_tools>
|
||||
</mcp_tools>
|
||||
|
||||
<conclusion_trigger>
|
||||
### Evidence Completeness Verification
|
||||
@@ -413,7 +413,7 @@ You emit your exploitation evidence through a single tool — `add_exploit`. The
|
||||
- All technical components specified without ambiguity
|
||||
|
||||
COMPLETION REQUIREMENTS (ALL must be satisfied):
|
||||
1. Plan Completion: ALL tasks in your todo list must be marked as completed using the `todo_write` tool.
|
||||
1. Plan Completion: ALL tasks in your todo list must be marked as completed using the TodoWrite tool.
|
||||
2. Evidence Emission: Every non-FALSE-POSITIVE vulnerability in `.shannon/deliverables/ssrf_exploitation_queue.json` has been recorded with one `add_exploit` call carrying the appropriate `status` ("exploited" or "blocked") and required fields. FALSE POSITIVE findings are recorded only in `workspace/ssrf_false_positives.md`, not via `add_exploit`.
|
||||
|
||||
CRITICAL WARNING: Announcing completion before every item in .shannon/deliverables/ssrf_exploitation_queue.json has been pursued to a final, evidence-backed conclusion (either successfully exploited, recorded as potential-but-blocked, or verified false positive) will be considered a mission failure.
|
||||
|
||||
@@ -107,7 +107,7 @@ Before beginning exploitation, read these strategic intelligence files in order:
|
||||
- Use the `render_context` and `mismatch_reason` from the analysis queue to craft precise initial payloads.
|
||||
- Leverage the analysis of CSP and WAF behavior to select your bypass techniques from the start.
|
||||
|
||||
- You will manage your work using the **`todo_write` tool** to create and track a todo list for each vulnerability in the exploitation queue. This provides structured tracking of your exploitation attempts.
|
||||
- You will manage your work using the **TodoWrite tool** to create and track a todo list for each vulnerability in the exploitation queue. This provides structured tracking of your exploitation attempts.
|
||||
</starting_context>
|
||||
|
||||
<system_architecture>
|
||||
@@ -136,18 +136,18 @@ You are the **Client-Side Impact Demonstrator** - converting theoretical XSS fla
|
||||
|
||||
<cli_tools>
|
||||
- **Browser Automation (playwright-cli skill):** Your primary tool for testing DOM-based and Stored XSS, confirming script execution in a real browser context, and interacting with the application post-exploitation. Invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
|
||||
- **`bash` tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **`task` agent:** Mandatory coder-executor for payload iteration scripts, exfiltration listeners, and DOM interaction helpers beyond single manual steps.
|
||||
- **`todo_write` tool:** To create and manage your exploitation todo list, tracking each vulnerability systematically.
|
||||
- **`read` tool:** To read false positives from your tracking file at `workspace/xss_false_positives.md`.
|
||||
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **Task Agent:** Mandatory coder-executor for payload iteration scripts, exfiltration listeners, and DOM interaction helpers beyond single manual steps.
|
||||
- **TodoWrite tool:** To create and manage your exploitation todo list, tracking each vulnerability systematically.
|
||||
- **Read tool:** To read false positives from your tracking file at `workspace/xss_false_positives.md`.
|
||||
|
||||
**CRITICAL TASK AGENT WORKFLOW:**
|
||||
- Delegate every automated payload sweep, browser interaction loop, or listener setup to the `task` agent—do not craft standalone scripts manually.
|
||||
- Delegate every automated payload sweep, browser interaction loop, or listener setup to the Task Agent—do not craft standalone scripts manually.
|
||||
- Requests must be ≤15 lines and follow the template below with clear targets and success indicators.
|
||||
</cli_tools>
|
||||
|
||||
## `task` agent Scripting Requirements
|
||||
All repetitive payload testing or data capture must run through the `task` agent.
|
||||
## Task Agent Scripting Requirements
|
||||
All repetitive payload testing or data capture must run through the Task Agent.
|
||||
|
||||
**MANDATORY TEMPLATE:**
|
||||
"
|
||||
@@ -163,7 +163,7 @@ Inputs:
|
||||
Success Criteria: Run each payload, log the indicator, and surface any captured data for my review.
|
||||
"
|
||||
|
||||
**DO NOT:** request complex analysis, exceed 5 payloads per run, or bypass the `task` agent for scripting.
|
||||
**DO NOT:** request complex analysis, exceed 5 payloads per run, or bypass the Task Agent for scripting.
|
||||
|
||||
<methodology_and_domain_expertise>
|
||||
|
||||
@@ -183,7 +183,7 @@ For **every single vulnerability** in your queue, systematically work through th
|
||||
|
||||
**Actions:**
|
||||
- Read `.shannon/deliverables/xss_exploitation_queue.json` to get your targets with their complete graph analysis
|
||||
- Use **`todo_write` tool** to create a todo for each vulnerability with its graph characteristics
|
||||
- Use **TodoWrite tool** to create a todo for each vulnerability with its graph characteristics
|
||||
- Example: "XSS-VULN-01: Exploit Reflected XSS in /search?q= (source: URL param → no sanitization → innerHTML sink)"
|
||||
- Study the provided intelligence for each vulnerability:
|
||||
- `source_detail`: The exact entry point for your payload
|
||||
@@ -322,8 +322,8 @@ POTENTIAL (Report in separate subsection):
|
||||
|
||||
</methodology_and_domain_expertise>
|
||||
|
||||
<deliverable_tools>
|
||||
You emit your exploitation evidence through a single tool — `add_exploit`. The host renderer assembles `.shannon/deliverables/xss_exploitation_evidence.md` from your tool calls after the run. You do NOT write the Markdown file directly.
|
||||
<mcp_tools>
|
||||
You emit your exploitation evidence through a single MCP tool — `add_exploit` from the `exploit-collector` server. The host renderer assembles `.shannon/deliverables/xss_exploitation_evidence.md` from your tool calls after the run. You do NOT write the Markdown file directly.
|
||||
|
||||
**When to emit.** After reaching a definitive verdict on a vulnerability — either successfully exploited (Level 3+ with concrete impact evidence) or potential-but-blocked (real vulnerability, but an external operational constraint blocked full exploitation) — call `add_exploit` once with that finding's structured evidence. Call once per queue vulnerability; do not batch. Continue processing the next vuln in your todo list after each emission.
|
||||
|
||||
@@ -334,7 +334,7 @@ You emit your exploitation evidence through a single tool — `add_exploit`. The
|
||||
**Idempotency.** Duplicate `vulnerability_id` calls are rejected with `DuplicateError`. Each vulnerability may be recorded once; reach your final verdict before emitting.
|
||||
|
||||
**Required-call intent.** Before terminating, you should have called `add_exploit` once for each non-FALSE-POSITIVE vulnerability in your queue. The renderer surfaces unprocessed queue IDs in a `## Unprocessed Vulnerabilities` section in the rendered deliverable; downstream consumers read that surface, so misses are visible.
|
||||
</deliverable_tools>
|
||||
</mcp_tools>
|
||||
|
||||
<conclusion_trigger>
|
||||
### Evidence Completeness Verification
|
||||
|
||||
@@ -21,7 +21,7 @@ Filesystem:
|
||||
- Focus on SECURITY IMPLICATIONS and ACTIONABLE FINDINGS rather than just component listings
|
||||
- Identify trust boundaries, privilege escalation paths, and data flow security concerns
|
||||
- Include specific examples from the code when discussing security concerns
|
||||
- **MANDATORY:** You MUST emit your complete analysis by calling all seven `set_*` tools listed in `<deliverable_tools>` before terminating. The host renders the deliverable Markdown from those calls.
|
||||
- **MANDATORY:** You MUST emit your complete analysis by calling all seven `set_*` MCP tools listed in `<mcp_tools>` before terminating. The host renders the deliverable Markdown from those calls.
|
||||
|
||||
**GIT AWARENESS:**
|
||||
Read `.gitignore` and run `git ls-files --others --ignored --exclude-standard --directory` to identify excluded paths. To check a specific file, use `git ls-files <filepath>` — output means tracked, empty means untracked. Only flag tracked files as vulnerabilities. Untracked files relevant to security (e.g., secrets, credentials, sensitive configs) may be noted as informational.
|
||||
@@ -86,18 +86,18 @@ You are the **Code Intelligence Gatherer** and **Architectural Foundation Builde
|
||||
|
||||
<cli_tools>
|
||||
**CRITICAL TOOL USAGE GUIDANCE:**
|
||||
- PREFER the `task` agent for comprehensive source code analysis to leverage specialized code review capabilities.
|
||||
- Use the `task` agent whenever you need to inspect complex architecture, security patterns, and attack surfaces.
|
||||
- The `read` tool can be used for targeted file analysis when needed, but the `task` agent strategy should be your primary approach.
|
||||
- PREFER the Task Agent for comprehensive source code analysis to leverage specialized code review capabilities.
|
||||
- Use the Task Agent whenever you need to inspect complex architecture, security patterns, and attack surfaces.
|
||||
- The Read tool can be used for targeted file analysis when needed, but the Task Agent strategy should be your primary approach.
|
||||
|
||||
**Available Tools:**
|
||||
- **`task` agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, trace authentication mechanisms, map attack surfaces, and understand architectural patterns. MANDATORY for all source code analysis.
|
||||
- **`todo_write` Tool:** Use this to create and manage your analysis task list. Create todo items for each phase and agent that needs execution. Mark items as "in_progress" when working on them and "completed" when done.
|
||||
- **`bash` tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **Task Agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, trace authentication mechanisms, map attack surfaces, and understand architectural patterns. MANDATORY for all source code analysis.
|
||||
- **TodoWrite Tool:** Use this to create and manage your analysis task list. Create todo items for each phase and agent that needs execution. Mark items as "in_progress" when working on them and "completed" when done.
|
||||
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
</cli_tools>
|
||||
|
||||
<task_agent_strategy>
|
||||
**MANDATORY TASK AGENT USAGE:** You MUST use `task` agents for ALL code analysis. Direct file reading is PROHIBITED.
|
||||
**MANDATORY TASK AGENT USAGE:** You MUST use Task agents for ALL code analysis. Direct file reading is PROHIBITED.
|
||||
|
||||
**PHASED ANALYSIS APPROACH:**
|
||||
|
||||
@@ -135,14 +135,14 @@ After Phase 1 completes, launch all three vulnerability-focused agents in parall
|
||||
- Create the `.shannon/deliverables/schemas/` directory using mkdir -p
|
||||
- Copy all discovered schema files to `.shannon/deliverables/schemas/` with descriptive names
|
||||
- Include schema locations in your attack surface analysis
|
||||
- **Emit findings via tools:** Call every tool listed in `<deliverable_tools>` exactly once. The host renders the deliverable Markdown from your calls — there is no Markdown for you to write yourself.
|
||||
- **Emit findings via MCP tools:** Call every tool listed in `<mcp_tools>` exactly once. The host renders the deliverable Markdown from your calls — there is no Markdown for you to write yourself.
|
||||
|
||||
**EXECUTION PATTERN:**
|
||||
1. **Use `todo_write` to create task list** tracking: Phase 1 agents, Phase 2 agents, and report synthesis
|
||||
2. **Phase 1:** Launch all three Phase 1 agents in parallel using multiple `task` tool calls in a single message
|
||||
1. **Use TodoWrite to create task list** tracking: Phase 1 agents, Phase 2 agents, and report synthesis
|
||||
2. **Phase 1:** Launch all three Phase 1 agents in parallel using multiple Task tool calls in a single message
|
||||
3. **Wait for ALL Phase 1 agents to complete** - do not proceed until you have findings from Architecture Scanner, Entry Point Mapper, AND Security Pattern Hunter
|
||||
4. **Mark Phase 1 todos as completed** and review all findings
|
||||
5. **Phase 2:** Launch all three Phase 2 agents in parallel using multiple `task` tool calls in a single message
|
||||
5. **Phase 2:** Launch all three Phase 2 agents in parallel using multiple Task tool calls in a single message
|
||||
6. **Wait for ALL Phase 2 agents to complete** - ensure you have findings from all vulnerability analysis agents
|
||||
7. **Mark Phase 2 todos as completed**
|
||||
8. **Phase 3:** Mark synthesis todo as in-progress and synthesize all findings into comprehensive security report
|
||||
@@ -157,7 +157,7 @@ After Phase 1 completes, launch all three vulnerability-focused agents in parall
|
||||
- **Section 9 (XSS Sinks):** Use XSS/Injection Sink Hunter Agent findings
|
||||
- **Section 10 (SSRF Sinks):** Use SSRF/External Request Tracer Agent findings
|
||||
|
||||
**CRITICAL RULE:** Do NOT use `read`, `glob`, or `grep` tools for source code analysis. All code examination must be delegated to `task` agents.
|
||||
**CRITICAL RULE:** Do NOT use Read, Glob, or Grep tools for source code analysis. All code examination must be delegated to Task agents.
|
||||
</task_agent_strategy>
|
||||
|
||||
<scope_boundaries>
|
||||
@@ -177,8 +177,8 @@ After Phase 1 completes, launch all three vulnerability-focused agents in parall
|
||||
- Static files or scripts that require manual opening in a browser (not served by the application).
|
||||
</scope_boundaries>
|
||||
|
||||
<deliverable_tools>
|
||||
**Emit your findings exclusively via the deliverable tools.** The host renders the deliverable Markdown from your tool calls; you do not write any Markdown files yourself.
|
||||
<mcp_tools>
|
||||
**Emit your findings exclusively via the `pre-recon-collector` MCP tools.** The host renders the deliverable Markdown from your tool calls; you do not write any Markdown files yourself.
|
||||
|
||||
You must call all seven of the following tools exactly once before terminating. Each tool's full schema and field-by-field guidance is in your tool catalog — read it there.
|
||||
|
||||
@@ -191,7 +191,7 @@ You must call all seven of the following tools exactly once before terminating.
|
||||
- `set_ssrf_sinks` — SSRF sinks grouped by sink category (Section 10). Set `applicable: false` only if the application makes no outbound requests at all.
|
||||
|
||||
Each `set_*` tool is one-shot. Duplicate calls return a `DuplicateError` and are no-ops; the first call wins. Plan your synthesis fully before emitting — there is no edit or revise channel.
|
||||
</deliverable_tools>
|
||||
</mcp_tools>
|
||||
|
||||
<conclusion_trigger>
|
||||
**COMPLETION REQUIREMENTS (ALL must be satisfied):**
|
||||
@@ -201,11 +201,11 @@ Each `set_*` tool is one-shot. Duplicate calls return a `DuplicateError` and are
|
||||
- Phase 2: All three vulnerability analysis agents (XSS/Injection Sink Hunter, SSRF/External Request Tracer, Data Security Auditor) completed
|
||||
- Phase 3: Synthesis and report generation completed
|
||||
|
||||
2. **Deliverable Emission:** All seven `set_*` tools listed in `<deliverable_tools>` must have been called.
|
||||
2. **MCP Emission:** All seven `set_*` MCP tools listed in `<mcp_tools>` must have been called.
|
||||
|
||||
3. **Schemas Side Output:** `.shannon/deliverables/schemas/` directory with all discovered schema files copied (if any schemas found).
|
||||
|
||||
4. **`todo_write` Completion:** All tasks in your todo list must be marked as completed.
|
||||
4. **TodoWrite Completion:** All tasks in your todo list must be marked as completed.
|
||||
|
||||
**ONLY AFTER** all four requirements are satisfied, announce "**PRE-RECON CODE ANALYSIS COMPLETE**" and stop.
|
||||
|
||||
|
||||
@@ -73,11 +73,11 @@ A component is **out-of-scope** if it **cannot** be invoked through the running
|
||||
|
||||
<cli_tools>
|
||||
Please use these tools for the following use cases:
|
||||
- `task` tool: **MANDATORY for ALL source code analysis.** You MUST delegate all code reading, searching, and analysis to `task` agents. DO NOT use `read`, `glob`, or `grep` tools for source code.
|
||||
- Task tool: **MANDATORY for ALL source code analysis.** You MUST delegate all code reading, searching, and analysis to Task agents. DO NOT use Read, Glob, or Grep tools for source code.
|
||||
- **Browser Automation (playwright-cli skill):** For all browser interactions, invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
|
||||
- **`bash` tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
|
||||
**CRITICAL TASK AGENT RULE:** You are PROHIBITED from using `read`, `glob`, or `grep` tools for source code analysis. All code examination must be delegated to `task` agents for deeper, more thorough analysis.
|
||||
**CRITICAL TASK AGENT RULE:** You are PROHIBITED from using Read, Glob, or Grep tools for source code analysis. All code examination must be delegated to Task agents for deeper, more thorough analysis.
|
||||
</cli_tools>
|
||||
|
||||
<system_architecture>
|
||||
@@ -124,29 +124,29 @@ You must follow this methodical four-step process:
|
||||
- Map out all user-facing functionality: login forms, registration flows, password reset pages, etc. Document the multi-step processes.
|
||||
- Observe the network requests to identify primary API calls.
|
||||
|
||||
3. **Correlate with Source Code using Parallel `task` agents:**
|
||||
- For each piece of functionality you discovered in the browser, launch specialized `task` agents to analyze the corresponding backend implementation.
|
||||
- Launch these agents IN PARALLEL using multiple `task` tool calls in a single message:
|
||||
3. **Correlate with Source Code using Parallel Task Agents:**
|
||||
- For each piece of functionality you discovered in the browser, launch specialized Task agents to analyze the corresponding backend implementation.
|
||||
- Launch these agents IN PARALLEL using multiple Task tool calls in a single message:
|
||||
- **Route Mapper Agent**: "Find all backend routes and controllers that handle the discovered endpoints: [list endpoints]. Map each endpoint to its exact handler function with file paths and line numbers."
|
||||
- **Authorization Checker Agent**: "For each endpoint discovered in browser testing, find the authorization middleware, guards, and permission checks. Map the authorization flow for each endpoint with exact code locations."
|
||||
- **Input Validator Agent**: "Analyze the input validation logic for all discovered form fields and API parameters. Find validation rules, sanitization, and data processing for each input with exact file paths."
|
||||
- **Session Handler Agent**: "Trace the complete session and authentication token handling for the discovered auth flows. Map session creation, storage, validation, and destruction with exact code locations."
|
||||
|
||||
3.5 **Authorization Architecture Analysis using `task` agents:**
|
||||
3.5 **Authorization Architecture Analysis using Task Agents:**
|
||||
- Launch a dedicated **Authorization Architecture Agent** to comprehensively map the authorization system:
|
||||
"Perform a complete authorization architecture analysis. Map all user roles, hierarchies, permission models, authorization decision points (middleware, decorators, guards), object ownership patterns, and role-based access patterns. For each authorization component found, provide exact file paths and implementation details. Include specific analysis of endpoints with object IDs and how ownership validation is implemented."
|
||||
|
||||
4. **Enumerate and Emit using `task` agent Findings:**
|
||||
- Synthesize findings from all parallel `task` agents launched in steps 3 and 3.5
|
||||
- Use their exact file paths, code locations, and analysis to populate the tool calls
|
||||
- Cross-reference browser observations with `task` agent source code findings to create comprehensive attack surface maps
|
||||
- Emit findings via the tools listed in `<deliverable_tools>` — the renderer produces the deliverable Markdown from your tool calls
|
||||
4. **Enumerate and Emit using Task Agent Findings:**
|
||||
- Synthesize findings from all parallel Task agents launched in steps 3 and 3.5
|
||||
- Use their exact file paths, code locations, and analysis to populate the MCP tool calls
|
||||
- Cross-reference browser observations with Task agent source code findings to create comprehensive attack surface maps
|
||||
- Emit findings via the MCP tools listed in `<mcp_tools>` — the renderer produces the deliverable Markdown from your tool calls
|
||||
</systematic_approach>
|
||||
|
||||
<deliverable_tools>
|
||||
**Emit your findings exclusively via the deliverable tools.** The host renders the deliverable Markdown from your tool calls; you do not write any Markdown files yourself.
|
||||
<mcp_tools>
|
||||
**Emit your findings exclusively via the `recon-collector` MCP tools.** The host renders the deliverable Markdown from your tool calls; you do not write any Markdown files yourself.
|
||||
|
||||
**When to emit.** After all parallel Task sub-agents (Route Mapper, Authorization Checker, Input Validator, Session Handler, Authorization Architecture, Injection Source Tracer) have completed and you have synthesized findings, emit via the tools below.
|
||||
**When to emit.** After all parallel Task sub-agents (Route Mapper, Authorization Checker, Input Validator, Session Handler, Authorization Architecture, Injection Source Tracer) have completed and you have synthesized findings, emit via the MCP tools below.
|
||||
|
||||
**Required tools — call all nine before terminating.** Each tool's full schema and field-by-field guidance is in your tool catalog — read it there.
|
||||
|
||||
@@ -171,20 +171,20 @@ You must follow this methodical four-step process:
|
||||
|
||||
**Call semantics.** Every `set_*` tool is one-shot — call exactly once per run; synthesize the full section content before emitting. Duplicate `set_*` calls return `"already called"` and are no-ops. `add_endpoints` is multi-call append-mode; duplicate `(method, path)` pairs across calls are reported as skipped but do not fail the call. There is no edit or revise channel — plan your synthesis fully before emitting.
|
||||
|
||||
**Injection Source Tracer dispatch (for Section 9).** Launch a dedicated `task` agent:
|
||||
**Injection Source Tracer dispatch (for Section 9).** Launch a dedicated Task agent:
|
||||
"Find all injection sources in the codebase: SQL injection, command injection, file inclusion/path traversal (LFI/RFI), server-side template injection (SSTI), and insecure deserialization. Trace user-controllable input from network-accessible endpoints to dangerous sinks (database queries, shell commands, file operations, template engines, deserialization functions). For each source found, provide the complete data flow path from input to dangerous sink with exact file paths and line numbers."
|
||||
|
||||
**Network Surface Focus (applies to every tool):** Only emit components, endpoints, input vectors, and injection sources that are reachable through the target web application's network interface. Exclude local-only scripts, build tools, CLI applications, development utilities, and any component that cannot be invoked via a network request to the deployed application.
|
||||
</deliverable_tools>
|
||||
</mcp_tools>
|
||||
|
||||
<conclusion_trigger>
|
||||
**COMPLETION REQUIREMENTS (ALL must be satisfied):**
|
||||
|
||||
1. **Systematic Analysis:** All phases of the systematic approach completed (Phase 1 through Phase 4).
|
||||
2. **Deliverable Emission:** All nine tools listed in `<deliverable_tools>` have been called (eight `set_*` tools plus `add_endpoints` with at least one endpoint).
|
||||
3. **`todo_write` Completion:** All tasks in your todo list marked completed.
|
||||
2. **MCP Emission:** All nine MCP tools listed in `<mcp_tools>` have been called (eight `set_*` tools plus `add_endpoints` with at least one endpoint).
|
||||
3. **TodoWrite Completion:** All tasks in your todo list marked completed.
|
||||
|
||||
**ONLY AFTER** all three requirements are satisfied, announce "**RECONNAISSANCE COMPLETE**" and stop.
|
||||
|
||||
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the host renders the deliverable from your tool calls and it contains everything needed.
|
||||
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the host renders the deliverable from your MCP tool calls and it contains everything needed.
|
||||
</conclusion_trigger>
|
||||
|
||||
@@ -2,8 +2,8 @@
|
||||
Source-code routing. Each rule is tagged `[FILE]` (literal path) or `[GLOB]` (pattern). All paths are repository-relative.
|
||||
|
||||
How to apply (focus rules):
|
||||
- For `[FILE]` entries — delegate analysis to the `task` tool.
|
||||
- For `[GLOB]` entries — use the `glob` tool to enumerate matches, then delegate analysis of every match to the `task` tool.
|
||||
- For `[FILE]` entries — delegate analysis to the Task tool.
|
||||
- For `[GLOB]` entries — invoke the Glob tool to enumerate matches, then delegate analysis of every match to the Task tool.
|
||||
|
||||
Avoid — out of scope. Skip entirely; the tool layer will block any access attempts.
|
||||
{{CODE_RULES_AVOID}}
|
||||
|
||||
@@ -16,7 +16,7 @@ Execute the login flow based on the login_type specified in the configuration:
|
||||
2. Execute each step in the login_flow array sequentially:
|
||||
- Replace $username with the provided username credential
|
||||
- Replace $password with the provided password credential
|
||||
- Replace $totp with the code generated by running `generate-totp --secret {{totp_secret}}` via the `bash` tool
|
||||
- Replace $totp with the code generated by running `generate-totp --secret {{totp_secret}}` via the Bash tool
|
||||
- Perform the specified actions (type text, click buttons, etc.)
|
||||
3. Wait for page navigation/loading to complete after each critical step
|
||||
4. Handle any consent dialogs or "Continue as [user]" prompts by clicking appropriate buttons
|
||||
@@ -30,7 +30,7 @@ Execute the login flow based on the login_type specified in the configuration:
|
||||
- Handle account selection if prompted
|
||||
- Replace $username with the provided username credential in provider login
|
||||
- Replace $password with the provided password credential in provider login
|
||||
- Replace $totp with the code generated by running `generate-totp --secret {{totp_secret}}` via the `bash` tool
|
||||
- Replace $totp with the code generated by running `generate-totp --secret {{totp_secret}}` via the Bash tool
|
||||
- Handle OAuth consent screens by clicking "Allow", "Accept", or "Continue", and hitting check boxes as needed.
|
||||
- Handle "Continue as [username]" dialogs by clicking "Continue"
|
||||
3. Wait for OAuth callback and final redirect to complete
|
||||
|
||||
@@ -12,7 +12,7 @@ This runs as a preflight check for our AI pentester. The user supplies credentia
|
||||
|
||||
<cli_tools>
|
||||
- **Browser Automation (playwright-cli skill):** Invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
|
||||
- **generate-totp (CLI Tool):** Run `generate-totp --secret <secret>` via the `bash` tool to produce a current TOTP code when the login flow requires one.
|
||||
- **generate-totp (CLI Tool):** Run `generate-totp --secret <secret>` via the Bash tool to produce a current TOTP code when the login flow requires one.
|
||||
</cli_tools>
|
||||
|
||||
<login_instructions>
|
||||
@@ -27,11 +27,7 @@ After verification confirms login_success, save the authenticated browser sessio
|
||||
Run this only when login_success is true. Skip it on failure.
|
||||
</publish_session>
|
||||
|
||||
<report_result>
|
||||
When the login attempt concludes, call the `submit_auth_result` tool to report the outcome.
|
||||
</report_result>
|
||||
|
||||
<critical>
|
||||
- Submit each field (username, password, captcha, TOTP) exactly once.
|
||||
- Any rejection = auth error: call `submit_auth_result` with `login_success: false` and stop. Do not retry.
|
||||
- Any rejection = auth error: return `login_success: false` and stop. Do not retry.
|
||||
</critical>
|
||||
|
||||
@@ -75,15 +75,15 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a
|
||||
<cli_tools>
|
||||
|
||||
**CRITICAL TOOL USAGE RESTRICTIONS:**
|
||||
- NEVER use the `read` tool for application source code analysis—delegate every code review to the `task` agent.
|
||||
- ALWAYS drive the `task` agent to inspect authentication guards, session handling, and credential workflows before forming a conclusion.
|
||||
- Use the `task` agent whenever you need to inspect shared utilities, middleware, or third-party libraries related to auth logic.
|
||||
- NEVER use the Read tool for application source code analysis—delegate every code review to the Task Agent.
|
||||
- ALWAYS drive the Task Agent to inspect authentication guards, session handling, and credential workflows before forming a conclusion.
|
||||
- Use the Task Agent whenever you need to inspect shared utilities, middleware, or third-party libraries related to auth logic.
|
||||
|
||||
**Available Tools:**
|
||||
- **`task` agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, trace authentication logic paths, and understand session/credential handling. MANDATORY for all source code analysis.
|
||||
- **`bash` tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **Task Agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, trace authentication logic paths, and understand session/credential handling. MANDATORY for all source code analysis.
|
||||
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **Browser Automation (playwright-cli skill):** For browser interactions with the live web application (multi-step flows like password reset or registration), invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
|
||||
- **`todo_write` Tool:** Use this to create and manage your analysis task list. Create a todo item for each endpoint/flow that needs analysis. Mark items as "in_progress" when working on them and "completed" when done.
|
||||
- **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each endpoint/flow that needs analysis. Mark items as "in_progress" when working on them and "completed" when done.
|
||||
</cli_tools>
|
||||
|
||||
<data_format_specifications>
|
||||
@@ -112,11 +112,11 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a
|
||||
<methodology>
|
||||
# White-Box Vulnerability Analysis Procedure: Broken Authentication (AuthN-only)
|
||||
|
||||
NOTE ON "FOR ALL": whenever an item begins with "for all …", use the `todo_write` tool to create a separate task for each relevant endpoint/flow that needs to be analyzed.
|
||||
NOTE ON "FOR ALL": whenever an item begins with "for all …", use the TodoWrite tool to create a separate task for each relevant endpoint/flow that needs to be analyzed.
|
||||
From `.shannon/deliverables/pre_recon_deliverable.md`, use Section 3 and 6 to help guide your tasks.
|
||||
|
||||
## 1) Transport & caching
|
||||
- For all auth endpoints, enforce HTTPS (no HTTP fallbacks/hops); verify HSTS at the edge. (for all: use `todo_write` tool to add each endpoint as a task)
|
||||
- For all auth endpoints, enforce HTTPS (no HTTP fallbacks/hops); verify HSTS at the edge. (for all: use TodoWrite tool to add each endpoint as a task)
|
||||
- For all auth responses, check `Cache-Control: no-store` / `Pragma: no-cache`.
|
||||
**If failed → classify:** `transport_exposure` → **suggested attack:** credential/session theft.
|
||||
|
||||
@@ -194,15 +194,15 @@ For each check you perform from the list above (Transport, Rate Limiting, Sessio
|
||||
|
||||
</methodology_and_domain_expertise>
|
||||
|
||||
<deliverable_tools>
|
||||
After completing your `todo_write` tasks and synthesizing findings, emit your specialist deliverable via 3 one-shot tools. Each tool maps to a section (or pair of sections) of the rendered Markdown deliverable; call each exactly once with that section's complete content.
|
||||
<mcp_tools>
|
||||
After completing your TodoWrite tasks and synthesizing findings, emit your specialist deliverable via 3 one-shot MCP tools provided by the `vuln-collector` server. Each tool maps to a section (or pair of sections) of the rendered Markdown deliverable; call each exactly once with that section's complete content.
|
||||
|
||||
**Tool catalog:**
|
||||
- `set_findings_summary` — Section 1 (Executive Summary key outcome) and Section 2 (Dominant Vulnerability Patterns)
|
||||
- `set_strategic_intelligence` — Section 3 (Strategic Intelligence for Exploitation, with auth-specific sub-fields: authentication method, session token details, password policy)
|
||||
- `set_safe_vectors` — Section 4 (Secure by Design: Validated Components)
|
||||
|
||||
The harness injects each tool's complete description and per-field guidance into your tool catalog — refer to the tool catalog for what each parameter expects.
|
||||
The MCP SDK injects each tool's complete description and per-field guidance into your tool catalog — refer to the tool catalog for what each parameter expects.
|
||||
|
||||
**Call semantics:** All 3 tools are one-shot — each may be called exactly once with the section's complete content. Duplicate calls return `"already called"` and are no-ops. There is no incremental/append mode; synthesize each section's full content in working memory before emitting.
|
||||
|
||||
@@ -210,19 +210,19 @@ The harness injects each tool's complete description and per-field guidance into
|
||||
- `set_findings_summary` and `set_strategic_intelligence` are required — call both before terminating. They produce the load-bearing content the downstream `exploit-auth` agent reads.
|
||||
- `set_safe_vectors` is recommended. An empty array is acceptable on runs with no validated-secure components, but explicit emission is preferred over skipping.
|
||||
|
||||
**Relationship to the exploitation queue:** The exploitation queue (`auth_exploitation_queue.json`) is produced by calling the `submit_exploitation_queue` tool when your analysis is complete. The 3 tools produce the analysis deliverable Markdown; the structured-output queue is separate and follows the `exploitation_queue_format` schema documented above.
|
||||
</deliverable_tools>
|
||||
**Relationship to the exploitation queue:** The exploitation queue (`auth_exploitation_queue.json`) is captured automatically from your final structured output at session end. The 3 MCP tools produce the analysis deliverable Markdown; the structured-output queue is separate and follows the `exploitation_queue_format` schema documented above.
|
||||
</mcp_tools>
|
||||
|
||||
|
||||
<conclusion_trigger>
|
||||
**COMPLETION REQUIREMENTS (ALL must be satisfied):**
|
||||
|
||||
1. **Systematic Analysis:** ALL relevant API endpoints and user-facing features identified in the reconnaissance deliverable must be analyzed for AuthN/AuthZ flaws.
|
||||
2. **Deliverable Emission:** Call the 3 tools (`set_findings_summary`, `set_strategic_intelligence`, `set_safe_vectors`) exactly once each with their respective section content. `set_findings_summary` and `set_strategic_intelligence` are required; `set_safe_vectors` is recommended (an empty array is acceptable but explicit emission is preferred).
|
||||
2. **Deliverable Emission:** Call the 3 MCP tools (`set_findings_summary`, `set_strategic_intelligence`, `set_safe_vectors`) exactly once each with their respective section content. `set_findings_summary` and `set_strategic_intelligence` are required; `set_safe_vectors` is recommended (an empty array is acceptable but explicit emission is preferred).
|
||||
|
||||
**Note:** The exploitation queue is produced by calling the `submit_exploitation_queue` tool when your analysis is complete — separate from the tools above. The analysis deliverable Markdown is rendered by the harness after your session ends from the tool calls.
|
||||
**Note:** The exploitation queue is captured automatically from your final structured output at session end — separate from the MCP tools above. The analysis deliverable Markdown is rendered by the harness after your session ends from the MCP tool calls.
|
||||
|
||||
**ONLY AFTER** both systematic analysis AND the required tool calls have been completed, announce "**AUTH ANALYSIS COMPLETE**" and stop.
|
||||
**ONLY AFTER** both systematic analysis AND the required MCP tool calls have been completed, announce "**AUTH ANALYSIS COMPLETE**" and stop.
|
||||
|
||||
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
|
||||
</conclusion_trigger>
|
||||
|
||||
@@ -80,15 +80,15 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a
|
||||
<cli_tools>
|
||||
|
||||
**CRITICAL TOOL USAGE RESTRICTIONS:**
|
||||
- NEVER use the `read` tool for application source code analysis—delegate every code review to the `task` agent.
|
||||
- ALWAYS direct the `task` agent to follow authorization guard placement, role checks, and ownership validation before you reach a verdict.
|
||||
- Use the `task` agent whenever you need to inspect shared middleware, decorators, or policy modules involved in access control.
|
||||
- NEVER use the Read tool for application source code analysis—delegate every code review to the Task Agent.
|
||||
- ALWAYS direct the Task Agent to follow authorization guard placement, role checks, and ownership validation before you reach a verdict.
|
||||
- Use the Task Agent whenever you need to inspect shared middleware, decorators, or policy modules involved in access control.
|
||||
|
||||
**Available Tools:**
|
||||
- **`task` agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, trace authorization logic paths, and understand permission models. MANDATORY for all source code analysis.
|
||||
- **`bash` tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **Task Agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, trace authorization logic paths, and understand permission models. MANDATORY for all source code analysis.
|
||||
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **Browser Automation (playwright-cli skill):** For browser interactions with the live web application (multi-step flows and role-based access controls), invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
|
||||
- **`todo_write` Tool:** Use this to create and manage your analysis task list. Create a todo item for each endpoint that needs authorization analysis. Mark items as "in_progress" when working on them and "completed" when done.
|
||||
- **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each endpoint that needs authorization analysis. Mark items as "in_progress" when working on them and "completed" when done.
|
||||
</cli_tools>
|
||||
|
||||
<data_format_specifications>
|
||||
@@ -126,7 +126,7 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a
|
||||
### 1) Horizontal Authorization Analysis
|
||||
|
||||
- **Create To Dos:**
|
||||
For each item listed under **`.shannon/deliverables/recon_deliverable.md` → section 8 "Horizontal"*, use the `todo_write` tool to create a task entry.
|
||||
For each item listed under **`.shannon/deliverables/recon_deliverable.md` → section 8 "Horizontal"*, use the TodoWrite tool to create a task entry.
|
||||
|
||||
- **Process:**
|
||||
- Start at the identified endpoint.
|
||||
@@ -158,7 +158,7 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a
|
||||
### 2) Vertical Authorization Analysis
|
||||
|
||||
- **Create To Dos:**
|
||||
For each item listed under **`.shannon/deliverables/recon_deliverable.md` → section 8 "Vertical"**, use the `todo_write` tool to create a task entry.
|
||||
For each item listed under **`.shannon/deliverables/recon_deliverable.md` → section 8 "Vertical"**, use the TodoWrite tool to create a task entry.
|
||||
|
||||
- **Process:**
|
||||
- Start at the identified endpoint.
|
||||
@@ -184,7 +184,7 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a
|
||||
### 3) Context / Workflow Authorization Analysis
|
||||
|
||||
- **Create To Dos:**
|
||||
For each item listed under **`.shannon/deliverables/recon_deliverable.md` → section 8 "Context"**, use the `todo_write` tool to create a task entry.
|
||||
For each item listed under **`.shannon/deliverables/recon_deliverable.md` → section 8 "Context"**, use the TodoWrite tool to create a task entry.
|
||||
|
||||
- **Process:**
|
||||
- Start at the endpoint that represents a step in a workflow.
|
||||
@@ -272,8 +272,8 @@ For each analysis you perform from the lists above, you must make a final **verd
|
||||
|
||||
</methodology_and_domain_expertise>
|
||||
|
||||
<deliverable_tools>
|
||||
After completing your `todo_write` tasks and synthesizing findings, emit your specialist deliverable via 4 one-shot tools. Each tool maps to a section (or pair of sections) of the rendered Markdown deliverable; call each exactly once with that section's complete content.
|
||||
<mcp_tools>
|
||||
After completing your TodoWrite tasks and synthesizing findings, emit your specialist deliverable via 4 one-shot MCP tools provided by the `vuln-collector` server. Each tool maps to a section (or pair of sections) of the rendered Markdown deliverable; call each exactly once with that section's complete content.
|
||||
|
||||
**Tool catalog:**
|
||||
- `set_findings_summary` — Section 1 (Executive Summary key outcome) and Section 2 (Dominant Vulnerability Patterns)
|
||||
@@ -281,7 +281,7 @@ After completing your `todo_write` tasks and synthesizing findings, emit your sp
|
||||
- `set_safe_vectors` — Section 4 (vectors confirmed secure)
|
||||
- `set_blind_spots` — Section 5 (analysis constraints and blind spots)
|
||||
|
||||
The harness injects each tool's complete description and per-field guidance into your tool catalog — refer to the tool catalog for what each parameter expects. For authz specifically, when populating `set_safe_vectors`, the renderer maps `subject` to the "Endpoint" column header and `location` to the "Guard Location" column header.
|
||||
The MCP SDK injects each tool's complete description and per-field guidance into your tool catalog — refer to the tool catalog for what each parameter expects. For authz specifically, when populating `set_safe_vectors`, the renderer maps `subject` to the "Endpoint" column header and `location` to the "Guard Location" column header.
|
||||
|
||||
**Call semantics:** All 4 tools are one-shot — each may be called exactly once with the section's complete content. Duplicate calls return `"already called"` and are no-ops. There is no incremental/append mode; synthesize each section's full content in working memory before emitting.
|
||||
|
||||
@@ -289,21 +289,21 @@ The harness injects each tool's complete description and per-field guidance into
|
||||
- `set_findings_summary` and `set_strategic_intelligence` are required — call both before terminating. They produce the load-bearing content the downstream `exploit-authz` agent reads.
|
||||
- `set_safe_vectors` and `set_blind_spots` are recommended. Empty arrays are acceptable on runs with no validated-secure endpoints or no constraint gaps, but explicit emission is preferred over skipping.
|
||||
|
||||
**Relationship to the exploitation queue:** The exploitation queue (`authz_exploitation_queue.json`) is produced by calling the `submit_exploitation_queue` tool when your analysis is complete. The 4 tools produce the analysis deliverable Markdown; the structured-output queue is separate and follows the `exploitation_queue_format` schema documented above.
|
||||
</deliverable_tools>
|
||||
**Relationship to the exploitation queue:** The exploitation queue (`authz_exploitation_queue.json`) is captured automatically from your final structured output at session end. The 4 MCP tools produce the analysis deliverable Markdown; the structured-output queue is separate and follows the `exploitation_queue_format` schema documented above.
|
||||
</mcp_tools>
|
||||
|
||||
|
||||
<conclusion_trigger>
|
||||
**COMPLETION REQUIREMENTS (ALL must be satisfied):**
|
||||
|
||||
1. **Todo Completion:** ALL tasks in your `todo_write` list must be marked as "completed".
|
||||
2. **Deliverable Emission:** Call the 4 tools (`set_findings_summary`, `set_strategic_intelligence`, `set_safe_vectors`, `set_blind_spots`) exactly once each with their respective section content. `set_findings_summary` and `set_strategic_intelligence` are required; `set_safe_vectors` and `set_blind_spots` are recommended (empty arrays acceptable but explicit emission is preferred).
|
||||
1. **Todo Completion:** ALL tasks in your TodoWrite list must be marked as "completed".
|
||||
2. **Deliverable Emission:** Call the 4 MCP tools (`set_findings_summary`, `set_strategic_intelligence`, `set_safe_vectors`, `set_blind_spots`) exactly once each with their respective section content. `set_findings_summary` and `set_strategic_intelligence` are required; `set_safe_vectors` and `set_blind_spots` are recommended (empty arrays acceptable but explicit emission is preferred).
|
||||
|
||||
**Note:** The exploitation queue is produced by calling the `submit_exploitation_queue` tool when your analysis is complete — separate from the tools above. The analysis deliverable Markdown is rendered by the harness after your session ends from the tool calls.
|
||||
**Note:** The exploitation queue is captured automatically from your final structured output at session end — separate from the MCP tools above. The analysis deliverable Markdown is rendered by the harness after your session ends from the MCP tool calls.
|
||||
|
||||
**ONLY AFTER** both todo completion AND the required tool calls have been completed, announce "**AUTHORIZATION ANALYSIS COMPLETE**" and stop.
|
||||
**ONLY AFTER** both todo completion AND the required MCP tool calls have been completed, announce "**AUTHORIZATION ANALYSIS COMPLETE**" and stop.
|
||||
|
||||
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
|
||||
|
||||
**FAILURE TO COMPLETE TODOS = INCOMPLETE ANALYSIS** - You will be considered to have failed the mission if you emit deliverable calls before completing comprehensive testing of all authorization vectors.
|
||||
**FAILURE TO COMPLETE TODOS = INCOMPLETE ANALYSIS** - You will be considered to have failed the mission if you emit MCP deliverable calls before completing comprehensive testing of all authorization vectors.
|
||||
</conclusion_trigger>
|
||||
|
||||
@@ -35,7 +35,7 @@ Success criterion: Complete source-to-sink traces detailing path, sanitizers, si
|
||||
- **Severity Context:** A structural flaw in a backend command, whether an SQL query or a shell command, is one of the most severe classes of vulnerability in a SaaS application. It undermines the foundational trust of the system by creating the *potential* for data exposure (SQLi) or direct server compromise (Command Injection).
|
||||
- **Your Role is Precise:** Your sole responsibility is to identify and precisely document these structural command flaws. You prove the *potential* for compromise; the Exploitation phase confirms the *realized* compromise. Do not cross this boundary.
|
||||
- **Code is Ground Truth:** Your analysis must be rooted in the application's code. An observed behavior is only a symptom; the insecure data flow within the code is the root cause you must identify.
|
||||
- **Thoroughness is Non-Negotiable:** An incomplete analysis is a failed analysis. Finding one flaw is merely the first data point. Your mission is only complete when **every potential data entry point** from the reconnaissance deliverable has been systematically analyzed and documented using the `todo_write` tool. **Do not terminate early.**
|
||||
- **Thoroughness is Non-Negotiable:** An incomplete analysis is a failed analysis. Finding one flaw is merely the first data point. Your mission is only complete when **every potential data entry point** from the reconnaissance deliverable has been systematically analyzed and documented using the TodoWrite tool. **Do not terminate early.**
|
||||
</critical>
|
||||
|
||||
<starting_context>
|
||||
@@ -80,15 +80,15 @@ An **exploitable vulnerability** is a confirmed source-to-sink path where the en
|
||||
<cli_tools>
|
||||
|
||||
**CRITICAL TOOL USAGE RESTRICTIONS:**
|
||||
- NEVER use the `read` tool for application source code analysis—delegate every code review to the `task` agent.
|
||||
- ALWAYS direct the `task` agent to trace tainted data flow, sanitization/encoding steps, and sink construction before you reach a verdict.
|
||||
- Use the `task` agent instead of Bash or Playwright when you need to inspect handlers, middleware, or shared utilities to follow an injection path.
|
||||
- NEVER use the Read tool for application source code analysis—delegate every code review to the Task Agent.
|
||||
- ALWAYS direct the Task Agent to trace tainted data flow, sanitization/encoding steps, and sink construction before you reach a verdict.
|
||||
- Use the Task Agent instead of Bash or Playwright when you need to inspect handlers, middleware, or shared utilities to follow an injection path.
|
||||
|
||||
**Available Tools:**
|
||||
- **`task` agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, map query/command construction paths, and verify sanitization coverage. MANDATORY for all source code analysis.
|
||||
- **`bash` tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **Task Agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, map query/command construction paths, and verify sanitization coverage. MANDATORY for all source code analysis.
|
||||
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **Browser Automation (playwright-cli skill):** For browser interactions with the live web application (multi-step flows like password reset or registration), invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
|
||||
- **`todo_write` Tool:** Use this to create and manage your analysis task list. Create a todo item for each injection source that needs analysis. Mark items as "in_progress" when working on them and "completed" when done.
|
||||
- **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each injection source that needs analysis. Mark items as "in_progress" when working on them and "completed" when done.
|
||||
</cli_tools>
|
||||
|
||||
<data_format_specifications>
|
||||
@@ -125,7 +125,7 @@ An **exploitable vulnerability** is a confirmed source-to-sink path where the en
|
||||
|
||||
- **Goal:** Prove whether untrusted input can influence the **structure** of a backend command (SQL or Shell) or reach sensitive **slots** without the correct defense. No live exploitation in this phase.
|
||||
- **1) Create a To Do for each Injection Source found in the Pre-Recon Deliverable
|
||||
- inside of .shannon/deliverables/pre_recon_deliverable.md under the section "7. Injection Sources (Command Injection and SQL Injection)" use the `todo_write` tool to create a task for each discovered Injection Source.
|
||||
- inside of .shannon/deliverables/pre_recon_deliverable.md under the section "7. Injection Sources (Command Injection and SQL Injection)" use the TodoWrite tool to create a task for each discovered Injection Source.
|
||||
- Note: All sources are marked as Tainted until they Hit a Santiization that matches the sink context. normalizers (lowercasing, trimming, JSON parse, schema decode) — still **tainted**.
|
||||
- **2) Trace Data Flow Paths from Source to Sink**
|
||||
- For each source, your goal is to identify every unique "Data Flow Path" to a database sink. A path is a distinct route the data takes through the code.
|
||||
@@ -283,8 +283,8 @@ An **exploitable vulnerability** is a confirmed source-to-sink path where the en
|
||||
|
||||
</methodology_and_domain_expertise>
|
||||
|
||||
<deliverable_tools>
|
||||
After completing your `todo_write` tasks and synthesizing findings, emit your specialist deliverable via 4 one-shot tools. Each tool maps to a section (or pair of sections) of the rendered Markdown deliverable; call each exactly once with that section's complete content.
|
||||
<mcp_tools>
|
||||
After completing your TodoWrite tasks and synthesizing findings, emit your specialist deliverable via 4 one-shot MCP tools provided by the `vuln-collector` server. Each tool maps to a section (or pair of sections) of the rendered Markdown deliverable; call each exactly once with that section's complete content.
|
||||
|
||||
**Tool catalog:**
|
||||
- `set_findings_summary` — Section 1 (Executive Summary key outcome) and Section 2 (Dominant Vulnerability Patterns)
|
||||
@@ -292,7 +292,7 @@ After completing your `todo_write` tasks and synthesizing findings, emit your sp
|
||||
- `set_safe_vectors` — Section 4 (vectors confirmed secure)
|
||||
- `set_blind_spots` — Section 5 (analysis constraints and blind spots)
|
||||
|
||||
The harness injects each tool's complete description and per-field guidance into your tool catalog — refer to the tool catalog for what each parameter expects.
|
||||
The MCP SDK injects each tool's complete description and per-field guidance into your tool catalog — refer to the tool catalog for what each parameter expects.
|
||||
|
||||
**Call semantics:** All 4 tools are one-shot — each may be called exactly once with the section's complete content. Duplicate calls return `"already called"` and are no-ops. There is no incremental/append mode; synthesize each section's full content in working memory before emitting.
|
||||
|
||||
@@ -300,21 +300,21 @@ The harness injects each tool's complete description and per-field guidance into
|
||||
- `set_findings_summary` and `set_strategic_intelligence` are required — call both before terminating. They produce the load-bearing content the downstream `exploit-injection` agent reads.
|
||||
- `set_safe_vectors` and `set_blind_spots` are recommended. Empty arrays are acceptable on runs with no validated-secure vectors or no constraint gaps, but explicit emission is preferred over skipping.
|
||||
|
||||
**Relationship to the exploitation queue:** The exploitation queue (`injection_exploitation_queue.json`) is produced by calling the `submit_exploitation_queue` tool when your analysis is complete. The 4 tools produce the analysis deliverable Markdown; the structured-output queue is separate and follows the `exploitation_queue_format` schema documented above.
|
||||
</deliverable_tools>
|
||||
**Relationship to the exploitation queue:** The exploitation queue (`injection_exploitation_queue.json`) is captured automatically from your final structured output at session end. The 4 MCP tools produce the analysis deliverable Markdown; the structured-output queue is separate and follows the `exploitation_queue_format` schema documented above.
|
||||
</mcp_tools>
|
||||
|
||||
|
||||
<conclusion_trigger>
|
||||
**COMPLETION REQUIREMENTS (ALL must be satisfied):**
|
||||
|
||||
1. **Todo Completion:** ALL tasks in your `todo_write` list must be marked as "completed".
|
||||
2. **Deliverable Emission:** Call the 4 tools (`set_findings_summary`, `set_strategic_intelligence`, `set_safe_vectors`, `set_blind_spots`) exactly once each with their respective section content. `set_findings_summary` and `set_strategic_intelligence` are required; `set_safe_vectors` and `set_blind_spots` are recommended (empty arrays acceptable but explicit emission is preferred).
|
||||
1. **Todo Completion:** ALL tasks in your TodoWrite list must be marked as "completed".
|
||||
2. **Deliverable Emission:** Call the 4 MCP tools (`set_findings_summary`, `set_strategic_intelligence`, `set_safe_vectors`, `set_blind_spots`) exactly once each with their respective section content. `set_findings_summary` and `set_strategic_intelligence` are required; `set_safe_vectors` and `set_blind_spots` are recommended (empty arrays acceptable but explicit emission is preferred).
|
||||
|
||||
**Note:** The exploitation queue is produced by calling the `submit_exploitation_queue` tool when your analysis is complete — separate from the tools above. The analysis deliverable Markdown is rendered by the harness after your session ends from the tool calls.
|
||||
**Note:** The exploitation queue is captured automatically from your final structured output at session end — separate from the MCP tools above. The analysis deliverable Markdown is rendered by the harness after your session ends from the MCP tool calls.
|
||||
|
||||
**ONLY AFTER** both todo completion AND the required tool calls have been completed, announce "**INJECTION ANALYSIS COMPLETE**" and stop.
|
||||
**ONLY AFTER** both todo completion AND the required MCP tool calls have been completed, announce "**INJECTION ANALYSIS COMPLETE**" and stop.
|
||||
|
||||
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
|
||||
|
||||
**FAILURE TO COMPLETE TODOS = INCOMPLETE ANALYSIS** - You will be considered to have failed the mission if you emit deliverable calls before completing comprehensive testing of all input vectors.
|
||||
**FAILURE TO COMPLETE TODOS = INCOMPLETE ANALYSIS** - You will be considered to have failed the mission if you emit MCP deliverable calls before completing comprehensive testing of all input vectors.
|
||||
</conclusion_trigger>
|
||||
|
||||
@@ -76,15 +76,15 @@ An **exploitable vulnerability** is a data flow where user-controlled input infl
|
||||
<cli_tools>
|
||||
|
||||
**CRITICAL TOOL USAGE RESTRICTIONS:**
|
||||
- NEVER use the `read` tool for application source code analysis—delegate every code review to the `task` agent.
|
||||
- ALWAYS drive the `task` agent to map user-controlled input to outbound HTTP clients, validation layers, and network controls before declaring a result.
|
||||
- Use the `task` agent to inspect shared utilities, proxy helpers, and request builders instead of reading files directly.
|
||||
- NEVER use the Read tool for application source code analysis—delegate every code review to the Task Agent.
|
||||
- ALWAYS drive the Task Agent to map user-controlled input to outbound HTTP clients, validation layers, and network controls before declaring a result.
|
||||
- Use the Task Agent to inspect shared utilities, proxy helpers, and request builders instead of reading files directly.
|
||||
|
||||
**Available Tools:**
|
||||
- **`task` agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, trace data flows, and understand HTTP client usage. MANDATORY for all source code analysis.
|
||||
- **`bash` tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **Task Agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, trace data flows, and understand HTTP client usage. MANDATORY for all source code analysis.
|
||||
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **Browser Automation (playwright-cli skill):** For browser interactions with the live web application (multi-step flows involving URL redirection or proxy functionality), invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
|
||||
- **`todo_write` Tool:** Use this to create and manage your analysis task list. Create a todo item for each SSRF sink that needs analysis. Mark items as "in_progress" when working on them and "completed" when done.
|
||||
- **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each SSRF sink that needs analysis. Mark items as "in_progress" when working on them and "completed" when done.
|
||||
</cli_tools>
|
||||
|
||||
<data_format_specifications>
|
||||
@@ -114,7 +114,7 @@ An **exploitable vulnerability** is a data flow where user-controlled input infl
|
||||
<methodology>
|
||||
# White-Box Vulnerability Analysis Procedure: Server-Side Request Forgery (SSRF)
|
||||
|
||||
NOTE ON "FOR ALL": whenever an item begins with "for all …", use the `todo_write` tool to create a separate task for each relevant endpoint/flow that needs to be analyzed.
|
||||
NOTE ON "FOR ALL": whenever an item begins with "for all …", use the TodoWrite tool to create a separate task for each relevant endpoint/flow that needs to be analyzed.
|
||||
From `.shannon/deliverables/pre_recon_deliverable.md`, use Section 10 (SSRF Sinks) to guide your tasks.
|
||||
|
||||
## 1) Identify HTTP Client Usage Patterns
|
||||
@@ -169,7 +169,7 @@ From `.shannon/deliverables/pre_recon_deliverable.md`, use Section 10 (SSRF Sink
|
||||
|
||||
Inside `.shannon/deliverables/pre_recon_deliverable.md` under section `##10. SSRF Sinks##`.
|
||||
|
||||
Use the `todo_write` tool to create a task for each discovered sink (any server-side request composed even partially from user input).
|
||||
Use the TodoWrite tool to create a task for each discovered sink (any server-side request composed even partially from user input).
|
||||
|
||||
---
|
||||
|
||||
@@ -243,15 +243,15 @@ For each check you perform from the list above, you must make a final **verdict*
|
||||
|
||||
</methodology_and_domain_expertise>
|
||||
|
||||
<deliverable_tools>
|
||||
After completing your `todo_write` tasks and synthesizing findings, emit your specialist deliverable via 3 one-shot tools. Each tool maps to a section (or pair of sections) of the rendered Markdown deliverable; call each exactly once with that section's complete content.
|
||||
<mcp_tools>
|
||||
After completing your TodoWrite tasks and synthesizing findings, emit your specialist deliverable via 3 one-shot MCP tools provided by the `vuln-collector` server. Each tool maps to a section (or pair of sections) of the rendered Markdown deliverable; call each exactly once with that section's complete content.
|
||||
|
||||
**Tool catalog:**
|
||||
- `set_findings_summary` — Section 1 (Executive Summary key outcome) and Section 2 (Dominant Vulnerability Patterns)
|
||||
- `set_strategic_intelligence` — Section 3 (Strategic Intelligence for Exploitation, with SSRF-specific sub-fields: HTTP client library, request architecture, internal services)
|
||||
- `set_safe_vectors` — Section 4 (Secure by Design: Validated Components)
|
||||
|
||||
The harness injects each tool's complete description and per-field guidance into your tool catalog — refer to the tool catalog for what each parameter expects.
|
||||
The MCP SDK injects each tool's complete description and per-field guidance into your tool catalog — refer to the tool catalog for what each parameter expects.
|
||||
|
||||
**Call semantics:** All 3 tools are one-shot — each may be called exactly once with the section's complete content. Duplicate calls return `"already called"` and are no-ops. There is no incremental/append mode; synthesize each section's full content in working memory before emitting.
|
||||
|
||||
@@ -259,19 +259,19 @@ The harness injects each tool's complete description and per-field guidance into
|
||||
- `set_findings_summary` and `set_strategic_intelligence` are required — call both before terminating. They produce the load-bearing content the downstream `exploit-ssrf` agent reads.
|
||||
- `set_safe_vectors` is recommended. An empty array is acceptable on runs with no validated-secure components, but explicit emission is preferred over skipping.
|
||||
|
||||
**Relationship to the exploitation queue:** The exploitation queue (`ssrf_exploitation_queue.json`) is produced by calling the `submit_exploitation_queue` tool when your analysis is complete. The 3 tools produce the analysis deliverable Markdown; the structured-output queue is separate and follows the `exploitation_queue_format` schema documented above.
|
||||
</deliverable_tools>
|
||||
**Relationship to the exploitation queue:** The exploitation queue (`ssrf_exploitation_queue.json`) is captured automatically from your final structured output at session end. The 3 MCP tools produce the analysis deliverable Markdown; the structured-output queue is separate and follows the `exploitation_queue_format` schema documented above.
|
||||
</mcp_tools>
|
||||
|
||||
|
||||
<conclusion_trigger>
|
||||
**COMPLETION REQUIREMENTS (ALL must be satisfied):**
|
||||
|
||||
1. **Systematic Analysis:** ALL relevant API endpoints and request-making features identified in the reconnaissance deliverable must be analyzed for SSRF vulnerabilities.
|
||||
2. **Deliverable Emission:** Call the 3 tools (`set_findings_summary`, `set_strategic_intelligence`, `set_safe_vectors`) exactly once each with their respective section content. `set_findings_summary` and `set_strategic_intelligence` are required; `set_safe_vectors` is recommended (an empty array is acceptable but explicit emission is preferred).
|
||||
2. **Deliverable Emission:** Call the 3 MCP tools (`set_findings_summary`, `set_strategic_intelligence`, `set_safe_vectors`) exactly once each with their respective section content. `set_findings_summary` and `set_strategic_intelligence` are required; `set_safe_vectors` is recommended (an empty array is acceptable but explicit emission is preferred).
|
||||
|
||||
**Note:** The exploitation queue is produced by calling the `submit_exploitation_queue` tool when your analysis is complete — separate from the tools above. The analysis deliverable Markdown is rendered by the harness after your session ends from the tool calls.
|
||||
**Note:** The exploitation queue is captured automatically from your final structured output at session end — separate from the MCP tools above. The analysis deliverable Markdown is rendered by the harness after your session ends from the MCP tool calls.
|
||||
|
||||
**ONLY AFTER** both systematic analysis AND the required tool calls have been completed, announce "**SSRF ANALYSIS COMPLETE**" and stop.
|
||||
**ONLY AFTER** both systematic analysis AND the required MCP tool calls have been completed, announce "**SSRF ANALYSIS COMPLETE**" and stop.
|
||||
|
||||
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
|
||||
</conclusion_trigger>
|
||||
|
||||
@@ -77,17 +77,17 @@ An **exploitable vulnerability** is a confirmed source-to-sink path where the en
|
||||
<cli_tools>
|
||||
|
||||
**CRITICAL TOOL USAGE RESTRICTIONS:**
|
||||
- NEVER use the `read` tool for application source code analysis - ALWAYS delegate to `task` agents for examining .js, .ts, .py, .php files and application logic. You MAY use Read
|
||||
- NEVER use the Read tool for application source code analysis - ALWAYS delegate to Task agents for examining .js, .ts, .py, .php files and application logic. You MAY use Read
|
||||
tool directly for these files: `.shannon/deliverables/pre_recon_deliverable.md`, `.shannon/deliverables/recon_deliverable.md`
|
||||
- Direct the `task` agent to trace render contexts, sanitization coverage, and template/component boundaries before deciding on exploitability.
|
||||
- **ALWAYS delegate code analysis to `task` agents**
|
||||
- Direct the Task Agent to trace render contexts, sanitization coverage, and template/component boundaries before deciding on exploitability.
|
||||
- **ALWAYS delegate code analysis to Task agents**
|
||||
|
||||
**Available Tools:**
|
||||
- **`task` agent (Code Analysis):** MANDATORY for all source code analysis and data flow tracing. Use this instead of `read` tool for examining application code, models, controllers, and templates.
|
||||
- **Task Agent (Code Analysis):** MANDATORY for all source code analysis and data flow tracing. Use this instead of Read tool for examining application code, models, controllers, and templates.
|
||||
- **Terminal (curl):** MANDATORY for testing HTTP-based XSS vectors and observing raw HTML responses. Use for reflected XSS testing and JSONP injection testing.
|
||||
- **Browser Automation (playwright-cli skill):** MANDATORY for testing DOM-based XSS and form submission vectors. Invoke the `playwright-cli` skill to learn available commands. Use for stored XSS testing and client-side payload execution verification. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
|
||||
- **`todo_write` Tool:** Use this to create and manage your analysis task list. Create a todo item for each sink you need to analyze.
|
||||
- **`bash` tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
- **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each sink you need to analyze.
|
||||
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
|
||||
</cli_tools>
|
||||
|
||||
<data_format_specifications>
|
||||
@@ -124,11 +124,11 @@ Structure: The vulnerability JSON object MUST follow this exact format:
|
||||
- **Goal:** Identify vulnerable data flow paths by starting at the XSS sinks received from the recon phase and tracing backward to their sanitizations and sources. This approach is optimized for finding all types of XSS, especially complex Stored XSS patterns.
|
||||
- **Core Principle:** Data is assumed to be tainted until a context-appropriate output encoder (sanitization) is encountered on its path to the sink.
|
||||
|
||||
### **1) Create a todo item for each XSS sink using the `todo_write` tool**
|
||||
Read .shannon/deliverables/pre_recon_deliverable.md section ##9. XSS Sinks and Render Contexts## and use the **`todo_write` tool** to create a todo item for each discovered sink-context pair that needs analysis.
|
||||
### **1) Create a todo item for each XSS sink using the TodoWrite tool**
|
||||
Read .shannon/deliverables/pre_recon_deliverable.md section ##9. XSS Sinks and Render Contexts## and use the **TodoWrite tool** to create a todo item for each discovered sink-context pair that needs analysis.
|
||||
|
||||
### **2) Trace Each Sink Backward (Backward Taint Analysis)**
|
||||
For each pending item in your todo list (managed via `todo_write` tool), trace the origin of the data variable backward from the sink through the application logic. Your goal is to find either a valid sanitizer or an untrusted source. Mark each todo item as completed after you've fully analyzed that sink.
|
||||
For each pending item in your todo list (managed via TodoWrite tool), trace the origin of the data variable backward from the sink through the application logic. Your goal is to find either a valid sanitizer or an untrusted source. Mark each todo item as completed after you've fully analyzed that sink.
|
||||
|
||||
- **Early Termination for Secure Paths (Efficiency Rule):**
|
||||
- As you trace backward, if you encounter a sanitization/encoding function, immediately perform two checks:
|
||||
@@ -205,8 +205,8 @@ This rulebook is used for the **Early Termination** check in Step 2.
|
||||
|
||||
</methodology_and_domain_expertise>
|
||||
|
||||
<deliverable_tools>
|
||||
After completing your `todo_write` tasks and synthesizing findings, emit your specialist deliverable via 4 one-shot tools. Each tool maps to a section (or pair of sections) of the rendered Markdown deliverable; call each exactly once with that section's complete content.
|
||||
<mcp_tools>
|
||||
After completing your TodoWrite tasks and synthesizing findings, emit your specialist deliverable via 4 one-shot MCP tools provided by the `vuln-collector` server. Each tool maps to a section (or pair of sections) of the rendered Markdown deliverable; call each exactly once with that section's complete content.
|
||||
|
||||
**Tool catalog:**
|
||||
- `set_findings_summary` — Section 1 (Executive Summary key outcome) and Section 2 (Dominant Vulnerability Patterns)
|
||||
@@ -214,7 +214,7 @@ After completing your `todo_write` tasks and synthesizing findings, emit your sp
|
||||
- `set_safe_vectors` — Section 4 (vectors confirmed secure)
|
||||
- `set_blind_spots` — Section 5 (analysis constraints and blind spots)
|
||||
|
||||
The harness injects each tool's complete description and per-field guidance into your tool catalog — refer to the tool catalog for what each parameter expects. For XSS specifically, when populating `set_safe_vectors`, include the optional `render_context` field on each entry (HTML_BODY, HTML_ATTRIBUTE, JAVASCRIPT_STRING, URL_PARAM, or CSS_VALUE).
|
||||
The MCP SDK injects each tool's complete description and per-field guidance into your tool catalog — refer to the tool catalog for what each parameter expects. For XSS specifically, when populating `set_safe_vectors`, include the optional `render_context` field on each entry (HTML_BODY, HTML_ATTRIBUTE, JAVASCRIPT_STRING, URL_PARAM, or CSS_VALUE).
|
||||
|
||||
**Call semantics:** All 4 tools are one-shot — each may be called exactly once with the section's complete content. Duplicate calls return `"already called"` and are no-ops. There is no incremental/append mode; synthesize each section's full content in working memory before emitting.
|
||||
|
||||
@@ -222,19 +222,19 @@ The harness injects each tool's complete description and per-field guidance into
|
||||
- `set_findings_summary` and `set_strategic_intelligence` are required — call both before terminating. They produce the load-bearing content the downstream `exploit-xss` agent reads.
|
||||
- `set_safe_vectors` and `set_blind_spots` are recommended. Empty arrays are acceptable on runs with no validated-secure vectors or no constraint gaps, but explicit emission is preferred over skipping.
|
||||
|
||||
**Relationship to the exploitation queue:** The exploitation queue (`xss_exploitation_queue.json`) is produced by calling the `submit_exploitation_queue` tool when your analysis is complete. The 4 tools produce the analysis deliverable Markdown; the structured-output queue is separate and follows the `exploitation_queue_format` schema documented above.
|
||||
</deliverable_tools>
|
||||
**Relationship to the exploitation queue:** The exploitation queue (`xss_exploitation_queue.json`) is captured automatically from your final structured output at session end. The 4 MCP tools produce the analysis deliverable Markdown; the structured-output queue is separate and follows the `exploitation_queue_format` schema documented above.
|
||||
</mcp_tools>
|
||||
|
||||
|
||||
<conclusion_trigger>
|
||||
COMPLETION REQUIREMENTS (ALL must be satisfied):
|
||||
|
||||
1. Systematic Analysis: ALL input vectors identified from the reconnaissance deliverable must be analyzed.
|
||||
2. Deliverable Emission: Call the 4 tools (`set_findings_summary`, `set_strategic_intelligence`, `set_safe_vectors`, `set_blind_spots`) exactly once each with their respective section content. `set_findings_summary` and `set_strategic_intelligence` are required; `set_safe_vectors` and `set_blind_spots` are recommended (empty arrays acceptable but explicit emission is preferred).
|
||||
2. Deliverable Emission: Call the 4 MCP tools (`set_findings_summary`, `set_strategic_intelligence`, `set_safe_vectors`, `set_blind_spots`) exactly once each with their respective section content. `set_findings_summary` and `set_strategic_intelligence` are required; `set_safe_vectors` and `set_blind_spots` are recommended (empty arrays acceptable but explicit emission is preferred).
|
||||
|
||||
**Note:** The exploitation queue is produced by calling the `submit_exploitation_queue` tool when your analysis is complete — separate from the tools above. The analysis deliverable Markdown is rendered by the harness after your session ends from the tool calls.
|
||||
**Note:** The exploitation queue is captured automatically from your final structured output at session end — separate from the MCP tools above. The analysis deliverable Markdown is rendered by the harness after your session ends from the MCP tool calls.
|
||||
|
||||
ONLY AFTER both systematic analysis AND the required tool calls have been completed, announce "XSS ANALYSIS COMPLETE" and stop.
|
||||
ONLY AFTER both systematic analysis AND the required MCP tool calls have been completed, announce "XSS ANALYSIS COMPLETE" and stop.
|
||||
|
||||
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
|
||||
</conclusion_trigger>
|
||||
|
||||
@@ -0,0 +1,404 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
// Production Claude agent execution with retry, git checkpoints, and audit logging
|
||||
|
||||
import { type JsonSchemaOutputFormat, query } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { fs, path } from 'zx';
|
||||
import type { AuditSession } from '../audit/index.js';
|
||||
import { deliverablesDir } from '../paths.js';
|
||||
import { isRetryableError, PentestError } from '../services/error-handling.js';
|
||||
import { AGENT_VALIDATORS } from '../session-manager.js';
|
||||
import type { ActivityLogger } from '../types/activity-logger.js';
|
||||
import { isSpendingCapBehavior } from '../utils/billing-detection.js';
|
||||
import { formatTimestamp } from '../utils/formatting.js';
|
||||
import { Timer } from '../utils/metrics.js';
|
||||
import { createAuditLogger } from './audit-logger.js';
|
||||
import { dispatchMessage } from './message-handlers.js';
|
||||
import { type ModelTier, resolveModel, supportsAdaptiveThinking } from './models.js';
|
||||
import { detectExecutionContext, formatCompletionMessage, formatErrorOutput } from './output-formatters.js';
|
||||
import { createProgressManager } from './progress-manager.js';
|
||||
|
||||
declare global {
|
||||
var SHANNON_DISABLE_LOADER: boolean | undefined;
|
||||
}
|
||||
|
||||
export interface ClaudePromptResult {
|
||||
result?: string | null | undefined;
|
||||
success: boolean;
|
||||
duration: number;
|
||||
turns?: number | undefined;
|
||||
cost: number;
|
||||
model?: string | undefined;
|
||||
partialCost?: number | undefined;
|
||||
apiErrorDetected?: boolean | undefined;
|
||||
error?: string | undefined;
|
||||
errorType?: string | undefined;
|
||||
prompt?: string | undefined;
|
||||
retryable?: boolean | undefined;
|
||||
structuredOutput?: unknown;
|
||||
}
|
||||
|
||||
function outputLines(lines: string[]): void {
|
||||
for (const line of lines) {
|
||||
console.log(line);
|
||||
}
|
||||
}
|
||||
|
||||
async function writeErrorLog(
|
||||
err: Error & { code?: string; status?: number },
|
||||
sourceDir: string,
|
||||
fullPrompt: string,
|
||||
duration: number,
|
||||
): Promise<void> {
|
||||
try {
|
||||
const errorLog = {
|
||||
timestamp: formatTimestamp(),
|
||||
agent: 'claude-executor',
|
||||
error: {
|
||||
name: err.constructor.name,
|
||||
message: err.message,
|
||||
code: err.code,
|
||||
status: err.status,
|
||||
stack: err.stack,
|
||||
},
|
||||
context: {
|
||||
sourceDir,
|
||||
prompt: `${fullPrompt.slice(0, 200)}...`,
|
||||
retryable: isRetryableError(err),
|
||||
},
|
||||
duration,
|
||||
};
|
||||
const logPath = path.join(deliverablesDir(sourceDir), 'error.log');
|
||||
await fs.appendFile(logPath, `${JSON.stringify(errorLog)}\n`);
|
||||
} catch {
|
||||
// Best-effort error log writing - don't propagate failures
|
||||
}
|
||||
}
|
||||
|
||||
export async function validateAgentOutput(
|
||||
result: ClaudePromptResult,
|
||||
agentName: string | null,
|
||||
sourceDir: string,
|
||||
logger: ActivityLogger,
|
||||
): Promise<boolean> {
|
||||
logger.info(`Validating ${agentName} agent output`);
|
||||
|
||||
try {
|
||||
// Check if agent completed successfully (text result OR structured output)
|
||||
if (!result.success || (!result.result && result.structuredOutput === undefined)) {
|
||||
logger.error('Validation failed: Agent execution was unsuccessful');
|
||||
return false;
|
||||
}
|
||||
|
||||
// Get validator function for this agent
|
||||
const validator = agentName ? AGENT_VALIDATORS[agentName as keyof typeof AGENT_VALIDATORS] : undefined;
|
||||
|
||||
if (!validator) {
|
||||
logger.warn(`No validator found for agent "${agentName}" - assuming success`);
|
||||
logger.info('Validation passed: Unknown agent with successful result');
|
||||
return true;
|
||||
}
|
||||
|
||||
logger.info(`Using validator for agent: ${agentName}`, { sourceDir });
|
||||
|
||||
// Apply validation function
|
||||
const validationResult = await validator(sourceDir, logger);
|
||||
|
||||
if (validationResult) {
|
||||
logger.info('Validation passed: Required files/structure present');
|
||||
} else {
|
||||
logger.error('Validation failed: Missing required deliverable files');
|
||||
}
|
||||
|
||||
return validationResult;
|
||||
} catch (error) {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
logger.error(`Validation failed with error: ${errMsg}`);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// Low-level SDK execution. Handles message streaming, progress, and audit logging.
|
||||
// Exported for Temporal activities to call single-attempt execution.
|
||||
export async function runClaudePrompt(
|
||||
prompt: string,
|
||||
sourceDir: string,
|
||||
context: string = '',
|
||||
description: string = 'Claude analysis',
|
||||
_agentName: string | null = null,
|
||||
auditSession: AuditSession | null = null,
|
||||
logger: ActivityLogger,
|
||||
modelTier: ModelTier = 'medium',
|
||||
outputFormat?: JsonSchemaOutputFormat,
|
||||
apiKey?: string,
|
||||
deliverablesSubdir?: string,
|
||||
providerConfig?: import('../types/config.js').ProviderConfig,
|
||||
mcpServers?: Record<string, import('@anthropic-ai/claude-agent-sdk').McpServerConfig>,
|
||||
): Promise<ClaudePromptResult> {
|
||||
// 1. Initialize timing and prompt
|
||||
const timer = new Timer(`agent-${description.toLowerCase().replace(/\s+/g, '-')}`);
|
||||
const fullPrompt = context ? `${context}\n\n${prompt}` : prompt;
|
||||
|
||||
// 2. Set up progress and audit infrastructure
|
||||
const execContext = detectExecutionContext(description);
|
||||
const progress = createProgressManager(
|
||||
{ description, useCleanOutput: execContext.useCleanOutput },
|
||||
global.SHANNON_DISABLE_LOADER ?? false,
|
||||
);
|
||||
const auditLogger = createAuditLogger(auditSession);
|
||||
|
||||
logger.info(`Running Claude Code: ${description}...`);
|
||||
|
||||
// 3. Build env vars to pass to SDK subprocesses
|
||||
const sdkEnv: Record<string, string> = {
|
||||
CLAUDE_CODE_MAX_OUTPUT_TOKENS: process.env.CLAUDE_CODE_MAX_OUTPUT_TOKENS || '64000',
|
||||
PLAYWRIGHT_MCP_OUTPUT_DIR: deliverablesSubdir
|
||||
? path.join(sourceDir, path.dirname(deliverablesSubdir), '.playwright-cli')
|
||||
: path.join(sourceDir, '.shannon', '.playwright-cli'),
|
||||
// apiKey from ContainerConfig takes precedence over process.env
|
||||
...(apiKey && { ANTHROPIC_API_KEY: apiKey }),
|
||||
// Deliverables subdir for save-deliverable CLI tool
|
||||
...(deliverablesSubdir && { SHANNON_DELIVERABLES_SUBDIR: deliverablesSubdir }),
|
||||
};
|
||||
|
||||
// 3a. Apply structured provider config directly to sdkEnv (no process.env mutation)
|
||||
if (providerConfig) {
|
||||
switch (providerConfig.providerType) {
|
||||
case 'bedrock':
|
||||
sdkEnv.CLAUDE_CODE_USE_BEDROCK = '1';
|
||||
if (providerConfig.awsRegion) sdkEnv.AWS_REGION = providerConfig.awsRegion;
|
||||
if (providerConfig.awsAccessKeyId) sdkEnv.AWS_ACCESS_KEY_ID = providerConfig.awsAccessKeyId;
|
||||
if (providerConfig.awsSecretAccessKey) sdkEnv.AWS_SECRET_ACCESS_KEY = providerConfig.awsSecretAccessKey;
|
||||
break;
|
||||
case 'vertex':
|
||||
sdkEnv.CLAUDE_CODE_USE_VERTEX = '1';
|
||||
if (providerConfig.gcpRegion) sdkEnv.CLOUD_ML_REGION = providerConfig.gcpRegion;
|
||||
if (providerConfig.gcpProjectId) sdkEnv.ANTHROPIC_VERTEX_PROJECT_ID = providerConfig.gcpProjectId;
|
||||
if (providerConfig.gcpCredentialsPath)
|
||||
sdkEnv.GOOGLE_APPLICATION_CREDENTIALS = providerConfig.gcpCredentialsPath;
|
||||
break;
|
||||
case 'litellm_router':
|
||||
if (providerConfig.baseUrl) sdkEnv.ANTHROPIC_BASE_URL = providerConfig.baseUrl;
|
||||
if (providerConfig.authToken) sdkEnv.ANTHROPIC_AUTH_TOKEN = providerConfig.authToken;
|
||||
break;
|
||||
default:
|
||||
// 'anthropic_api' or unset — apiKey already handled above
|
||||
if (providerConfig.apiKey && !apiKey) sdkEnv.ANTHROPIC_API_KEY = providerConfig.apiKey;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// 3b. Passthrough env vars not already set by providerConfig or apiKey
|
||||
const passthroughVars = [
|
||||
...(!sdkEnv.ANTHROPIC_API_KEY ? ['ANTHROPIC_API_KEY'] : []),
|
||||
'CLAUDE_CODE_OAUTH_TOKEN',
|
||||
...(!sdkEnv.ANTHROPIC_BASE_URL ? ['ANTHROPIC_BASE_URL'] : []),
|
||||
...(!sdkEnv.ANTHROPIC_AUTH_TOKEN ? ['ANTHROPIC_AUTH_TOKEN'] : []),
|
||||
...(!sdkEnv.CLAUDE_CODE_USE_BEDROCK ? ['CLAUDE_CODE_USE_BEDROCK'] : []),
|
||||
...(!sdkEnv.AWS_REGION ? ['AWS_REGION'] : []),
|
||||
'AWS_BEARER_TOKEN_BEDROCK',
|
||||
...(!sdkEnv.CLAUDE_CODE_USE_VERTEX ? ['CLAUDE_CODE_USE_VERTEX'] : []),
|
||||
...(!sdkEnv.CLOUD_ML_REGION ? ['CLOUD_ML_REGION'] : []),
|
||||
...(!sdkEnv.ANTHROPIC_VERTEX_PROJECT_ID ? ['ANTHROPIC_VERTEX_PROJECT_ID'] : []),
|
||||
...(!sdkEnv.GOOGLE_APPLICATION_CREDENTIALS ? ['GOOGLE_APPLICATION_CREDENTIALS'] : []),
|
||||
'HOME',
|
||||
'PATH',
|
||||
'PLAYWRIGHT_MCP_EXECUTABLE_PATH',
|
||||
];
|
||||
for (const name of passthroughVars) {
|
||||
const val = process.env[name];
|
||||
if (val) {
|
||||
sdkEnv[name] = val;
|
||||
}
|
||||
}
|
||||
|
||||
// 4. Configure SDK options
|
||||
// Model override from providerConfig takes precedence over env-based resolveModel
|
||||
const model = providerConfig?.modelOverrides?.[modelTier] ?? resolveModel(modelTier);
|
||||
const adaptiveThinking = supportsAdaptiveThinking(model) && process.env.CLAUDE_ADAPTIVE_THINKING !== 'false';
|
||||
const options = {
|
||||
model,
|
||||
maxTurns: 10_000,
|
||||
cwd: sourceDir,
|
||||
permissionMode: 'bypassPermissions' as const,
|
||||
allowDangerouslySkipPermissions: true,
|
||||
settingSources: ['user'] as ('user' | 'project' | 'local')[],
|
||||
env: sdkEnv,
|
||||
...(adaptiveThinking && { thinking: { type: 'adaptive' as const } }),
|
||||
...(outputFormat && { outputFormat }),
|
||||
...(mcpServers && Object.keys(mcpServers).length > 0 && { mcpServers }),
|
||||
};
|
||||
|
||||
if (!execContext.useCleanOutput) {
|
||||
logger.info(`SDK Options: maxTurns=${options.maxTurns}, cwd=${sourceDir}, permissions=BYPASS`);
|
||||
}
|
||||
|
||||
let turnCount = 0;
|
||||
let result: string | null = null;
|
||||
let apiErrorDetected = false;
|
||||
let totalCost = 0;
|
||||
|
||||
progress.start();
|
||||
|
||||
try {
|
||||
// 6. Process the message stream
|
||||
const messageLoopResult = await processMessageStream(
|
||||
fullPrompt,
|
||||
options,
|
||||
{ execContext, description, progress, auditLogger, logger },
|
||||
timer,
|
||||
);
|
||||
|
||||
turnCount = messageLoopResult.turnCount;
|
||||
result = messageLoopResult.result;
|
||||
apiErrorDetected = messageLoopResult.apiErrorDetected;
|
||||
totalCost = messageLoopResult.cost;
|
||||
const model = messageLoopResult.model;
|
||||
|
||||
// === SPENDING CAP SAFEGUARD ===
|
||||
// 7. Defense-in-depth: Detect spending cap that slipped through detectApiError().
|
||||
// Uses consolidated billing detection from utils/billing-detection.ts
|
||||
if (isSpendingCapBehavior(turnCount, totalCost, result || '')) {
|
||||
throw new PentestError(
|
||||
`Spending cap likely reached (turns=${turnCount}, cost=$0): ${result?.slice(0, 100)}`,
|
||||
'billing',
|
||||
true, // Retryable - Temporal will use 5-30 min backoff
|
||||
);
|
||||
}
|
||||
|
||||
// 8. Finalize successful result
|
||||
const duration = timer.stop();
|
||||
|
||||
if (apiErrorDetected) {
|
||||
logger.warn(`API Error detected in ${description} - will validate deliverables before failing`);
|
||||
}
|
||||
|
||||
progress.finish(formatCompletionMessage(execContext, description, turnCount, duration));
|
||||
|
||||
return {
|
||||
result,
|
||||
success: true,
|
||||
duration,
|
||||
turns: turnCount,
|
||||
cost: totalCost,
|
||||
model,
|
||||
partialCost: totalCost,
|
||||
apiErrorDetected,
|
||||
...(messageLoopResult.structuredOutput !== undefined && {
|
||||
structuredOutput: messageLoopResult.structuredOutput,
|
||||
}),
|
||||
};
|
||||
} catch (error) {
|
||||
// 9. Handle errors — log, write error file, return failure
|
||||
const duration = timer.stop();
|
||||
|
||||
const err = error as Error & { code?: string; status?: number };
|
||||
|
||||
await auditLogger.logError(err, duration, turnCount);
|
||||
progress.stop();
|
||||
outputLines(formatErrorOutput(err, execContext, description, duration, sourceDir, isRetryableError(err)));
|
||||
await writeErrorLog(err, sourceDir, fullPrompt, duration);
|
||||
|
||||
return {
|
||||
error: err.message,
|
||||
errorType: err.constructor.name,
|
||||
prompt: `${fullPrompt.slice(0, 100)}...`,
|
||||
success: false,
|
||||
duration,
|
||||
cost: totalCost,
|
||||
retryable: isRetryableError(err),
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
interface MessageLoopResult {
|
||||
turnCount: number;
|
||||
result: string | null;
|
||||
apiErrorDetected: boolean;
|
||||
cost: number;
|
||||
model?: string | undefined;
|
||||
structuredOutput?: unknown;
|
||||
}
|
||||
|
||||
interface MessageLoopDeps {
|
||||
execContext: ReturnType<typeof detectExecutionContext>;
|
||||
description: string;
|
||||
progress: ReturnType<typeof createProgressManager>;
|
||||
auditLogger: ReturnType<typeof createAuditLogger>;
|
||||
logger: ActivityLogger;
|
||||
}
|
||||
|
||||
async function processMessageStream(
|
||||
fullPrompt: string,
|
||||
options: NonNullable<Parameters<typeof query>[0]['options']>,
|
||||
deps: MessageLoopDeps,
|
||||
timer: Timer,
|
||||
): Promise<MessageLoopResult> {
|
||||
const { execContext, description, progress, auditLogger, logger } = deps;
|
||||
const HEARTBEAT_INTERVAL = 30000;
|
||||
|
||||
let turnCount = 0;
|
||||
let result: string | null = null;
|
||||
let apiErrorDetected = false;
|
||||
let cost = 0;
|
||||
let model: string | undefined;
|
||||
let structuredOutput: unknown | undefined;
|
||||
let lastHeartbeat = Date.now();
|
||||
|
||||
for await (const message of query({ prompt: fullPrompt, options })) {
|
||||
// Heartbeat logging when loader is disabled
|
||||
const now = Date.now();
|
||||
if (global.SHANNON_DISABLE_LOADER && now - lastHeartbeat > HEARTBEAT_INTERVAL) {
|
||||
logger.info(`[${Math.floor((now - timer.startTime) / 1000)}s] ${description} running... (Turn ${turnCount})`);
|
||||
lastHeartbeat = now;
|
||||
}
|
||||
|
||||
// Increment turn count for assistant messages
|
||||
if (message.type === 'assistant') {
|
||||
turnCount++;
|
||||
}
|
||||
|
||||
const dispatchResult = await dispatchMessage(message as { type: string; subtype?: string }, turnCount, {
|
||||
execContext,
|
||||
description,
|
||||
progress,
|
||||
auditLogger,
|
||||
logger,
|
||||
});
|
||||
|
||||
if (dispatchResult.type === 'throw') {
|
||||
throw dispatchResult.error;
|
||||
}
|
||||
|
||||
if (dispatchResult.type === 'complete') {
|
||||
result = dispatchResult.result;
|
||||
cost = dispatchResult.cost;
|
||||
if (dispatchResult.structuredOutput !== undefined) {
|
||||
structuredOutput = dispatchResult.structuredOutput;
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
if (dispatchResult.type === 'continue') {
|
||||
if (dispatchResult.apiErrorDetected) {
|
||||
apiErrorDetected = true;
|
||||
}
|
||||
if (dispatchResult.model) {
|
||||
model = dispatchResult.model;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
turnCount,
|
||||
result,
|
||||
apiErrorDetected,
|
||||
cost,
|
||||
model,
|
||||
...(structuredOutput !== undefined && { structuredOutput }),
|
||||
};
|
||||
}
|
||||
@@ -1,47 +0,0 @@
|
||||
/**
|
||||
* pi extension: enforce a bounded timeout on every `bash` tool call.
|
||||
*
|
||||
* pi's built-in bash tool accepts an optional `timeout` (in seconds) but applies
|
||||
* NO default and NO upper bound — an unbounded command (e.g. a `playwright-cli`
|
||||
* browser action that never returns) hangs the agent indefinitely. This extension
|
||||
* registers a `tool_call` pre-execution handler that blocks any `bash` invocation
|
||||
* that omits `timeout` or sets it above the maximum, returning a message that tells
|
||||
* the model how to re-run the command correctly.
|
||||
*/
|
||||
|
||||
import type { ExtensionAPI, ToolCallEvent, ToolCallEventResult } from '@earendil-works/pi-coding-agent';
|
||||
import { isToolCallEventType } from '@earendil-works/pi-coding-agent';
|
||||
|
||||
/** Recommended timeout (seconds) suggested to the model when it omits one. */
|
||||
const DEFAULT_TIMEOUT_SECONDS = 120;
|
||||
|
||||
/** Hard upper bound (seconds) a single bash command may run. */
|
||||
const MAX_TIMEOUT_SECONDS = 600;
|
||||
|
||||
function evaluateBashTimeout(timeout: number | undefined): ToolCallEventResult | undefined {
|
||||
const hasValidTimeout = typeof timeout === 'number' && Number.isFinite(timeout) && timeout > 0;
|
||||
if (!hasValidTimeout) {
|
||||
return {
|
||||
block: true,
|
||||
reason: `Set bash 'timeout' (seconds). Default ${DEFAULT_TIMEOUT_SECONDS}s, max ${MAX_TIMEOUT_SECONDS}s.`,
|
||||
};
|
||||
}
|
||||
|
||||
if (timeout > MAX_TIMEOUT_SECONDS) {
|
||||
return {
|
||||
block: true,
|
||||
reason: `bash 'timeout' ${timeout}s exceeds max ${MAX_TIMEOUT_SECONDS}s. Default ${DEFAULT_TIMEOUT_SECONDS}s, max ${MAX_TIMEOUT_SECONDS}s.`,
|
||||
};
|
||||
}
|
||||
|
||||
return undefined;
|
||||
}
|
||||
|
||||
export default function bashTimeoutExtension(pi: ExtensionAPI): void {
|
||||
pi.on('tool_call', (event: ToolCallEvent): ToolCallEventResult | undefined => {
|
||||
if (!isToolCallEventType('bash', event)) {
|
||||
return undefined;
|
||||
}
|
||||
return evaluateBashTimeout(event.input.timeout);
|
||||
});
|
||||
}
|
||||
@@ -0,0 +1,408 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
import type { SDKAssistantMessageError } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { PentestError } from '../services/error-handling.js';
|
||||
import type { ActivityLogger } from '../types/activity-logger.js';
|
||||
import { ErrorCode } from '../types/errors.js';
|
||||
import { matchesBillingTextPattern } from '../utils/billing-detection.js';
|
||||
import { formatTimestamp } from '../utils/formatting.js';
|
||||
import type { AuditLogger } from './audit-logger.js';
|
||||
import {
|
||||
filterJsonToolCalls,
|
||||
formatAssistantOutput,
|
||||
formatResultOutput,
|
||||
formatToolResultOutput,
|
||||
formatToolUseOutput,
|
||||
} from './output-formatters.js';
|
||||
import type { ProgressManager } from './progress-manager.js';
|
||||
import type {
|
||||
ApiErrorDetection,
|
||||
AssistantMessage,
|
||||
AssistantResult,
|
||||
ContentBlock,
|
||||
ExecutionContext,
|
||||
ModelRefusalFallbackMessage,
|
||||
ResultData,
|
||||
ResultMessage,
|
||||
SystemInitMessage,
|
||||
ToolResultData,
|
||||
ToolResultMessage,
|
||||
ToolUseData,
|
||||
ToolUseMessage,
|
||||
} from './types.js';
|
||||
|
||||
// Handles both array and string content formats from SDK
|
||||
function extractMessageContent(message: AssistantMessage): string {
|
||||
const messageContent = message.message;
|
||||
|
||||
if (Array.isArray(messageContent.content)) {
|
||||
return messageContent.content
|
||||
.filter((c: ContentBlock) => c.type !== 'thinking' && c.type !== 'redacted_thinking')
|
||||
.map((c: ContentBlock) => c.text || JSON.stringify(c))
|
||||
.join('\n');
|
||||
}
|
||||
|
||||
return String(messageContent.content);
|
||||
}
|
||||
|
||||
// Extracts only text content (no tool_use JSON) to avoid false positives in error detection
|
||||
function extractTextOnlyContent(message: AssistantMessage): string {
|
||||
const messageContent = message.message;
|
||||
|
||||
if (Array.isArray(messageContent.content)) {
|
||||
return messageContent.content
|
||||
.filter((c: ContentBlock) => c.type === 'text' || c.text)
|
||||
.map((c: ContentBlock) => c.text || '')
|
||||
.join('\n');
|
||||
}
|
||||
|
||||
return String(messageContent.content);
|
||||
}
|
||||
|
||||
function detectApiError(content: string): ApiErrorDetection {
|
||||
if (!content || typeof content !== 'string') {
|
||||
return { detected: false };
|
||||
}
|
||||
|
||||
const lowerContent = content.toLowerCase();
|
||||
|
||||
// === BILLING/SPENDING CAP ERRORS (Retryable with long backoff) ===
|
||||
// When Claude Code hits its spending cap, it returns a short message like
|
||||
// "Spending cap reached resets 8am" instead of throwing an error.
|
||||
// These should retry with 5-30 min backoff so workflows can recover when cap resets.
|
||||
if (matchesBillingTextPattern(content)) {
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Billing limit reached: ${content.slice(0, 100)}`,
|
||||
'billing',
|
||||
true, // RETRYABLE - Temporal will use 5-30 min backoff
|
||||
{},
|
||||
ErrorCode.SPENDING_CAP_REACHED,
|
||||
),
|
||||
};
|
||||
}
|
||||
|
||||
// === SESSION LIMIT (Non-retryable) ===
|
||||
// Different from spending cap - usually means something is fundamentally wrong
|
||||
if (lowerContent.includes('session limit reached')) {
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError('Session limit reached', 'billing', false),
|
||||
};
|
||||
}
|
||||
|
||||
// Non-fatal API errors - detected but continue
|
||||
if (lowerContent.includes('api error') || lowerContent.includes('terminated')) {
|
||||
return { detected: true };
|
||||
}
|
||||
|
||||
return { detected: false };
|
||||
}
|
||||
|
||||
// Maps SDK structured error types to our error handling.
|
||||
function handleStructuredError(errorType: SDKAssistantMessageError, content: string): ApiErrorDetection {
|
||||
switch (errorType) {
|
||||
case 'billing_error':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Billing error (structured): ${content.slice(0, 100)}`,
|
||||
'billing',
|
||||
true, // Retryable with backoff
|
||||
{},
|
||||
ErrorCode.INSUFFICIENT_CREDITS,
|
||||
),
|
||||
};
|
||||
case 'rate_limit':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Rate limit hit (structured): ${content.slice(0, 100)}`,
|
||||
'network',
|
||||
true, // Retryable with backoff
|
||||
{},
|
||||
ErrorCode.API_RATE_LIMITED,
|
||||
),
|
||||
};
|
||||
case 'authentication_failed':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Authentication failed: ${content.slice(0, 100)}`,
|
||||
'config',
|
||||
false, // Not retryable - needs API key fix
|
||||
),
|
||||
};
|
||||
case 'server_error':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Server error (structured): ${content.slice(0, 100)}`,
|
||||
'network',
|
||||
true, // Retryable
|
||||
),
|
||||
};
|
||||
case 'invalid_request':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Invalid request: ${content.slice(0, 100)}`,
|
||||
'config',
|
||||
false, // Not retryable - needs code fix
|
||||
),
|
||||
};
|
||||
case 'max_output_tokens':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Max output tokens reached: ${content.slice(0, 100)}`,
|
||||
'billing',
|
||||
true, // Retryable - may succeed with different content
|
||||
),
|
||||
};
|
||||
case 'overloaded':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Anthropic API overloaded (structured): ${content.slice(0, 100)}`,
|
||||
'network',
|
||||
true, // Retryable with backoff
|
||||
),
|
||||
};
|
||||
case 'model_not_found':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Model not found: ${content.slice(0, 100)}`,
|
||||
'config',
|
||||
false, // Not retryable - model ID is misconfigured
|
||||
),
|
||||
};
|
||||
case 'oauth_org_not_allowed':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Organization not allowed for this credential: ${content.slice(0, 100)}`,
|
||||
'config',
|
||||
false, // Not retryable - needs credential/org fix
|
||||
),
|
||||
};
|
||||
default:
|
||||
return { detected: true };
|
||||
}
|
||||
}
|
||||
|
||||
function handleAssistantMessage(message: AssistantMessage, turnCount: number): AssistantResult {
|
||||
const content = extractMessageContent(message);
|
||||
const cleanedContent = filterJsonToolCalls(content);
|
||||
|
||||
// Prefer structured error field from SDK, fall back to text-sniffing
|
||||
// Use text-only content for error detection to avoid false positives
|
||||
// from tool_use JSON (e.g. security reports containing "usage limit")
|
||||
let errorDetection: ApiErrorDetection;
|
||||
if (message.error) {
|
||||
errorDetection = handleStructuredError(message.error, content);
|
||||
} else {
|
||||
const textOnlyContent = extractTextOnlyContent(message);
|
||||
errorDetection = detectApiError(textOnlyContent);
|
||||
}
|
||||
|
||||
const result: AssistantResult = {
|
||||
content,
|
||||
cleanedContent,
|
||||
apiErrorDetected: errorDetection.detected,
|
||||
logData: {
|
||||
turn: turnCount,
|
||||
content,
|
||||
timestamp: formatTimestamp(),
|
||||
},
|
||||
};
|
||||
|
||||
// Only add shouldThrow if it exists (exactOptionalPropertyTypes compliance)
|
||||
if (errorDetection.shouldThrow) {
|
||||
result.shouldThrow = errorDetection.shouldThrow;
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
// Final message of a query with cost/duration info
|
||||
function handleResultMessage(message: ResultMessage): ResultData {
|
||||
const result: ResultData = {
|
||||
result: message.result || null,
|
||||
cost: message.total_cost_usd || 0,
|
||||
duration_ms: message.duration_ms || 0,
|
||||
permissionDenials: message.permission_denials?.length || 0,
|
||||
};
|
||||
|
||||
// Only add subtype if it exists (exactOptionalPropertyTypes compliance)
|
||||
if (message.subtype) {
|
||||
result.subtype = message.subtype;
|
||||
}
|
||||
|
||||
// Capture stop_reason for diagnostics (helps debug early stops, budget exceeded, etc.)
|
||||
if (message.stop_reason !== undefined) {
|
||||
result.stop_reason = message.stop_reason;
|
||||
if (message.stop_reason && message.stop_reason !== 'end_turn') {
|
||||
console.log(` Stop reason: ${message.stop_reason}`);
|
||||
}
|
||||
}
|
||||
|
||||
if (message.structured_output !== undefined) {
|
||||
result.structuredOutput = message.structured_output;
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
function handleToolUseMessage(message: ToolUseMessage): ToolUseData {
|
||||
return {
|
||||
toolName: message.name,
|
||||
parameters: message.input || {},
|
||||
timestamp: formatTimestamp(),
|
||||
};
|
||||
}
|
||||
|
||||
// Truncates long results for display (500 char limit), preserves full content for logging
|
||||
function handleToolResultMessage(message: ToolResultMessage): ToolResultData {
|
||||
const content = message.content;
|
||||
const contentStr = typeof content === 'string' ? content : JSON.stringify(content, null, 2);
|
||||
|
||||
const displayContent =
|
||||
contentStr.length > 500
|
||||
? `${contentStr.slice(0, 500)}...\n[Result truncated - ${contentStr.length} total chars]`
|
||||
: contentStr;
|
||||
|
||||
return {
|
||||
content,
|
||||
displayContent,
|
||||
timestamp: formatTimestamp(),
|
||||
};
|
||||
}
|
||||
|
||||
function outputLines(lines: string[]): void {
|
||||
for (const line of lines) {
|
||||
console.log(line);
|
||||
}
|
||||
}
|
||||
|
||||
export type MessageDispatchAction =
|
||||
| { type: 'continue'; apiErrorDetected?: boolean | undefined; model?: string | undefined }
|
||||
| { type: 'complete'; result: string | null; cost: number; structuredOutput?: unknown }
|
||||
| { type: 'throw'; error: Error };
|
||||
|
||||
export interface MessageDispatchDeps {
|
||||
execContext: ExecutionContext;
|
||||
description: string;
|
||||
progress: ProgressManager;
|
||||
auditLogger: AuditLogger;
|
||||
logger: ActivityLogger;
|
||||
}
|
||||
|
||||
// Dispatches SDK messages to appropriate handlers and formatters
|
||||
export async function dispatchMessage(
|
||||
message: { type: string; subtype?: string },
|
||||
turnCount: number,
|
||||
deps: MessageDispatchDeps,
|
||||
): Promise<MessageDispatchAction> {
|
||||
const { execContext, description, progress, auditLogger, logger } = deps;
|
||||
|
||||
switch (message.type) {
|
||||
case 'assistant': {
|
||||
const assistantResult = handleAssistantMessage(message as AssistantMessage, turnCount);
|
||||
|
||||
if (assistantResult.shouldThrow) {
|
||||
return { type: 'throw', error: assistantResult.shouldThrow };
|
||||
}
|
||||
|
||||
if (assistantResult.cleanedContent.trim()) {
|
||||
progress.stop();
|
||||
outputLines(formatAssistantOutput(assistantResult.cleanedContent, execContext, turnCount, description));
|
||||
progress.start();
|
||||
}
|
||||
|
||||
await auditLogger.logLlmResponse(turnCount, assistantResult.content);
|
||||
|
||||
if (assistantResult.apiErrorDetected) {
|
||||
logger.warn('API Error detected in assistant response');
|
||||
return { type: 'continue', apiErrorDetected: true };
|
||||
}
|
||||
|
||||
return { type: 'continue' };
|
||||
}
|
||||
|
||||
case 'system': {
|
||||
if (message.subtype === 'init') {
|
||||
const initMsg = message as SystemInitMessage;
|
||||
if (!execContext.useCleanOutput) {
|
||||
logger.info(`Model: ${initMsg.model}, Permission: ${initMsg.permissionMode}`);
|
||||
}
|
||||
return { type: 'continue', model: initMsg.model };
|
||||
}
|
||||
if (message.subtype === 'model_refusal_fallback') {
|
||||
const fallback = message as ModelRefusalFallbackMessage;
|
||||
const category = fallback.api_refusal_category ?? 'policy';
|
||||
await auditLogger.logNote(
|
||||
'model-fallback',
|
||||
`Model refused (${category}); fell back ${fallback.original_model} → ${fallback.fallback_model}`,
|
||||
);
|
||||
return { type: 'continue' };
|
||||
}
|
||||
return { type: 'continue' };
|
||||
}
|
||||
|
||||
case 'user':
|
||||
case 'tool_progress':
|
||||
case 'tool_use_summary':
|
||||
case 'auth_status':
|
||||
return { type: 'continue' };
|
||||
|
||||
case 'tool_use': {
|
||||
const toolData = handleToolUseMessage(message as unknown as ToolUseMessage);
|
||||
outputLines(formatToolUseOutput(toolData.toolName, toolData.parameters));
|
||||
await auditLogger.logToolStart(toolData.toolName, toolData.parameters);
|
||||
return { type: 'continue' };
|
||||
}
|
||||
|
||||
case 'tool_result': {
|
||||
const toolResultData = handleToolResultMessage(message as unknown as ToolResultMessage);
|
||||
outputLines(formatToolResultOutput(toolResultData.displayContent));
|
||||
await auditLogger.logToolEnd(toolResultData.content);
|
||||
return { type: 'continue' };
|
||||
}
|
||||
|
||||
case 'result': {
|
||||
const resultData = handleResultMessage(message as ResultMessage);
|
||||
outputLines(formatResultOutput(resultData, !execContext.useCleanOutput));
|
||||
|
||||
if (resultData.subtype === 'error_max_structured_output_retries') {
|
||||
return {
|
||||
type: 'throw',
|
||||
error: new PentestError(
|
||||
'Structured output validation failed after max retries',
|
||||
'validation',
|
||||
true,
|
||||
{},
|
||||
ErrorCode.OUTPUT_VALIDATION_FAILED,
|
||||
),
|
||||
};
|
||||
}
|
||||
|
||||
return {
|
||||
type: 'complete' as const,
|
||||
result: resultData.result,
|
||||
cost: resultData.cost,
|
||||
...(resultData.structuredOutput !== undefined && { structuredOutput: resultData.structuredOutput }),
|
||||
};
|
||||
}
|
||||
|
||||
default:
|
||||
logger.info(`Unhandled message type: ${message.type}`);
|
||||
return { type: 'continue' };
|
||||
}
|
||||
}
|
||||
@@ -5,30 +5,17 @@
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Model tier definitions and resolution for the pi harness.
|
||||
* Model tier definitions and resolution.
|
||||
*
|
||||
* Three tiers mapped to capability levels:
|
||||
* - "small" (Haiku — summarization, structured extraction)
|
||||
* - "medium" (Sonnet — tool use, general analysis)
|
||||
* - "large" (Opus — deep reasoning, complex analysis)
|
||||
*
|
||||
* Users override per tier via ANTHROPIC_SMALL_MODEL / ANTHROPIC_MEDIUM_MODEL /
|
||||
* ANTHROPIC_LARGE_MODEL, which works across all providers (Anthropic, Bedrock,
|
||||
* custom base URL).
|
||||
*
|
||||
* The active provider is chosen from an injected `providerConfig` (the Pro consumer)
|
||||
* or, in OSS, from the env-var contract the CLI forwards (`CLAUDE_CODE_USE_BEDROCK`,
|
||||
* `ANTHROPIC_BASE_URL`+`ANTHROPIC_AUTH_TOKEN`, else direct Anthropic). Resolution
|
||||
* returns a pi `Model` via `ModelRegistry.find`, the `thinkingLevel`, and an
|
||||
* `AuthStorage` primed with the right credential. Bedrock authenticates from the
|
||||
* AWS_ env vars via pi-ai.
|
||||
* Users override via ANTHROPIC_SMALL_MODEL / ANTHROPIC_MEDIUM_MODEL / ANTHROPIC_LARGE_MODEL,
|
||||
* which works across all providers (direct, Bedrock, Vertex).
|
||||
*/
|
||||
|
||||
import type { ThinkingLevel } from '@earendil-works/pi-agent-core';
|
||||
import type { Api, Model } from '@earendil-works/pi-ai';
|
||||
import { AuthStorage, type ModelRegistry } from '@earendil-works/pi-coding-agent';
|
||||
import type { ProviderConfig } from '../types/config.js';
|
||||
|
||||
export type ModelTier = 'small' | 'medium' | 'large';
|
||||
|
||||
const DEFAULT_MODELS: Readonly<Record<ModelTier, string>> = {
|
||||
@@ -37,62 +24,8 @@ const DEFAULT_MODELS: Readonly<Record<ModelTier, string>> = {
|
||||
large: 'claude-opus-4-8',
|
||||
};
|
||||
|
||||
export interface EffectiveProvider {
|
||||
/** pi-ai provider id: 'anthropic' or 'amazon-bedrock'. */
|
||||
providerId: string;
|
||||
/** Custom-base-URL override applied to the resolved anthropic model. */
|
||||
baseUrl?: string;
|
||||
/** Runtime credential to prime on AuthStorage for the 'anthropic' provider. */
|
||||
anthropicToken?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Determine the active provider + auth.
|
||||
*
|
||||
* An explicit `providerConfig` (injected by the Pro consumer) wins; otherwise we
|
||||
* fall back to the OSS env-var contract the CLI forwards: `CLAUDE_CODE_USE_BEDROCK`
|
||||
* → Bedrock; `ANTHROPIC_BASE_URL`+`ANTHROPIC_AUTH_TOKEN` → custom base URL; else
|
||||
* direct Anthropic (`ANTHROPIC_API_KEY`, or `CLAUDE_CODE_OAUTH_TOKEN`). Bedrock
|
||||
* authenticates from the AWS_ env vars via pi-ai, so it needs no anthropic token.
|
||||
*/
|
||||
export function resolveEffectiveProvider(apiKey?: string, providerConfig?: ProviderConfig): EffectiveProvider {
|
||||
const anthropicKey = apiKey ?? providerConfig?.apiKey ?? process.env.ANTHROPIC_API_KEY;
|
||||
const type = providerConfig?.providerType;
|
||||
|
||||
// Bedrock — explicit providerConfig or the env flag.
|
||||
if (type === 'bedrock' || (!type && process.env.CLAUDE_CODE_USE_BEDROCK === '1')) {
|
||||
return { providerId: 'amazon-bedrock' };
|
||||
}
|
||||
|
||||
// Custom base URL — explicit providerConfig.
|
||||
if (type === 'custom_base_url') {
|
||||
const eff: EffectiveProvider = { providerId: 'anthropic' };
|
||||
if (providerConfig?.baseUrl) eff.baseUrl = providerConfig.baseUrl;
|
||||
const token = providerConfig?.authToken ?? anthropicKey;
|
||||
if (token) eff.anthropicToken = token;
|
||||
return eff;
|
||||
}
|
||||
|
||||
// Custom base URL — OSS env contract (no providerConfig).
|
||||
if (!type && process.env.ANTHROPIC_BASE_URL && process.env.ANTHROPIC_AUTH_TOKEN) {
|
||||
return {
|
||||
providerId: 'anthropic',
|
||||
baseUrl: process.env.ANTHROPIC_BASE_URL,
|
||||
anthropicToken: process.env.ANTHROPIC_AUTH_TOKEN,
|
||||
};
|
||||
}
|
||||
|
||||
// Direct Anthropic (API key, or — env only — OAuth token).
|
||||
const eff: EffectiveProvider = { providerId: 'anthropic' };
|
||||
const token = anthropicKey ?? (type ? undefined : process.env.CLAUDE_CODE_OAUTH_TOKEN);
|
||||
if (token) eff.anthropicToken = token;
|
||||
return eff;
|
||||
}
|
||||
|
||||
/** Resolve a model tier to a concrete model ID (env override → providerConfig → default). */
|
||||
export function resolveModelId(tier: ModelTier = 'medium', providerConfig?: ProviderConfig): string {
|
||||
const override = providerConfig?.modelOverrides?.[tier];
|
||||
if (override) return override;
|
||||
/** Resolve a model tier to a concrete model ID. */
|
||||
export function resolveModel(tier: ModelTier = 'medium'): string {
|
||||
switch (tier) {
|
||||
case 'small':
|
||||
return process.env.ANTHROPIC_SMALL_MODEL || DEFAULT_MODELS.small;
|
||||
@@ -108,69 +41,6 @@ export function supportsAdaptiveThinking(model: string): boolean {
|
||||
return /opus-4-[678]/.test(model);
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve the thinking level for a run.
|
||||
*
|
||||
* Adaptive thinking is enabled only on capable models (Opus 4.6/4.7/4.8), mapped to
|
||||
* pi's 'medium' level; every other model runs with thinking 'off'. The
|
||||
* CLAUDE_ADAPTIVE_THINKING=false kill switch forces 'off' regardless of model.
|
||||
*/
|
||||
export function resolveThinkingLevel(modelId: string): ThinkingLevel {
|
||||
if (process.env.CLAUDE_ADAPTIVE_THINKING === 'false') return 'off';
|
||||
return supportsAdaptiveThinking(modelId) ? 'medium' : 'off';
|
||||
}
|
||||
|
||||
export interface ModelSelection {
|
||||
model: Model<Api>;
|
||||
thinkingLevel: ThinkingLevel;
|
||||
authStorage: AuthStorage;
|
||||
modelId: string;
|
||||
providerId: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve the active provider (see resolveEffectiveProvider), prime an AuthStorage
|
||||
* with its credential, and resolve the tier's model from a fresh ModelRegistry.
|
||||
* Anthropic / custom-base-URL use a runtime anthropic key; Bedrock authenticates
|
||||
* from the AWS_ env vars (bearer token primed explicitly as a belt-and-suspenders).
|
||||
*/
|
||||
export function resolveModelSelection(
|
||||
registryFactory: (authStorage: AuthStorage) => ModelRegistry,
|
||||
modelTier: ModelTier,
|
||||
apiKey?: string,
|
||||
providerConfig?: ProviderConfig,
|
||||
): ModelSelection {
|
||||
const eff = resolveEffectiveProvider(apiKey, providerConfig);
|
||||
const modelId = resolveModelId(modelTier, providerConfig);
|
||||
|
||||
const authStorage = AuthStorage.inMemory();
|
||||
if (eff.providerId === 'anthropic' && eff.anthropicToken) {
|
||||
authStorage.setRuntimeApiKey('anthropic', eff.anthropicToken);
|
||||
}
|
||||
// Bedrock auth flows from the AWS_ env vars; prime the bearer token explicitly so
|
||||
// it resolves via AuthStorage in addition to pi-ai's own env fallback.
|
||||
if (eff.providerId === 'amazon-bedrock' && process.env.AWS_BEARER_TOKEN_BEDROCK) {
|
||||
authStorage.setRuntimeApiKey('amazon-bedrock', process.env.AWS_BEARER_TOKEN_BEDROCK);
|
||||
}
|
||||
|
||||
const registry = registryFactory(authStorage);
|
||||
const found = registry.find(eff.providerId, modelId);
|
||||
if (!found) {
|
||||
throw new Error(`Model not found in pi registry: provider="${eff.providerId}" model="${modelId}"`);
|
||||
}
|
||||
|
||||
// Custom base URL: override the resolved model's endpoint.
|
||||
const model: Model<Api> = eff.baseUrl ? { ...found, baseUrl: eff.baseUrl } : found;
|
||||
|
||||
return {
|
||||
model,
|
||||
thinkingLevel: resolveThinkingLevel(modelId),
|
||||
authStorage,
|
||||
modelId,
|
||||
providerId: eff.providerId,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Whether a model is in the Fable family. Fable's safety classifiers flag
|
||||
* cybersecurity tasks and route them to Opus 4.8, so a security scan on Fable
|
||||
|
||||
@@ -4,31 +4,36 @@
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Human-readable console formatting for the agent executor.
|
||||
*
|
||||
* Driven by the pi harness event stream: `turn_end` (assistant text) and
|
||||
* `tool_execution_start` (structured tool calls). Unlike the previous harness —
|
||||
* where tool calls were tool_use JSON embedded in assistant text and had to be
|
||||
* parsed out — pi delivers tool name + args as discrete events, so formatting is
|
||||
* a direct mapping.
|
||||
*/
|
||||
|
||||
import { AGENTS } from '../session-manager.js';
|
||||
import { extractAgentType, formatDuration } from '../utils/formatting.js';
|
||||
import type { ExecutionContext } from './types.js';
|
||||
import type { ExecutionContext, ResultData } from './types.js';
|
||||
|
||||
interface ToolCallInput {
|
||||
url?: string;
|
||||
command?: string;
|
||||
element?: string;
|
||||
key?: string;
|
||||
fields?: unknown[];
|
||||
text?: string;
|
||||
action?: string;
|
||||
description?: string;
|
||||
path?: string;
|
||||
todos?: Array<{ status: string; content: string }>;
|
||||
command?: string;
|
||||
todos?: Array<{
|
||||
status: string;
|
||||
content: string;
|
||||
}>;
|
||||
[key: string]: unknown;
|
||||
}
|
||||
|
||||
/** Agent prefix used to attribute output when parallel agents interleave on one stream. */
|
||||
interface ToolCall {
|
||||
name: string;
|
||||
input?: ToolCallInput;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get agent prefix for parallel execution
|
||||
*/
|
||||
export function getAgentPrefix(description: string): string {
|
||||
// Map agent names to their prefixes
|
||||
const agentPrefixes: Record<string, string> = {
|
||||
'injection-vuln': '[Injection]',
|
||||
'xss-vuln': '[XSS]',
|
||||
@@ -42,6 +47,7 @@ export function getAgentPrefix(description: string): string {
|
||||
'ssrf-exploit': '[SSRF]',
|
||||
};
|
||||
|
||||
// First try to match by agent name directly
|
||||
for (const [agentName, prefix] of Object.entries(agentPrefixes)) {
|
||||
const agent = AGENTS[agentName as keyof typeof AGENTS];
|
||||
if (agent && description.includes(agent.displayName)) {
|
||||
@@ -49,6 +55,7 @@ export function getAgentPrefix(description: string): string {
|
||||
}
|
||||
}
|
||||
|
||||
// Fallback to partial matches for backwards compatibility
|
||||
if (description.includes('injection')) return '[Injection]';
|
||||
if (description.includes('xss')) return '[XSS]';
|
||||
if (description.includes('authz')) return '[Authz]'; // Check authz before auth
|
||||
@@ -58,7 +65,9 @@ export function getAgentPrefix(description: string): string {
|
||||
return '[Agent]';
|
||||
}
|
||||
|
||||
/** Extract domain from URL for display. */
|
||||
/**
|
||||
* Extract domain from URL for display
|
||||
*/
|
||||
function extractDomain(url: string): string {
|
||||
try {
|
||||
const urlObj = new URL(url);
|
||||
@@ -68,8 +77,11 @@ function extractDomain(url: string): string {
|
||||
}
|
||||
}
|
||||
|
||||
/** Format a playwright-cli command (run via the bash tool) into a clean progress indicator. */
|
||||
/**
|
||||
* Format playwright-cli commands into clean progress indicators
|
||||
*/
|
||||
function formatBrowserAction(command: string): string | null {
|
||||
// Extract subcommand after optional session flag (e.g., "playwright-cli -s=session1 navigate https://example.com")
|
||||
const match = command.match(/playwright-cli\s+(?:-s=\S+\s+)?(\S+)(?:\s+(.*))?/);
|
||||
if (!match) return null;
|
||||
|
||||
@@ -139,19 +151,26 @@ function formatBrowserAction(command: string): string | null {
|
||||
}
|
||||
}
|
||||
|
||||
/** Summarize a todo_write update into a clean progress indicator. */
|
||||
/**
|
||||
* Summarize TodoWrite updates into clean progress indicators
|
||||
*/
|
||||
function summarizeTodoUpdate(input: ToolCallInput | undefined): string | null {
|
||||
if (!input?.todos || !Array.isArray(input.todos)) {
|
||||
return null;
|
||||
}
|
||||
|
||||
const todos = input.todos;
|
||||
const recent = todos.filter((t) => t.status === 'completed').at(-1);
|
||||
const completed = todos.filter((t) => t.status === 'completed');
|
||||
const inProgress = todos.filter((t) => t.status === 'in_progress');
|
||||
|
||||
// Show recently completed tasks
|
||||
const recent = completed.at(-1);
|
||||
if (recent) {
|
||||
return `✅ ${recent.content}`;
|
||||
}
|
||||
|
||||
const current = todos.filter((t) => t.status === 'in_progress').at(0);
|
||||
// Show current in-progress task
|
||||
const current = inProgress.at(0);
|
||||
if (current) {
|
||||
return `🔄 ${current.content}`;
|
||||
}
|
||||
@@ -159,6 +178,69 @@ function summarizeTodoUpdate(input: ToolCallInput | undefined): string | null {
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Filter out JSON tool calls from content, with special handling for Task calls
|
||||
*/
|
||||
export function filterJsonToolCalls(content: string | null | undefined): string {
|
||||
if (!content || typeof content !== 'string') {
|
||||
return content || '';
|
||||
}
|
||||
|
||||
const lines = content.split('\n');
|
||||
const processedLines: string[] = [];
|
||||
|
||||
for (const line of lines) {
|
||||
const trimmed = line.trim();
|
||||
|
||||
// Skip empty lines
|
||||
if (trimmed === '') {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Check if this is a JSON tool call
|
||||
if (trimmed.startsWith('{"type":"tool_use"')) {
|
||||
try {
|
||||
const toolCall = JSON.parse(trimmed) as ToolCall;
|
||||
|
||||
// Special handling for Task tool calls
|
||||
if (toolCall.name === 'Task') {
|
||||
const description = toolCall.input?.description || 'analysis agent';
|
||||
processedLines.push(`🚀 Launching ${description}`);
|
||||
continue;
|
||||
}
|
||||
|
||||
// Special handling for TodoWrite tool calls
|
||||
if (toolCall.name === 'TodoWrite') {
|
||||
const summary = summarizeTodoUpdate(toolCall.input);
|
||||
if (summary) {
|
||||
processedLines.push(summary);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
// Special handling for browser tool calls (playwright-cli via Bash)
|
||||
if (toolCall.name === 'Bash') {
|
||||
const command = toolCall.input?.command || '';
|
||||
if (command.includes('playwright-cli')) {
|
||||
const browserAction = formatBrowserAction(command);
|
||||
if (browserAction) {
|
||||
processedLines.push(browserAction);
|
||||
}
|
||||
}
|
||||
}
|
||||
} catch {
|
||||
// If JSON parsing fails, treat as regular text
|
||||
processedLines.push(line);
|
||||
}
|
||||
} else {
|
||||
// Keep non-JSON lines (assistant text)
|
||||
processedLines.push(line);
|
||||
}
|
||||
}
|
||||
|
||||
return processedLines.join('\n');
|
||||
}
|
||||
|
||||
export function detectExecutionContext(description: string): ExecutionContext {
|
||||
const isParallelExecution = description.includes('vuln agent') || description.includes('exploit agent');
|
||||
|
||||
@@ -170,69 +252,62 @@ export function detectExecutionContext(description: string): ExecutionContext {
|
||||
description.includes('exploit agent');
|
||||
|
||||
const agentType = extractAgentType(description);
|
||||
|
||||
const agentKey = description.toLowerCase().replace(/\s+/g, '-');
|
||||
|
||||
return { isParallelExecution, useCleanOutput, agentType, agentKey };
|
||||
}
|
||||
|
||||
/** Format assistant turn text (from a pi `turn_end` event). */
|
||||
export function formatAssistantOutput(
|
||||
text: string,
|
||||
cleanedContent: string,
|
||||
context: ExecutionContext,
|
||||
turnCount: number,
|
||||
description: string,
|
||||
): string[] {
|
||||
if (!text.trim()) {
|
||||
if (!cleanedContent.trim()) {
|
||||
return [];
|
||||
}
|
||||
|
||||
const lines: string[] = [];
|
||||
|
||||
if (context.isParallelExecution) {
|
||||
// Compact, attributed output for interleaved parallel agents.
|
||||
return [`${getAgentPrefix(description)} ${text}`];
|
||||
// Compact output for parallel agents with prefixes
|
||||
const prefix = getAgentPrefix(description);
|
||||
lines.push(`${prefix} ${cleanedContent}`);
|
||||
} else {
|
||||
// Full turn output for sequential agents
|
||||
lines.push(`\n Turn ${turnCount} (${description}):`);
|
||||
lines.push(` ${cleanedContent}`);
|
||||
}
|
||||
// Full turn output for sequential agents.
|
||||
return [`\n Turn ${turnCount} (${description}):`, ` ${text}`];
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
/**
|
||||
* Format a pi `tool_execution_start` event into a clean one-line progress indicator.
|
||||
*
|
||||
* Maps the common tool surfaces — `task` (sub-agent delegation), `todo_write`
|
||||
* (plan updates), `bash` (incl. playwright-cli browser actions), read-only file
|
||||
* tools, and the structured collector/submit tools — to friendly lines. Returns
|
||||
* `[]` when there's nothing worth surfacing (e.g. a todo update with no active item).
|
||||
*/
|
||||
export function formatToolCall(
|
||||
toolName: string,
|
||||
args: Record<string, unknown> | undefined,
|
||||
context: ExecutionContext,
|
||||
description: string,
|
||||
): string[] {
|
||||
const input = (args ?? {}) as ToolCallInput;
|
||||
let line: string | null;
|
||||
export function formatResultOutput(data: ResultData, showFullResult: boolean): string[] {
|
||||
const lines: string[] = [];
|
||||
|
||||
if (toolName === 'task') {
|
||||
line = `🚀 Launching ${input.description ?? 'sub-agent'}`;
|
||||
} else if (toolName === 'todo_write') {
|
||||
line = summarizeTodoUpdate(input);
|
||||
} else if (toolName === 'bash') {
|
||||
const command = typeof input.command === 'string' ? input.command : '';
|
||||
line = command.includes('playwright-cli') ? formatBrowserAction(command) : `💻 ${command.slice(0, 60)}`;
|
||||
} else if (toolName === 'read' || toolName === 'grep' || toolName === 'find' || toolName === 'ls') {
|
||||
const path = typeof input.path === 'string' ? ` ${input.path.slice(0, 60)}` : '';
|
||||
line = `📖 ${toolName}${path}`;
|
||||
} else if (toolName.startsWith('set_') || toolName.startsWith('add_') || toolName.startsWith('submit_')) {
|
||||
line = `📊 ${toolName.replace(/_/g, ' ')}`;
|
||||
} else {
|
||||
line = `🔧 ${toolName}`;
|
||||
lines.push(`\n COMPLETED:`);
|
||||
lines.push(` Duration: ${(data.duration_ms / 1000).toFixed(1)}s, Cost: $${data.cost.toFixed(4)}`);
|
||||
|
||||
if (data.subtype === 'error_max_turns') {
|
||||
lines.push(` Stopped: Hit maximum turns limit`);
|
||||
} else if (data.subtype === 'error_during_execution') {
|
||||
lines.push(` Stopped: Execution error`);
|
||||
}
|
||||
|
||||
if (!line) return [];
|
||||
|
||||
if (context.isParallelExecution) {
|
||||
return [`${getAgentPrefix(description)} ${line}`];
|
||||
if (data.permissionDenials > 0) {
|
||||
lines.push(` ${data.permissionDenials} permission denials`);
|
||||
}
|
||||
return [` ${line}`];
|
||||
|
||||
if (showFullResult && data.result && typeof data.result === 'string') {
|
||||
if (data.result.length > 1000) {
|
||||
lines.push(` ${data.result.slice(0, 1000)}... [${data.result.length} total chars]`);
|
||||
} else {
|
||||
lines.push(` ${data.result}`);
|
||||
}
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
export function formatErrorOutput(
|
||||
@@ -246,11 +321,12 @@ export function formatErrorOutput(
|
||||
const lines: string[] = [];
|
||||
|
||||
if (context.isParallelExecution) {
|
||||
lines.push(`${getAgentPrefix(description)} Failed (${formatDuration(duration)})`);
|
||||
const prefix = getAgentPrefix(description);
|
||||
lines.push(`${prefix} Failed (${formatDuration(duration)})`);
|
||||
} else if (context.useCleanOutput) {
|
||||
lines.push(`${context.agentType} failed (${formatDuration(duration)})`);
|
||||
} else {
|
||||
lines.push(` pi agent failed: ${description} (${formatDuration(duration)})`);
|
||||
lines.push(` Claude Code failed: ${description} (${formatDuration(duration)})`);
|
||||
}
|
||||
|
||||
lines.push(` Error Type: ${error.constructor.name}`);
|
||||
@@ -276,12 +352,35 @@ export function formatCompletionMessage(
|
||||
duration: number,
|
||||
): string {
|
||||
if (context.isParallelExecution) {
|
||||
return `${getAgentPrefix(description)} Complete (${turnCount} turns, ${formatDuration(duration)})`;
|
||||
const prefix = getAgentPrefix(description);
|
||||
return `${prefix} Complete (${turnCount} turns, ${formatDuration(duration)})`;
|
||||
}
|
||||
|
||||
if (context.useCleanOutput) {
|
||||
return `${context.agentType.charAt(0).toUpperCase() + context.agentType.slice(1)} complete! (${turnCount} turns, ${formatDuration(duration)})`;
|
||||
}
|
||||
|
||||
return ` pi agent completed: ${description} (${turnCount} turns) in ${formatDuration(duration)}`;
|
||||
return ` Claude Code completed: ${description} (${turnCount} turns) in ${formatDuration(duration)}`;
|
||||
}
|
||||
|
||||
export function formatToolUseOutput(toolName: string, input: Record<string, unknown> | undefined): string[] {
|
||||
const lines: string[] = [];
|
||||
|
||||
lines.push(`\n Using Tool: ${toolName}`);
|
||||
if (input && Object.keys(input).length > 0) {
|
||||
lines.push(` Input: ${JSON.stringify(input, null, 2)}`);
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
export function formatToolResultOutput(displayContent: string): string[] {
|
||||
const lines: string[] = [];
|
||||
|
||||
lines.push(` Tool Result:`);
|
||||
if (displayContent) {
|
||||
lines.push(` ${displayContent}`);
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
@@ -1,389 +0,0 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
// Production agent execution on the pi harness, with git checkpoints and audit logging.
|
||||
|
||||
import { createRequire } from 'node:module';
|
||||
import type { AgentMessage } from '@earendil-works/pi-agent-core';
|
||||
import {
|
||||
type AgentSessionEvent,
|
||||
createAgentSession,
|
||||
DefaultResourceLoader,
|
||||
getAgentDir,
|
||||
ModelRegistry,
|
||||
type ResourceLoader,
|
||||
SessionManager,
|
||||
SettingsManager,
|
||||
type ToolDefinition,
|
||||
} from '@earendil-works/pi-coding-agent';
|
||||
import { fs, path } from 'zx';
|
||||
import type { AuditSession } from '../audit/index.js';
|
||||
import { BASH_TIMEOUT_EXTENSION_DIR, deliverablesDir, PLAYWRIGHT_SKILL_DIR } from '../paths.js';
|
||||
import { isRetryableError, PentestError } from '../services/error-handling.js';
|
||||
import { AGENT_VALIDATORS } from '../session-manager.js';
|
||||
import type { ActivityLogger } from '../types/activity-logger.js';
|
||||
import { ErrorCode } from '../types/errors.js';
|
||||
import { isSpendingCapBehavior, matchesBillingTextPattern } from '../utils/billing-detection.js';
|
||||
import { formatTimestamp } from '../utils/formatting.js';
|
||||
import { Timer } from '../utils/metrics.js';
|
||||
import { createAuditLogger } from './audit-logger.js';
|
||||
import { type ModelTier, resolveModelSelection } from './models.js';
|
||||
import {
|
||||
detectExecutionContext,
|
||||
formatAssistantOutput,
|
||||
formatCompletionMessage,
|
||||
formatErrorOutput,
|
||||
formatToolCall,
|
||||
} from './output-formatters.js';
|
||||
import { createProgressManager } from './progress-manager.js';
|
||||
import { permissionConfigPath } from './settings-writer.js';
|
||||
import { createGlobTool, createTaskTool, createTodoWriteTool } from './tools.js';
|
||||
|
||||
declare global {
|
||||
var SHANNON_DISABLE_LOADER: boolean | undefined;
|
||||
}
|
||||
|
||||
/** Built-in pi tools enabled for every agent (custom tool names are appended). */
|
||||
const BUILTIN_TOOLS = ['read', 'bash', 'edit', 'write', 'grep', 'find', 'ls'];
|
||||
|
||||
const requireFromHere = createRequire(import.meta.url);
|
||||
let cachedExtensionDir: string | null | undefined;
|
||||
|
||||
/** Resolve the installed @gotgenes/pi-permission-system package dir, or null. */
|
||||
function permissionExtensionDir(): string | null {
|
||||
if (cachedExtensionDir !== undefined) return cachedExtensionDir;
|
||||
try {
|
||||
const entry = requireFromHere.resolve('@gotgenes/pi-permission-system');
|
||||
cachedExtensionDir = path.dirname(path.dirname(entry));
|
||||
} catch {
|
||||
cachedExtensionDir = null;
|
||||
}
|
||||
return cachedExtensionDir;
|
||||
}
|
||||
|
||||
async function buildResourceLoader(cwd: string, logger: ActivityLogger): Promise<ResourceLoader> {
|
||||
// Always enforce bounded bash timeouts so an unbounded command cannot hang the agent.
|
||||
const additionalExtensionPaths: string[] = [BASH_TIMEOUT_EXTENSION_DIR];
|
||||
if (fs.existsSync(permissionConfigPath())) {
|
||||
const extDir = permissionExtensionDir();
|
||||
if (extDir) {
|
||||
additionalExtensionPaths.push(extDir);
|
||||
} else {
|
||||
logger.warn(
|
||||
'code_path deny config present but @gotgenes/pi-permission-system not resolvable — skipping enforcement',
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
const loader = new DefaultResourceLoader({
|
||||
cwd,
|
||||
agentDir: getAgentDir(),
|
||||
additionalSkillPaths: [PLAYWRIGHT_SKILL_DIR],
|
||||
...(additionalExtensionPaths.length > 0 && { additionalExtensionPaths }),
|
||||
});
|
||||
await loader.reload();
|
||||
return loader;
|
||||
}
|
||||
|
||||
export interface PiPromptResult {
|
||||
result?: string | null | undefined;
|
||||
success: boolean;
|
||||
duration: number;
|
||||
turns?: number | undefined;
|
||||
cost: number;
|
||||
model?: string | undefined;
|
||||
partialCost?: number | undefined;
|
||||
apiErrorDetected?: boolean | undefined;
|
||||
error?: string | undefined;
|
||||
errorType?: string | undefined;
|
||||
prompt?: string | undefined;
|
||||
retryable?: boolean | undefined;
|
||||
structuredOutput?: unknown;
|
||||
}
|
||||
|
||||
function outputLines(lines: string[]): void {
|
||||
for (const line of lines) {
|
||||
console.log(line);
|
||||
}
|
||||
}
|
||||
|
||||
async function writeErrorLog(
|
||||
err: Error & { code?: string; status?: number },
|
||||
sourceDir: string,
|
||||
fullPrompt: string,
|
||||
duration: number,
|
||||
): Promise<void> {
|
||||
try {
|
||||
const errorLog = {
|
||||
timestamp: formatTimestamp(),
|
||||
agent: 'pi-executor',
|
||||
error: { name: err.constructor.name, message: err.message, code: err.code, status: err.status, stack: err.stack },
|
||||
context: { sourceDir, prompt: `${fullPrompt.slice(0, 200)}...`, retryable: isRetryableError(err) },
|
||||
duration,
|
||||
};
|
||||
const logPath = path.join(deliverablesDir(sourceDir), 'error.log');
|
||||
await fs.appendFile(logPath, `${JSON.stringify(errorLog)}\n`);
|
||||
} catch {
|
||||
// Best-effort error log writing - don't propagate failures
|
||||
}
|
||||
}
|
||||
|
||||
export async function validateAgentOutput(
|
||||
result: PiPromptResult,
|
||||
agentName: string | null,
|
||||
sourceDir: string,
|
||||
logger: ActivityLogger,
|
||||
): Promise<boolean> {
|
||||
logger.info(`Validating ${agentName} agent output`);
|
||||
try {
|
||||
if (!result.success || (!result.result && result.structuredOutput === undefined)) {
|
||||
logger.error('Validation failed: Agent execution was unsuccessful');
|
||||
return false;
|
||||
}
|
||||
const validator = agentName ? AGENT_VALIDATORS[agentName as keyof typeof AGENT_VALIDATORS] : undefined;
|
||||
if (!validator) {
|
||||
logger.warn(`No validator found for agent "${agentName}" - assuming success`);
|
||||
return true;
|
||||
}
|
||||
logger.info(`Using validator for agent: ${agentName}`, { sourceDir });
|
||||
const validationResult = await validator(sourceDir, logger);
|
||||
if (validationResult) {
|
||||
logger.info('Validation passed: Required files/structure present');
|
||||
} else {
|
||||
logger.error('Validation failed: Missing required deliverable files');
|
||||
}
|
||||
return validationResult;
|
||||
} catch (error) {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
logger.error(`Validation failed with error: ${errMsg}`);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/** Concatenate the text blocks of an assistant message (skips thinking + tool calls). */
|
||||
function extractAssistantText(message: AgentMessage): string {
|
||||
if (message.role !== 'assistant') return '';
|
||||
const blocks = message.content as Array<{ type: string; text?: string }>;
|
||||
return blocks
|
||||
.filter((c) => c.type === 'text')
|
||||
.map((c) => c.text ?? '')
|
||||
.join('\n');
|
||||
}
|
||||
|
||||
/**
|
||||
* Classify error-bearing text into a PentestError, mirroring the prior provider error
|
||||
* handling. Spending-cap / billing text is retryable (Temporal backs off and
|
||||
* recovers when the cap resets); session limit is permanent.
|
||||
*/
|
||||
function classifyErrorText(content: string): PentestError | null {
|
||||
if (!content) return null;
|
||||
if (matchesBillingTextPattern(content)) {
|
||||
return new PentestError(
|
||||
`Billing limit reached: ${content.slice(0, 100)}`,
|
||||
'billing',
|
||||
true,
|
||||
{},
|
||||
ErrorCode.SPENDING_CAP_REACHED,
|
||||
);
|
||||
}
|
||||
if (content.toLowerCase().includes('session limit reached')) {
|
||||
return new PentestError('Session limit reached', 'billing', false);
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
// Low-level pi execution. Drives one agent session to completion with progress and
|
||||
// audit logging. Exported for Temporal activities to call single-attempt execution.
|
||||
export async function runPiPrompt(
|
||||
prompt: string,
|
||||
sourceDir: string,
|
||||
context: string = '',
|
||||
description: string = 'Agent analysis',
|
||||
_agentName: string | null = null,
|
||||
auditSession: AuditSession | null = null,
|
||||
logger: ActivityLogger,
|
||||
modelTier: ModelTier = 'medium',
|
||||
callerTools?: ToolDefinition[],
|
||||
apiKey?: string,
|
||||
deliverablesSubdir?: string,
|
||||
providerConfig?: import('../types/config.js').ProviderConfig,
|
||||
): Promise<PiPromptResult> {
|
||||
// 1. Initialize timing and prompt
|
||||
const timer = new Timer(`agent-${description.toLowerCase().replace(/\s+/g, '-')}`);
|
||||
const fullPrompt = context ? `${context}\n\n${prompt}` : prompt;
|
||||
|
||||
// 2. Set up progress and audit infrastructure
|
||||
const execContext = detectExecutionContext(description);
|
||||
const progress = createProgressManager(
|
||||
{ description, useCleanOutput: execContext.useCleanOutput },
|
||||
global.SHANNON_DISABLE_LOADER ?? false,
|
||||
);
|
||||
const auditLogger = createAuditLogger(auditSession);
|
||||
|
||||
logger.info(`Running pi agent: ${description}...`);
|
||||
|
||||
// 3. Expose bash-invoked CLI tooling (playwright-cli, save-deliverable) to the
|
||||
// environment pi's bash tool inherits. These are constant per container, so
|
||||
// setting them on process.env is parallel-safe across this workflow's agents.
|
||||
process.env.PLAYWRIGHT_MCP_OUTPUT_DIR = deliverablesSubdir
|
||||
? path.join(sourceDir, path.dirname(deliverablesSubdir), '.playwright-cli')
|
||||
: path.join(sourceDir, '.shannon', '.playwright-cli');
|
||||
if (deliverablesSubdir) process.env.SHANNON_DELIVERABLES_SUBDIR = deliverablesSubdir;
|
||||
if (apiKey) process.env.ANTHROPIC_API_KEY = apiKey;
|
||||
|
||||
// 4. Resolve model + auth, then assemble the tool set (universal task/todo tools
|
||||
// plus any caller-supplied collector/submit tools).
|
||||
const selection = resolveModelSelection((auth) => ModelRegistry.create(auth), modelTier, apiKey, providerConfig);
|
||||
const resourceLoader = await buildResourceLoader(sourceDir, logger);
|
||||
// Accumulates cost from in-process `task` child sessions so the parent's reported
|
||||
// cost includes sub-agent spend (their getSessionStats is separate from ours).
|
||||
const childUsage = { cost: 0 };
|
||||
const customTools: ToolDefinition[] = [
|
||||
createTaskTool({
|
||||
model: selection.model,
|
||||
thinkingLevel: selection.thinkingLevel,
|
||||
authStorage: selection.authStorage,
|
||||
cwd: sourceDir,
|
||||
childUsage,
|
||||
resourceLoader,
|
||||
}),
|
||||
createTodoWriteTool(auditLogger),
|
||||
createGlobTool(sourceDir),
|
||||
...(callerTools ?? []),
|
||||
];
|
||||
// pi's `tools` allowlist gates custom tools too — list every custom name.
|
||||
const tools = [...BUILTIN_TOOLS, ...customTools.map((t) => t.name)];
|
||||
|
||||
let turnCount = 0;
|
||||
let pendingError: PentestError | null = null;
|
||||
let apiErrorDetected = false;
|
||||
|
||||
progress.start();
|
||||
|
||||
try {
|
||||
const { session } = await createAgentSession({
|
||||
cwd: sourceDir,
|
||||
model: selection.model,
|
||||
thinkingLevel: selection.thinkingLevel,
|
||||
tools,
|
||||
customTools,
|
||||
authStorage: selection.authStorage,
|
||||
sessionManager: SessionManager.inMemory(),
|
||||
// Temporal owns retry; pi compaction stays on (no analog previously, guards
|
||||
// against context overflow on long agent runs).
|
||||
settingsManager: SettingsManager.inMemory({ retry: { enabled: false }, compaction: { enabled: true } }),
|
||||
resourceLoader,
|
||||
});
|
||||
|
||||
// 5. Map pi events to audit logging + progress + error capture.
|
||||
session.subscribe((event: AgentSessionEvent) => {
|
||||
switch (event.type) {
|
||||
case 'turn_end': {
|
||||
turnCount += 1;
|
||||
const msg = event.message;
|
||||
const text = extractAssistantText(msg);
|
||||
if (text.trim()) {
|
||||
void auditLogger.logLlmResponse(turnCount, text);
|
||||
progress.stop();
|
||||
outputLines(formatAssistantOutput(text, execContext, turnCount, description));
|
||||
progress.start();
|
||||
const billing = classifyErrorText(text);
|
||||
if (billing) pendingError = billing;
|
||||
}
|
||||
if (msg.role === 'assistant' && msg.stopReason === 'error') {
|
||||
apiErrorDetected = true;
|
||||
pendingError =
|
||||
pendingError ??
|
||||
classifyErrorText(msg.errorMessage ?? '') ??
|
||||
new PentestError(`Agent error: ${(msg.errorMessage ?? 'unknown').slice(0, 200)}`, 'unknown', true);
|
||||
}
|
||||
break;
|
||||
}
|
||||
case 'tool_execution_start': {
|
||||
void auditLogger.logToolStart(event.toolName, event.args);
|
||||
const toolLines = formatToolCall(
|
||||
event.toolName,
|
||||
event.args as Record<string, unknown>,
|
||||
execContext,
|
||||
description,
|
||||
);
|
||||
if (toolLines.length > 0) {
|
||||
progress.stop();
|
||||
outputLines(toolLines);
|
||||
progress.start();
|
||||
}
|
||||
break;
|
||||
}
|
||||
case 'tool_execution_end':
|
||||
void auditLogger.logToolEnd(event.result);
|
||||
break;
|
||||
case 'compaction_end':
|
||||
if (!event.aborted && !event.willRetry && event.errorMessage) {
|
||||
pendingError =
|
||||
pendingError ??
|
||||
classifyErrorText(event.errorMessage) ??
|
||||
new PentestError(`Context compaction failed: ${event.errorMessage.slice(0, 200)}`, 'unknown', true);
|
||||
}
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
});
|
||||
|
||||
// 6. Run the agent to completion (resolves at agent_end).
|
||||
await session.prompt(fullPrompt);
|
||||
session.dispose();
|
||||
|
||||
// 7. Surface any error captured during the run.
|
||||
if (pendingError) throw pendingError;
|
||||
|
||||
// 8. Read usage/cost and final text.
|
||||
const stats = session.getSessionStats();
|
||||
const totalCost = stats.cost + childUsage.cost;
|
||||
const result = session.getLastAssistantText() ?? null;
|
||||
|
||||
// 9. Defense-in-depth: detect a spending cap that produced an empty/cheap run.
|
||||
if (isSpendingCapBehavior(turnCount, totalCost, result || '')) {
|
||||
throw new PentestError(
|
||||
`Spending cap likely reached (turns=${turnCount}, cost=$0): ${result?.slice(0, 100)}`,
|
||||
'billing',
|
||||
true,
|
||||
);
|
||||
}
|
||||
|
||||
const duration = timer.stop();
|
||||
progress.finish(formatCompletionMessage(execContext, description, turnCount, duration));
|
||||
|
||||
return {
|
||||
result,
|
||||
success: true,
|
||||
duration,
|
||||
turns: turnCount,
|
||||
cost: totalCost,
|
||||
model: selection.model.id,
|
||||
partialCost: totalCost,
|
||||
apiErrorDetected,
|
||||
};
|
||||
} catch (error) {
|
||||
// 10. Handle errors — log, write error file, return failure
|
||||
const duration = timer.stop();
|
||||
const err = error as Error & { code?: string; status?: number };
|
||||
await auditLogger.logError(err, duration, turnCount);
|
||||
progress.stop();
|
||||
outputLines(formatErrorOutput(err, execContext, description, duration, sourceDir, isRetryableError(err)));
|
||||
await writeErrorLog(err, sourceDir, fullPrompt, duration);
|
||||
|
||||
return {
|
||||
error: err.message,
|
||||
errorType: err.constructor.name,
|
||||
prompt: `${fullPrompt.slice(0, 100)}...`,
|
||||
success: false,
|
||||
duration,
|
||||
cost: 0,
|
||||
retryable: isRetryableError(err),
|
||||
};
|
||||
}
|
||||
}
|
||||
+182
-126
@@ -5,114 +5,196 @@
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* TypeBox schemas + submit-tool factory for vulnerability exploitation queues.
|
||||
* Zod schema definitions for vulnerability exploitation queue structured outputs.
|
||||
*
|
||||
* pi has no JSON-schema output format, so each vuln agent's structured queue is
|
||||
* captured via a `submit_exploitation_queue` custom tool whose parameters mirror
|
||||
* the per-class schema below. The captured payload is written to
|
||||
* `<class>_exploitation_queue.json` by the caller (agent-execution).
|
||||
* Each vuln agent returns a structured JSON response matching its schema.
|
||||
* The SDK validates the output against the JSON Schema generated from these Zod definitions.
|
||||
*/
|
||||
|
||||
import { defineTool, type ToolDefinition } from '@earendil-works/pi-coding-agent';
|
||||
import { type Static, type TObject, Type } from 'typebox';
|
||||
import type { JsonSchemaOutputFormat } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { z } from 'zod';
|
||||
import type { AgentName } from '../types/agents.js';
|
||||
|
||||
// === Common Fields ===
|
||||
|
||||
const ANALYSIS_NOTES_DESCRIPTION = 'Plain context for defenders (caveats, scope, what is at risk). Not attack steps.';
|
||||
|
||||
const optStr = (description?: string) => Type.Optional(Type.String(description ? { description } : {}));
|
||||
|
||||
/** Base fields shared by every queue entry. `notes` gains guidance in analysis mode. */
|
||||
function baseFields(exploit: boolean) {
|
||||
return {
|
||||
ID: Type.String(),
|
||||
vulnerability_type: Type.String(),
|
||||
externally_exploitable: Type.Boolean(),
|
||||
confidence: Type.String(),
|
||||
notes: exploit ? optStr() : optStr(ANALYSIS_NOTES_DESCRIPTION),
|
||||
};
|
||||
function notesField(exploit: boolean) {
|
||||
const f = z.string().optional();
|
||||
return exploit ? f : f.describe(ANALYSIS_NOTES_DESCRIPTION);
|
||||
}
|
||||
|
||||
const injectionFields = {
|
||||
source: optStr(),
|
||||
combined_sources: optStr(),
|
||||
path: optStr(),
|
||||
sink_call: optStr(),
|
||||
slot_type: optStr(),
|
||||
sanitization_observed: optStr(),
|
||||
concat_occurrences: optStr(),
|
||||
verdict: optStr(),
|
||||
mismatch_reason: optStr(),
|
||||
witness_payload: optStr(),
|
||||
};
|
||||
|
||||
const xssFields = {
|
||||
source: optStr(),
|
||||
source_detail: optStr(),
|
||||
path: optStr(),
|
||||
sink_function: optStr(),
|
||||
render_context: optStr(),
|
||||
encoding_observed: optStr(),
|
||||
verdict: optStr(),
|
||||
mismatch_reason: optStr(),
|
||||
witness_payload: optStr(),
|
||||
};
|
||||
|
||||
const authFields = {
|
||||
source_endpoint: optStr(),
|
||||
vulnerable_code_location: optStr(),
|
||||
missing_defense: optStr(),
|
||||
exploitation_hypothesis: optStr(),
|
||||
suggested_exploit_technique: optStr(),
|
||||
};
|
||||
|
||||
const ssrfFields = {
|
||||
source_endpoint: optStr(),
|
||||
vulnerable_parameter: optStr(),
|
||||
vulnerable_code_location: optStr(),
|
||||
missing_defense: optStr(),
|
||||
exploitation_hypothesis: optStr(),
|
||||
suggested_exploit_technique: optStr(),
|
||||
};
|
||||
|
||||
const authzFields = {
|
||||
endpoint: optStr(),
|
||||
vulnerable_code_location: optStr(),
|
||||
role_context: optStr(),
|
||||
guard_evidence: optStr(),
|
||||
side_effect: optStr(),
|
||||
reason: optStr(),
|
||||
minimal_witness: optStr(),
|
||||
};
|
||||
|
||||
const PER_TYPE_FIELDS: Partial<Record<AgentName, Record<string, ReturnType<typeof optStr>>>> = {
|
||||
'injection-vuln': injectionFields,
|
||||
'xss-vuln': xssFields,
|
||||
'auth-vuln': authFields,
|
||||
'ssrf-vuln': ssrfFields,
|
||||
'authz-vuln': authzFields,
|
||||
};
|
||||
|
||||
/** Build the `{ vulnerabilities: [...] }` queue schema for an agent + mode. */
|
||||
function queueSchema(agentName: AgentName, exploit: boolean): TObject | null {
|
||||
const extra = PER_TYPE_FIELDS[agentName];
|
||||
if (!extra) return null;
|
||||
return Type.Object({
|
||||
vulnerabilities: Type.Array(Type.Object({ ...baseFields(exploit), ...extra })),
|
||||
function makeBase(exploit: boolean) {
|
||||
return z.object({
|
||||
ID: z.string(),
|
||||
vulnerability_type: z.string(),
|
||||
externally_exploitable: z.boolean(),
|
||||
confidence: z.string(),
|
||||
notes: notesField(exploit),
|
||||
});
|
||||
}
|
||||
|
||||
// === Inferred entry types (consumed by renderers) ===
|
||||
export type InjectionFinding = Static<ReturnType<typeof injectionEntry>>;
|
||||
export type XssFinding = Static<ReturnType<typeof xssEntry>>;
|
||||
export type AuthFinding = Static<ReturnType<typeof authEntry>>;
|
||||
export type SsrfFinding = Static<ReturnType<typeof ssrfEntry>>;
|
||||
export type AuthzFinding = Static<ReturnType<typeof authzEntry>>;
|
||||
// === Per-Vuln-Type Schemas (used for type inference; notes description is mode-agnostic for types) ===
|
||||
|
||||
const injectionEntry = () => Type.Object({ ...baseFields(true), ...injectionFields });
|
||||
const xssEntry = () => Type.Object({ ...baseFields(true), ...xssFields });
|
||||
const authEntry = () => Type.Object({ ...baseFields(true), ...authFields });
|
||||
const ssrfEntry = () => Type.Object({ ...baseFields(true), ...ssrfFields });
|
||||
const authzEntry = () => Type.Object({ ...baseFields(true), ...authzFields });
|
||||
const baseVulnerability = makeBase(true);
|
||||
|
||||
const InjectionVulnerability = baseVulnerability.extend({
|
||||
source: z.string().optional(),
|
||||
combined_sources: z.string().optional(),
|
||||
path: z.string().optional(),
|
||||
sink_call: z.string().optional(),
|
||||
slot_type: z.string().optional(),
|
||||
sanitization_observed: z.string().optional(),
|
||||
concat_occurrences: z.string().optional(),
|
||||
verdict: z.string().optional(),
|
||||
mismatch_reason: z.string().optional(),
|
||||
witness_payload: z.string().optional(),
|
||||
});
|
||||
|
||||
const XssVulnerability = baseVulnerability.extend({
|
||||
source: z.string().optional(),
|
||||
source_detail: z.string().optional(),
|
||||
path: z.string().optional(),
|
||||
sink_function: z.string().optional(),
|
||||
render_context: z.string().optional(),
|
||||
encoding_observed: z.string().optional(),
|
||||
verdict: z.string().optional(),
|
||||
mismatch_reason: z.string().optional(),
|
||||
witness_payload: z.string().optional(),
|
||||
});
|
||||
|
||||
const AuthVulnerability = baseVulnerability.extend({
|
||||
source_endpoint: z.string().optional(),
|
||||
vulnerable_code_location: z.string().optional(),
|
||||
missing_defense: z.string().optional(),
|
||||
exploitation_hypothesis: z.string().optional(),
|
||||
suggested_exploit_technique: z.string().optional(),
|
||||
});
|
||||
|
||||
const SsrfVulnerability = baseVulnerability.extend({
|
||||
source_endpoint: z.string().optional(),
|
||||
vulnerable_parameter: z.string().optional(),
|
||||
vulnerable_code_location: z.string().optional(),
|
||||
missing_defense: z.string().optional(),
|
||||
exploitation_hypothesis: z.string().optional(),
|
||||
suggested_exploit_technique: z.string().optional(),
|
||||
});
|
||||
|
||||
const AuthzVulnerability = baseVulnerability.extend({
|
||||
endpoint: z.string().optional(),
|
||||
vulnerable_code_location: z.string().optional(),
|
||||
role_context: z.string().optional(),
|
||||
guard_evidence: z.string().optional(),
|
||||
side_effect: z.string().optional(),
|
||||
reason: z.string().optional(),
|
||||
minimal_witness: z.string().optional(),
|
||||
});
|
||||
|
||||
// === Inferred Entry Types (consumed by renderer) ===
|
||||
|
||||
export type InjectionFinding = z.infer<typeof InjectionVulnerability>;
|
||||
export type XssFinding = z.infer<typeof XssVulnerability>;
|
||||
export type AuthFinding = z.infer<typeof AuthVulnerability>;
|
||||
export type SsrfFinding = z.infer<typeof SsrfVulnerability>;
|
||||
export type AuthzFinding = z.infer<typeof AuthzVulnerability>;
|
||||
|
||||
// === Convert to JSON Schema for SDK ===
|
||||
|
||||
// NOTE: The SDK's AJV validator expects draft-07. Zod defaults to draft-2020-12 which
|
||||
// causes the SDK to silently skip structured output.
|
||||
function toOutputFormat(zodSchema: z.ZodType): JsonSchemaOutputFormat {
|
||||
return { type: 'json_schema', schema: z.toJSONSchema(zodSchema, { target: 'draft-07' }) as Record<string, unknown> };
|
||||
}
|
||||
|
||||
// === Per-Mode Output Format Builders ===
|
||||
// Two maps cached at module load; the only per-mode difference is the
|
||||
// description on the `notes` field, which steers the LLM's writing.
|
||||
|
||||
function buildOutputFormats(exploit: boolean): Partial<Record<AgentName, JsonSchemaOutputFormat>> {
|
||||
const base = makeBase(exploit);
|
||||
return {
|
||||
'injection-vuln': toOutputFormat(
|
||||
z.object({
|
||||
vulnerabilities: z.array(
|
||||
base.extend({
|
||||
source: z.string().optional(),
|
||||
combined_sources: z.string().optional(),
|
||||
path: z.string().optional(),
|
||||
sink_call: z.string().optional(),
|
||||
slot_type: z.string().optional(),
|
||||
sanitization_observed: z.string().optional(),
|
||||
concat_occurrences: z.string().optional(),
|
||||
verdict: z.string().optional(),
|
||||
mismatch_reason: z.string().optional(),
|
||||
witness_payload: z.string().optional(),
|
||||
}),
|
||||
),
|
||||
}),
|
||||
),
|
||||
'xss-vuln': toOutputFormat(
|
||||
z.object({
|
||||
vulnerabilities: z.array(
|
||||
base.extend({
|
||||
source: z.string().optional(),
|
||||
source_detail: z.string().optional(),
|
||||
path: z.string().optional(),
|
||||
sink_function: z.string().optional(),
|
||||
render_context: z.string().optional(),
|
||||
encoding_observed: z.string().optional(),
|
||||
verdict: z.string().optional(),
|
||||
mismatch_reason: z.string().optional(),
|
||||
witness_payload: z.string().optional(),
|
||||
}),
|
||||
),
|
||||
}),
|
||||
),
|
||||
'auth-vuln': toOutputFormat(
|
||||
z.object({
|
||||
vulnerabilities: z.array(
|
||||
base.extend({
|
||||
source_endpoint: z.string().optional(),
|
||||
vulnerable_code_location: z.string().optional(),
|
||||
missing_defense: z.string().optional(),
|
||||
exploitation_hypothesis: z.string().optional(),
|
||||
suggested_exploit_technique: z.string().optional(),
|
||||
}),
|
||||
),
|
||||
}),
|
||||
),
|
||||
'ssrf-vuln': toOutputFormat(
|
||||
z.object({
|
||||
vulnerabilities: z.array(
|
||||
base.extend({
|
||||
source_endpoint: z.string().optional(),
|
||||
vulnerable_parameter: z.string().optional(),
|
||||
vulnerable_code_location: z.string().optional(),
|
||||
missing_defense: z.string().optional(),
|
||||
exploitation_hypothesis: z.string().optional(),
|
||||
suggested_exploit_technique: z.string().optional(),
|
||||
}),
|
||||
),
|
||||
}),
|
||||
),
|
||||
'authz-vuln': toOutputFormat(
|
||||
z.object({
|
||||
vulnerabilities: z.array(
|
||||
base.extend({
|
||||
endpoint: z.string().optional(),
|
||||
vulnerable_code_location: z.string().optional(),
|
||||
role_context: z.string().optional(),
|
||||
guard_evidence: z.string().optional(),
|
||||
side_effect: z.string().optional(),
|
||||
reason: z.string().optional(),
|
||||
minimal_witness: z.string().optional(),
|
||||
}),
|
||||
),
|
||||
}),
|
||||
),
|
||||
};
|
||||
}
|
||||
|
||||
const OUTPUT_FORMATS_EXPLOIT = buildOutputFormats(true);
|
||||
const OUTPUT_FORMATS_ANALYSIS = buildOutputFormats(false);
|
||||
|
||||
const VULN_AGENT_QUEUE_FILENAMES: Partial<Record<AgentName, string>> = {
|
||||
'injection-vuln': 'injection_exploitation_queue.json',
|
||||
@@ -122,38 +204,12 @@ const VULN_AGENT_QUEUE_FILENAMES: Partial<Record<AgentName, string>> = {
|
||||
'authz-vuln': 'authz_exploitation_queue.json',
|
||||
};
|
||||
|
||||
/** Returns the structured output format for a vuln agent, or undefined for non-vuln agents. */
|
||||
export function getOutputFormat(agentName: AgentName, exploit = true): JsonSchemaOutputFormat | undefined {
|
||||
return (exploit ? OUTPUT_FORMATS_EXPLOIT : OUTPUT_FORMATS_ANALYSIS)[agentName];
|
||||
}
|
||||
|
||||
/** Returns the queue filename for a vuln agent, or undefined for non-vuln agents. */
|
||||
export function getQueueFilename(agentName: AgentName): string | undefined {
|
||||
return VULN_AGENT_QUEUE_FILENAMES[agentName];
|
||||
}
|
||||
|
||||
export interface QueueSubmitTool {
|
||||
tool: ToolDefinition;
|
||||
getCaptured: () => unknown;
|
||||
}
|
||||
|
||||
/**
|
||||
* Build the `submit_exploitation_queue` tool for a vuln agent, or null for
|
||||
* non-vuln agents. The agent calls it once with the full findings list; the
|
||||
* captured payload is the structured queue.
|
||||
*/
|
||||
export function createQueueSubmitTool(agentName: AgentName, exploit: boolean): QueueSubmitTool | null {
|
||||
const schema = queueSchema(agentName, exploit);
|
||||
if (!schema) return null;
|
||||
let captured: unknown;
|
||||
const tool = defineTool({
|
||||
name: 'submit_exploitation_queue',
|
||||
label: 'Submit Exploitation Queue',
|
||||
description:
|
||||
'Submit the final structured list of analyzed vulnerabilities for this class. Call exactly once when ' +
|
||||
'analysis is complete, with every finding included.',
|
||||
promptSnippet: 'submit_exploitation_queue: record the final structured findings list (call once)',
|
||||
parameters: schema,
|
||||
execute: async (_toolCallId, params) => {
|
||||
captured = params;
|
||||
const count = (params as { vulnerabilities?: unknown[] }).vulnerabilities?.length ?? 0;
|
||||
return { content: [{ type: 'text' as const, text: `Recorded ${count} findings.` }], details: {} };
|
||||
},
|
||||
});
|
||||
return { tool, getCaptured: () => captured };
|
||||
}
|
||||
|
||||
@@ -5,71 +5,37 @@
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Writes the @gotgenes/pi-permission-system global config from `code_path` avoid
|
||||
* patterns. The executor loads the extension (see pi-executor) and pi enforces
|
||||
* these path denies at the tool layer for every agent. Written to the global config
|
||||
* dir under `agentDir` — the project-scoped path is gated behind project trust,
|
||||
* which our headless runs do not grant; the global path is not.
|
||||
* Writes ~/.claude/settings.json with permissions.deny rules derived from
|
||||
* `code_path` avoid patterns. The SDK reads this via `settingSources: ['user']`;
|
||||
* deny rules fire even in `bypassPermissions` mode.
|
||||
*/
|
||||
|
||||
import { getAgentDir } from '@earendil-works/pi-coding-agent';
|
||||
import os from 'node:os';
|
||||
import { fs, path } from 'zx';
|
||||
import type { DistributedConfig } from '../types/config.js';
|
||||
|
||||
/** Absolute path to the pi-permission-system global config.json. */
|
||||
export function permissionConfigPath(): string {
|
||||
return path.join(getAgentDir(), 'extensions', 'pi-permission-system', 'config.json');
|
||||
const FILE_TOOLS = ['Read', 'Edit'] as const;
|
||||
|
||||
function denyEntriesFor(pattern: string): string[] {
|
||||
const arg = `./${pattern.replace(/^[./]+/, '')}`;
|
||||
return FILE_TOOLS.map((tool) => `${tool}(${arg})`);
|
||||
}
|
||||
|
||||
/**
|
||||
* Write (or remove) the pi-permission-system config derived from `code_path`
|
||||
* avoid patterns.
|
||||
*
|
||||
* Each avoid maps to a cross-cutting `path` deny — the strongest surface, blocking
|
||||
* the path across every tool and bash command, and not overridable by a per-tool
|
||||
* allow. `"*": "allow"` keeps everything else permitted so the extension does not
|
||||
* fall back to its default `ask` (which would block all access headlessly). When
|
||||
* there are no avoids the config is removed, so the executor skips loading the
|
||||
* extension entirely.
|
||||
*/
|
||||
export async function writeCodePathPermissionConfig(config: DistributedConfig | null): Promise<void> {
|
||||
export async function writeUserSettingsForCodePathAvoids(config: DistributedConfig | null): Promise<void> {
|
||||
const avoidPatterns = (config?.avoid ?? []).filter((r) => r.type === 'code_path').map((r) => r.value);
|
||||
const configPath = permissionConfigPath();
|
||||
const settingsPath = path.join(os.homedir(), '.claude', 'settings.json');
|
||||
|
||||
if (avoidPatterns.length === 0) {
|
||||
await fs.remove(configPath);
|
||||
await fs.remove(settingsPath);
|
||||
return;
|
||||
}
|
||||
|
||||
// pi's matcher (wildcard-matcher.ts) has NO `**` globstar — it splits on each `*`
|
||||
// and joins with `.*`, and a single `*` already matches any chars incl. `/`. Tool
|
||||
// paths are compared as absolute (path-utils resolves them against cwd), so we
|
||||
// collapse `**`→`*` and add a `*/`-prefixed variant that matches the path under
|
||||
// any repo prefix. (A bare pattern never matches an absolute path.)
|
||||
const pathDeny: Record<string, 'allow' | 'deny'> = { '*': 'allow' };
|
||||
for (const pattern of avoidPatterns) {
|
||||
const clean = pattern.replace(/^[./]+/, '').replace(/\*\*/g, '*');
|
||||
// Deny the contents (under any repo prefix and as written)...
|
||||
pathDeny[`*/${clean}`] = 'deny';
|
||||
pathDeny[clean] = 'deny';
|
||||
// ...and the folder path itself, so the directory entry is denied too — the
|
||||
// contents patterns (…/*) require a trailing segment and wouldn't match it.
|
||||
if (clean.endsWith('/*')) {
|
||||
const folder = clean.slice(0, -2);
|
||||
if (folder) {
|
||||
pathDeny[`*/${folder}`] = 'deny';
|
||||
pathDeny[folder] = 'deny';
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
const permissionConfig = {
|
||||
permission: {
|
||||
'*': 'allow',
|
||||
path: pathDeny,
|
||||
const settings = {
|
||||
permissions: {
|
||||
deny: avoidPatterns.flatMap(denyEntriesFor),
|
||||
},
|
||||
};
|
||||
|
||||
await fs.ensureDir(path.dirname(configPath));
|
||||
await fs.writeJson(configPath, permissionConfig, { spaces: 2 });
|
||||
await fs.ensureDir(path.dirname(settingsPath));
|
||||
await fs.writeJson(settingsPath, settings, { spaces: 2 });
|
||||
}
|
||||
|
||||
@@ -1,205 +0,0 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Universal custom tools registered for every agent: `task`, `todo_write`, and `glob`.
|
||||
*
|
||||
* These replace harness built-ins that pi does not ship. `task` delegates a focused
|
||||
* sub-task to an in-process child session (the Task sub-agent replacement);
|
||||
* `todo_write` is a full-state-replace planning scratchpad mirrored to the workflow
|
||||
* log; `glob` is fast-glob file matching (pi has no `Glob` built-in).
|
||||
*/
|
||||
|
||||
import type { ThinkingLevel } from '@earendil-works/pi-agent-core';
|
||||
import type { Api, Model } from '@earendil-works/pi-ai';
|
||||
import {
|
||||
type AuthStorage,
|
||||
createAgentSession,
|
||||
defineTool,
|
||||
type ResourceLoader,
|
||||
SessionManager,
|
||||
SettingsManager,
|
||||
type ToolDefinition,
|
||||
} from '@earendil-works/pi-coding-agent';
|
||||
import { Type } from 'typebox';
|
||||
import { fs, glob, path } from 'zx';
|
||||
import type { AuditLogger } from './audit-logger.js';
|
||||
|
||||
/** Tool surface for child sessions: read/search plus `write`+`bash` to author and run scripts. */
|
||||
const CHILD_TOOLS = ['read', 'grep', 'find', 'ls', 'write', 'bash'];
|
||||
|
||||
export interface TaskToolContext {
|
||||
model: Model<Api>;
|
||||
thinkingLevel: ThinkingLevel;
|
||||
authStorage: AuthStorage;
|
||||
cwd: string;
|
||||
/** When set, child sessions inherit the code_path deny policy. */
|
||||
resourceLoader?: ResourceLoader;
|
||||
/**
|
||||
* Mutable accumulator: each child (sub-agent) session's cost is added here so the
|
||||
* parent executor can include sub-agent spend in its reported cost. Child sessions
|
||||
* keep their own `getSessionStats`, separate from the parent's.
|
||||
*/
|
||||
childUsage?: { cost: number };
|
||||
}
|
||||
|
||||
/**
|
||||
* The `task` tool — launch a new agent to handle a multi-step task autonomously.
|
||||
*
|
||||
* Spawns an in-process child session, drives it to completion, and returns its
|
||||
* final text. Marked `parallel` for one-turn fan-out. Children get no `task` of
|
||||
* their own — delegation is one level.
|
||||
*/
|
||||
export function createTaskTool(ctx: TaskToolContext): ToolDefinition {
|
||||
return defineTool({
|
||||
name: 'task',
|
||||
label: 'Task',
|
||||
description:
|
||||
'Launch a new agent to handle complex, multi-step tasks autonomously. The agent runs on its own and ' +
|
||||
'its final report is returned to you as the tool result (it is not shown to the user). Each invocation ' +
|
||||
'is stateless — you cannot send follow-up messages, so give a complete, detailed instruction in a single ' +
|
||||
'prompt and specify exactly what information the agent should return. Launch multiple agents concurrently ' +
|
||||
'by issuing multiple task calls in a single message.',
|
||||
promptSnippet: 'task: launch a new agent to handle a multi-step task',
|
||||
executionMode: 'parallel',
|
||||
parameters: Type.Object({
|
||||
description: Type.Optional(Type.String({ description: 'Short (3-5 word) label for the delegated sub-task.' })),
|
||||
prompt: Type.String({ description: 'The full instruction for the sub-agent.' }),
|
||||
}),
|
||||
execute: async (_toolCallId, params) => {
|
||||
const { session: child } = await createAgentSession({
|
||||
cwd: ctx.cwd,
|
||||
model: ctx.model,
|
||||
thinkingLevel: ctx.thinkingLevel,
|
||||
tools: CHILD_TOOLS,
|
||||
authStorage: ctx.authStorage,
|
||||
sessionManager: SessionManager.inMemory(),
|
||||
settingsManager: SettingsManager.inMemory({
|
||||
retry: { enabled: false },
|
||||
compaction: { enabled: true },
|
||||
}),
|
||||
...(ctx.resourceLoader && { resourceLoader: ctx.resourceLoader }),
|
||||
});
|
||||
try {
|
||||
await child.prompt(params.prompt);
|
||||
const text = child.getLastAssistantText() ?? '(sub-agent produced no output)';
|
||||
return { content: [{ type: 'text' as const, text }], details: {} };
|
||||
} finally {
|
||||
// Roll the child's cost up to the parent before disposing (best-effort, and
|
||||
// captured in `finally` so a failed child's partial spend still counts).
|
||||
if (ctx.childUsage) {
|
||||
try {
|
||||
ctx.childUsage.cost += child.getSessionStats().cost;
|
||||
} catch {
|
||||
// ignore — cost capture is best-effort
|
||||
}
|
||||
}
|
||||
child.dispose();
|
||||
}
|
||||
},
|
||||
});
|
||||
}
|
||||
|
||||
export interface TodoItem {
|
||||
content: string;
|
||||
status: 'pending' | 'in_progress' | 'completed';
|
||||
activeForm: string;
|
||||
}
|
||||
|
||||
/** Render a todo list as a compact checklist for the workflow log. */
|
||||
function renderTodos(todos: readonly TodoItem[]): string {
|
||||
const mark = (s: TodoItem['status']): string => (s === 'completed' ? 'x' : s === 'in_progress' ? '~' : ' ');
|
||||
return todos.map((t) => `[${mark(t.status)}] ${t.content}`).join(' ');
|
||||
}
|
||||
|
||||
/**
|
||||
* The `todo_write` tool — a full-state-replace planning scratchpad.
|
||||
*
|
||||
* Mirrors the TodoWrite tool: each call carries the entire list and replaces
|
||||
* stored state (no append/merge). No deliverable impact; every call is echoed to
|
||||
* the workflow log so `shannon logs` shows the agent's live plan. State is per
|
||||
* tool instance (one per agent execution).
|
||||
*/
|
||||
export function createTodoWriteTool(auditLogger: AuditLogger): ToolDefinition {
|
||||
let current: TodoItem[] = [];
|
||||
return defineTool({
|
||||
name: 'todo_write',
|
||||
label: 'Todo Write',
|
||||
description:
|
||||
'Use this tool to create and manage a structured task list for your current session. This helps you ' +
|
||||
'track progress and organize complex, multi-step work, and gives visibility into what you are doing. ' +
|
||||
'Pass the COMPLETE todo list on every call — it replaces the stored list entirely (no append or merge). ' +
|
||||
'Each todo has a status of pending, in_progress, or completed; keep exactly one task in_progress at a ' +
|
||||
'time and mark a task completed as soon as it is finished.',
|
||||
promptSnippet: 'todo_write: create and manage a structured task list',
|
||||
parameters: Type.Object({
|
||||
todos: Type.Array(
|
||||
Type.Object({
|
||||
content: Type.String({ description: 'Imperative task description, e.g. "Map SSRF sinks".' }),
|
||||
status: Type.Union([Type.Literal('pending'), Type.Literal('in_progress'), Type.Literal('completed')]),
|
||||
activeForm: Type.String({ description: 'Present-continuous form, e.g. "Mapping SSRF sinks".' }),
|
||||
}),
|
||||
),
|
||||
}),
|
||||
execute: async (_toolCallId, params) => {
|
||||
current = params.todos as TodoItem[];
|
||||
const completed = current.filter((t) => t.status === 'completed').length;
|
||||
await auditLogger.logNote('todo', renderTodos(current));
|
||||
return {
|
||||
content: [{ type: 'text' as const, text: `Todos updated (${current.length} items, ${completed} completed).` }],
|
||||
details: {},
|
||||
};
|
||||
},
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* The `glob` tool — fast file pattern matching (pi ships no `Glob` built-in).
|
||||
*
|
||||
* Backed by the same fast-glob engine that classifies code_path rules as `[GLOB]`
|
||||
* (see utils/glob.ts `isGlobPattern`), so it enumerates exactly the patterns the
|
||||
* routing tags as globs — including `**` and `{a,b}`, which pi's `find` would not
|
||||
* match the same way. Returns absolute paths, most-recently-modified first.
|
||||
*/
|
||||
export function createGlobTool(cwd: string): ToolDefinition {
|
||||
return defineTool({
|
||||
name: 'glob',
|
||||
label: 'Glob',
|
||||
description:
|
||||
'Fast file pattern matching. Supports glob patterns like "**/*.ts" or "src/**/*.{js,ts}". Returns ' +
|
||||
'matching file paths sorted by modification time (most recent first), one per line, or "No files found".',
|
||||
promptSnippet: 'glob: find files by name pattern',
|
||||
parameters: Type.Object({
|
||||
pattern: Type.String({ description: 'The glob pattern to match files against.' }),
|
||||
path: Type.Optional(Type.String({ description: 'Directory to search in. Omit to search the repository root.' })),
|
||||
}),
|
||||
execute: async (_toolCallId, params) => {
|
||||
const searchRoot = params.path ? path.resolve(cwd, params.path) : cwd;
|
||||
const matches = await glob.globby(params.pattern, {
|
||||
cwd: searchRoot,
|
||||
absolute: true,
|
||||
dot: true,
|
||||
onlyFiles: true,
|
||||
followSymbolicLinks: false,
|
||||
});
|
||||
if (matches.length === 0) {
|
||||
return { content: [{ type: 'text' as const, text: 'No files found' }], details: {} };
|
||||
}
|
||||
// Sort by mtime (most recent first) to match the canonical Glob contract.
|
||||
const withMtime = await Promise.all(
|
||||
matches.map(async (file) => {
|
||||
try {
|
||||
return { file, mtime: (await fs.stat(file)).mtimeMs };
|
||||
} catch {
|
||||
return { file, mtime: 0 };
|
||||
}
|
||||
}),
|
||||
);
|
||||
withMtime.sort((a, b) => b.mtime - a.mtime);
|
||||
return { content: [{ type: 'text' as const, text: withMtime.map((m) => m.file).join('\n') }], details: {} };
|
||||
},
|
||||
});
|
||||
}
|
||||
@@ -4,7 +4,9 @@
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
// Shared display/formatting types for the agent executor output layer.
|
||||
// Type definitions for Claude executor message processing pipeline
|
||||
|
||||
import type { SDKAssistantMessageError } from '@anthropic-ai/claude-agent-sdk';
|
||||
|
||||
export interface ExecutionContext {
|
||||
isParallelExecution: boolean;
|
||||
@@ -12,3 +14,99 @@ export interface ExecutionContext {
|
||||
agentType: string;
|
||||
agentKey: string;
|
||||
}
|
||||
|
||||
export interface AssistantResult {
|
||||
content: string;
|
||||
cleanedContent: string;
|
||||
apiErrorDetected: boolean;
|
||||
shouldThrow?: Error;
|
||||
logData: {
|
||||
turn: number;
|
||||
content: string;
|
||||
timestamp: string;
|
||||
};
|
||||
}
|
||||
|
||||
export interface ResultData {
|
||||
result: string | null;
|
||||
cost: number;
|
||||
duration_ms: number;
|
||||
subtype?: string;
|
||||
stop_reason?: string | null;
|
||||
permissionDenials: number;
|
||||
structuredOutput?: unknown;
|
||||
}
|
||||
|
||||
export interface ToolUseData {
|
||||
toolName: string;
|
||||
parameters: Record<string, unknown>;
|
||||
timestamp: string;
|
||||
}
|
||||
|
||||
export interface ToolResultData {
|
||||
content: unknown;
|
||||
displayContent: string;
|
||||
timestamp: string;
|
||||
}
|
||||
|
||||
export interface ContentBlock {
|
||||
type?: string;
|
||||
text?: string;
|
||||
thinking?: string;
|
||||
data?: string;
|
||||
}
|
||||
|
||||
export interface AssistantMessage {
|
||||
type: 'assistant';
|
||||
error?: SDKAssistantMessageError;
|
||||
message: {
|
||||
content: ContentBlock[] | string;
|
||||
};
|
||||
}
|
||||
|
||||
export interface ResultMessage {
|
||||
type: 'result';
|
||||
result?: string;
|
||||
total_cost_usd?: number;
|
||||
duration_ms?: number;
|
||||
subtype?: string;
|
||||
stop_reason?: string | null;
|
||||
permission_denials?: unknown[];
|
||||
structured_output?: unknown;
|
||||
}
|
||||
|
||||
export interface ToolUseMessage {
|
||||
type: 'tool_use';
|
||||
name: string;
|
||||
input?: Record<string, unknown>;
|
||||
}
|
||||
|
||||
export interface ToolResultMessage {
|
||||
type: 'tool_result';
|
||||
content?: unknown;
|
||||
}
|
||||
|
||||
export interface ApiErrorDetection {
|
||||
detected: boolean;
|
||||
shouldThrow?: Error;
|
||||
}
|
||||
|
||||
export interface SystemInitMessage {
|
||||
type: 'system';
|
||||
subtype: 'init';
|
||||
model?: string;
|
||||
permissionMode?: string;
|
||||
}
|
||||
|
||||
/** Emitted when a model refuses a request and the SDK falls back to another model (e.g. Fable 5 routing cybersecurity tasks to Opus 4.8). */
|
||||
export interface ModelRefusalFallbackMessage {
|
||||
type: 'system';
|
||||
subtype: 'model_refusal_fallback';
|
||||
original_model: string;
|
||||
fallback_model: string;
|
||||
api_refusal_category?: string | null;
|
||||
}
|
||||
|
||||
export interface UserMessage {
|
||||
type: 'user';
|
||||
}
|
||||
|
||||
@@ -12,7 +12,7 @@
|
||||
*/
|
||||
|
||||
import fs from 'node:fs/promises';
|
||||
import { isFableModel, resolveModelId } from '../ai/models.js';
|
||||
import { isFableModel, resolveModel } from '../ai/models.js';
|
||||
import { formatDuration, formatTimestamp } from '../utils/formatting.js';
|
||||
import { LogStream } from './log-stream.js';
|
||||
import { generateWorkflowLogPath, type SessionMetadata } from './utils.js';
|
||||
@@ -90,7 +90,7 @@ export class WorkflowLogger {
|
||||
// Surface Fable usage: its safety classifiers route cybersecurity tasks to
|
||||
// Opus 4.8, so those phases run on Opus 4.8 regardless of the tier setting.
|
||||
const fableTiers = (['small', 'medium', 'large'] as const)
|
||||
.map((tier) => ({ tier, model: resolveModelId(tier) }))
|
||||
.map((tier) => ({ tier, model: resolveModel(tier) }))
|
||||
.filter(({ model }) => isFableModel(model));
|
||||
if (fableTiers.length > 0) {
|
||||
const tierList = fableTiers.map(({ tier, model }) => `${tier} (${model})`).join(', ');
|
||||
|
||||
@@ -5,10 +5,10 @@
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Exploit Collector tool factory (parameterized by vulnerability class and
|
||||
* per-run valid-ID set).
|
||||
* Exploit Collector MCP Server (factory parameterized by vulnerability class
|
||||
* and per-run valid-ID set).
|
||||
*
|
||||
* Exposes a single TypeBox-validated tool `add_exploit`, called once per
|
||||
* Exposes a single Zod-validated MCP tool `add_exploit`, called once per
|
||||
* processed vulnerability by the 5 exploit-* agents (injection, xss, auth,
|
||||
* ssrf, authz). After the agent terminates, the host harvests
|
||||
* collector.getAll() and runs exploit-renderer to produce
|
||||
@@ -16,28 +16,29 @@
|
||||
* output.
|
||||
*
|
||||
* Schema shape:
|
||||
* - The visible parameter schema is a single Type.Object with common fields
|
||||
* required, status as a string union, and per-status fields marked optional
|
||||
* at the tool layer (TypeBox cannot express a top-level discriminated union
|
||||
* as the flat tool parameters). Each field's `description` text explains
|
||||
* when it applies.
|
||||
* - The SDK tool() helper consumes a ZodRawShape (flat object), not a
|
||||
* top-level discriminated union. The visible shape is therefore a single
|
||||
* z.object with common fields required, status as a string enum, and
|
||||
* per-status fields marked optional at the SDK layer. Each field's
|
||||
* `.describe()` text explains when it applies.
|
||||
* - True per-status field enforcement runs inside the tool handler via a
|
||||
* Type.Union([exploited, blocked]) re-validation using the TypeBox `Value`
|
||||
* API. Missing-field errors come back to the agent as structured issues
|
||||
* with retryable=true so it can fix and retry the call.
|
||||
* z.discriminatedUnion('status', ...). Missing-field errors come back to
|
||||
* the agent as structured Zod issues with retryable=true so it can fix
|
||||
* and retry the call.
|
||||
*
|
||||
* Strict queue-ID validation: vulnerability_id is checked against the per-run
|
||||
* queue's known IDs in the handler. Hallucinated or typo'd IDs are rejected
|
||||
* with a structured error that includes the valid-ID list, letting the agent
|
||||
* recover locally.
|
||||
* Strict queue-ID validation: vulnerability_id is refined against the per-run
|
||||
* queue's known IDs at schema-build time. Hallucinated or typo'd IDs are
|
||||
* rejected with a structured Zod error that includes the valid-ID list,
|
||||
* letting the agent recover locally.
|
||||
*
|
||||
* Each field's description carries the bullet labels and reproducibility
|
||||
* guidance, so the harness injects it into the agent's tool catalog.
|
||||
* Each Zod schema's field-level descriptions carry the bullet labels and
|
||||
* reproducibility guidance, so the SDK injects it into the agent's tool
|
||||
* catalog.
|
||||
*/
|
||||
|
||||
import { defineTool, type ToolDefinition } from '@earendil-works/pi-coding-agent';
|
||||
import { type Static, Type } from 'typebox';
|
||||
import { Value } from 'typebox/value';
|
||||
import type { McpSdkServerConfigWithInstance } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { createSdkMcpServer, tool } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { z } from 'zod';
|
||||
|
||||
// ============================================================================
|
||||
// CLASS DISCRIMINATOR
|
||||
@@ -102,181 +103,214 @@ export type AddExploitInput = ExploitedExploit | BlockedExploit;
|
||||
// ============================================================================
|
||||
|
||||
function buildSchemas(validIds: ReadonlySet<string>) {
|
||||
const vulnerabilityIdField = Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
const vulnerabilityIdField = z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Vulnerability identifier (e.g. "INJ-VULN-03"). Must match an ID from this run\'s ' +
|
||||
'{class}_exploitation_queue.json exactly — the collector rejects IDs not in the queue. ' +
|
||||
`Valid IDs for this run: ${formatValidIdsPreview(validIds)}.`,
|
||||
});
|
||||
'{class}_exploitation_queue.json exactly — the collector rejects IDs not in the queue. ' +
|
||||
`Valid IDs for this run: ${formatValidIdsPreview(validIds)}.`,
|
||||
)
|
||||
.refine((id: string) => validIds.has(id), {
|
||||
message:
|
||||
`Vulnerability ID not in this run's queue. Valid IDs: ` +
|
||||
`${formatValidIdsPreview(validIds)}. ` +
|
||||
'Check the queue.json for the canonical ID — likely a typo or hallucinated ID.',
|
||||
});
|
||||
|
||||
const titleField = Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
const titleField = z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Descriptive vulnerability title (e.g. "SQL Injection — User Search", "IDOR — Unauthorized ' +
|
||||
'Access to User Orders"). Concise; encodes the vulnerability category and where it lives.',
|
||||
});
|
||||
'Access to User Orders"). Concise; encodes the vulnerability category and where it lives.',
|
||||
);
|
||||
|
||||
const vulnerableLocationField = Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
const vulnerableLocationField = z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Endpoint or mechanism where the vulnerability exists (e.g. "GET /api/products?id=", ' +
|
||||
'"POST /login", or a code location like "controllers/userController.js:42").',
|
||||
});
|
||||
'"POST /login", or a code location like "controllers/userController.js:42").',
|
||||
);
|
||||
|
||||
const overviewField = Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
const overviewField = z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Brief summary of the exploit itself — what the vulnerability is and how it was demonstrated ' +
|
||||
'(or how it would be demonstrated, for blocked findings). 1-3 sentences.',
|
||||
});
|
||||
'(or how it would be demonstrated, for blocked findings). 1-3 sentences.',
|
||||
);
|
||||
|
||||
const prerequisitesField = Type.Optional(
|
||||
Type.Union([Type.String(), Type.Null()], {
|
||||
description:
|
||||
'Required setup, tools, or conditions to reproduce the exploit (e.g. authentication, ' +
|
||||
const prerequisitesField = z
|
||||
.string()
|
||||
.nullable()
|
||||
.optional()
|
||||
.describe(
|
||||
'Required setup, tools, or conditions to reproduce the exploit (e.g. authentication, ' +
|
||||
'specific role, prior application state). Omit or pass null when no prerequisites apply.',
|
||||
}),
|
||||
);
|
||||
);
|
||||
|
||||
const notesField = Type.Optional(
|
||||
Type.Union([Type.String(), Type.Null()], {
|
||||
description:
|
||||
'Optional supplementary context — caveats, related findings, environmental observations. ' +
|
||||
const notesField = z
|
||||
.string()
|
||||
.nullable()
|
||||
.optional()
|
||||
.describe(
|
||||
'Optional supplementary context — caveats, related findings, environmental observations. ' +
|
||||
'Free-form Markdown. Omit or pass null when N/A.',
|
||||
}),
|
||||
);
|
||||
);
|
||||
|
||||
const statusField = Type.Union([Type.Literal('exploited'), Type.Literal('blocked')], {
|
||||
description:
|
||||
const statusField = z
|
||||
.enum(['exploited', 'blocked'])
|
||||
.describe(
|
||||
'Verdict bucket. Set to "exploited" only after reaching Proof of Exploitation Level 3+ with ' +
|
||||
'concrete impact evidence (extracted data, executed JavaScript, account takeover, internal ' +
|
||||
'service access). Set to "blocked" only for real vulnerabilities where external factors ' +
|
||||
'(NOT security defenses) prevented full exploitation. Findings where a security defense ' +
|
||||
'successfully prevented exploitation after exhaustive bypass attempts are FALSE POSITIVE — ' +
|
||||
'route those to your workspace tracking file, not this tool.',
|
||||
});
|
||||
'concrete impact evidence (extracted data, executed JavaScript, account takeover, internal ' +
|
||||
'service access). Set to "blocked" only for real vulnerabilities where external factors ' +
|
||||
'(NOT security defenses) prevented full exploitation. Findings where a security defense ' +
|
||||
'successfully prevented exploitation after exhaustive bypass attempts are FALSE POSITIVE — ' +
|
||||
'route those to your workspace tracking file, not this tool.',
|
||||
);
|
||||
|
||||
// Per-status fields. All optional at the flat parameter layer because a single
|
||||
// Type.Object cannot express a top-level discriminated union; the handler
|
||||
// Per-status fields. All optional at the SDK shape layer because a single
|
||||
// ZodRawShape cannot express a top-level discriminated union; the handler
|
||||
// re-validates against the discriminated union below for true enforcement.
|
||||
const severityField = Type.Optional(
|
||||
Type.Union([...SEVERITY_VALUES.map((v) => Type.Literal(v)), Type.Null()], {
|
||||
description:
|
||||
'REQUIRED when status="exploited". Severity of the demonstrated impact. Critical = Level 4 ' +
|
||||
const severityField = z
|
||||
.enum(SEVERITY_VALUES)
|
||||
.nullable()
|
||||
.optional()
|
||||
.describe(
|
||||
'REQUIRED when status="exploited". Severity of the demonstrated impact. Critical = Level 4 ' +
|
||||
'(admin credentials extracted, sensitive data dumped, system commands executed, full account ' +
|
||||
'takeover). High = Level 3 (data extraction proven, authentication bypass confirmed, ' +
|
||||
'internal service access). Medium/Low based on impact narrowness or read-only access. Must ' +
|
||||
'reflect demonstrated impact, not theoretical potential.',
|
||||
}),
|
||||
);
|
||||
);
|
||||
|
||||
const impactField = Type.Optional(
|
||||
Type.Union([Type.String({ minLength: 1 }), Type.Null()], {
|
||||
description:
|
||||
'REQUIRED when status="exploited". Business/security impact achieved by the exploit ' +
|
||||
const impactField = z
|
||||
.string()
|
||||
.min(1)
|
||||
.nullable()
|
||||
.optional()
|
||||
.describe(
|
||||
'REQUIRED when status="exploited". Business/security impact achieved by the exploit ' +
|
||||
'(e.g. "Extracted full user table including bcrypt password hashes for 1,247 users", ' +
|
||||
'"Achieved RCE as the application user; arbitrary shell commands executed"). Must describe ' +
|
||||
'what was actually demonstrated, not what could theoretically happen.',
|
||||
}),
|
||||
);
|
||||
);
|
||||
|
||||
const exploitationStepsField = Type.Optional(
|
||||
Type.Union([Type.Array(Type.String({ minLength: 1 }), { minItems: 1 }), Type.Null()], {
|
||||
description:
|
||||
'REQUIRED when status="exploited". Ordered, reproducible exploitation steps — one Markdown ' +
|
||||
const exploitationStepsField = z
|
||||
.array(z.string().min(1))
|
||||
.min(1)
|
||||
.nullable()
|
||||
.optional()
|
||||
.describe(
|
||||
'REQUIRED when status="exploited". Ordered, reproducible exploitation steps — one Markdown ' +
|
||||
'blob per numbered step. Each step must include full URLs (protocol + domain + port + path ' +
|
||||
'+ params), complete payloads, and copy-paste-ready commands. Use clear placeholders for ' +
|
||||
'variable values like [SESSION_TOKEN], [DATABASE_NAME], [TABLE_NAME], [TARGET_USER_ID]. ' +
|
||||
'Write each step as natural Markdown — interleave prose with fenced code blocks (```bash, ' +
|
||||
'```http, etc.) as you would in a write-up. Steps must be detailed enough that someone ' +
|
||||
'unfamiliar with the application can follow without additional research.',
|
||||
}),
|
||||
);
|
||||
);
|
||||
|
||||
const proofOfImpactField = Type.Optional(
|
||||
Type.Union([Type.String({ minLength: 1 }), Type.Null()], {
|
||||
description:
|
||||
'REQUIRED when status="exploited". Concrete evidence of successful exploitation — extracted ' +
|
||||
const proofOfImpactField = z
|
||||
.string()
|
||||
.min(1)
|
||||
.nullable()
|
||||
.optional()
|
||||
.describe(
|
||||
'REQUIRED when status="exploited". Concrete evidence of successful exploitation — extracted ' +
|
||||
'data, achieved actions, captured request/response pairs, log excerpts. Markdown blob; ' +
|
||||
'interleave prose with fenced code blocks. Must show what the exploit demonstrably achieved, ' +
|
||||
'not theoretical impact.',
|
||||
}),
|
||||
);
|
||||
);
|
||||
|
||||
const confidenceField = Type.Optional(
|
||||
Type.Union([...CONFIDENCE_VALUES.map((v) => Type.Literal(v)), Type.Null()], {
|
||||
description:
|
||||
'REQUIRED when status="blocked". Confidence that this finding is a real vulnerability that ' +
|
||||
const confidenceField = z
|
||||
.enum(CONFIDENCE_VALUES)
|
||||
.nullable()
|
||||
.optional()
|
||||
.describe(
|
||||
'REQUIRED when status="blocked". Confidence that this finding is a real vulnerability that ' +
|
||||
'would be exploited if the external blocker were removed. High = code analysis strongly ' +
|
||||
'confirms vulnerability and partial exploitation (Level 1-2) succeeded. Medium = code ' +
|
||||
'analysis confirms but live evidence is partial. Low = signal-only; revisit if blocker is ' +
|
||||
'removed in a future run.',
|
||||
}),
|
||||
);
|
||||
);
|
||||
|
||||
const currentBlockerField = Type.Optional(
|
||||
Type.Union([Type.String({ minLength: 1 }), Type.Null()], {
|
||||
description:
|
||||
'REQUIRED when status="blocked". What prevents full exploitation (e.g. "Server crashes after ' +
|
||||
const currentBlockerField = z
|
||||
.string()
|
||||
.min(1)
|
||||
.nullable()
|
||||
.optional()
|
||||
.describe(
|
||||
'REQUIRED when status="blocked". What prevents full exploitation (e.g. "Server crashes after ' +
|
||||
'5 requests, blocking enumeration", "OAuth callback requires verified third-party email ' +
|
||||
'account we could not provision"). Must be an external operational constraint, not a ' +
|
||||
'security defense.',
|
||||
}),
|
||||
);
|
||||
);
|
||||
|
||||
const potentialImpactField = Type.Optional(
|
||||
Type.Union([Type.String({ minLength: 1 }), Type.Null()], {
|
||||
description:
|
||||
'REQUIRED when status="blocked". What could be achieved if the blocker were removed (e.g. ' +
|
||||
const potentialImpactField = z
|
||||
.string()
|
||||
.min(1)
|
||||
.nullable()
|
||||
.optional()
|
||||
.describe(
|
||||
'REQUIRED when status="blocked". What could be achieved if the blocker were removed (e.g. ' +
|
||||
'"Full database read access", "Account takeover of arbitrary user via reset-token leak"). ' +
|
||||
'Distinct from impact — this is the hypothetical outcome, not a demonstrated one.',
|
||||
}),
|
||||
);
|
||||
);
|
||||
|
||||
const evidenceOfVulnerabilityField = Type.Optional(
|
||||
Type.Union([Type.String({ minLength: 1 }), Type.Null()], {
|
||||
description:
|
||||
'REQUIRED when status="blocked". Code snippets, response excerpts, or observed behavior ' +
|
||||
const evidenceOfVulnerabilityField = z
|
||||
.string()
|
||||
.min(1)
|
||||
.nullable()
|
||||
.optional()
|
||||
.describe(
|
||||
'REQUIRED when status="blocked". Code snippets, response excerpts, or observed behavior ' +
|
||||
'proving the vulnerability is real. Markdown blob; interleave prose with fenced code blocks. ' +
|
||||
'This is what convinces the reader the finding is not a false positive despite incomplete ' +
|
||||
'exploitation.',
|
||||
}),
|
||||
);
|
||||
);
|
||||
|
||||
const whatWeTriedField = Type.Optional(
|
||||
Type.Union([Type.String({ minLength: 1 }), Type.Null()], {
|
||||
description:
|
||||
'REQUIRED when status="blocked". Log of attempted exploitation techniques and why each was ' +
|
||||
const whatWeTriedField = z
|
||||
.string()
|
||||
.min(1)
|
||||
.nullable()
|
||||
.optional()
|
||||
.describe(
|
||||
'REQUIRED when status="blocked". Log of attempted exploitation techniques and why each was ' +
|
||||
'blocked. Each attempt should document the payload, the observed result, and the inferred ' +
|
||||
'blocker. Markdown blob; multiple attempts as a list or distinct paragraphs. Demonstrates ' +
|
||||
'exhaustive bypass effort per the Bypass Exhaustion Protocol.',
|
||||
}),
|
||||
);
|
||||
);
|
||||
|
||||
const howThisWouldBeExploitedField = Type.Optional(
|
||||
Type.Union([Type.Array(Type.String({ minLength: 1 }), { minItems: 1 }), Type.Null()], {
|
||||
description:
|
||||
'REQUIRED when status="blocked". Ordered hypothetical exploitation steps assuming the blocker ' +
|
||||
const howThisWouldBeExploitedField = z
|
||||
.array(z.string().min(1))
|
||||
.min(1)
|
||||
.nullable()
|
||||
.optional()
|
||||
.describe(
|
||||
'REQUIRED when status="blocked". Ordered hypothetical exploitation steps assuming the blocker ' +
|
||||
'is removed — one Markdown blob per numbered step. Same reproducibility requirements as ' +
|
||||
'exploitation_steps: full URLs, complete payloads, copy-paste-ready commands. Frame the ' +
|
||||
'first step as "If [blocker] were removed: …".',
|
||||
}),
|
||||
);
|
||||
);
|
||||
|
||||
const expectedImpactField = Type.Optional(
|
||||
Type.Union([Type.String({ minLength: 1 }), Type.Null()], {
|
||||
description:
|
||||
'REQUIRED when status="blocked". Specific data or access that would be compromised if ' +
|
||||
const expectedImpactField = z
|
||||
.string()
|
||||
.min(1)
|
||||
.nullable()
|
||||
.optional()
|
||||
.describe(
|
||||
'REQUIRED when status="blocked". Specific data or access that would be compromised if ' +
|
||||
'exploitation succeeded (e.g. "Read access to all user profile data including PII; write ' +
|
||||
'access to user-owned resources"). Markdown blob.',
|
||||
}),
|
||||
);
|
||||
);
|
||||
|
||||
// The flat parameter schema passed to defineTool(). The harness uses this to
|
||||
// build the agent's tool catalog. Per-status enforcement happens in the
|
||||
// handler via the discriminated union below.
|
||||
const flatShape = Type.Object({
|
||||
// The flat shape passed to tool(). The SDK uses this to build the agent's
|
||||
// tool catalog. Per-status enforcement happens in the handler via the
|
||||
// discriminated union below.
|
||||
const flatShape = {
|
||||
status: statusField,
|
||||
vulnerability_id: vulnerabilityIdField,
|
||||
title: titleField,
|
||||
@@ -295,64 +329,59 @@ function buildSchemas(validIds: ReadonlySet<string>) {
|
||||
what_we_tried: whatWeTriedField,
|
||||
how_this_would_be_exploited: howThisWouldBeExploitedField,
|
||||
expected_impact: expectedImpactField,
|
||||
});
|
||||
};
|
||||
|
||||
// Strict per-status validation. Re-runs in the handler so missing fields
|
||||
// for the chosen status return a retryable error to the agent.
|
||||
const ExploitedSchema = Type.Object({
|
||||
status: Type.Literal('exploited'),
|
||||
// for the chosen status return a retryable Zod error to the agent.
|
||||
const ExploitedSchema = z.object({
|
||||
status: z.literal('exploited'),
|
||||
vulnerability_id: vulnerabilityIdField,
|
||||
title: titleField,
|
||||
vulnerable_location: vulnerableLocationField,
|
||||
overview: overviewField,
|
||||
prerequisites: prerequisitesField,
|
||||
severity: Type.Union(SEVERITY_VALUES.map((v) => Type.Literal(v))),
|
||||
impact: Type.String({ minLength: 1 }),
|
||||
exploitation_steps: Type.Array(Type.String({ minLength: 1 }), { minItems: 1 }),
|
||||
proof_of_impact: Type.String({ minLength: 1 }),
|
||||
severity: z.enum(SEVERITY_VALUES),
|
||||
impact: z.string().min(1),
|
||||
exploitation_steps: z.array(z.string().min(1)).min(1),
|
||||
proof_of_impact: z.string().min(1),
|
||||
notes: notesField,
|
||||
});
|
||||
|
||||
const BlockedSchema = Type.Object({
|
||||
status: Type.Literal('blocked'),
|
||||
const BlockedSchema = z.object({
|
||||
status: z.literal('blocked'),
|
||||
vulnerability_id: vulnerabilityIdField,
|
||||
title: titleField,
|
||||
vulnerable_location: vulnerableLocationField,
|
||||
prerequisites: prerequisitesField,
|
||||
confidence: Type.Union(CONFIDENCE_VALUES.map((v) => Type.Literal(v))),
|
||||
current_blocker: Type.String({ minLength: 1 }),
|
||||
potential_impact: Type.String({ minLength: 1 }),
|
||||
evidence_of_vulnerability: Type.String({ minLength: 1 }),
|
||||
what_we_tried: Type.String({ minLength: 1 }),
|
||||
how_this_would_be_exploited: Type.Array(Type.String({ minLength: 1 }), { minItems: 1 }),
|
||||
expected_impact: Type.String({ minLength: 1 }),
|
||||
confidence: z.enum(CONFIDENCE_VALUES),
|
||||
current_blocker: z.string().min(1),
|
||||
potential_impact: z.string().min(1),
|
||||
evidence_of_vulnerability: z.string().min(1),
|
||||
what_we_tried: z.string().min(1),
|
||||
how_this_would_be_exploited: z.array(z.string().min(1)).min(1),
|
||||
expected_impact: z.string().min(1),
|
||||
notes: notesField,
|
||||
});
|
||||
|
||||
const StrictSchema = Type.Union([ExploitedSchema, BlockedSchema]);
|
||||
const StrictSchema = z.discriminatedUnion('status', [ExploitedSchema, BlockedSchema]);
|
||||
|
||||
return { flatShape, StrictSchema };
|
||||
}
|
||||
|
||||
type FlatInput = Static<ReturnType<typeof buildSchemas>['flatShape']>;
|
||||
type StrictInput = Static<ReturnType<typeof buildSchemas>['StrictSchema']>;
|
||||
|
||||
// ============================================================================
|
||||
// RESPONSE HELPERS
|
||||
// ============================================================================
|
||||
|
||||
interface ToolResult {
|
||||
[x: string]: unknown;
|
||||
content: Array<{ type: 'text'; text: string }>;
|
||||
details: Record<string, unknown>;
|
||||
isError?: boolean;
|
||||
isError: boolean;
|
||||
}
|
||||
|
||||
function createToolResult(response: { status: string; [key: string]: unknown }): ToolResult {
|
||||
const isError = response.status === 'error';
|
||||
return {
|
||||
content: [{ type: 'text' as const, text: JSON.stringify(response, null, 2) }],
|
||||
details: {},
|
||||
...(isError && { isError: true }),
|
||||
content: [{ type: 'text', text: JSON.stringify(response, null, 2) }],
|
||||
isError: response.status === 'error',
|
||||
};
|
||||
}
|
||||
|
||||
@@ -364,21 +393,21 @@ function errorResult(message: string, errorType = 'ValidationError', retryable =
|
||||
return createToolResult({ status: 'error', message, errorType, retryable });
|
||||
}
|
||||
|
||||
function formatValueErrors(schema: ReturnType<typeof buildSchemas>['StrictSchema'], value: unknown): string {
|
||||
return [...Value.Errors(schema, value)]
|
||||
function formatZodIssues(error: z.ZodError): string {
|
||||
return error.issues
|
||||
.map((issue) => {
|
||||
const path = issue.instancePath.length > 0 ? issue.instancePath.replace(/^\//, '').replace(/\//g, '.') : '(root)';
|
||||
const path = issue.path.length > 0 ? issue.path.join('.') : '(root)';
|
||||
return `- ${path}: ${issue.message}`;
|
||||
})
|
||||
.join('\n');
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// TOOL FACTORY
|
||||
// SERVER FACTORY
|
||||
// ============================================================================
|
||||
|
||||
export interface ExploitCollectorServer {
|
||||
tools: ToolDefinition[];
|
||||
server: McpSdkServerConfigWithInstance;
|
||||
getAll(): AddExploitInput[];
|
||||
}
|
||||
|
||||
@@ -392,11 +421,9 @@ export function createExploitCollector(options: CreateExploitCollectorOptions):
|
||||
const exploits: AddExploitInput[] = [];
|
||||
const { flatShape, StrictSchema } = buildSchemas(validIds);
|
||||
|
||||
const addExploitTool = defineTool({
|
||||
name: 'add_exploit',
|
||||
label: 'Add Exploit',
|
||||
description:
|
||||
`Record a single processed ${vulnClass} vulnerability as structured exploitation evidence. ` +
|
||||
const addExploitTool = tool(
|
||||
'add_exploit',
|
||||
`Record a single processed ${vulnClass} vulnerability as structured exploitation evidence. ` +
|
||||
'Call this once per vulnerability in your queue.json after reaching a definitive verdict ' +
|
||||
'(either successfully exploited or potential-but-blocked). The status field discriminates the ' +
|
||||
"two report buckets; required sub-fields differ per status (see each field's description for " +
|
||||
@@ -405,34 +432,20 @@ export function createExploitCollector(options: CreateExploitCollectorOptions):
|
||||
'IDs. FALSE POSITIVE findings do NOT use this tool — they go to your workspace tracking file. ' +
|
||||
'After all queue vulnerabilities have been emitted, the host renderer assembles the ' +
|
||||
'deliverable Markdown from your recorded calls.',
|
||||
parameters: flatShape,
|
||||
execute: async (_toolCallId, args): Promise<ToolResult> => {
|
||||
const input = args as FlatInput;
|
||||
|
||||
// Strict queue-ID validation: reject hallucinated or typo'd IDs with the valid-ID list.
|
||||
if (!validIds.has(input.vulnerability_id)) {
|
||||
return errorResult(
|
||||
`Vulnerability ID not in this run's queue. Valid IDs: ` +
|
||||
`${formatValidIdsPreview(validIds)}. ` +
|
||||
'Check the queue.json for the canonical ID — likely a typo or hallucinated ID.',
|
||||
'ValidationError',
|
||||
true,
|
||||
);
|
||||
}
|
||||
|
||||
flatShape,
|
||||
async (input): Promise<ToolResult> => {
|
||||
// Re-validate against the strict discriminated union for per-status enforcement.
|
||||
if (!Value.Check(StrictSchema, input)) {
|
||||
const parsed = StrictSchema.safeParse(input);
|
||||
if (!parsed.success) {
|
||||
return errorResult(
|
||||
`Schema validation failed for status="${(input as { status?: string }).status}". ` +
|
||||
'Required-field issues:\n' +
|
||||
formatValueErrors(StrictSchema, input),
|
||||
formatZodIssues(parsed.error),
|
||||
'ValidationError',
|
||||
true,
|
||||
);
|
||||
}
|
||||
// Strip excess properties from the flat input so only the chosen status's
|
||||
// fields survive (mirrors the prior discriminated-union parse).
|
||||
const typed = Value.Clean(StrictSchema, structuredClone(input)) as StrictInput as AddExploitInput;
|
||||
const typed = parsed.data as AddExploitInput;
|
||||
const existing = exploits.find((e) => e.vulnerability_id === typed.vulnerability_id);
|
||||
if (existing) {
|
||||
return errorResult(
|
||||
@@ -445,10 +458,16 @@ export function createExploitCollector(options: CreateExploitCollectorOptions):
|
||||
exploits.push(typed);
|
||||
return successResult({ added: [typed.vulnerability_id], recorded_status: typed.status });
|
||||
},
|
||||
);
|
||||
|
||||
const server: McpSdkServerConfigWithInstance = createSdkMcpServer({
|
||||
name: 'exploit-collector',
|
||||
version: '1.0.0',
|
||||
tools: [addExploitTool],
|
||||
});
|
||||
|
||||
return {
|
||||
tools: [addExploitTool] as ToolDefinition[],
|
||||
server,
|
||||
getAll: (): AddExploitInput[] => [...exploits],
|
||||
};
|
||||
}
|
||||
|
||||
@@ -5,9 +5,9 @@
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Pre-Recon Collector tools
|
||||
* Pre-Recon Collector MCP Server
|
||||
*
|
||||
* Exposes seven TypeBox-validated tools, one per section of the
|
||||
* Exposes seven Zod-validated MCP tools, one per section of the
|
||||
* pre_recon_deliverable.md report. Every tool is one-shot (write-once;
|
||||
* duplicate calls return DuplicateError). A skipped tool renders a placeholder
|
||||
* rather than failing the activity. After the agent finishes, the host calls
|
||||
@@ -15,353 +15,386 @@
|
||||
* per-run call pattern, and runs the deterministic renderer to produce the
|
||||
* deliverable Markdown.
|
||||
*
|
||||
* Each TypeBox schema's field-level descriptions carry the section guidance, so
|
||||
* the harness injects it into the agent's tool catalog.
|
||||
* Each Zod schema's field-level descriptions carry the section guidance, so
|
||||
* the SDK injects it into the agent's tool catalog.
|
||||
*/
|
||||
|
||||
import { defineTool, type ToolDefinition } from '@earendil-works/pi-coding-agent';
|
||||
import { type Static, Type } from 'typebox';
|
||||
import type { McpSdkServerConfigWithInstance } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { createSdkMcpServer, tool } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { z } from 'zod';
|
||||
|
||||
// ============================================================================
|
||||
// SHARED SCHEMA
|
||||
// ============================================================================
|
||||
|
||||
export const SinkRefSchema = Type.Object({
|
||||
location: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
export const SinkRefSchema = z.object({
|
||||
location: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'File path with line number (e.g., "templates/render.js:34") or richer prose ' +
|
||||
'(e.g., "innerHTML at templates/render.js:34", "lines 45-67"). Must contain enough ' +
|
||||
'detail for a downstream agent to find the exact location.',
|
||||
}),
|
||||
sink_function: Type.String({
|
||||
minLength: 1,
|
||||
description: 'The sink function or property name (e.g., "innerHTML", "axios.get", "eval", "document.write").',
|
||||
}),
|
||||
notes: Type.Optional(
|
||||
Type.Union([Type.String(), Type.Null()], {
|
||||
description:
|
||||
'Optional context — render-context detail, attribute name, scope hints, or anything ' +
|
||||
'(e.g., "innerHTML at templates/render.js:34", "lines 45-67"). Must contain enough ' +
|
||||
'detail for a downstream agent to find the exact location.',
|
||||
),
|
||||
sink_function: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe('The sink function or property name (e.g., "innerHTML", "axios.get", "eval", "document.write").'),
|
||||
notes: z
|
||||
.string()
|
||||
.nullable()
|
||||
.optional()
|
||||
.describe(
|
||||
'Optional context — render-context detail, attribute name, scope hints, or anything ' +
|
||||
'a downstream agent needs to act on this sink. Omit when the location and sink_function ' +
|
||||
'are sufficient on their own.',
|
||||
}),
|
||||
),
|
||||
),
|
||||
});
|
||||
|
||||
export type SinkRef = Static<typeof SinkRefSchema>;
|
||||
export type SinkRef = z.infer<typeof SinkRefSchema>;
|
||||
|
||||
// ============================================================================
|
||||
// PER-TOOL INPUT SCHEMAS
|
||||
// ============================================================================
|
||||
|
||||
export const ExecutiveSummaryInputSchema = Type.Object({
|
||||
text: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
export const ExecutiveSummaryInputSchema = z.object({
|
||||
text: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
"Provide a 2-3 paragraph overview of the application's security posture, highlighting " +
|
||||
'the most critical attack surfaces and architectural security decisions. Becomes ' +
|
||||
'Section 1 of the rendered deliverable.',
|
||||
}),
|
||||
'the most critical attack surfaces and architectural security decisions. Becomes ' +
|
||||
'Section 1 of the rendered deliverable.',
|
||||
),
|
||||
});
|
||||
|
||||
const ArchitectureSchema = Type.Object({
|
||||
framework_and_language: Type.String({
|
||||
minLength: 1,
|
||||
description: 'Framework and language details with their security implications.',
|
||||
}),
|
||||
architectural_pattern: Type.String({
|
||||
minLength: 1,
|
||||
description: 'Architectural pattern (monolith, microservices, hybrid) with trust boundary analysis.',
|
||||
}),
|
||||
critical_security_components: Type.String({
|
||||
minLength: 1,
|
||||
description: 'Critical security components with focus on auth, authz, and data protection.',
|
||||
}),
|
||||
const ArchitectureSchema = z.object({
|
||||
framework_and_language: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe('Framework and language details with their security implications.'),
|
||||
architectural_pattern: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe('Architectural pattern (monolith, microservices, hybrid) with trust boundary analysis.'),
|
||||
critical_security_components: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe('Critical security components with focus on auth, authz, and data protection.'),
|
||||
});
|
||||
|
||||
const DataSecuritySchema = Type.Object({
|
||||
database_security: Type.String({
|
||||
minLength: 1,
|
||||
description: 'Analyze encryption, access controls, and query safety in database interactions.',
|
||||
}),
|
||||
data_flow_security: Type.String({
|
||||
minLength: 1,
|
||||
description: 'Identify sensitive data paths and the protection mechanisms applied along them.',
|
||||
}),
|
||||
multi_tenant_isolation: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
const DataSecuritySchema = z.object({
|
||||
database_security: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe('Analyze encryption, access controls, and query safety in database interactions.'),
|
||||
data_flow_security: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe('Identify sensitive data paths and the protection mechanisms applied along them.'),
|
||||
multi_tenant_isolation: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Assess tenant separation effectiveness. If the application is single-tenant, state that ' +
|
||||
'explicitly rather than leaving the field thin.',
|
||||
}),
|
||||
'explicitly rather than leaving the field thin.',
|
||||
),
|
||||
});
|
||||
|
||||
const AttackSurfaceSchema = Type.Object({
|
||||
external_entry_points: Type.String({
|
||||
minLength: 1,
|
||||
description: 'Detailed analysis of each public interface that is network-accessible.',
|
||||
}),
|
||||
internal_service_communication: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
const AttackSurfaceSchema = z.object({
|
||||
external_entry_points: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe('Detailed analysis of each public interface that is network-accessible.'),
|
||||
internal_service_communication: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Trust relationships and security assumptions between network-reachable services. ' +
|
||||
'If the application is a single service with no internal RPC fabric, state that.',
|
||||
}),
|
||||
input_validation_patterns: Type.String({
|
||||
minLength: 1,
|
||||
description: 'How user input is handled and validated in network-accessible endpoints.',
|
||||
}),
|
||||
background_processing: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
'If the application is a single service with no internal RPC fabric, state that.',
|
||||
),
|
||||
input_validation_patterns: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe('How user input is handled and validated in network-accessible endpoints.'),
|
||||
background_processing: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Async job security and privilege models for jobs triggered by network requests. ' +
|
||||
'If no async/background processing exists, state that.',
|
||||
}),
|
||||
'If no async/background processing exists, state that.',
|
||||
),
|
||||
});
|
||||
|
||||
const InfrastructureSchema = Type.Object({
|
||||
secrets_management: Type.String({ minLength: 1, description: 'How secrets are stored, rotated, and accessed.' }),
|
||||
configuration_security: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
const InfrastructureSchema = z.object({
|
||||
secrets_management: z.string().min(1).describe('How secrets are stored, rotated, and accessed.'),
|
||||
configuration_security: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Environment separation and secret handling. Specifically search for infrastructure ' +
|
||||
'configuration (e.g., Nginx, Kubernetes Ingress, CDN settings) that defines security ' +
|
||||
'headers like Strict-Transport-Security (HSTS) and Cache-Control, and report what was found.',
|
||||
}),
|
||||
external_dependencies: Type.String({
|
||||
minLength: 1,
|
||||
description: 'Third-party services and their security implications.',
|
||||
}),
|
||||
monitoring_and_logging: Type.String({
|
||||
minLength: 1,
|
||||
description: 'Security event visibility — what is logged, where it goes, and who can see it.',
|
||||
}),
|
||||
'configuration (e.g., Nginx, Kubernetes Ingress, CDN settings) that defines security ' +
|
||||
'headers like Strict-Transport-Security (HSTS) and Cache-Control, and report what was found.',
|
||||
),
|
||||
external_dependencies: z.string().min(1).describe('Third-party services and their security implications.'),
|
||||
monitoring_and_logging: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe('Security event visibility — what is logged, where it goes, and who can see it.'),
|
||||
});
|
||||
|
||||
export const ApplicationIntelligenceInputSchema = Type.Object({
|
||||
architecture: Type.Object(ArchitectureSchema.properties, {
|
||||
description:
|
||||
'Architecture & Technology Stack — driven by the Architecture Scanner sub-agent. ' +
|
||||
export const ApplicationIntelligenceInputSchema = z.object({
|
||||
architecture: ArchitectureSchema.describe(
|
||||
'Architecture & Technology Stack — driven by the Architecture Scanner sub-agent. ' +
|
||||
'Becomes Section 2 of the rendered deliverable.',
|
||||
}),
|
||||
data_security: Type.Object(DataSecuritySchema.properties, {
|
||||
description:
|
||||
'Data Security & Storage — driven by the Data Security Auditor sub-agent. ' +
|
||||
),
|
||||
data_security: DataSecuritySchema.describe(
|
||||
'Data Security & Storage — driven by the Data Security Auditor sub-agent. ' +
|
||||
'Becomes Section 4 of the rendered deliverable.',
|
||||
}),
|
||||
attack_surface: Type.Object(AttackSurfaceSchema.properties, {
|
||||
description:
|
||||
'Attack Surface Analysis — driven by Entry Point Mapper + Architecture Scanner sub-agents. ' +
|
||||
),
|
||||
attack_surface: AttackSurfaceSchema.describe(
|
||||
'Attack Surface Analysis — driven by Entry Point Mapper + Architecture Scanner sub-agents. ' +
|
||||
'Only include entry points confirmed to be in-scope (network-reachable). ' +
|
||||
'Becomes Section 5 of the rendered deliverable.',
|
||||
}),
|
||||
infrastructure: Type.Object(InfrastructureSchema.properties, {
|
||||
description: 'Infrastructure & Operational Security. Becomes Section 6 of the rendered deliverable.',
|
||||
}),
|
||||
),
|
||||
infrastructure: InfrastructureSchema.describe(
|
||||
'Infrastructure & Operational Security. Becomes Section 6 of the rendered deliverable.',
|
||||
),
|
||||
});
|
||||
|
||||
export const AuthDeepDiveInputSchema = Type.Object({
|
||||
authentication_mechanisms: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
export const AuthDeepDiveInputSchema = z.object({
|
||||
authentication_mechanisms: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Authentication mechanisms and their security properties. MUST include an exhaustive list of ' +
|
||||
'all API endpoints used for authentication (e.g., login, logout, token refresh, password reset).',
|
||||
}),
|
||||
session_management: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
'all API endpoints used for authentication (e.g., login, logout, token refresh, password reset).',
|
||||
),
|
||||
session_management: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Session management and token security. Pinpoint the exact file and line(s) of code where ' +
|
||||
'session cookie flags (HttpOnly, Secure, SameSite) are configured.',
|
||||
}),
|
||||
authz_model: Type.String({ minLength: 1, description: 'Authorization model and potential bypass scenarios.' }),
|
||||
multi_tenancy: Type.String({
|
||||
minLength: 1,
|
||||
description: 'Multi-tenancy security implementation. If the application is single-tenant, state that explicitly.',
|
||||
}),
|
||||
sso_oauth_oidc: Type.Union([Type.String(), Type.Null()], {
|
||||
description:
|
||||
'session cookie flags (HttpOnly, Secure, SameSite) are configured.',
|
||||
),
|
||||
authz_model: z.string().min(1).describe('Authorization model and potential bypass scenarios.'),
|
||||
multi_tenancy: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe('Multi-tenancy security implementation. If the application is single-tenant, state that explicitly.'),
|
||||
sso_oauth_oidc: z
|
||||
.string()
|
||||
.nullable()
|
||||
.describe(
|
||||
'SSO/OAuth/OIDC flows: identify the callback endpoints and locate the specific code that ' +
|
||||
'validates the state and nonce parameters. Set null only if the application has no SSO/OAuth/OIDC ' +
|
||||
'integration at all.',
|
||||
}),
|
||||
'validates the state and nonce parameters. Set null only if the application has no SSO/OAuth/OIDC ' +
|
||||
'integration at all.',
|
||||
),
|
||||
});
|
||||
|
||||
export const CodebaseIndexingInputSchema = Type.Object({
|
||||
text: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
export const CodebaseIndexingInputSchema = z.object({
|
||||
text: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
"A detailed, multi-sentence paragraph describing the codebase's directory structure, " +
|
||||
'organization, and significant tools or conventions used (e.g., build orchestration, code ' +
|
||||
'generation, testing frameworks). Focus on how this structure impacts discoverability of ' +
|
||||
'security-relevant components.',
|
||||
}),
|
||||
'organization, and significant tools or conventions used (e.g., build orchestration, code ' +
|
||||
'generation, testing frameworks). Focus on how this structure impacts discoverability of ' +
|
||||
'security-relevant components.',
|
||||
),
|
||||
});
|
||||
|
||||
export const CriticalFilePathsInputSchema = Type.Object({
|
||||
configuration: Type.Array(Type.String({ minLength: 1 }), {
|
||||
description: 'Configuration files (e.g., config/server.yaml, Dockerfile, docker-compose.yml).',
|
||||
}),
|
||||
authentication_and_authorization: Type.Array(Type.String({ minLength: 1 }), {
|
||||
description:
|
||||
export const CriticalFilePathsInputSchema = z.object({
|
||||
configuration: z
|
||||
.array(z.string().min(1))
|
||||
.describe('Configuration files (e.g., config/server.yaml, Dockerfile, docker-compose.yml).'),
|
||||
authentication_and_authorization: z
|
||||
.array(z.string().min(1))
|
||||
.describe(
|
||||
'Auth/authz files (e.g., auth/jwt_middleware.go, internal/user/permissions.go, ' +
|
||||
'config/initializers/session_store.rb, src/services/oauth_callback.js).',
|
||||
}),
|
||||
api_and_routing: Type.Array(Type.String({ minLength: 1 }), {
|
||||
description:
|
||||
'config/initializers/session_store.rb, src/services/oauth_callback.js).',
|
||||
),
|
||||
api_and_routing: z
|
||||
.array(z.string().min(1))
|
||||
.describe(
|
||||
'API and routing files (e.g., cmd/api/main.go, internal/handlers/user_routes.go, ' +
|
||||
'ts/graphql/schema.graphql).',
|
||||
}),
|
||||
data_models_and_db: Type.Array(Type.String({ minLength: 1 }), {
|
||||
description:
|
||||
'ts/graphql/schema.graphql).',
|
||||
),
|
||||
data_models_and_db: z
|
||||
.array(z.string().min(1))
|
||||
.describe(
|
||||
'Data model and DB interaction files (e.g., db/migrations/001_initial.sql, ' +
|
||||
'internal/models/user.go, internal/repository/sql_queries.go).',
|
||||
}),
|
||||
dependency_manifests: Type.Array(Type.String({ minLength: 1 }), {
|
||||
description: 'Dependency manifests (e.g., go.mod, package.json, requirements.txt).',
|
||||
}),
|
||||
sensitive_data_and_secrets: Type.Array(Type.String({ minLength: 1 }), {
|
||||
description:
|
||||
'internal/models/user.go, internal/repository/sql_queries.go).',
|
||||
),
|
||||
dependency_manifests: z
|
||||
.array(z.string().min(1))
|
||||
.describe('Dependency manifests (e.g., go.mod, package.json, requirements.txt).'),
|
||||
sensitive_data_and_secrets: z
|
||||
.array(z.string().min(1))
|
||||
.describe(
|
||||
'Sensitive data and secrets handling (e.g., internal/utils/encryption.go, ' + 'internal/secrets/manager.go).',
|
||||
}),
|
||||
middleware_and_input_validation: Type.Array(Type.String({ minLength: 1 }), {
|
||||
description:
|
||||
),
|
||||
middleware_and_input_validation: z
|
||||
.array(z.string().min(1))
|
||||
.describe(
|
||||
'Middleware and input validation (e.g., internal/middleware/validator.go, ' +
|
||||
'internal/handlers/input_parsers.go).',
|
||||
}),
|
||||
logging_and_monitoring: Type.Array(Type.String({ minLength: 1 }), {
|
||||
description: 'Logging and monitoring (e.g., internal/logging/logger.go, config/monitoring.yaml).',
|
||||
}),
|
||||
infrastructure_and_deployment: Type.Array(Type.String({ minLength: 1 }), {
|
||||
description:
|
||||
'internal/handlers/input_parsers.go).',
|
||||
),
|
||||
logging_and_monitoring: z
|
||||
.array(z.string().min(1))
|
||||
.describe('Logging and monitoring (e.g., internal/logging/logger.go, config/monitoring.yaml).'),
|
||||
infrastructure_and_deployment: z
|
||||
.array(z.string().min(1))
|
||||
.describe(
|
||||
'Infrastructure and deployment (e.g., infra/pulumi/main.go, kubernetes/deploy.yaml, ' +
|
||||
'nginx.conf, gateway-ingress.yaml).',
|
||||
}),
|
||||
'nginx.conf, gateway-ingress.yaml).',
|
||||
),
|
||||
});
|
||||
|
||||
export const XssSinksInputSchema = Type.Object({
|
||||
applicable: Type.Boolean({
|
||||
description:
|
||||
export const XssSinksInputSchema = z.object({
|
||||
applicable: z
|
||||
.boolean()
|
||||
.describe(
|
||||
'False only if the application has no web frontend at all. Otherwise true, even if no ' +
|
||||
'sinks were found in a given category — empty arrays mean "scanned this category, no sinks found".',
|
||||
}),
|
||||
html_body: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'sinks were found in a given category — empty arrays mean "scanned this category, no sinks found".',
|
||||
),
|
||||
html_body: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'HTML Body Context sinks: element.innerHTML, element.outerHTML, document.write(), ' +
|
||||
'document.writeln(), element.insertAdjacentHTML(), Range.createContextualFragment(), ' +
|
||||
'and jQuery sinks like add(), after(), append(), before(), html(), prepend(), replaceWith(), wrap().',
|
||||
}),
|
||||
html_attribute: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'document.writeln(), element.insertAdjacentHTML(), Range.createContextualFragment(), ' +
|
||||
'and jQuery sinks like add(), after(), append(), before(), html(), prepend(), replaceWith(), wrap().',
|
||||
),
|
||||
html_attribute: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'HTML Attribute Context sinks: event handlers (onclick, onerror, onmouseover, onload, onfocus), ' +
|
||||
'URL-based attributes (href, src, formaction, action, background, data), the style attribute, ' +
|
||||
'iframe srcdoc, and general attributes (value, id, class, name, alt) when quotes are escaped.',
|
||||
}),
|
||||
javascript: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'URL-based attributes (href, src, formaction, action, background, data), the style attribute, ' +
|
||||
'iframe srcdoc, and general attributes (value, id, class, name, alt) when quotes are escaped.',
|
||||
),
|
||||
javascript: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'JavaScript Context sinks: eval(), Function() constructor, setTimeout() / setInterval() ' +
|
||||
'with string arguments, and direct writes of user data into a <script> tag.',
|
||||
}),
|
||||
css: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'with string arguments, and direct writes of user data into a <script> tag.',
|
||||
),
|
||||
css: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'CSS Context sinks: element.style properties (e.g., element.style.backgroundImage) and ' +
|
||||
'direct writes of user data into a <style> tag.',
|
||||
}),
|
||||
url: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'direct writes of user data into a <style> tag.',
|
||||
),
|
||||
url: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'URL Context sinks: location / window.location, location.href, location.replace(), ' +
|
||||
'location.assign(), window.open(), history.pushState(), history.replaceState(), ' +
|
||||
'URL.createObjectURL(), and jQuery selector $(userInput) in older versions.',
|
||||
}),
|
||||
'location.assign(), window.open(), history.pushState(), history.replaceState(), ' +
|
||||
'URL.createObjectURL(), and jQuery selector $(userInput) in older versions.',
|
||||
),
|
||||
});
|
||||
|
||||
export const SsrfSinksInputSchema = Type.Object({
|
||||
applicable: Type.Boolean({
|
||||
description:
|
||||
export const SsrfSinksInputSchema = z.object({
|
||||
applicable: z
|
||||
.boolean()
|
||||
.describe(
|
||||
'False only if the application makes no outbound requests at all. Otherwise true, even if ' +
|
||||
'no sinks were found in a given category — empty arrays mean "scanned this category, no sinks found".',
|
||||
}),
|
||||
http_clients: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'no sinks were found in a given category — empty arrays mean "scanned this category, no sinks found".',
|
||||
),
|
||||
http_clients: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'HTTP(S) clients: curl, requests (Python), axios (Node.js), fetch (JavaScript/Node.js), ' +
|
||||
'net/http (Go), HttpClient (Java/.NET), urllib (Python), RestTemplate, WebClient, OkHttp, Apache HttpClient.',
|
||||
}),
|
||||
raw_sockets: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'net/http (Go), HttpClient (Java/.NET), urllib (Python), RestTemplate, WebClient, OkHttp, Apache HttpClient.',
|
||||
),
|
||||
raw_sockets: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'Raw sockets and connect APIs: Socket.connect, net.Dial (Go), socket.connect (Python), ' +
|
||||
'TcpClient, UdpClient, NetworkStream, java.net.Socket, java.net.URL.openConnection().',
|
||||
}),
|
||||
url_openers: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'TcpClient, UdpClient, NetworkStream, java.net.Socket, java.net.URL.openConnection().',
|
||||
),
|
||||
url_openers: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'URL openers and file includes: file_get_contents (PHP), fopen, include_once, require_once, ' +
|
||||
'new URL().openStream() (Java), urllib.urlopen (Python), fs.readFile with URLs, ' +
|
||||
'import() with dynamic URLs, loadHTML / loadXML with external sources.',
|
||||
}),
|
||||
redirect_handlers: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'new URL().openStream() (Java), urllib.urlopen (Python), fs.readFile with URLs, ' +
|
||||
'import() with dynamic URLs, loadHTML / loadXML with external sources.',
|
||||
),
|
||||
redirect_handlers: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'Redirect and "next URL" handlers: auto-follow redirects in HTTP clients, framework Location ' +
|
||||
'handlers (response.redirect), URL validation in redirect chains, "Continue to" / "Return URL" parameters.',
|
||||
}),
|
||||
headless_browsers: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'handlers (response.redirect), URL validation in redirect chains, "Continue to" / "Return URL" parameters.',
|
||||
),
|
||||
headless_browsers: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'Headless browsers and render engines: Puppeteer (page.goto, page.setContent), ' +
|
||||
'Playwright (page.navigate, page.route), Selenium WebDriver navigation, html-to-pdf converters ' +
|
||||
'(wkhtmltopdf, Puppeteer PDF), and SSR with external content.',
|
||||
}),
|
||||
media_processors: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'Playwright (page.navigate, page.route), Selenium WebDriver navigation, html-to-pdf converters ' +
|
||||
'(wkhtmltopdf, Puppeteer PDF), and SSR with external content.',
|
||||
),
|
||||
media_processors: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'Media processors: ImageMagick (convert, identify with URLs), GraphicsMagick, FFmpeg with ' +
|
||||
'network sources, wkhtmltopdf, Ghostscript with URL inputs, image optimization services with URL parameters.',
|
||||
}),
|
||||
link_preview: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'network sources, wkhtmltopdf, Ghostscript with URL inputs, image optimization services with URL parameters.',
|
||||
),
|
||||
link_preview: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'Link preview and unfurlers: chat application link expanders, CMS link preview generators, ' +
|
||||
'oEmbed endpoint fetchers, social media card generators, URL metadata extractors.',
|
||||
}),
|
||||
webhook_testers: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'oEmbed endpoint fetchers, social media card generators, URL metadata extractors.',
|
||||
),
|
||||
webhook_testers: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'Webhook testers and callback verifiers: "ping my webhook" functionality, outbound callback ' +
|
||||
'verification, health check notifications, event delivery confirmations, API endpoint validation tools.',
|
||||
}),
|
||||
sso_oidc_discovery: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'verification, health check notifications, event delivery confirmations, API endpoint validation tools.',
|
||||
),
|
||||
sso_oidc_discovery: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'SSO/OIDC discovery and JWKS fetchers: OpenID Connect discovery endpoints, JWKS fetchers, ' +
|
||||
'OAuth authorization server metadata, SAML metadata fetchers, federation metadata retrievers.',
|
||||
}),
|
||||
importers: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'OAuth authorization server metadata, SAML metadata fetchers, federation metadata retrievers.',
|
||||
),
|
||||
importers: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'Importers and data loaders: "import from URL" functionality, CSV/JSON/XML remote loaders, ' +
|
||||
'RSS/Atom feed readers, API data synchronization, configuration file fetchers.',
|
||||
}),
|
||||
package_installers: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'RSS/Atom feed readers, API data synchronization, configuration file fetchers.',
|
||||
),
|
||||
package_installers: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'Package/plugin/theme installers: "install from URL" features, package managers with remote ' +
|
||||
'sources, plugin/theme downloaders, update mechanisms with remote checks, dependency resolution ' +
|
||||
'with external repos.',
|
||||
}),
|
||||
monitoring_and_health: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'sources, plugin/theme downloaders, update mechanisms with remote checks, dependency resolution ' +
|
||||
'with external repos.',
|
||||
),
|
||||
monitoring_and_health: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'Monitoring and health check frameworks: URL pingers and uptime checkers, health check ' +
|
||||
'endpoints, monitoring probe systems, alerting webhook senders, performance testing tools.',
|
||||
}),
|
||||
cloud_metadata: Type.Array(SinkRefSchema, {
|
||||
description:
|
||||
'endpoints, monitoring probe systems, alerting webhook senders, performance testing tools.',
|
||||
),
|
||||
cloud_metadata: z
|
||||
.array(SinkRefSchema)
|
||||
.describe(
|
||||
'Cloud metadata helpers: AWS/GCP/Azure instance metadata callers, cloud service discovery ' +
|
||||
'mechanisms, container orchestration API clients, infrastructure metadata fetchers, service mesh ' +
|
||||
'configuration retrievers.',
|
||||
}),
|
||||
'mechanisms, container orchestration API clients, infrastructure metadata fetchers, service mesh ' +
|
||||
'configuration retrievers.',
|
||||
),
|
||||
});
|
||||
|
||||
// ============================================================================
|
||||
// EXPORTED TYPES
|
||||
// ============================================================================
|
||||
|
||||
export type ExecutiveSummaryInput = Static<typeof ExecutiveSummaryInputSchema>;
|
||||
export type ApplicationIntelligenceInput = Static<typeof ApplicationIntelligenceInputSchema>;
|
||||
export type AuthDeepDiveInput = Static<typeof AuthDeepDiveInputSchema>;
|
||||
export type CodebaseIndexingInput = Static<typeof CodebaseIndexingInputSchema>;
|
||||
export type CriticalFilePathsInput = Static<typeof CriticalFilePathsInputSchema>;
|
||||
export type XssSinksInput = Static<typeof XssSinksInputSchema>;
|
||||
export type SsrfSinksInput = Static<typeof SsrfSinksInputSchema>;
|
||||
export type ExecutiveSummaryInput = z.infer<typeof ExecutiveSummaryInputSchema>;
|
||||
export type ApplicationIntelligenceInput = z.infer<typeof ApplicationIntelligenceInputSchema>;
|
||||
export type AuthDeepDiveInput = z.infer<typeof AuthDeepDiveInputSchema>;
|
||||
export type CodebaseIndexingInput = z.infer<typeof CodebaseIndexingInputSchema>;
|
||||
export type CriticalFilePathsInput = z.infer<typeof CriticalFilePathsInputSchema>;
|
||||
export type XssSinksInput = z.infer<typeof XssSinksInputSchema>;
|
||||
export type SsrfSinksInput = z.infer<typeof SsrfSinksInputSchema>;
|
||||
|
||||
export interface PreReconData {
|
||||
readonly executive_summary?: ExecutiveSummaryInput;
|
||||
@@ -394,27 +427,32 @@ export type PreReconCallStatus = Readonly<Record<PreReconToolName, PreReconToolS
|
||||
// ============================================================================
|
||||
|
||||
interface ToolResult {
|
||||
[x: string]: unknown;
|
||||
content: Array<{ type: 'text'; text: string }>;
|
||||
details: Record<string, unknown>;
|
||||
isError?: boolean;
|
||||
isError: boolean;
|
||||
}
|
||||
|
||||
function createToolResult(response: { status: string; [key: string]: unknown }): ToolResult {
|
||||
return {
|
||||
content: [{ type: 'text', text: JSON.stringify(response, null, 2) }],
|
||||
isError: response.status === 'error',
|
||||
};
|
||||
}
|
||||
|
||||
function successResult(data: Record<string, unknown>): ToolResult {
|
||||
const response = { status: 'success', ...data };
|
||||
return { content: [{ type: 'text' as const, text: JSON.stringify(response, null, 2) }], details: {} };
|
||||
return createToolResult({ status: 'success', ...data });
|
||||
}
|
||||
|
||||
function errorResult(message: string, errorType = 'ValidationError', retryable = true): ToolResult {
|
||||
const response = { status: 'error', message, errorType, retryable };
|
||||
return { content: [{ type: 'text' as const, text: JSON.stringify(response, null, 2) }], details: {}, isError: true };
|
||||
return createToolResult({ status: 'error', message, errorType, retryable });
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// TOOLS FACTORY
|
||||
// SERVER FACTORY
|
||||
// ============================================================================
|
||||
|
||||
export interface PreReconCollectorServer {
|
||||
tools: ToolDefinition[];
|
||||
server: McpSdkServerConfigWithInstance;
|
||||
getAll(): PreReconData;
|
||||
getCallStatus(): PreReconCallStatus;
|
||||
}
|
||||
@@ -438,122 +476,112 @@ export function createPreReconCollectorServer(): PreReconCollectorServer {
|
||||
);
|
||||
}
|
||||
|
||||
const setExecutiveSummary = defineTool({
|
||||
name: 'set_executive_summary',
|
||||
label: 'Set Executive Summary',
|
||||
description:
|
||||
"Record the application's overall security posture as a short executive summary. " +
|
||||
const setExecutiveSummary = tool(
|
||||
'set_executive_summary',
|
||||
"Record the application's overall security posture as a short executive summary. " +
|
||||
'Call exactly once before terminating. Becomes Section 1 of the rendered deliverable. ' +
|
||||
'Duplicate calls are rejected.',
|
||||
parameters: ExecutiveSummaryInputSchema,
|
||||
execute: async (_toolCallId, input): Promise<ToolResult> => {
|
||||
ExecutiveSummaryInputSchema.shape,
|
||||
async (input): Promise<ToolResult> => {
|
||||
if (state.executive_summary) return alreadyCalled('set_executive_summary');
|
||||
state.executive_summary = input;
|
||||
return successResult({ set: 'set_executive_summary' });
|
||||
},
|
||||
});
|
||||
);
|
||||
|
||||
const setApplicationIntelligence = defineTool({
|
||||
name: 'set_application_intelligence',
|
||||
label: 'Set Application Intelligence',
|
||||
description:
|
||||
'Record the composite application intelligence — architecture, data security, attack surface, ' +
|
||||
const setApplicationIntelligence = tool(
|
||||
'set_application_intelligence',
|
||||
'Record the composite application intelligence — architecture, data security, attack surface, ' +
|
||||
'and infrastructure — in a single call. Call exactly once before terminating. ' +
|
||||
'Becomes Sections 2, 4, 5, and 6 of the rendered deliverable. Duplicate calls are rejected.',
|
||||
parameters: ApplicationIntelligenceInputSchema,
|
||||
execute: async (_toolCallId, input): Promise<ToolResult> => {
|
||||
ApplicationIntelligenceInputSchema.shape,
|
||||
async (input): Promise<ToolResult> => {
|
||||
if (state.application_intelligence) return alreadyCalled('set_application_intelligence');
|
||||
state.application_intelligence = input;
|
||||
return successResult({ set: 'set_application_intelligence' });
|
||||
},
|
||||
});
|
||||
);
|
||||
|
||||
const setAuthDeepDive = defineTool({
|
||||
name: 'set_auth_deep_dive',
|
||||
label: 'Set Auth Deep Dive',
|
||||
description:
|
||||
'Record the authentication & authorization deep dive. Call exactly once before terminating. ' +
|
||||
const setAuthDeepDive = tool(
|
||||
'set_auth_deep_dive',
|
||||
'Record the authentication & authorization deep dive. Call exactly once before terminating. ' +
|
||||
'Becomes Section 3 of the rendered deliverable. Duplicate calls are rejected.',
|
||||
parameters: AuthDeepDiveInputSchema,
|
||||
execute: async (_toolCallId, input): Promise<ToolResult> => {
|
||||
AuthDeepDiveInputSchema.shape,
|
||||
async (input): Promise<ToolResult> => {
|
||||
if (state.auth_deep_dive) return alreadyCalled('set_auth_deep_dive');
|
||||
state.auth_deep_dive = input;
|
||||
return successResult({ set: 'set_auth_deep_dive' });
|
||||
},
|
||||
});
|
||||
);
|
||||
|
||||
const setCodebaseIndexing = defineTool({
|
||||
name: 'set_codebase_indexing',
|
||||
label: 'Set Codebase Indexing',
|
||||
description:
|
||||
'Record the overall codebase indexing narrative. Call exactly once before terminating. ' +
|
||||
const setCodebaseIndexing = tool(
|
||||
'set_codebase_indexing',
|
||||
'Record the overall codebase indexing narrative. Call exactly once before terminating. ' +
|
||||
'Becomes Section 7 of the rendered deliverable. Duplicate calls are rejected.',
|
||||
parameters: CodebaseIndexingInputSchema,
|
||||
execute: async (_toolCallId, input): Promise<ToolResult> => {
|
||||
CodebaseIndexingInputSchema.shape,
|
||||
async (input): Promise<ToolResult> => {
|
||||
if (state.codebase_indexing) return alreadyCalled('set_codebase_indexing');
|
||||
state.codebase_indexing = input;
|
||||
return successResult({ set: 'set_codebase_indexing' });
|
||||
},
|
||||
});
|
||||
);
|
||||
|
||||
const setCriticalFilePaths = defineTool({
|
||||
name: 'set_critical_file_paths',
|
||||
label: 'Set Critical File Paths',
|
||||
description:
|
||||
'Record the catalog of critical file paths grouped by security relevance. Call exactly once ' +
|
||||
const setCriticalFilePaths = tool(
|
||||
'set_critical_file_paths',
|
||||
'Record the catalog of critical file paths grouped by security relevance. Call exactly once ' +
|
||||
'before terminating. Becomes Section 8 of the rendered deliverable. The next agent uses this ' +
|
||||
'as a starting point for manual review. Duplicate calls are rejected.',
|
||||
parameters: CriticalFilePathsInputSchema,
|
||||
execute: async (_toolCallId, input): Promise<ToolResult> => {
|
||||
CriticalFilePathsInputSchema.shape,
|
||||
async (input): Promise<ToolResult> => {
|
||||
if (state.critical_file_paths) return alreadyCalled('set_critical_file_paths');
|
||||
state.critical_file_paths = input;
|
||||
return successResult({ set: 'set_critical_file_paths' });
|
||||
},
|
||||
});
|
||||
);
|
||||
|
||||
const setXssSinks = defineTool({
|
||||
name: 'set_xss_sinks',
|
||||
label: 'Set Xss Sinks',
|
||||
description:
|
||||
'Record discovered XSS sinks grouped by render context. Call exactly once before terminating. ' +
|
||||
const setXssSinks = tool(
|
||||
'set_xss_sinks',
|
||||
'Record discovered XSS sinks grouped by render context. Call exactly once before terminating. ' +
|
||||
'If the application has no web frontend at all, set applicable=false; otherwise populate each ' +
|
||||
'render-context array (empty arrays mean "scanned, no sinks of this kind"). This list drives ' +
|
||||
"the vuln-xss agent's testing todos downstream. Becomes Section 9 of the rendered deliverable. " +
|
||||
'Duplicate calls are rejected.',
|
||||
parameters: XssSinksInputSchema,
|
||||
execute: async (_toolCallId, input): Promise<ToolResult> => {
|
||||
XssSinksInputSchema.shape,
|
||||
async (input): Promise<ToolResult> => {
|
||||
if (state.xss_sinks) return alreadyCalled('set_xss_sinks');
|
||||
state.xss_sinks = input;
|
||||
return successResult({ set: 'set_xss_sinks' });
|
||||
},
|
||||
});
|
||||
);
|
||||
|
||||
const setSsrfSinks = defineTool({
|
||||
name: 'set_ssrf_sinks',
|
||||
label: 'Set Ssrf Sinks',
|
||||
description:
|
||||
'Record discovered SSRF sinks grouped by sink category. Call exactly once before terminating. ' +
|
||||
const setSsrfSinks = tool(
|
||||
'set_ssrf_sinks',
|
||||
'Record discovered SSRF sinks grouped by sink category. Call exactly once before terminating. ' +
|
||||
'If the application makes no outbound requests at all, set applicable=false; otherwise populate ' +
|
||||
'each category array (empty arrays mean "scanned, no sinks of this kind"). This list drives ' +
|
||||
"the vuln-ssrf agent's testing todos downstream. Becomes Section 10 of the rendered deliverable. " +
|
||||
'Duplicate calls are rejected.',
|
||||
parameters: SsrfSinksInputSchema,
|
||||
execute: async (_toolCallId, input): Promise<ToolResult> => {
|
||||
SsrfSinksInputSchema.shape,
|
||||
async (input): Promise<ToolResult> => {
|
||||
if (state.ssrf_sinks) return alreadyCalled('set_ssrf_sinks');
|
||||
state.ssrf_sinks = input;
|
||||
return successResult({ set: 'set_ssrf_sinks' });
|
||||
},
|
||||
});
|
||||
);
|
||||
|
||||
const tools: ToolDefinition[] = [
|
||||
setExecutiveSummary,
|
||||
setApplicationIntelligence,
|
||||
setAuthDeepDive,
|
||||
setCodebaseIndexing,
|
||||
setCriticalFilePaths,
|
||||
setXssSinks,
|
||||
setSsrfSinks,
|
||||
];
|
||||
const server: McpSdkServerConfigWithInstance = createSdkMcpServer({
|
||||
name: 'pre-recon-collector',
|
||||
version: '1.0.0',
|
||||
tools: [
|
||||
setExecutiveSummary,
|
||||
setApplicationIntelligence,
|
||||
setAuthDeepDive,
|
||||
setCodebaseIndexing,
|
||||
setCriticalFilePaths,
|
||||
setXssSinks,
|
||||
setSsrfSinks,
|
||||
],
|
||||
});
|
||||
|
||||
function statusOf<K extends PreReconToolName>(key: K): PreReconToolStatus {
|
||||
const flagMap: Record<PreReconToolName, unknown> = {
|
||||
@@ -569,7 +597,7 @@ export function createPreReconCollectorServer(): PreReconCollectorServer {
|
||||
}
|
||||
|
||||
return {
|
||||
tools,
|
||||
server,
|
||||
getAll: (): PreReconData => ({
|
||||
...(state.executive_summary && { executive_summary: state.executive_summary }),
|
||||
...(state.application_intelligence && { application_intelligence: state.application_intelligence }),
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -5,9 +5,9 @@
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Vuln Collector tools (factory parameterized by vulnerability class).
|
||||
* Vuln Collector MCP Server (factory parameterized by vulnerability class).
|
||||
*
|
||||
* Exposes 4 one-shot, TypeBox-validated tools per vuln agent (injection, xss,
|
||||
* Exposes 4 one-shot, Zod-validated MCP tools per vuln agent (injection, xss,
|
||||
* auth, ssrf, authz) that feed a deterministic renderer producing
|
||||
* {class}_analysis_deliverable.md:
|
||||
* - set_findings_summary — §1 executive summary + §2 dominant patterns
|
||||
@@ -20,13 +20,14 @@
|
||||
* across classes.
|
||||
*
|
||||
* Skipped tools surface as renderer placeholders, not activity failures.
|
||||
* getCallStatus() exposes the per-run call pattern for logging. Each schema's
|
||||
* field-level descriptions carry the section guidance, so the agent's tool
|
||||
* catalog surfaces it.
|
||||
* getCallStatus() exposes the per-run call pattern for logging. Each Zod
|
||||
* schema's field-level descriptions carry the section guidance, so the SDK
|
||||
* injects it into the agent's tool catalog.
|
||||
*/
|
||||
|
||||
import { defineTool, type ToolDefinition } from '@earendil-works/pi-coding-agent';
|
||||
import { type Static, Type } from 'typebox';
|
||||
import type { McpSdkServerConfigWithInstance } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { createSdkMcpServer, tool } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { type ZodRawShape, z } from 'zod';
|
||||
|
||||
// ============================================================================
|
||||
// CLASS DISCRIMINATOR
|
||||
@@ -45,262 +46,286 @@ export const BLIND_SPOTS_CLASSES: ReadonlySet<VulnClass> = new Set<VulnClass>(['
|
||||
// SHARED SCHEMAS — set_findings_summary, set_safe_vectors, set_blind_spots
|
||||
// ============================================================================
|
||||
|
||||
const PatternSchema = Type.Object({
|
||||
name: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
const PatternSchema = z.object({
|
||||
name: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Concise pattern name, e.g. "Weak Session Management", "Reflected XSS in Search Parameter", ' +
|
||||
'"Insufficient URL Validation".',
|
||||
}),
|
||||
description: Type.String({
|
||||
minLength: 1,
|
||||
description: 'One- to two-sentence description of the pattern observed in the codebase.',
|
||||
}),
|
||||
implication: Type.String({
|
||||
minLength: 1,
|
||||
description: 'One- to two-sentence implication for exploitation — what does this pattern enable an attacker to do.',
|
||||
}),
|
||||
representative_finding_ids: Type.Array(Type.String({ minLength: 1 }), {
|
||||
minItems: 1,
|
||||
description:
|
||||
'"Insufficient URL Validation".',
|
||||
),
|
||||
description: z.string().min(1).describe('One- to two-sentence description of the pattern observed in the codebase.'),
|
||||
implication: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe('One- to two-sentence implication for exploitation — what does this pattern enable an attacker to do.'),
|
||||
representative_finding_ids: z
|
||||
.array(z.string().min(1))
|
||||
.min(1)
|
||||
.describe(
|
||||
'IDs of findings that exhibit this pattern (e.g. ["AUTH-VULN-01", "AUTH-VULN-02"]). Must match ' +
|
||||
'IDs the agent has assigned in the structured-output exploitation queue.',
|
||||
}),
|
||||
'IDs the agent has assigned in the structured-output exploitation queue.',
|
||||
),
|
||||
});
|
||||
|
||||
export const FindingsSummaryInputSchema = Type.Object({
|
||||
key_outcome: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
export const FindingsSummaryInputSchema = z.object({
|
||||
key_outcome: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'One to two sentences capturing the headline result of your analysis — what was found and its ' +
|
||||
'severity profile (e.g. "Several high-confidence SQL injection vulnerabilities were identified; ' +
|
||||
'all findings have been passed to the exploitation phase"). Becomes Section 1 of the rendered ' +
|
||||
'deliverable.',
|
||||
}),
|
||||
patterns: Type.Array(PatternSchema, {
|
||||
description:
|
||||
'severity profile (e.g. "Several high-confidence SQL injection vulnerabilities were identified; ' +
|
||||
'all findings have been passed to the exploitation phase"). Becomes Section 1 of the rendered ' +
|
||||
'deliverable.',
|
||||
),
|
||||
patterns: z
|
||||
.array(PatternSchema)
|
||||
.describe(
|
||||
'Complete list of dominant patterns observed across findings. Pass all patterns in one call. ' +
|
||||
'Empty array is acceptable if no recurring patterns were observed — the deliverable will render ' +
|
||||
'"No dominant patterns identified" for Section 2 in that case.',
|
||||
}),
|
||||
'Empty array is acceptable if no recurring patterns were observed — the deliverable will render ' +
|
||||
'"No dominant patterns identified" for Section 2 in that case.',
|
||||
),
|
||||
});
|
||||
|
||||
export const SafeVectorInputSchema = Type.Object({
|
||||
subject: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
export const SafeVectorInputSchema = z.object({
|
||||
subject: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'The specific subject of analysis. For injection/xss runs, the input parameter name (e.g. ' +
|
||||
'"username", "redirect_url"). For auth/ssrf runs, the component or flow name (e.g. ' +
|
||||
'"Password Hashing", "Webhook Configuration"). For authz runs, the endpoint (e.g. ' +
|
||||
'"POST /api/auth/logout"). The renderer maps this to the class-appropriate column header.',
|
||||
}),
|
||||
location: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
'"username", "redirect_url"). For auth/ssrf runs, the component or flow name (e.g. ' +
|
||||
'"Password Hashing", "Webhook Configuration"). For authz runs, the endpoint (e.g. ' +
|
||||
'"POST /api/auth/logout"). The renderer maps this to the class-appropriate column header.',
|
||||
),
|
||||
location: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'File path with line number (e.g. "controllers/authController.js:45") or endpoint URL (e.g. ' +
|
||||
'"/profile"). For authz runs, this is the guard location specifically (e.g. ' +
|
||||
'"middleware/auth.js:45"). The renderer maps this to the class-appropriate column header.',
|
||||
}),
|
||||
defense_mechanism: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
'"/profile"). For authz runs, this is the guard location specifically (e.g. ' +
|
||||
'"middleware/auth.js:45"). The renderer maps this to the class-appropriate column header.',
|
||||
),
|
||||
defense_mechanism: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'The robust defense observed (e.g. "Prepared Statement (Parameter Binding)", "HTML Entity ' +
|
||||
'Encoding", "Strict URL Whitelist Validation", "bcrypt.compare for constant-time check").',
|
||||
}),
|
||||
render_context: Type.Optional(
|
||||
Type.Union([Type.String(), Type.Null()], {
|
||||
description:
|
||||
'XSS-only: the DOM render context for the validated vector — one of HTML_BODY, HTML_ATTRIBUTE, ' +
|
||||
'Encoding", "Strict URL Whitelist Validation", "bcrypt.compare for constant-time check").',
|
||||
),
|
||||
render_context: z
|
||||
.string()
|
||||
.nullable()
|
||||
.optional()
|
||||
.describe(
|
||||
'XSS-only: the DOM render context for the validated vector — one of HTML_BODY, HTML_ATTRIBUTE, ' +
|
||||
'JAVASCRIPT_STRING, URL_PARAM, CSS_VALUE. Omit (or pass null) for non-XSS classes; the renderer ' +
|
||||
'only emits this column for the XSS deliverable.',
|
||||
}),
|
||||
),
|
||||
),
|
||||
});
|
||||
|
||||
export const SafeVectorsInputSchema = Type.Object({
|
||||
vectors: Type.Array(SafeVectorInputSchema, {
|
||||
description:
|
||||
export const SafeVectorsInputSchema = z.object({
|
||||
vectors: z
|
||||
.array(SafeVectorInputSchema)
|
||||
.describe(
|
||||
'All input vectors / components / endpoints that were analyzed and confirmed to have robust, ' +
|
||||
'context-appropriate defenses. Empty array is acceptable but unusual — the deliverable will ' +
|
||||
'render "No vectors confirmed secure during analysis" for Section 4 in that case. Becomes ' +
|
||||
'Section 4 of the rendered deliverable. The renderer sorts by (subject, location) before ' +
|
||||
'rendering, so emission order does not affect output.',
|
||||
}),
|
||||
'context-appropriate defenses. Empty array is acceptable but unusual — the deliverable will ' +
|
||||
'render "No vectors confirmed secure during analysis" for Section 4 in that case. Becomes ' +
|
||||
'Section 4 of the rendered deliverable. The renderer sorts by (subject, location) before ' +
|
||||
'rendering, so emission order does not affect output.',
|
||||
),
|
||||
});
|
||||
|
||||
export const BlindSpotItemSchema = Type.Object({
|
||||
heading: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
export const BlindSpotItemSchema = z.object({
|
||||
heading: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Short heading for the blind spot (e.g. "Untraced Asynchronous Flows", ' +
|
||||
'"Limited Visibility into Stored Procedures", "Minified JavaScript Bundle").',
|
||||
}),
|
||||
description: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
'"Limited Visibility into Stored Procedures", "Minified JavaScript Bundle").',
|
||||
),
|
||||
description: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'One to three sentences describing the analysis gap — what could not be traced, why, and what ' +
|
||||
'the residual risk is.',
|
||||
}),
|
||||
'the residual risk is.',
|
||||
),
|
||||
});
|
||||
|
||||
export const BlindSpotsInputSchema = Type.Object({
|
||||
items: Type.Array(BlindSpotItemSchema, {
|
||||
description:
|
||||
export const BlindSpotsInputSchema = z.object({
|
||||
items: z
|
||||
.array(BlindSpotItemSchema)
|
||||
.describe(
|
||||
'Analysis constraints, untraced code paths, or other coverage gaps that should be noted. ' +
|
||||
'Empty array is acceptable on high-coverage runs — the deliverable will render "No analysis ' +
|
||||
'constraints or blind spots identified" for Section 5 in that case. Becomes Section 5 of the ' +
|
||||
'rendered deliverable.',
|
||||
}),
|
||||
'Empty array is acceptable on high-coverage runs — the deliverable will render "No analysis ' +
|
||||
'constraints or blind spots identified" for Section 5 in that case. Becomes Section 5 of the ' +
|
||||
'rendered deliverable.',
|
||||
),
|
||||
});
|
||||
|
||||
// ============================================================================
|
||||
// PER-CLASS set_strategic_intelligence SCHEMAS (flat — no nesting)
|
||||
// ============================================================================
|
||||
|
||||
const InjectionStrategicIntelSchema = Type.Object({
|
||||
defensive_evasion_waf: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
const InjectionStrategicIntelSchema = z.object({
|
||||
defensive_evasion_waf: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'WAF behavior observed during analysis: active rules, common payloads blocked, identified ' +
|
||||
'bypasses (e.g. "WAF blocks UNION SELECT but not time-based blind injection"). Write ' +
|
||||
'"Not applicable — no WAF observed" if none was detected.',
|
||||
}),
|
||||
error_based_potential: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
'bypasses (e.g. "WAF blocks UNION SELECT but not time-based blind injection"). Write ' +
|
||||
'"Not applicable — no WAF observed" if none was detected.',
|
||||
),
|
||||
error_based_potential: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Whether endpoints leak verbose database errors that enable error-based injection (e.g. ' +
|
||||
'"/api/products returns verbose PostgreSQL error messages, prime target for error-based ' +
|
||||
'exploitation"). Write "Not applicable" if no injection findings exist.',
|
||||
}),
|
||||
confirmed_database_technology: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
'"/api/products returns verbose PostgreSQL error messages, prime target for error-based ' +
|
||||
'exploitation"). Write "Not applicable" if no injection findings exist.',
|
||||
),
|
||||
confirmed_database_technology: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Database engine(s) confirmed via error syntax or function calls (e.g. "PostgreSQL, confirmed ' +
|
||||
'via pg_sleep() and verbose error syntax"). Drives payload selection downstream. Write ' +
|
||||
'"Not applicable" if no DB sinks in scope.',
|
||||
}),
|
||||
'via pg_sleep() and verbose error syntax"). Drives payload selection downstream. Write ' +
|
||||
'"Not applicable" if no DB sinks in scope.',
|
||||
),
|
||||
});
|
||||
|
||||
const XssStrategicIntelSchema = Type.Object({
|
||||
csp_analysis: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
const XssStrategicIntelSchema = z.object({
|
||||
csp_analysis: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Content Security Policy observed and its bypassability: current policy text, critical bypasses ' +
|
||||
"(e.g. \"script-src 'self' https://trusted-cdn.com — the trusted CDN hosts vulnerable AngularJS, " +
|
||||
'enabling client-side template injection bypass"). Write "Not applicable — no CSP header served" ' +
|
||||
'if none.',
|
||||
}),
|
||||
cookie_security: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
"(e.g. \"script-src 'self' https://trusted-cdn.com — the trusted CDN hosts vulnerable AngularJS, " +
|
||||
'enabling client-side template injection bypass"). Write "Not applicable — no CSP header served" ' +
|
||||
'if none.',
|
||||
),
|
||||
cookie_security: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Session cookie security observations: HttpOnly, Secure, SameSite flags, and storage mechanism ' +
|
||||
'(e.g. "Primary session cookie `sessionid` is missing HttpOnly; tokens are also stored in ' +
|
||||
'localStorage, both accessible to JavaScript"). Drives exfiltration strategy.',
|
||||
}),
|
||||
'(e.g. "Primary session cookie `sessionid` is missing HttpOnly; tokens are also stored in ' +
|
||||
'localStorage, both accessible to JavaScript"). Drives exfiltration strategy.',
|
||||
),
|
||||
});
|
||||
|
||||
const AuthStrategicIntelSchema = Type.Object({
|
||||
authentication_method: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
const AuthStrategicIntelSchema = z.object({
|
||||
authentication_method: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'How users authenticate: JWT, session cookie, OAuth, SAML, etc. Include any algorithm or library ' +
|
||||
'details (e.g. "JWT (RS256) with hardcoded private key in lib/insecurity.ts:23").',
|
||||
}),
|
||||
session_token_details: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
'details (e.g. "JWT (RS256) with hardcoded private key in lib/insecurity.ts:23").',
|
||||
),
|
||||
session_token_details: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Where tokens live and how they are protected: cookie name, storage mechanism (cookie vs ' +
|
||||
'localStorage), cookie flags, expiration (e.g. "JWT stored in localStorage under key `token`; ' +
|
||||
'cookie copy lacks HttpOnly/Secure/SameSite; 6-hour TTL with no revocation").',
|
||||
}),
|
||||
password_policy: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
'localStorage), cookie flags, expiration (e.g. "JWT stored in localStorage under key `token`; ' +
|
||||
'cookie copy lacks HttpOnly/Secure/SameSite; 6-hour TTL with no revocation").',
|
||||
),
|
||||
password_policy: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Observed server-side password policy and storage: complexity rules, hashing algorithm, salt, ' +
|
||||
'(e.g. "MD5 without salt via crypto.createHash; no server-side complexity policy; client-side ' +
|
||||
'5-char minimum trivially bypassed").',
|
||||
}),
|
||||
'(e.g. "MD5 without salt via crypto.createHash; no server-side complexity policy; client-side ' +
|
||||
'5-char minimum trivially bypassed").',
|
||||
),
|
||||
});
|
||||
|
||||
const SsrfStrategicIntelSchema = Type.Object({
|
||||
http_client_library: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
const SsrfStrategicIntelSchema = z.object({
|
||||
http_client_library: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'HTTP client library/libraries used for outbound requests (e.g. "axios 1.6", "node-fetch", ' +
|
||||
'"requests", "HttpClient (Spring)"). Include version where it informs known bypass techniques.',
|
||||
}),
|
||||
request_architecture: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
'"requests", "HttpClient (Spring)"). Include version where it informs known bypass techniques.',
|
||||
),
|
||||
request_architecture: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'How outbound requests are constructed and routed: proxy/middleware patterns, internal routing ' +
|
||||
'rules (e.g. "Webhook URLs are POSTed directly without an outbound proxy; redirects are ' +
|
||||
'followed by default with no maxRedirects limit").',
|
||||
}),
|
||||
internal_services: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
'rules (e.g. "Webhook URLs are POSTed directly without an outbound proxy; redirects are ' +
|
||||
'followed by default with no maxRedirects limit").',
|
||||
),
|
||||
internal_services: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Internal endpoints, services, or cloud-metadata addresses discovered during analysis that an ' +
|
||||
'SSRF could reach (e.g. "169.254.169.254 (AWS IMDS), internal admin API at admin.internal:8443, ' +
|
||||
'PostgreSQL on localhost:5432").',
|
||||
}),
|
||||
'SSRF could reach (e.g. "169.254.169.254 (AWS IMDS), internal admin API at admin.internal:8443, ' +
|
||||
'PostgreSQL on localhost:5432").',
|
||||
),
|
||||
});
|
||||
|
||||
const AuthzStrategicIntelSchema = Type.Object({
|
||||
session_management_architecture: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
const AuthzStrategicIntelSchema = z.object({
|
||||
session_management_architecture: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Session and authentication architecture relevant to authorization decisions: where user identity ' +
|
||||
'comes from, whether the user ID is trusted by downstream guards (e.g. "JWT tokens in cookies; ' +
|
||||
'user ID extracted from `req.user.id` and used directly in DB queries without ownership ' +
|
||||
're-validation").',
|
||||
}),
|
||||
role_permission_model: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
'comes from, whether the user ID is trusted by downstream guards (e.g. "JWT tokens in cookies; ' +
|
||||
'user ID extracted from `req.user.id` and used directly in DB queries without ownership ' +
|
||||
're-validation").',
|
||||
),
|
||||
role_permission_model: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Roles, capabilities, and where they live: identified roles, their privilege levels, and where ' +
|
||||
'role/permission data is stored (e.g. "Three roles: user, moderator, admin. Role embedded in ' +
|
||||
'JWT and database; checks inconsistent — many admin routes only check `req.user` presence").',
|
||||
}),
|
||||
resource_access_patterns: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
'role/permission data is stored (e.g. "Three roles: user, moderator, admin. Role embedded in ' +
|
||||
'JWT and database; checks inconsistent — many admin routes only check `req.user` presence").',
|
||||
),
|
||||
resource_access_patterns: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'How resource IDs flow through the system and ownership patterns: e.g. "Most endpoints use path ' +
|
||||
'parameters for resource IDs (/api/users/{id}); IDs are passed to DB queries without ownership ' +
|
||||
'validation". Critical for IDOR exploitation.',
|
||||
}),
|
||||
workflow_implementation: Type.String({
|
||||
minLength: 1,
|
||||
description:
|
||||
'parameters for resource IDs (/api/users/{id}); IDs are passed to DB queries without ownership ' +
|
||||
'validation". Critical for IDOR exploitation.',
|
||||
),
|
||||
workflow_implementation: z
|
||||
.string()
|
||||
.min(1)
|
||||
.describe(
|
||||
'Multi-step processes and state transitions: how workflow stages are tracked, whether prior-state ' +
|
||||
'checks are enforced (e.g. "Multi-step processes use status fields in database; status ' +
|
||||
'transitions do not verify prior state completion"). Drives context-based authz exploitation.',
|
||||
}),
|
||||
'checks are enforced (e.g. "Multi-step processes use status fields in database; status ' +
|
||||
'transitions do not verify prior state completion"). Drives context-based authz exploitation.',
|
||||
),
|
||||
});
|
||||
|
||||
const STRATEGIC_INTEL_SCHEMAS = {
|
||||
const STRATEGIC_INTEL_SCHEMAS: Record<VulnClass, z.ZodObject<ZodRawShape>> = {
|
||||
injection: InjectionStrategicIntelSchema,
|
||||
xss: XssStrategicIntelSchema,
|
||||
auth: AuthStrategicIntelSchema,
|
||||
ssrf: SsrfStrategicIntelSchema,
|
||||
authz: AuthzStrategicIntelSchema,
|
||||
} as const;
|
||||
};
|
||||
|
||||
// ============================================================================
|
||||
// EXPORTED TYPES
|
||||
// ============================================================================
|
||||
|
||||
export type Pattern = Static<typeof PatternSchema>;
|
||||
export type FindingsSummaryInput = Static<typeof FindingsSummaryInputSchema>;
|
||||
export type SafeVectorInput = Static<typeof SafeVectorInputSchema>;
|
||||
export type SafeVectorsInput = Static<typeof SafeVectorsInputSchema>;
|
||||
export type BlindSpotItem = Static<typeof BlindSpotItemSchema>;
|
||||
export type BlindSpotsInput = Static<typeof BlindSpotsInputSchema>;
|
||||
export type Pattern = z.infer<typeof PatternSchema>;
|
||||
export type FindingsSummaryInput = z.infer<typeof FindingsSummaryInputSchema>;
|
||||
export type SafeVectorInput = z.infer<typeof SafeVectorInputSchema>;
|
||||
export type SafeVectorsInput = z.infer<typeof SafeVectorsInputSchema>;
|
||||
export type BlindSpotItem = z.infer<typeof BlindSpotItemSchema>;
|
||||
export type BlindSpotsInput = z.infer<typeof BlindSpotsInputSchema>;
|
||||
|
||||
export type InjectionStrategicIntel = Static<typeof InjectionStrategicIntelSchema>;
|
||||
export type XssStrategicIntel = Static<typeof XssStrategicIntelSchema>;
|
||||
export type AuthStrategicIntel = Static<typeof AuthStrategicIntelSchema>;
|
||||
export type SsrfStrategicIntel = Static<typeof SsrfStrategicIntelSchema>;
|
||||
export type AuthzStrategicIntel = Static<typeof AuthzStrategicIntelSchema>;
|
||||
export type InjectionStrategicIntel = z.infer<typeof InjectionStrategicIntelSchema>;
|
||||
export type XssStrategicIntel = z.infer<typeof XssStrategicIntelSchema>;
|
||||
export type AuthStrategicIntel = z.infer<typeof AuthStrategicIntelSchema>;
|
||||
export type SsrfStrategicIntel = z.infer<typeof SsrfStrategicIntelSchema>;
|
||||
export type AuthzStrategicIntel = z.infer<typeof AuthzStrategicIntelSchema>;
|
||||
|
||||
// Discriminated by the agent class context — the renderer reads only the
|
||||
// sub-fields that apply to the active class.
|
||||
@@ -338,14 +363,12 @@ export type VulnCallStatus = Readonly<Record<VulnToolName, VulnToolStatus>>;
|
||||
interface ToolResult {
|
||||
[x: string]: unknown;
|
||||
content: Array<{ type: 'text'; text: string }>;
|
||||
details: Record<string, unknown>;
|
||||
isError: boolean;
|
||||
}
|
||||
|
||||
function createToolResult(response: { status: string; [key: string]: unknown }): ToolResult {
|
||||
return {
|
||||
content: [{ type: 'text' as const, text: JSON.stringify(response, null, 2) }],
|
||||
details: {},
|
||||
content: [{ type: 'text', text: JSON.stringify(response, null, 2) }],
|
||||
isError: response.status === 'error',
|
||||
};
|
||||
}
|
||||
@@ -359,11 +382,11 @@ function errorResult(message: string, errorType = 'ValidationError', retryable =
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// COLLECTOR FACTORY
|
||||
// SERVER FACTORY
|
||||
// ============================================================================
|
||||
|
||||
export interface VulnCollectorServer {
|
||||
tools: ToolDefinition[];
|
||||
server: McpSdkServerConfigWithInstance;
|
||||
getAll(): VulnCollectorData;
|
||||
getCallStatus(): VulnCallStatus;
|
||||
}
|
||||
@@ -384,76 +407,68 @@ export function createVulnCollector(vulnClass: VulnClass): VulnCollectorServer {
|
||||
);
|
||||
}
|
||||
|
||||
const setFindingsSummary = defineTool({
|
||||
name: 'set_findings_summary',
|
||||
label: 'Set Findings Summary',
|
||||
description:
|
||||
'Record the executive summary headline and the dominant vulnerability patterns observed across ' +
|
||||
const setFindingsSummary = tool(
|
||||
'set_findings_summary',
|
||||
'Record the executive summary headline and the dominant vulnerability patterns observed across ' +
|
||||
'your findings. Call exactly once before terminating. Becomes Section 1 (key outcome) and ' +
|
||||
'Section 2 (patterns) of the rendered deliverable — this is the load-bearing emission for the ' +
|
||||
'narrative .md and is required. Duplicate calls return "already called" and are no-ops. Empty ' +
|
||||
'patterns array is acceptable (renders as "No dominant patterns identified") but key_outcome ' +
|
||||
'is always required.',
|
||||
parameters: FindingsSummaryInputSchema,
|
||||
execute: async (_toolCallId, input): Promise<ToolResult> => {
|
||||
FindingsSummaryInputSchema.shape,
|
||||
async (input): Promise<ToolResult> => {
|
||||
if (state.findings_summary) return alreadyCalled('set_findings_summary');
|
||||
state.findings_summary = input;
|
||||
return successResult({ set: 'set_findings_summary' });
|
||||
},
|
||||
});
|
||||
);
|
||||
|
||||
const intelSchema = STRATEGIC_INTEL_SCHEMAS[vulnClass];
|
||||
const setStrategicIntelligence = defineTool({
|
||||
name: 'set_strategic_intelligence',
|
||||
label: 'Set Strategic Intelligence',
|
||||
description:
|
||||
`Record the environmental and defensive intelligence relevant to exploiting the ${vulnClass} ` +
|
||||
const setStrategicIntelligence = tool(
|
||||
'set_strategic_intelligence',
|
||||
`Record the environmental and defensive intelligence relevant to exploiting the ${vulnClass} ` +
|
||||
'findings. Call exactly once before terminating. Becomes Section 3 of the rendered deliverable ' +
|
||||
`and is the section the downstream exploit-${vulnClass} agent reads for strategic context. ` +
|
||||
'Required. Duplicate calls return "already called" and are no-ops. Write "Not applicable" as ' +
|
||||
'the field value when a sub-field does not apply to this run (rather than omitting).',
|
||||
parameters: intelSchema,
|
||||
execute: async (_toolCallId, input): Promise<ToolResult> => {
|
||||
intelSchema.shape,
|
||||
async (input): Promise<ToolResult> => {
|
||||
if (state.strategic_intelligence) return alreadyCalled('set_strategic_intelligence');
|
||||
state.strategic_intelligence = input as unknown as StrategicIntelligenceInput;
|
||||
return successResult({ set: 'set_strategic_intelligence' });
|
||||
},
|
||||
});
|
||||
);
|
||||
|
||||
const setSafeVectors = defineTool({
|
||||
name: 'set_safe_vectors',
|
||||
label: 'Set Safe Vectors',
|
||||
description:
|
||||
'Record the input vectors, components, or endpoints that were analyzed and confirmed to have ' +
|
||||
const setSafeVectors = tool(
|
||||
'set_safe_vectors',
|
||||
'Record the input vectors, components, or endpoints that were analyzed and confirmed to have ' +
|
||||
'robust, context-appropriate defenses. Call exactly once before terminating. Becomes Section 4 ' +
|
||||
'of the rendered deliverable. Recommended (empty array is acceptable on runs where no vectors ' +
|
||||
'were validated as safe, but explicit emission is preferred). The renderer sorts by ' +
|
||||
'(subject, location) before rendering, so emission order does not affect output. Duplicate ' +
|
||||
'calls return "already called" and are no-ops.',
|
||||
parameters: SafeVectorsInputSchema,
|
||||
execute: async (_toolCallId, input): Promise<ToolResult> => {
|
||||
SafeVectorsInputSchema.shape,
|
||||
async (input): Promise<ToolResult> => {
|
||||
if (state.safe_vectors) return alreadyCalled('set_safe_vectors');
|
||||
state.safe_vectors = input;
|
||||
return successResult({ set: 'set_safe_vectors', count: input.vectors.length });
|
||||
},
|
||||
});
|
||||
);
|
||||
|
||||
const setBlindSpots = defineTool({
|
||||
name: 'set_blind_spots',
|
||||
label: 'Set Blind Spots',
|
||||
description:
|
||||
'Record analysis constraints, untraced code paths, or other coverage gaps. Call exactly once ' +
|
||||
const setBlindSpots = tool(
|
||||
'set_blind_spots',
|
||||
'Record analysis constraints, untraced code paths, or other coverage gaps. Call exactly once ' +
|
||||
'before terminating. Becomes Section 5 of the rendered deliverable. Recommended (empty array ' +
|
||||
'is acceptable on high-coverage runs, but explicit emission is preferred — readers expect ' +
|
||||
'either documented gaps or an explicit "no gaps" signal). Duplicate calls return "already ' +
|
||||
'called" and are no-ops.',
|
||||
parameters: BlindSpotsInputSchema,
|
||||
execute: async (_toolCallId, input): Promise<ToolResult> => {
|
||||
BlindSpotsInputSchema.shape,
|
||||
async (input): Promise<ToolResult> => {
|
||||
if (state.blind_spots) return alreadyCalled('set_blind_spots');
|
||||
state.blind_spots = input;
|
||||
return successResult({ set: 'set_blind_spots', count: input.items.length });
|
||||
},
|
||||
});
|
||||
);
|
||||
|
||||
// set_blind_spots is withheld from classes without a Section 5 (auth, ssrf).
|
||||
const tools = [
|
||||
@@ -463,6 +478,12 @@ export function createVulnCollector(vulnClass: VulnClass): VulnCollectorServer {
|
||||
...(BLIND_SPOTS_CLASSES.has(vulnClass) ? [setBlindSpots] : []),
|
||||
];
|
||||
|
||||
const server: McpSdkServerConfigWithInstance = createSdkMcpServer({
|
||||
name: 'vuln-collector',
|
||||
version: '1.0.0',
|
||||
tools,
|
||||
});
|
||||
|
||||
function statusOf<K extends VulnToolName>(key: K): VulnToolStatus {
|
||||
const flagMap: Record<VulnToolName, unknown> = {
|
||||
set_findings_summary: state.findings_summary,
|
||||
@@ -474,7 +495,7 @@ export function createVulnCollector(vulnClass: VulnClass): VulnCollectorServer {
|
||||
}
|
||||
|
||||
return {
|
||||
tools: tools as ToolDefinition[],
|
||||
server,
|
||||
getAll: (): VulnCollectorData => ({
|
||||
...(state.findings_summary && { findings_summary: state.findings_summary }),
|
||||
...(state.strategic_intelligence && { strategic_intelligence: state.strategic_intelligence }),
|
||||
|
||||
@@ -1,7 +1,6 @@
|
||||
/** Centralized path constants for the worker package */
|
||||
|
||||
import fs from 'node:fs';
|
||||
import os from 'node:os';
|
||||
import path from 'node:path';
|
||||
|
||||
/** Worker package root (apps/worker/) resolved from compiled dist/ files */
|
||||
@@ -10,11 +9,6 @@ const WORKER_ROOT = path.resolve(import.meta.dirname, '..');
|
||||
export const PROMPTS_DIR = path.join(WORKER_ROOT, 'prompts');
|
||||
export const CONFIGS_DIR = path.join(WORKER_ROOT, 'configs');
|
||||
|
||||
export const PLAYWRIGHT_SKILL_DIR = path.join(os.homedir(), '.claude', 'skills', 'playwright-cli');
|
||||
|
||||
/** Compiled pi extension dir that enforces bounded `bash` timeouts (resolved from dist/) */
|
||||
export const BASH_TIMEOUT_EXTENSION_DIR = path.join(import.meta.dirname, 'ai', 'extensions', 'bash-timeout');
|
||||
|
||||
/** Default deliverables subdirectory relative to repoPath */
|
||||
export const DEFAULT_DELIVERABLES_SUBDIR = '.shannon/deliverables';
|
||||
|
||||
|
||||
@@ -12,7 +12,7 @@
|
||||
* - Load prompt template using AGENTS[agentName].promptTemplate
|
||||
* - Create git checkpoint
|
||||
* - Start audit logging
|
||||
* - Invoke the pi agent via runPiPrompt
|
||||
* - Invoke Claude SDK via runClaudePrompt
|
||||
* - Spending cap check using isSpendingCapBehavior
|
||||
* - Handle failure (rollback, audit)
|
||||
* - Validate output using AGENTS[agentName].deliverableFilename
|
||||
@@ -23,8 +23,8 @@
|
||||
*/
|
||||
|
||||
import { fs, path } from 'zx';
|
||||
import { type PiPromptResult, runPiPrompt, validateAgentOutput } from '../ai/pi-executor.js';
|
||||
import { createQueueSubmitTool, getQueueFilename } from '../ai/queue-schemas.js';
|
||||
import { type ClaudePromptResult, runClaudePrompt, validateAgentOutput } from '../ai/claude-executor.js';
|
||||
import { getOutputFormat, getQueueFilename } from '../ai/queue-schemas.js';
|
||||
import type { AuditSession } from '../audit/index.js';
|
||||
import { authStateFile } from '../audit/utils.js';
|
||||
import { AGENTS } from '../session-manager.js';
|
||||
@@ -55,14 +55,14 @@ export interface AgentExecutionInput {
|
||||
apiKey?: string | undefined;
|
||||
promptDir?: string | undefined;
|
||||
providerConfig?: import('../types/config.js').ProviderConfig | undefined;
|
||||
customTools?: import('@earendil-works/pi-coding-agent').ToolDefinition[];
|
||||
mcpServers?: Record<string, import('@anthropic-ai/claude-agent-sdk').McpServerConfig>;
|
||||
// Renders the deliverable to disk; invoked after validation, before the success commit.
|
||||
writeDeliverable?: (deliverablesPath: string) => Promise<void>;
|
||||
}
|
||||
|
||||
interface FailAgentOpts {
|
||||
attemptNumber: number;
|
||||
result: PiPromptResult;
|
||||
result: ClaudePromptResult;
|
||||
rollbackReason: string;
|
||||
errorMessage: string;
|
||||
errorCode: ErrorCode;
|
||||
@@ -112,7 +112,7 @@ export class AgentExecutionService {
|
||||
apiKey,
|
||||
promptDir,
|
||||
providerConfig,
|
||||
customTools,
|
||||
mcpServers,
|
||||
writeDeliverable,
|
||||
} = input;
|
||||
|
||||
@@ -167,11 +167,9 @@ export class AgentExecutionService {
|
||||
// 4. Start audit logging
|
||||
await auditSession.startAgent(agentName, prompt, attemptNumber);
|
||||
|
||||
// 5. Execute agent. Vuln agents get a submit tool that captures the structured
|
||||
// exploitation queue (pi has no JSON-schema output format).
|
||||
const submitTool = createQueueSubmitTool(agentName, distributedConfig?.exploit ?? true);
|
||||
const callerTools = [...(customTools ?? []), ...(submitTool ? [submitTool.tool] : [])];
|
||||
const result: PiPromptResult = await runPiPrompt(
|
||||
// 5. Execute agent
|
||||
const outputFormat = getOutputFormat(agentName, distributedConfig?.exploit ?? true);
|
||||
const result: ClaudePromptResult = await runClaudePrompt(
|
||||
prompt,
|
||||
repoPath,
|
||||
'', // context
|
||||
@@ -180,10 +178,11 @@ export class AgentExecutionService {
|
||||
auditSession,
|
||||
logger,
|
||||
AGENTS[agentName].modelTier,
|
||||
callerTools,
|
||||
outputFormat,
|
||||
apiKey,
|
||||
path.relative(repoPath, deliverablesPath),
|
||||
providerConfig,
|
||||
mcpServers,
|
||||
);
|
||||
|
||||
// 6. Spending cap check - defense-in-depth
|
||||
@@ -217,17 +216,13 @@ export class AgentExecutionService {
|
||||
});
|
||||
}
|
||||
|
||||
// 8. Write structured output to disk (vuln agents only) from the submit-tool capture
|
||||
// 8. Write structured output to disk (vuln agents only)
|
||||
const queueFilename = getQueueFilename(agentName);
|
||||
if (submitTool && queueFilename) {
|
||||
const captured = submitTool.getCaptured();
|
||||
if (captured !== undefined) {
|
||||
result.structuredOutput = captured; // carry for the validation gate below
|
||||
await fs.ensureDir(deliverablesPath);
|
||||
const queuePath = path.join(deliverablesPath, queueFilename);
|
||||
await fs.writeFile(queuePath, JSON.stringify(captured, null, 2), 'utf8');
|
||||
logger.info(`Wrote structured output queue to ${queueFilename}`);
|
||||
}
|
||||
if (result.structuredOutput !== undefined && queueFilename) {
|
||||
await fs.ensureDir(deliverablesPath);
|
||||
const queuePath = path.join(deliverablesPath, queueFilename);
|
||||
await fs.writeFile(queuePath, JSON.stringify(result.structuredOutput, null, 2), 'utf8');
|
||||
logger.info(`Wrote structured output queue to ${queueFilename}`);
|
||||
}
|
||||
|
||||
// 9. Validate output
|
||||
@@ -318,10 +313,10 @@ export class AgentExecutionService {
|
||||
/**
|
||||
* Convert AgentEndResult to AgentMetrics for workflow state.
|
||||
*/
|
||||
static toMetrics(endResult: AgentEndResult, result: PiPromptResult): AgentMetrics {
|
||||
static toMetrics(endResult: AgentEndResult, result: ClaudePromptResult): AgentMetrics {
|
||||
return {
|
||||
durationMs: endResult.duration_ms,
|
||||
inputTokens: null, // Not currently exposed by the pi executor
|
||||
inputTokens: null, // Not currently exposed by SDK wrapper
|
||||
outputTokens: null,
|
||||
costUsd: endResult.cost_usd,
|
||||
numTurns: result.turns ?? null,
|
||||
|
||||
@@ -62,7 +62,7 @@ const RETRYABLE_PATTERNS = [
|
||||
'internal server error',
|
||||
'service unavailable',
|
||||
'bad gateway',
|
||||
// Provider API errors
|
||||
// Claude API errors
|
||||
'model unavailable',
|
||||
'service temporarily unavailable',
|
||||
'api error',
|
||||
@@ -160,7 +160,7 @@ function classifyByErrorCode(code: ErrorCode, retryableFromError: boolean): { ty
|
||||
*
|
||||
* Classification priority:
|
||||
* 1. If error is PentestError with ErrorCode, classify by code (reliable)
|
||||
* 2. Fall through to string matching for external errors (provider, network, etc.)
|
||||
* 2. Fall through to string matching for external errors (SDK, network, etc.)
|
||||
*/
|
||||
export function classifyErrorForTemporal(error: unknown): { type: string; retryable: boolean } {
|
||||
// === CODE-BASED CLASSIFICATION (Preferred for internal errors) ===
|
||||
|
||||
@@ -9,7 +9,7 @@
|
||||
*
|
||||
* Used when exploit=false: the exploit agents didn't run, so there is no
|
||||
* `*_exploitation_evidence.md` to concatenate into the report. This module
|
||||
* reads each `*_exploitation_queue.json` (already validated by the submit tool against the
|
||||
* reads each `*_exploitation_queue.json` (already SDK-validated against the
|
||||
* schemas in ../ai/queue-schemas.ts) and writes a `*_findings.md` per class
|
||||
* in the canonical body shape that report-executive.txt's cleanup expects.
|
||||
*
|
||||
|
||||
@@ -11,8 +11,8 @@
|
||||
* Services are pure domain logic with no Temporal dependencies.
|
||||
*/
|
||||
|
||||
export type { PiPromptResult } from '../ai/pi-executor.js';
|
||||
export { runPiPrompt } from '../ai/pi-executor.js';
|
||||
export type { ClaudePromptResult } from '../ai/claude-executor.js';
|
||||
export { runClaudePrompt } from '../ai/claude-executor.js';
|
||||
export type { AgentExecutionInput } from './agent-execution.js';
|
||||
export { AgentExecutionService } from './agent-execution.js';
|
||||
export { ConfigLoaderService } from './config-loader.js';
|
||||
|
||||
@@ -12,10 +12,10 @@
|
||||
* time and API costs compared to failing mid-pipeline.
|
||||
*
|
||||
* Checks run sequentially, cheapest first:
|
||||
* 1. Repository path exists and contains .git
|
||||
* 1. Repository path exists and is a directory
|
||||
* 2. Config file parses and validates (if provided)
|
||||
* 3. code_path rules match real entries in the repo (filesystem only)
|
||||
* 4. Credentials validate via a minimal pi session (API key, OAuth, or Bedrock)
|
||||
* 4. Credentials validate via Claude Agent SDK query (API key, OAuth, Bedrock, or Vertex AI)
|
||||
* 5. Target URL resolves, is not link-local (cloud metadata), and is reachable (DNS + HTTP)
|
||||
*/
|
||||
|
||||
@@ -25,23 +25,16 @@ import fs from 'node:fs/promises';
|
||||
import http from 'node:http';
|
||||
import https from 'node:https';
|
||||
import net, { type LookupFunction } from 'node:net';
|
||||
import os from 'node:os';
|
||||
import {
|
||||
AuthStorage,
|
||||
createAgentSession,
|
||||
ModelRegistry,
|
||||
SessionManager,
|
||||
SettingsManager,
|
||||
} from '@earendil-works/pi-coding-agent';
|
||||
import type { SDKAssistantMessageError } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { query } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { glob } from 'zx';
|
||||
import { resolveEffectiveProvider, resolveModelId } from '../ai/models.js';
|
||||
import { resolveModel } from '../ai/models.js';
|
||||
import { parseConfig } from '../config-parser.js';
|
||||
import type { ActivityLogger } from '../types/activity-logger.js';
|
||||
import type { Config, Rule } from '../types/config.js';
|
||||
import { ErrorCode } from '../types/errors.js';
|
||||
import { err, isErr, ok, type Result } from '../types/result.js';
|
||||
import { matchesBillingTextPattern } from '../utils/billing-detection.js';
|
||||
import { PentestError } from './error-handling.js';
|
||||
import { err, ok, type Result } from '../types/result.js';
|
||||
import { isRetryableError, PentestError } from './error-handling.js';
|
||||
|
||||
const TARGET_URL_TIMEOUT_MS = 10_000;
|
||||
|
||||
@@ -85,14 +78,12 @@ function pinnedLookup(addresses: LookupAddress[]): LookupFunction {
|
||||
|
||||
// === Repository Validation ===
|
||||
|
||||
async function validateRepo(
|
||||
repoPath: string,
|
||||
logger: ActivityLogger,
|
||||
skipGitCheck?: boolean,
|
||||
): Promise<Result<void, PentestError>> {
|
||||
async function validateRepo(repoPath: string, logger: ActivityLogger): Promise<Result<void, PentestError>> {
|
||||
logger.info('Checking repository path...', { repoPath });
|
||||
|
||||
// 1. Check repo directory exists
|
||||
// Check repo directory exists. The repo is not required to be a git repository:
|
||||
// multi-repo targets (a parent directory containing several repos) have no top-level
|
||||
// .git, and git-based checkpoint/rollback in git-manager already no-ops on non-git dirs.
|
||||
try {
|
||||
const stats = await fs.stat(repoPath);
|
||||
if (!stats.isDirectory()) {
|
||||
@@ -118,36 +109,6 @@ async function validateRepo(
|
||||
);
|
||||
}
|
||||
|
||||
// 2. Check .git directory exists (skipped when consumer removes .git after clone)
|
||||
if (!skipGitCheck) {
|
||||
try {
|
||||
const gitStats = await fs.stat(`${repoPath}/.git`);
|
||||
if (!gitStats.isDirectory()) {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Not a git repository (no .git directory): ${repoPath}`,
|
||||
'config',
|
||||
false,
|
||||
{ repoPath },
|
||||
ErrorCode.REPO_NOT_FOUND,
|
||||
),
|
||||
);
|
||||
}
|
||||
} catch {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Not a git repository (no .git directory): ${repoPath}`,
|
||||
'config',
|
||||
false,
|
||||
{ repoPath },
|
||||
ErrorCode.REPO_NOT_FOUND,
|
||||
),
|
||||
);
|
||||
}
|
||||
} else {
|
||||
logger.info('Skipping .git check (skipGitCheck enabled)');
|
||||
}
|
||||
|
||||
logger.info('Repository path OK');
|
||||
return ok(undefined);
|
||||
}
|
||||
@@ -247,119 +208,93 @@ async function validateCodePathsExist(
|
||||
|
||||
// === Credential Validation ===
|
||||
|
||||
/** Map provider error text to a human-readable preflight PentestError. */
|
||||
/** Classify a provider error message (thrown or from a failed turn) into a PentestError. */
|
||||
function classifyCredentialError(text: string, authType: string): Result<void, PentestError> {
|
||||
const lower = text.toLowerCase();
|
||||
if (matchesBillingTextPattern(text)) {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Anthropic account has a billing or rate-limit issue during ${authType} validation. Add credits or wait and retry.`,
|
||||
'billing',
|
||||
true,
|
||||
{ authType },
|
||||
ErrorCode.BILLING_ERROR,
|
||||
),
|
||||
);
|
||||
/** Map SDK error type to a human-readable preflight PentestError. */
|
||||
function classifySdkError(sdkError: SDKAssistantMessageError, authType: string): Result<void, PentestError> {
|
||||
switch (sdkError) {
|
||||
case 'authentication_failed':
|
||||
return err(
|
||||
new PentestError(
|
||||
`Invalid ${authType}. Check your credentials in .env and try again.`,
|
||||
'config',
|
||||
false,
|
||||
{ authType, sdkError },
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
case 'billing_error':
|
||||
return err(
|
||||
new PentestError(
|
||||
`Anthropic account has a billing issue. Add credits or check your billing dashboard.`,
|
||||
'billing',
|
||||
true,
|
||||
{ authType, sdkError },
|
||||
ErrorCode.BILLING_ERROR,
|
||||
),
|
||||
);
|
||||
case 'rate_limit':
|
||||
return err(
|
||||
new PentestError(
|
||||
`Anthropic rate limit or spending cap reached. Wait a few minutes and try again.`,
|
||||
'billing',
|
||||
true,
|
||||
{ authType, sdkError },
|
||||
ErrorCode.BILLING_ERROR,
|
||||
),
|
||||
);
|
||||
case 'server_error':
|
||||
return err(
|
||||
new PentestError(`Anthropic API is temporarily unavailable. Try again shortly.`, 'network', true, {
|
||||
authType,
|
||||
sdkError,
|
||||
}),
|
||||
);
|
||||
case 'overloaded':
|
||||
return err(
|
||||
new PentestError(`Anthropic API is overloaded. Wait a few moments and try again.`, 'network', true, {
|
||||
authType,
|
||||
sdkError,
|
||||
}),
|
||||
);
|
||||
case 'model_not_found':
|
||||
return err(
|
||||
new PentestError(
|
||||
`Configured model is not available for this account. Check ANTHROPIC_*_MODEL in .env.`,
|
||||
'config',
|
||||
false,
|
||||
{ authType, sdkError },
|
||||
),
|
||||
);
|
||||
case 'oauth_org_not_allowed':
|
||||
return err(
|
||||
new PentestError(
|
||||
`This credential's organization is not allowed. Check your ${authType} in .env.`,
|
||||
'config',
|
||||
false,
|
||||
{ authType, sdkError },
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
default:
|
||||
return err(
|
||||
new PentestError(
|
||||
`${authType} validation failed unexpectedly. Check your credentials in .env.`,
|
||||
'config',
|
||||
false,
|
||||
{ authType, sdkError },
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
if (/401|403|invalid[ _-]?api[ _-]?key|unauthorized|authentication|forbidden|not allowed|x-api-key/.test(lower)) {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Invalid ${authType}. Check your credentials in .env and try again.`,
|
||||
'config',
|
||||
false,
|
||||
{ authType },
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
if (/model/.test(lower) && /not found|not available|unknown/.test(lower)) {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Configured model is not available for this account. Check ANTHROPIC_*_MODEL in .env.`,
|
||||
'config',
|
||||
false,
|
||||
{ authType },
|
||||
),
|
||||
);
|
||||
}
|
||||
if (
|
||||
/network|timeout|enotfound|econnrefused|fetch failed|getaddrinfo|socket|overloaded|unavailable|50\d/.test(lower)
|
||||
) {
|
||||
return err(
|
||||
new PentestError(`Anthropic API unreachable or temporarily unavailable. Try again shortly.`, 'network', true, {
|
||||
authType,
|
||||
}),
|
||||
);
|
||||
}
|
||||
return err(
|
||||
new PentestError(
|
||||
`${authType} validation failed: ${text.slice(0, 150)}`,
|
||||
'config',
|
||||
false,
|
||||
{ authType },
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
/** Minimal pi session probe to validate credentials. An optional baseUrl overrides the endpoint. */
|
||||
async function probeCredentialsWithPi(
|
||||
authType: string,
|
||||
token?: string,
|
||||
baseUrl?: string,
|
||||
): Promise<Result<void, PentestError>> {
|
||||
const authStorage = AuthStorage.inMemory();
|
||||
if (token) authStorage.setRuntimeApiKey('anthropic', token);
|
||||
|
||||
const baseModel = ModelRegistry.create(authStorage).find('anthropic', resolveModelId('small'));
|
||||
if (!baseModel) {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Model not found in pi registry: ${resolveModelId('small')}`,
|
||||
'config',
|
||||
false,
|
||||
{},
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
const model = baseUrl ? { ...baseModel, baseUrl } : baseModel;
|
||||
|
||||
let errText: string | undefined;
|
||||
try {
|
||||
const { session } = await createAgentSession({
|
||||
cwd: os.tmpdir(),
|
||||
model,
|
||||
thinkingLevel: 'off',
|
||||
noTools: 'all',
|
||||
authStorage,
|
||||
sessionManager: SessionManager.inMemory(),
|
||||
settingsManager: SettingsManager.inMemory({ retry: { enabled: false }, compaction: { enabled: false } }),
|
||||
});
|
||||
session.subscribe((e) => {
|
||||
if (e.type === 'turn_end' && e.message.role === 'assistant' && e.message.stopReason === 'error') {
|
||||
errText = e.message.errorMessage ?? 'unknown provider error';
|
||||
}
|
||||
});
|
||||
await session.prompt('hi');
|
||||
session.dispose();
|
||||
} catch (error) {
|
||||
errText = error instanceof Error ? error.message : String(error);
|
||||
}
|
||||
|
||||
if (errText) return classifyCredentialError(errText, authType);
|
||||
return ok(undefined);
|
||||
}
|
||||
|
||||
/** Validate credentials via a minimal pi session. */
|
||||
/** Validate credentials via a minimal Claude Agent SDK query. */
|
||||
async function validateCredentials(
|
||||
logger: ActivityLogger,
|
||||
apiKey?: string,
|
||||
providerConfig?: import('../types/config.js').ProviderConfig,
|
||||
): Promise<Result<void, PentestError>> {
|
||||
// 0. If providerConfig is present, credentials are managed by the caller.
|
||||
// The executor/provider layer owns providerConfig resolution — no env preflight needed.
|
||||
// The executor will map providerConfig directly to sdkEnv — no process.env needed.
|
||||
if (providerConfig) {
|
||||
logger.info(
|
||||
`Provider config present (type: ${providerConfig.providerType || 'anthropic_api'}) — skipping env-based credential validation`,
|
||||
@@ -367,19 +302,44 @@ async function validateCredentials(
|
||||
return ok(undefined);
|
||||
}
|
||||
|
||||
// 0b. If apiKey provided via config, set it in env for pi validation
|
||||
// 0b. If apiKey provided via config, set it in env for SDK validation
|
||||
// This avoids requiring process.env.ANTHROPIC_API_KEY when key is threaded via input
|
||||
if (apiKey) {
|
||||
process.env.ANTHROPIC_API_KEY = apiKey;
|
||||
}
|
||||
// 1. Custom base URL — validate endpoint is reachable via SDK query
|
||||
if (process.env.ANTHROPIC_BASE_URL && process.env.ANTHROPIC_AUTH_TOKEN) {
|
||||
const baseUrl = process.env.ANTHROPIC_BASE_URL;
|
||||
logger.info('Validating custom base URL');
|
||||
|
||||
// Resolve the active provider through the same precedence the executor uses, so
|
||||
// preflight validates exactly the credentials the run will use (no drift).
|
||||
const eff = resolveEffectiveProvider(apiKey);
|
||||
try {
|
||||
for await (const message of query({ prompt: 'hi', options: { model: resolveModel('small'), maxTurns: 1 } })) {
|
||||
if (message.type === 'assistant' && message.error) {
|
||||
return classifySdkError(message.error, `custom endpoint (${baseUrl})`);
|
||||
}
|
||||
if (message.type === 'result') {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// 1. Bedrock mode — validate required AWS credentials are present (pi-ai owns the
|
||||
// live AWS auth, so there is no cheap session probe here)
|
||||
if (eff.providerId === 'amazon-bedrock') {
|
||||
logger.info('Custom base URL OK');
|
||||
return ok(undefined);
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
return err(
|
||||
new PentestError(
|
||||
`Custom base URL unreachable: ${baseUrl} — ${message}`,
|
||||
'network',
|
||||
false,
|
||||
{ baseUrl },
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// 2. Bedrock mode — validate required AWS credentials are present
|
||||
if (process.env.CLAUDE_CODE_USE_BEDROCK === '1') {
|
||||
const required = [
|
||||
'AWS_REGION',
|
||||
'AWS_BEARER_TOKEN_BEDROCK',
|
||||
@@ -403,20 +363,62 @@ async function validateCredentials(
|
||||
return ok(undefined);
|
||||
}
|
||||
|
||||
// 2. Custom base URL — validate the endpoint via a minimal pi session
|
||||
if (eff.baseUrl) {
|
||||
logger.info('Validating custom base URL');
|
||||
const probe = await probeCredentialsWithPi(`custom endpoint (${eff.baseUrl})`, eff.anthropicToken, eff.baseUrl);
|
||||
if (isErr(probe)) return probe;
|
||||
logger.info('Custom base URL OK');
|
||||
// 3. Vertex AI mode — validate required GCP credentials are present
|
||||
if (process.env.CLAUDE_CODE_USE_VERTEX === '1') {
|
||||
const required = [
|
||||
'CLOUD_ML_REGION',
|
||||
'ANTHROPIC_VERTEX_PROJECT_ID',
|
||||
'ANTHROPIC_SMALL_MODEL',
|
||||
'ANTHROPIC_MEDIUM_MODEL',
|
||||
'ANTHROPIC_LARGE_MODEL',
|
||||
];
|
||||
const missing = required.filter((v) => !process.env[v]);
|
||||
if (missing.length > 0) {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Vertex AI mode requires the following env vars in .env: ${missing.join(', ')}`,
|
||||
'config',
|
||||
false,
|
||||
{ missing },
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
// Validate service account credentials file is accessible
|
||||
const credPath = process.env.GOOGLE_APPLICATION_CREDENTIALS;
|
||||
if (!credPath) {
|
||||
return err(
|
||||
new PentestError(
|
||||
'Vertex AI mode requires GOOGLE_APPLICATION_CREDENTIALS pointing to a service account key JSON file',
|
||||
'config',
|
||||
false,
|
||||
{},
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
try {
|
||||
await fs.access(credPath);
|
||||
} catch {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Service account key file not found at: ${credPath}`,
|
||||
'config',
|
||||
false,
|
||||
{ credPath },
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
logger.info('Vertex AI credentials OK');
|
||||
return ok(undefined);
|
||||
}
|
||||
|
||||
// 3. Direct Anthropic — require a credential, then validate via a minimal pi session
|
||||
if (!eff.anthropicToken) {
|
||||
// 4. Check that at least one credential is present
|
||||
if (!process.env.ANTHROPIC_API_KEY && !process.env.CLAUDE_CODE_OAUTH_TOKEN && !process.env.ANTHROPIC_AUTH_TOKEN) {
|
||||
return err(
|
||||
new PentestError(
|
||||
'No API credentials found. Set ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN in .env (or use CLAUDE_CODE_USE_BEDROCK=1 for AWS Bedrock)',
|
||||
'No API credentials found. Set ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN in .env (or use CLAUDE_CODE_USE_BEDROCK=1 for AWS Bedrock, or CLAUDE_CODE_USE_VERTEX=1 for Google Vertex AI)',
|
||||
'config',
|
||||
false,
|
||||
{},
|
||||
@@ -425,13 +427,38 @@ async function validateCredentials(
|
||||
);
|
||||
}
|
||||
|
||||
const usingApiKey = Boolean(apiKey ?? process.env.ANTHROPIC_API_KEY);
|
||||
const authType = usingApiKey ? 'API key' : 'OAuth token';
|
||||
logger.info(`Validating ${authType} via pi...`);
|
||||
const probe = await probeCredentialsWithPi(authType, eff.anthropicToken);
|
||||
if (isErr(probe)) return probe;
|
||||
logger.info(`${authType} OK`);
|
||||
return ok(undefined);
|
||||
// 5. Validate via SDK query
|
||||
const authType = process.env.CLAUDE_CODE_OAUTH_TOKEN ? 'OAuth token' : 'API key';
|
||||
logger.info(`Validating ${authType} via SDK...`);
|
||||
|
||||
try {
|
||||
for await (const message of query({ prompt: 'hi', options: { model: resolveModel('small'), maxTurns: 1 } })) {
|
||||
if (message.type === 'assistant' && message.error) {
|
||||
return classifySdkError(message.error, authType);
|
||||
}
|
||||
if (message.type === 'result') {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
logger.info(`${authType} OK`);
|
||||
return ok(undefined);
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
const retryable = isRetryableError(error instanceof Error ? error : new Error(message));
|
||||
|
||||
return err(
|
||||
new PentestError(
|
||||
retryable
|
||||
? `Failed to reach Anthropic API. Check your network connection.`
|
||||
: `${authType} validation failed: ${message}`,
|
||||
retryable ? 'network' : 'config',
|
||||
retryable,
|
||||
{ authType },
|
||||
retryable ? undefined : ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// === Target URL Validation ===
|
||||
@@ -559,10 +586,10 @@ async function validateTargetUrl(targetUrl: string, logger: ActivityLogger): Pro
|
||||
/**
|
||||
* Run all preflight checks sequentially (cheapest first).
|
||||
*
|
||||
* 1. Repository path exists and contains .git
|
||||
* 1. Repository path exists and is a directory
|
||||
* 2. Config file parses and validates (if configPath provided)
|
||||
* 3. code_path rules match at least one entry in the repo (skipped without config)
|
||||
* 4. Credentials validate (API key, OAuth, or Bedrock)
|
||||
* 4. Credentials validate (API key, OAuth, Bedrock, or Vertex AI)
|
||||
* 5. Target URL is reachable from the container
|
||||
*
|
||||
* Returns on first failure.
|
||||
@@ -572,12 +599,11 @@ export async function runPreflightChecks(
|
||||
repoPath: string,
|
||||
configPath: string | undefined,
|
||||
logger: ActivityLogger,
|
||||
skipGitCheck?: boolean,
|
||||
apiKey?: string,
|
||||
providerConfig?: import('../types/config.js').ProviderConfig,
|
||||
): Promise<Result<void, PentestError>> {
|
||||
// 1. Repository check (free — filesystem only)
|
||||
const repoResult = await validateRepo(repoPath, logger, skipGitCheck);
|
||||
const repoResult = await validateRepo(repoPath, logger);
|
||||
if (!repoResult.ok) {
|
||||
return repoResult;
|
||||
}
|
||||
@@ -601,7 +627,7 @@ export async function runPreflightChecks(
|
||||
}
|
||||
}
|
||||
|
||||
// 4. Credential check (cheap — 1 pi round-trip, skipped when providerConfig present)
|
||||
// 4. Credential check (cheap — 1 SDK round-trip, skipped when providerConfig present)
|
||||
const credResult = await validateCredentials(logger, apiKey, providerConfig);
|
||||
if (!credResult.ok) {
|
||||
return credResult;
|
||||
|
||||
@@ -13,9 +13,9 @@
|
||||
*/
|
||||
|
||||
import { readFile, rm } from 'node:fs/promises';
|
||||
import { defineTool, type ToolDefinition } from '@earendil-works/pi-coding-agent';
|
||||
import { Type } from 'typebox';
|
||||
import { runPiPrompt } from '../ai/pi-executor.js';
|
||||
import type { JsonSchemaOutputFormat } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { z } from 'zod';
|
||||
import { runClaudePrompt } from '../ai/claude-executor.js';
|
||||
import type { AuditSession } from '../audit/index.js';
|
||||
import { authStateFile } from '../audit/utils.js';
|
||||
import type { ActivityLogger } from '../types/activity-logger.js';
|
||||
@@ -33,38 +33,26 @@ function isAuthFailurePoint(v: unknown): v is AuthFailurePoint {
|
||||
return typeof v === 'string' && (FAILURE_POINTS as readonly string[]).includes(v);
|
||||
}
|
||||
|
||||
interface AuthValidationVerdict {
|
||||
login_success: boolean;
|
||||
failure_point?: AuthFailurePoint;
|
||||
failure_detail?: string;
|
||||
}
|
||||
// NOTE: SDK's AJV validator expects draft-07; Zod defaults to draft-2020-12,
|
||||
// which causes the SDK to silently skip structured output.
|
||||
const AuthValidationSchema = z.object({
|
||||
login_success: z.boolean(),
|
||||
failure_point: z.enum(FAILURE_POINTS).optional(),
|
||||
failure_detail: z
|
||||
.string()
|
||||
.max(250)
|
||||
.optional()
|
||||
.describe(
|
||||
'Free-form 1-2 sentence diagnostic of what the page showed (error messages, page state) when login failed. Required when login_success is false. Mask any sensitive values.',
|
||||
),
|
||||
});
|
||||
|
||||
/** Submit tool capturing the login verdict (pi has no JSON-schema output format). */
|
||||
function createAuthSubmitTool(): { tool: ToolDefinition; getCaptured: () => AuthValidationVerdict | undefined } {
|
||||
let captured: AuthValidationVerdict | undefined;
|
||||
const tool = defineTool({
|
||||
name: 'submit_auth_result',
|
||||
label: 'Submit Auth Result',
|
||||
description: 'Report the login outcome. Call exactly once when the login attempt has concluded.',
|
||||
parameters: Type.Object({
|
||||
login_success: Type.Boolean(),
|
||||
failure_point: Type.Optional(
|
||||
Type.Union([Type.Literal('username_or_password'), Type.Literal('totp_secret'), Type.Literal('out_of_band')]),
|
||||
),
|
||||
failure_detail: Type.Optional(
|
||||
Type.String({
|
||||
description:
|
||||
'Free-form 1-2 sentence diagnostic of what the page showed (error messages, page state) when login failed. Required when login_success is false. Mask any sensitive values.',
|
||||
}),
|
||||
),
|
||||
}),
|
||||
execute: async (_toolCallId, params) => {
|
||||
captured = params as AuthValidationVerdict;
|
||||
return { content: [{ type: 'text' as const, text: 'Auth result recorded.' }], details: {} };
|
||||
},
|
||||
});
|
||||
return { tool, getCaptured: () => captured };
|
||||
}
|
||||
type AuthValidationVerdict = z.infer<typeof AuthValidationSchema>;
|
||||
|
||||
const VALIDATION_SCHEMA: JsonSchemaOutputFormat = {
|
||||
type: 'json_schema',
|
||||
schema: z.toJSONSchema(AuthValidationSchema, { target: 'draft-07' }) as Record<string, unknown>,
|
||||
};
|
||||
|
||||
const AGENT_NAME = 'validate-authentication';
|
||||
|
||||
@@ -122,8 +110,7 @@ export async function validateAuthentication(input: ValidateAuthInput): Promise<
|
||||
await auditSession.startAgent(AGENT_NAME, prompt, attemptNumber);
|
||||
const startTime = Date.now();
|
||||
|
||||
const submit = createAuthSubmitTool();
|
||||
const result = await runPiPrompt(
|
||||
const result = await runClaudePrompt(
|
||||
prompt,
|
||||
repoPath,
|
||||
'',
|
||||
@@ -132,13 +119,11 @@ export async function validateAuthentication(input: ValidateAuthInput): Promise<
|
||||
auditSession,
|
||||
logger,
|
||||
'medium',
|
||||
[submit.tool],
|
||||
VALIDATION_SCHEMA,
|
||||
apiKey,
|
||||
deliverablesSubdir,
|
||||
providerConfig,
|
||||
);
|
||||
const verdict = submit.getCaptured();
|
||||
if (verdict !== undefined) result.structuredOutput = verdict;
|
||||
|
||||
let classification = classifyResult(result, authentication);
|
||||
|
||||
@@ -219,7 +204,7 @@ function countStorageEntries(parsed: unknown, key: 'cookies' | 'origins'): numbe
|
||||
}
|
||||
|
||||
function classifyResult(
|
||||
result: import('../ai/pi-executor.js').PiPromptResult,
|
||||
result: import('../ai/claude-executor.js').ClaudePromptResult,
|
||||
authentication: NonNullable<DistributedConfig['authentication']>,
|
||||
): Result<void, PentestError> {
|
||||
if (!result.success) {
|
||||
|
||||
@@ -130,8 +130,8 @@ export const AGENT_PHASE_MAP: Readonly<Record<AgentName, PhaseName>> = Object.fr
|
||||
// The analysis_deliverable.md is rendered via the writeDeliverable hook, which
|
||||
// AgentExecutionService runs after validateAgentOutput but before the success
|
||||
// commit — so a "both files exist" check here would race the renderer. The
|
||||
// validator only checks queue.json, written by the submit-tool path in
|
||||
// agent-execution.ts before this validator runs.
|
||||
// validator only checks queue.json, written by the SDK structured-output path
|
||||
// in agent-execution.ts before this validator runs.
|
||||
function createVulnValidator(vulnType: VulnType): AgentValidator {
|
||||
return async (sourceDir: string, logger: ActivityLogger): Promise<boolean> => {
|
||||
const queueFile = path.join(sourceDir, `${vulnType}_exploitation_queue.json`);
|
||||
|
||||
@@ -19,7 +19,7 @@ import fs from 'node:fs/promises';
|
||||
import path from 'node:path';
|
||||
import { ApplicationFailure, Context, heartbeat } from '@temporalio/activity';
|
||||
import { writePlaywrightStealthConfig } from '../ai/playwright-config-writer.js';
|
||||
import { writeCodePathPermissionConfig } from '../ai/settings-writer.js';
|
||||
import { writeUserSettingsForCodePathAvoids } from '../ai/settings-writer.js';
|
||||
import { AuditSession } from '../audit/index.js';
|
||||
import type { ResumeAttempt } from '../audit/metrics-tracker.js';
|
||||
import { authStateFile, generateSessionJsonPath, type SessionMetadata } from '../audit/utils.js';
|
||||
@@ -76,7 +76,6 @@ export interface ActivityInput {
|
||||
auditDir?: string;
|
||||
promptDir?: string;
|
||||
sastSarifPath?: string;
|
||||
skipGitCheck?: boolean;
|
||||
providerConfig?: ProviderConfig;
|
||||
}
|
||||
|
||||
@@ -137,7 +136,7 @@ function buildContainerConfig(input: ActivityInput): ContainerConfig {
|
||||
async function runAgentActivity(
|
||||
agentName: AgentName,
|
||||
input: ActivityInput,
|
||||
customTools?: import('@earendil-works/pi-coding-agent').ToolDefinition[],
|
||||
mcpServers?: Record<string, import('@anthropic-ai/claude-agent-sdk').McpServerConfig>,
|
||||
writeDeliverable?: (deliverablesPath: string) => Promise<void>,
|
||||
): Promise<AgentMetrics> {
|
||||
const { repoPath, configPath, pipelineTestingMode = false, workflowId, webUrl } = input;
|
||||
@@ -193,7 +192,7 @@ async function runAgentActivity(
|
||||
...(input.providerConfig !== undefined && { providerConfig: input.providerConfig }),
|
||||
...(input.promptDir !== undefined && { promptDir: input.promptDir }),
|
||||
...(input.configYAML !== undefined && { configYAML: input.configYAML }),
|
||||
...(customTools && { customTools }),
|
||||
...(mcpServers && { mcpServers }),
|
||||
...(writeDeliverable && { writeDeliverable }),
|
||||
},
|
||||
auditSession,
|
||||
@@ -272,7 +271,7 @@ export async function runPreReconAgent(input: ActivityInput): Promise<AgentMetri
|
||||
logger.info(`Wrote pre_recon_deliverable.md from structured data (${markdown.length} bytes)`);
|
||||
};
|
||||
|
||||
return runAgentActivity('pre-recon', input, collector.tools, writeDeliverable);
|
||||
return runAgentActivity('pre-recon', input, { 'pre-recon-collector': collector.server }, writeDeliverable);
|
||||
}
|
||||
|
||||
export async function runReconAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
@@ -294,7 +293,7 @@ export async function runReconAgent(input: ActivityInput): Promise<AgentMetrics>
|
||||
logger.info(`Wrote recon_deliverable.md from structured data (${markdown.length} bytes)`);
|
||||
};
|
||||
|
||||
return runAgentActivity('recon', input, collector.tools, writeDeliverable);
|
||||
return runAgentActivity('recon', input, { 'recon-collector': collector.server }, writeDeliverable);
|
||||
}
|
||||
|
||||
async function runVulnAgentWithCollector(
|
||||
@@ -320,7 +319,7 @@ async function runVulnAgentWithCollector(
|
||||
logger.info(`Wrote ${vulnClass}_analysis_deliverable.md from structured data (${markdown.length} bytes)`);
|
||||
};
|
||||
|
||||
return runAgentActivity(agentName, input, collector.tools, writeDeliverable);
|
||||
return runAgentActivity(agentName, input, { 'vuln-collector': collector.server }, writeDeliverable);
|
||||
}
|
||||
|
||||
export async function runInjectionVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
@@ -402,7 +401,7 @@ async function runExploitAgentWithCollector(
|
||||
logger.info(`Wrote ${vulnClass}_exploitation_evidence.md from structured data (${markdown.length} bytes)`);
|
||||
};
|
||||
|
||||
return runAgentActivity(agentName, input, collector.tools, writeDeliverable);
|
||||
return runAgentActivity(agentName, input, { 'exploit-collector': collector.server }, writeDeliverable);
|
||||
}
|
||||
|
||||
export async function runInjectionExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
@@ -433,12 +432,12 @@ export async function runReportAgent(input: ActivityInput): Promise<AgentMetrics
|
||||
* Preflight validation activity.
|
||||
*
|
||||
* Runs cheap checks before any agent execution:
|
||||
* 1. Repository path exists with .git
|
||||
* 1. Repository path exists and is a directory
|
||||
* 2. Config file validates (if provided)
|
||||
* 3. Credential validation (API key, OAuth, or Bedrock)
|
||||
* 3. Credential validation (API key, OAuth, Bedrock, or Vertex AI)
|
||||
* 4. Target URL reachable from the container
|
||||
*
|
||||
* NOT using runAgentActivity — preflight doesn't run a full analysis agent.
|
||||
* NOT using runAgentActivity — preflight doesn't run an agent via the SDK.
|
||||
*/
|
||||
export async function runPreflightValidation(input: ActivityInput): Promise<void> {
|
||||
const startTime = Date.now();
|
||||
@@ -458,7 +457,6 @@ export async function runPreflightValidation(input: ActivityInput): Promise<void
|
||||
input.repoPath,
|
||||
input.configPath,
|
||||
logger,
|
||||
input.skipGitCheck,
|
||||
input.apiKey,
|
||||
input.providerConfig,
|
||||
);
|
||||
@@ -637,13 +635,12 @@ export async function syncPlaywrightStealthConfig(input: ActivityInput): Promise
|
||||
}
|
||||
|
||||
/**
|
||||
* Sync code_path avoid rules into the @gotgenes/pi-permission-system global config
|
||||
* so pi enforces them at the tool layer for every agent in this run. The executor
|
||||
* loads the extension when this config is present (see pi-executor).
|
||||
* Sync code_path avoid rules into Claude's user-scope settings.json so the
|
||||
* SDK enforces them at the tool layer for every agent in this run.
|
||||
*
|
||||
* Runs once per workflow before any analysis agent fires. Config is fixed for the
|
||||
* lifetime of the workflow, so writing once avoids a parallel-agent race on the
|
||||
* global config file.
|
||||
* Runs once per workflow before any agent fires. Config is fixed for the
|
||||
* lifetime of the workflow, so writing once avoids the parallel-agent race
|
||||
* on the global ~/.claude/settings.json file.
|
||||
*/
|
||||
export async function syncCodePathDenyRules(input: ActivityInput): Promise<void> {
|
||||
const logger = createActivityLogger();
|
||||
@@ -657,12 +654,8 @@ export async function syncCodePathDenyRules(input: ActivityInput): Promise<void>
|
||||
|
||||
const config = configResult.value;
|
||||
const denyCount = (config?.avoid ?? []).filter((r) => r.type === 'code_path').length;
|
||||
await writeCodePathPermissionConfig(config);
|
||||
logger.info(
|
||||
denyCount > 0
|
||||
? `Synced ${denyCount} code_path deny rule(s) to the pi-permission-system config`
|
||||
: 'No code_path deny rules; pi-permission-system config cleared',
|
||||
);
|
||||
await writeUserSettingsForCodePathAvoids(config);
|
||||
logger.info(`Synced code_path deny rules to user settings (${denyCount} entries)`);
|
||||
}
|
||||
|
||||
/**
|
||||
|
||||
@@ -27,8 +27,7 @@ export interface PipelineInput {
|
||||
promptDir?: string; // Override prompt template directory
|
||||
sastSarifPath?: string; // Optional path for consumer-supplied findings input
|
||||
checkpointsEnabled?: boolean; // Enable checkpoint activities (default: false)
|
||||
skipGitCheck?: boolean; // Skip .git directory validation in preflight (e.g. when .git is removed after clone)
|
||||
providerConfig?: ProviderConfig; // LLM provider configuration (Bedrock, custom base URL, etc.)
|
||||
providerConfig?: ProviderConfig; // LLM provider configuration (Bedrock, Vertex, etc.)
|
||||
vulnClasses?: VulnClass[]; // omitted = all five
|
||||
exploit?: boolean; // false skips the exploitation phase
|
||||
}
|
||||
|
||||
@@ -92,7 +92,7 @@ const TESTING_RETRY = {
|
||||
// Activity proxy with production retry configuration (default)
|
||||
const acts = proxyActivities<typeof activities>({
|
||||
startToCloseTimeout: '2 hours',
|
||||
heartbeatTimeout: '60 minutes', // Extended for nested pi task execution
|
||||
heartbeatTimeout: '60 minutes', // Extended for sub-agent execution (SDK blocks event loop during Task tool calls)
|
||||
retry: PRODUCTION_RETRY,
|
||||
});
|
||||
|
||||
@@ -135,7 +135,7 @@ const preflightActs = proxyActivities<typeof activities>({
|
||||
retry: PREFLIGHT_RETRY,
|
||||
});
|
||||
|
||||
// Credential rejection is not retryable; transient provider errors get 3 attempts.
|
||||
// Credential rejection is not retryable; transient SDK errors get 3 attempts.
|
||||
const AUTH_VALIDATION_RETRY = {
|
||||
initialInterval: '10 seconds',
|
||||
maximumInterval: '1 minute',
|
||||
@@ -242,7 +242,6 @@ export async function pentestPipeline(input: PipelineInput): Promise<PipelineSta
|
||||
...(input.auditDir !== undefined && { auditDir: input.auditDir }),
|
||||
...(input.promptDir !== undefined && { promptDir: input.promptDir }),
|
||||
...(input.sastSarifPath !== undefined && { sastSarifPath: input.sastSarifPath }),
|
||||
...(input.skipGitCheck !== undefined && { skipGitCheck: input.skipGitCheck }),
|
||||
...(input.providerConfig !== undefined && { providerConfig: input.providerConfig }),
|
||||
};
|
||||
|
||||
@@ -452,7 +451,7 @@ export async function pentestPipeline(input: PipelineInput): Promise<PipelineSta
|
||||
// === Initialize Deliverables Git ===
|
||||
await a.initDeliverableGit(activityInput);
|
||||
|
||||
// === Sync code_path deny rules ===
|
||||
// === Sync SDK deny rules ===
|
||||
await a.syncCodePathDenyRules(activityInput);
|
||||
|
||||
log.info(`Run scope: vuln_classes=[${selectedVulnClasses.join(', ')}] exploit=${exploit}`);
|
||||
|
||||
@@ -94,9 +94,8 @@ export interface DistributedConfig {
|
||||
/**
|
||||
* LLM provider configuration for multi-provider support.
|
||||
*
|
||||
* Resolved by the pi model/provider layer at execution time. Recognized
|
||||
* providerType values: 'bedrock', 'custom_base_url', 'anthropic_api'.
|
||||
* When omitted or 'anthropic_api', falls back to apiKey + ANTHROPIC_API_KEY.
|
||||
* Maps to SDK environment variables at execution time. When providerType
|
||||
* is omitted or 'anthropic_api', falls back to apiKey + ANTHROPIC_API_KEY.
|
||||
*/
|
||||
export interface ProviderConfig {
|
||||
readonly providerType?: string;
|
||||
@@ -104,6 +103,9 @@ export interface ProviderConfig {
|
||||
readonly awsRegion?: string;
|
||||
readonly awsAccessKeyId?: string;
|
||||
readonly awsSecretAccessKey?: string;
|
||||
readonly gcpRegion?: string;
|
||||
readonly gcpProjectId?: string;
|
||||
readonly gcpCredentialsPath?: string;
|
||||
readonly baseUrl?: string;
|
||||
readonly authToken?: string;
|
||||
readonly modelOverrides?: Record<string, string>;
|
||||
@@ -125,6 +127,6 @@ export interface ContainerConfig {
|
||||
readonly apiKey?: string;
|
||||
/** Prompt directory override — when set, prompt manager loads from this path */
|
||||
readonly promptDir?: string;
|
||||
/** LLM provider configuration for the pi executor */
|
||||
/** LLM provider configuration — when set, executor maps to SDK env vars directly */
|
||||
readonly providerConfig?: ProviderConfig;
|
||||
}
|
||||
|
||||
@@ -8,8 +8,8 @@
|
||||
* Consolidated billing/spending cap detection utilities.
|
||||
*
|
||||
* Anthropic's spending cap behavior is inconsistent:
|
||||
* - Sometimes a proper provider error (billing_error)
|
||||
* - Sometimes the agent responds with text about the cap
|
||||
* - Sometimes a proper SDK error (billing_error)
|
||||
* - Sometimes Claude responds with text about the cap
|
||||
* - Sometimes partial billing before cutoff
|
||||
*
|
||||
* This module provides defense-in-depth detection with shared pattern lists
|
||||
@@ -17,8 +17,8 @@
|
||||
*/
|
||||
|
||||
/**
|
||||
* Text patterns for provider/harness output sniffing (what the agent says).
|
||||
* Used by the pi event stream and the behavioral heuristic.
|
||||
* Text patterns for SDK output sniffing (what Claude says).
|
||||
* Used by message-handlers.ts and the behavioral heuristic.
|
||||
*/
|
||||
export const BILLING_TEXT_PATTERNS = [
|
||||
'spending cap',
|
||||
@@ -48,7 +48,7 @@ export const BILLING_API_PATTERNS = [
|
||||
|
||||
/**
|
||||
* Checks if text matches any billing text pattern.
|
||||
* Used for sniffing agent output content for spending cap messages.
|
||||
* Used for sniffing SDK output content for spending cap messages.
|
||||
*/
|
||||
export function matchesBillingTextPattern(text: string): boolean {
|
||||
const lowerText = text.toLowerCase();
|
||||
@@ -67,7 +67,7 @@ export function matchesBillingApiPattern(message: string): boolean {
|
||||
/**
|
||||
* Behavioral heuristic for detecting spending cap.
|
||||
*
|
||||
* When the agent hits a spending cap, it often returns a short message
|
||||
* When Claude hits a spending cap, it often returns a short message
|
||||
* with $0 cost. Legitimate agent work NEVER costs $0 with only 1-2 turns.
|
||||
*
|
||||
* This combines three signals:
|
||||
|
||||
+37
-4
@@ -1,6 +1,6 @@
|
||||
# AI Providers
|
||||
|
||||
Shannon Lite works best with Claude models. Anthropic API keys are recommended for most users, and Shannon Lite also supports AWS Bedrock and custom Anthropic-compatible endpoints.
|
||||
Shannon works best with Claude models. Anthropic API keys are recommended for most users, and Shannon also supports AWS Bedrock, Google Vertex AI, and custom Anthropic-compatible endpoints.
|
||||
|
||||
## Anthropic
|
||||
|
||||
@@ -20,6 +20,7 @@ Source-build mode can use a `.env` file:
|
||||
|
||||
```bash
|
||||
ANTHROPIC_API_KEY=your-api-key
|
||||
CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000
|
||||
```
|
||||
|
||||
Each tier can be pointed at any Claude model via `ANTHROPIC_SMALL_MODEL` / `ANTHROPIC_MEDIUM_MODEL` / `ANTHROPIC_LARGE_MODEL` (or the setup wizard). If you set a tier to `claude-fable-5`, note that Fable's safety classifiers route cybersecurity tasks to Opus 4.8, so those phases run on Opus 4.8 regardless.
|
||||
@@ -50,7 +51,7 @@ ANTHROPIC_MEDIUM_MODEL=us.anthropic.claude-sonnet-4-6
|
||||
ANTHROPIC_LARGE_MODEL=us.anthropic.claude-opus-4-8
|
||||
```
|
||||
|
||||
Shannon Lite uses three model tiers:
|
||||
Shannon uses three model tiers:
|
||||
|
||||
- **small** for summarization
|
||||
- **medium** for security analysis
|
||||
@@ -58,12 +59,44 @@ Shannon Lite uses three model tiers:
|
||||
|
||||
Set `ANTHROPIC_SMALL_MODEL`, `ANTHROPIC_MEDIUM_MODEL`, and `ANTHROPIC_LARGE_MODEL` to Bedrock model IDs available in your region.
|
||||
|
||||
## Google Vertex AI
|
||||
|
||||
Create a service account with the `roles/aiplatform.user` role in the GCP Console, then download a JSON key file.
|
||||
|
||||
Run `npx @keygraph/shannon setup` and select **Google Vertex AI**. The wizard prompts for region, project ID, service account key file path, and model IDs. The key file is copied to `~/.shannon/google-sa-key.json`.
|
||||
|
||||
Or export environment variables directly:
|
||||
|
||||
```bash
|
||||
export CLAUDE_CODE_USE_VERTEX=1
|
||||
export CLOUD_ML_REGION=us-east5
|
||||
export ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project-id
|
||||
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your-sa-key.json
|
||||
export ANTHROPIC_SMALL_MODEL=claude-haiku-4-5@20251001
|
||||
export ANTHROPIC_MEDIUM_MODEL=claude-sonnet-4-6
|
||||
export ANTHROPIC_LARGE_MODEL=claude-opus-4-8
|
||||
```
|
||||
|
||||
Source-build `.env` equivalent:
|
||||
|
||||
```bash
|
||||
CLAUDE_CODE_USE_VERTEX=1
|
||||
CLOUD_ML_REGION=us-east5
|
||||
ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project-id
|
||||
GOOGLE_APPLICATION_CREDENTIALS=./credentials/google-sa-key.json
|
||||
ANTHROPIC_SMALL_MODEL=claude-haiku-4-5@20251001
|
||||
ANTHROPIC_MEDIUM_MODEL=claude-sonnet-4-6
|
||||
ANTHROPIC_LARGE_MODEL=claude-opus-4-8
|
||||
```
|
||||
|
||||
Set `CLOUD_ML_REGION=global` for global endpoints, or use a specific region like `us-east5`. Some models may not be available on global endpoints.
|
||||
|
||||
## Custom Base URL
|
||||
|
||||
Shannon Lite supports pointing the SDK at an Anthropic-compatible endpoint with `ANTHROPIC_BASE_URL`. For proxy-based routing, use an LLM proxy such as LiteLLM configured to expose an Anthropic-compatible endpoint.
|
||||
Shannon supports pointing the SDK at an Anthropic-compatible endpoint with `ANTHROPIC_BASE_URL`. For proxy-based routing, use an LLM proxy such as LiteLLM configured to expose an Anthropic-compatible endpoint.
|
||||
|
||||
> [!IMPORTANT]
|
||||
> Only Claude models are officially supported. Shannon Lite's evaluations, internal testing, and agent harness are optimized for Claude. Smaller or alternative models, including non-Claude models routed through a proxy, may not reliably follow Shannon Lite's instructions or tool-use constraints. Use them at your own risk.
|
||||
> Only Claude models are officially supported. Shannon's evaluations, internal testing, and agent harness are optimized for Claude. Smaller or alternative models, including non-Claude models routed through a proxy, may not reliably follow Shannon's instructions or tool-use constraints. Use them at your own risk.
|
||||
|
||||
The experimental `claude-code-router` integration is being removed. If you rely on it, migrate to an Anthropic-compatible proxy such as LiteLLM before upgrading.
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
# Configuration
|
||||
|
||||
Shannon Lite can run without a configuration file, but configuration enables authenticated testing, scope guidance, rules of engagement, report filtering, and rate-limit tuning.
|
||||
Shannon can run without a configuration file, but configuration enables authenticated testing, scope guidance, rules of engagement, report filtering, and rate-limit tuning.
|
||||
|
||||
## Credential Precedence
|
||||
|
||||
@@ -119,7 +119,7 @@ Supported placeholders:
|
||||
- `$email_password`
|
||||
- `$email_totp`
|
||||
|
||||
At runtime, Shannon Lite replaces these placeholders with the credentials passed in the config.
|
||||
At runtime, Shannon replaces these placeholders with the credentials passed in the config.
|
||||
|
||||
```yaml
|
||||
login_flow:
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
# Coverage and Roadmap
|
||||
|
||||
Shannon Lite focuses on exploitable findings that can be validated against a running application.
|
||||
Shannon focuses on exploitable findings that can be validated against a running application.
|
||||
|
||||
## Current Shannon Lite Coverage
|
||||
## Current Shannon Coverage
|
||||
|
||||
- Broken Authentication
|
||||
- Broken Authorization
|
||||
@@ -12,12 +12,12 @@ Shannon Lite focuses on exploitable findings that can be validated against a run
|
||||
|
||||
## Reporting Philosophy
|
||||
|
||||
Shannon Lite follows a proof-by-exploitation model. Findings that cannot be demonstrated with a working proof of concept are not included in the final report.
|
||||
Shannon follows a proof-by-exploitation model. Findings that cannot be demonstrated with a working proof of concept are not included in the final report.
|
||||
|
||||
This reduces speculative noise, but it also means Shannon Lite does not aim to report every possible security issue in a repository. In particular, many dependency, policy, configuration, and broad static-analysis findings are outside the core Shannon Lite workflow.
|
||||
This reduces speculative noise, but it also means Shannon does not aim to report every possible security issue in a repository. In particular, many dependency, policy, configuration, and broad static-analysis findings are outside the core Shannon workflow.
|
||||
|
||||
## Roadmap Direction
|
||||
|
||||
Planned coverage areas should continue to live in the repository's canonical roadmap document if one exists. The README should link to that document rather than carrying detailed roadmap history inline.
|
||||
|
||||
For organizations that need broader static and organizational coverage now, see [Shannon Pro](shannon-pro.md).
|
||||
For organizations that need broader static and organizational coverage now, see [the Keygraph platform](keygraph-platform.md).
|
||||
|
||||
+6
-4
@@ -11,10 +11,10 @@ This guide covers the source-build workflow, common CLI commands, repository pat
|
||||
|
||||
## Clone and Build
|
||||
|
||||
Use the source-build workflow if you want to run Shannon Lite from a local clone, modify the open-source CLI, or keep the worker image built locally.
|
||||
Use the source-build workflow if you want to run Shannon from a local clone, modify the open-source CLI, or keep the worker image built locally.
|
||||
|
||||
```bash
|
||||
# 1. Clone Shannon Lite.
|
||||
# 1. Clone Shannon.
|
||||
git clone https://github.com/KeygraphHQ/shannon.git
|
||||
cd shannon
|
||||
|
||||
@@ -33,17 +33,19 @@ At minimum, your `.env` file should include one supported AI provider credential
|
||||
|
||||
```bash
|
||||
ANTHROPIC_API_KEY=your-api-key
|
||||
CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000
|
||||
```
|
||||
|
||||
Environment variables can also be exported directly:
|
||||
|
||||
```bash
|
||||
export ANTHROPIC_API_KEY="your-api-key"
|
||||
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000
|
||||
```
|
||||
|
||||
## Prepare Your Repository
|
||||
|
||||
Shannon Lite can scan any repository on your machine. Pass an absolute or relative path with `-r`.
|
||||
Shannon can scan any repository on your machine. Pass an absolute or relative path with `-r`.
|
||||
|
||||
```bash
|
||||
npx @keygraph/shannon start -u https://example.com -r /path/to/repo
|
||||
@@ -74,7 +76,7 @@ Open the Temporal Web UI for detailed monitoring:
|
||||
open http://localhost:8233
|
||||
```
|
||||
|
||||
Stop Shannon Lite:
|
||||
Stop Shannon:
|
||||
|
||||
```bash
|
||||
npx @keygraph/shannon stop
|
||||
|
||||
@@ -1,12 +1,12 @@
|
||||
# Shannon Pro
|
||||
# Keygraph Platform
|
||||
|
||||
Shannon Pro is Keygraph's commercial continuous pentesting and AppSec platform for teams running security across many repositories, services, and environments. While Shannon Lite is a local white-box pentesting CLI, Shannon Pro is a full platform: it combines parsed-code SAST, source-to-sink analysis, black-box and white-box agentic pentesting, verified remediation, CI/CD gating, SLA tracking, and reporting for security and compliance teams.
|
||||
The Keygraph platform is Keygraph's commercial continuous pentesting and AppSec platform for teams running security across many repositories, services, and environments. While Shannon is a local white-box pentesting CLI, the Keygraph platform is a complete AppSec system: it combines parsed-code SAST, source-to-sink analysis, black-box and white-box agentic pentesting, verified remediation, CI/CD gating, SLA tracking, and reporting for security and compliance teams.
|
||||
|
||||
This repository contains Shannon Lite, the AGPL-3.0 open-source CLI for strictly white-box pentesting. Shannon Pro supports both white-box and black-box agentic pentesting and adds static analysis, finding management, remediation workflows, reporting, and enterprise deployment options.
|
||||
This repository contains Shannon, the AGPL-3.0 open-source CLI for strictly white-box pentesting. The Keygraph platform supports both white-box and black-box agentic pentesting and adds static analysis, finding management, remediation workflows, reporting, and enterprise deployment options.
|
||||
|
||||
## Who Should Consider Shannon Pro
|
||||
## Who Should Consider the Keygraph Platform
|
||||
|
||||
Shannon Pro is intended for organizations that need:
|
||||
The Keygraph platform is intended for organizations that need:
|
||||
|
||||
- Continuous AppSec coverage across many repositories and services
|
||||
- White-box pentesting when source code is available
|
||||
@@ -21,7 +21,7 @@ Shannon Pro is intended for organizations that need:
|
||||
|
||||
## Full Vulnerability Lifecycle
|
||||
|
||||
Shannon Pro is designed to cover the full vulnerability lifecycle, not only discovery:
|
||||
The Keygraph platform is designed to cover the full vulnerability lifecycle, not only discovery:
|
||||
|
||||
1. **Find** exploitable issues with white-box pentesting, black-box pentesting, SAST, SCA, secrets, IaC, container, and business logic testing.
|
||||
2. **Normalize** results into canonical findings so duplicate scanner outputs become one tracked vulnerability per repository.
|
||||
@@ -34,9 +34,9 @@ Shannon Pro is designed to cover the full vulnerability lifecycle, not only disc
|
||||
|
||||
## Pentesting Modes
|
||||
|
||||
Shannon Lite is strictly white-box: it requires access to the target application's source code and repository layout.
|
||||
Shannon is strictly white-box: it requires access to the target application's source code and repository layout.
|
||||
|
||||
Shannon Pro supports two pentesting modes:
|
||||
The Keygraph platform supports two pentesting modes:
|
||||
|
||||
- **White-box agentic pentesting**: Agents use source-code context to understand architecture, identify realistic attack paths, and validate exploitability against the running application.
|
||||
- **Black-box agentic pentesting**: Agents test deployed applications and APIs without source-code access, useful for third-party surfaces, production-like external validation, or environments where source access is unavailable.
|
||||
@@ -45,7 +45,7 @@ Both modes follow the same core principle: do not report what might be vulnerabl
|
||||
|
||||
## AppSec Coverage
|
||||
|
||||
Shannon Pro combines agentic pentesting with broader AppSec coverage:
|
||||
The Keygraph platform combines agentic pentesting with broader AppSec coverage:
|
||||
|
||||
- **Agentic SAST**: Code Property Graph analysis with LLM reasoning for data flow, context, and sanitization decisions.
|
||||
- **SCA with reachability**: Dependency vulnerability analysis that prioritizes issues reachable from application entry points.
|
||||
@@ -62,7 +62,7 @@ The result is a finding with proof of exploitability, source context when availa
|
||||
|
||||
## Enterprise Deployment
|
||||
|
||||
Shannon Pro supports enterprise deployment patterns for teams with strict data, model, and network requirements:
|
||||
The Keygraph platform supports enterprise deployment patterns for teams with strict data, model, and network requirements:
|
||||
|
||||
- **Self-hosted deployments** inside the customer's cloud or infrastructure
|
||||
- **Air-gapped deployments** for isolated environments
|
||||
@@ -75,7 +75,7 @@ Deployments can be designed so source code, scan results, prompts, completions,
|
||||
|
||||
## Capability Comparison
|
||||
|
||||
| Need | Shannon Lite | Shannon Pro |
|
||||
| Need | Shannon | Keygraph platform |
|
||||
| --- | --- | --- |
|
||||
| Licensing | AGPL-3.0 | Commercial |
|
||||
| White-box pentesting | Yes; source code required | Yes; source-aware testing with platform workflows |
|
||||
@@ -91,4 +91,4 @@ Deployments can be designed so source code, scan results, prompts, completions,
|
||||
|
||||
## Contact
|
||||
|
||||
Learn more on the [Keygraph website](https://keygraph.io), start a free trial, book a [Shannon Pro demo](https://cal.com/team/keygraph/shannon-pro), or contact [shannon@keygraph.io](mailto:shannon@keygraph.io).
|
||||
Learn more on the [Keygraph website](https://keygraph.io), start a free trial, book a [Keygraph demo](https://cal.com/team/keygraph/shannon-pro), or contact [shannon@keygraph.io](mailto:shannon@keygraph.io).
|
||||
+4
-4
@@ -4,7 +4,7 @@ This guide covers platform-specific notes and Docker networking behavior.
|
||||
|
||||
## Windows
|
||||
|
||||
Shannon Lite on Windows is supported through WSL2. Native Windows, including Git Bash, is not supported.
|
||||
Shannon on Windows is supported through WSL2. Native Windows, including Git Bash, is not supported.
|
||||
|
||||
### Ensure WSL2
|
||||
|
||||
@@ -25,7 +25,7 @@ wsl --set-version <distro-name> 2
|
||||
|
||||
Install Docker Desktop on Windows and enable the WSL2 backend under **Settings > General > Use the WSL 2 based engine**.
|
||||
|
||||
Run Shannon Lite inside WSL:
|
||||
Run Shannon inside WSL:
|
||||
|
||||
```bash
|
||||
npx @keygraph/shannon setup
|
||||
@@ -43,7 +43,7 @@ cp .env.example .env
|
||||
|
||||
To access the Temporal Web UI, run `ip addr` inside WSL to find your WSL IP address, then navigate to `http://<wsl-ip>:8233` in your Windows browser.
|
||||
|
||||
Windows Defender may flag exploit code in reports as false positives. Add an exclusion for the Shannon Lite directory or use Docker/WSL2 isolation.
|
||||
Windows Defender may flag exploit code in reports as false positives. Add an exclusion for the Shannon directory or use Docker/WSL2 isolation.
|
||||
|
||||
## Linux
|
||||
|
||||
@@ -69,7 +69,7 @@ Source-build equivalent:
|
||||
|
||||
## Custom Hostnames
|
||||
|
||||
If your local stack uses custom hostnames mapped in `/etc/hosts`, Shannon Lite forwards those entries into the worker container at scan start.
|
||||
If your local stack uses custom hostnames mapped in `/etc/hosts`, Shannon forwards those entries into the worker container at scan start.
|
||||
|
||||
To disable forwarding:
|
||||
|
||||
|
||||
+12
-12
@@ -1,18 +1,18 @@
|
||||
# Safety and Limitations
|
||||
|
||||
Read this before running Shannon Lite in a new environment.
|
||||
Read this before running Shannon in a new environment.
|
||||
|
||||
## Authorized Use Only
|
||||
|
||||
Shannon Lite is designed for legitimate security auditing. You must have explicit written authorization from the owner of the target system before running Shannon Lite.
|
||||
Shannon is designed for legitimate security auditing. You must have explicit written authorization from the owner of the target system before running Shannon.
|
||||
|
||||
Unauthorized scanning or exploitation of systems you do not own is illegal. Keygraph is not responsible for misuse of Shannon Lite.
|
||||
Unauthorized scanning or exploitation of systems you do not own is illegal. Keygraph is not responsible for misuse of Shannon.
|
||||
|
||||
## Do Not Run on Production
|
||||
|
||||
Shannon Lite is not a passive scanner. Exploitation agents actively execute attacks to confirm vulnerabilities. This can mutate application state and data.
|
||||
Shannon is not a passive scanner. Exploitation agents actively execute attacks to confirm vulnerabilities. This can mutate application state and data.
|
||||
|
||||
Do not run Shannon Lite against production systems. Use sandboxed, staging, or local development environments where data integrity is not a concern.
|
||||
Do not run Shannon against production systems. Use sandboxed, staging, or local development environments where data integrity is not a concern.
|
||||
|
||||
Potential mutative effects include:
|
||||
|
||||
@@ -23,17 +23,17 @@ Potential mutative effects include:
|
||||
- Generating unexpected outbound traffic
|
||||
- Writing exploit artifacts to reports or deliverables
|
||||
|
||||
For maximum isolation, run Shannon Lite inside a disposable virtual machine.
|
||||
For maximum isolation, run Shannon inside a disposable virtual machine.
|
||||
|
||||
## LLM and Automation Caveats
|
||||
|
||||
- **Verification is required**: Shannon Lite uses a proof-by-exploitation methodology, but final reports can still contain weakly supported or incorrect details. Human review is essential.
|
||||
- **Model support**: Shannon Lite is officially supported only with Claude models. Alternative models may be incomplete, inaccurate, or unstable.
|
||||
- **Prompt injection risk**: Do not point Shannon Lite at untrusted or adversarial codebases. AI-powered tools that read source code can be influenced by malicious repository content.
|
||||
- **Verification is required**: Shannon uses a proof-by-exploitation methodology, but final reports can still contain weakly supported or incorrect details. Human review is essential.
|
||||
- **Model support**: Shannon is officially supported only with Claude models. Alternative models may be incomplete, inaccurate, or unstable.
|
||||
- **Prompt injection risk**: Do not point Shannon at untrusted or adversarial codebases. AI-powered tools that read source code can be influenced by malicious repository content.
|
||||
|
||||
## Scope of Analysis
|
||||
|
||||
Shannon Lite currently targets exploitable vulnerabilities in these classes:
|
||||
Shannon currently targets exploitable vulnerabilities in these classes:
|
||||
|
||||
- Broken Authentication
|
||||
- Broken Authorization
|
||||
@@ -41,9 +41,9 @@ Shannon Lite currently targets exploitable vulnerabilities in these classes:
|
||||
- Cross-Site Scripting
|
||||
- Server-Side Request Forgery
|
||||
|
||||
Shannon Lite's proof-by-exploitation model means it does not report issues it cannot actively exploit, such as many vulnerable dependency, insecure configuration, or broad policy findings.
|
||||
Shannon's proof-by-exploitation model means it does not report issues it cannot actively exploit, such as many vulnerable dependency, insecure configuration, or broad policy findings.
|
||||
|
||||
For broader coverage, Shannon Pro adds black-box and white-box agentic pentesting, graph-based static analysis, SCA reachability, secrets detection, business logic testing, remediation workflows, SLA tracking, and reporting dashboards.
|
||||
For broader coverage, the Keygraph platform adds black-box and white-box agentic pentesting, graph-based static analysis, SCA reachability, secrets detection, business logic testing, remediation workflows, SLA tracking, and reporting dashboards.
|
||||
|
||||
## Cost and Performance
|
||||
|
||||
|
||||
+2
-2
@@ -1,6 +1,6 @@
|
||||
# Workspaces and Resuming
|
||||
|
||||
Shannon Lite uses workspaces to store scan state, logs, prompts, and deliverables. Workspaces allow interrupted or failed runs to resume without re-running completed agents.
|
||||
Shannon uses workspaces to store scan state, logs, prompts, and deliverables. Workspaces allow interrupted or failed runs to resume without re-running completed agents.
|
||||
|
||||
## How Workspaces Work
|
||||
|
||||
@@ -13,7 +13,7 @@ Shannon Lite uses workspaces to store scan state, logs, prompts, and deliverable
|
||||
- Each agent's progress is checkpointed so resumed runs can skip completed work.
|
||||
|
||||
> [!NOTE]
|
||||
> The URL must match the original workspace URL when resuming. Shannon Lite rejects mismatched URLs to prevent cross-target contamination.
|
||||
> The URL must match the original workspace URL when resuming. Shannon rejects mismatched URLs to prevent cross-target contamination.
|
||||
|
||||
## Examples
|
||||
|
||||
|
||||
+143
-110
@@ -7,12 +7,12 @@
|
||||
|
||||
# File: README.md
|
||||
|
||||
>[!NOTE]
|
||||
> **[Better Steerability, Authentication Improvements, and the Migration to the Pi Harness](https://github.com/KeygraphHQ/shannon/discussions/348)**
|
||||
> [!NOTE]
|
||||
> **[Shannon Now Runs on the Pi Harness (Beta) - run it today with `npx @keygraph/shannon@beta`](https://github.com/KeygraphHQ/shannon/discussions/358)**
|
||||
|
||||
<div align="center">
|
||||
|
||||
<img src="./assets/github-banner.png" alt="Shannon - AI Pentester for Web Applications and APIs" width="100%">
|
||||
<img src="./assets/github-banner.png" alt="Shannon - AI Pentester by Keygraph" width="100%">
|
||||
|
||||
# Shannon - AI Pentester by Keygraph
|
||||
|
||||
@@ -21,6 +21,8 @@
|
||||
Shannon is an autonomous, white-box AI pentester for web applications and APIs. <br />
|
||||
It analyzes your source code, identifies attack paths, and executes real exploits to prove vulnerabilities before they reach production.
|
||||
|
||||
**This repository is Shannon Open Source: the full agent, run locally from your command line.**
|
||||
|
||||
---
|
||||
|
||||
<a href="https://discord.gg/9ZqQPuhJB7"><img src="./assets/discord.png" height="40" alt="Join Discord"></a>
|
||||
@@ -35,45 +37,38 @@ It analyzes your source code, identifies attack paths, and executes real exploit
|
||||
## Table of Contents
|
||||
|
||||
- [What is Shannon?](#what-is-shannon)
|
||||
- [Product Line](#product-line)
|
||||
- [Shannon Lite in Action](#shannon-lite-in-action)
|
||||
- [Shannon in Action](#shannon-in-action)
|
||||
- [Quick Start](#quick-start)
|
||||
- [Key Capabilities](#key-capabilities)
|
||||
- [Shannon Lite and Shannon Pro](#shannon-lite-and-shannon-pro)
|
||||
- [Editions](#editions)
|
||||
- [Architecture](#architecture)
|
||||
- [Documentation](#documentation)
|
||||
- [Safety, Scope, and Limitations](#safety-scope-and-limitations)
|
||||
- [License and Enterprise Licensing](#license-and-enterprise-licensing)
|
||||
- [About Keygraph](#about-keygraph)
|
||||
- [Community and Support](#community-and-support)
|
||||
|
||||
## What is Shannon?
|
||||
|
||||
Shannon is an AI pentester developed by [Keygraph](https://keygraph.io). It performs white-box security testing of web applications and their underlying APIs by combining source-code analysis with live exploitation.
|
||||
Shannon is an autonomous AI pentester developed by [Keygraph](https://keygraph.io). It performs white-box security testing of web applications and their underlying APIs by combining source-code analysis with live exploitation.
|
||||
|
||||
Shannon analyzes your web application's source code to identify potential attack vectors, then uses browser automation and command-line tools to execute real exploits against the running application and its APIs. Only vulnerabilities with a working proof-of-concept are included in the final report.
|
||||
|
||||
Shannon is the agent. This repository is Shannon Open Source, the standalone pentester you run yourself. The same Shannon also powers the [Keygraph platform](https://keygraph.io), Keygraph's commercial pentesting product. See [Editions](#editions) for how the two compare.
|
||||
|
||||
### Why Shannon Exists
|
||||
|
||||
Thanks to tools like Claude Code and Cursor, your team ships code non-stop. But your penetration test? That happens once a year. This creates a massive security gap. For the other 364 days, you could be unknowingly shipping vulnerabilities to production.
|
||||
|
||||
Shannon closes that gap by providing on-demand, automated penetration testing that can run against every build or release.
|
||||
|
||||
## Product Line
|
||||
|
||||
Shannon is developed by [Keygraph](https://keygraph.io) and available in two editions:
|
||||
|
||||
| Edition | License | Best For |
|
||||
| --- | --- | --- |
|
||||
| **Shannon Lite** | AGPL-3.0 | Local, strictly white-box testing of applications you own or are authorized to test. |
|
||||
| **Shannon Pro** | Commercial | Organizations needing a continuous pentesting and AppSec platform with black-box and white-box pentesting, parsed-code SAST, CI/CD gating, verified remediation, SLA tracking, and enterprise deployment. |
|
||||
|
||||
## Shannon Lite in Action
|
||||
## Shannon in Action
|
||||
|
||||
<p align="center">
|
||||
<img src="assets/shannon-action.gif" alt="Shannon Lite running an autonomous pentest" width="100%">
|
||||
<img src="assets/shannon-action.gif" alt="Shannon running an autonomous pentest" width="100%">
|
||||
</p>
|
||||
|
||||
Sample Shannon Lite penetration test reports from intentionally vulnerable applications:
|
||||
Sample penetration test reports from intentionally vulnerable applications, produced by Shannon Open Source:
|
||||
|
||||
| Target | Summary | Report |
|
||||
| --- | --- | --- |
|
||||
@@ -85,14 +80,14 @@ Sample Shannon Lite penetration test reports from intentionally vulnerable appli
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- **Docker** - required for the worker container.
|
||||
- **Node.js 18+** - required for the recommended `npx` workflow.
|
||||
- **AI provider credentials** - Anthropic is recommended; AWS Bedrock and compatible proxy setups are documented separately.
|
||||
- **Docker**: required for the worker container.
|
||||
- **Node.js 18+**: required for the recommended `npx` workflow.
|
||||
- **AI provider credentials**: Anthropic is recommended. AWS Bedrock, Google Vertex AI, and compatible proxy setups are documented separately.
|
||||
|
||||
### Run Shannon Lite
|
||||
### Run Shannon
|
||||
|
||||
> [!WARNING]
|
||||
> Shannon Lite actively executes exploits. Run it only against applications and environments you own or have explicit written authorization to test. Do not run Shannon Lite against production systems.
|
||||
> Shannon actively executes exploits. Run it only against applications and environments you own or have explicit written authorization to test. Do not run Shannon against production systems.
|
||||
|
||||
```bash
|
||||
# Configure credentials with the interactive wizard.
|
||||
@@ -102,52 +97,49 @@ npx @keygraph/shannon setup
|
||||
npx @keygraph/shannon start -u https://your-app.com -r /path/to/your-repo
|
||||
```
|
||||
|
||||
Shannon Lite pulls the worker image from Docker Hub, starts the required local infrastructure, mounts the target repository read-only inside an ephemeral worker container, and writes results to a local workspace.
|
||||
Shannon pulls the worker image from Docker Hub, starts the required local infrastructure, mounts the target repository read-only inside an ephemeral worker container, and writes results to a local workspace.
|
||||
|
||||
For source builds, authenticated scans, provider-specific setup, and platform notes, see [Documentation](#documentation).
|
||||
|
||||
## Key Capabilities
|
||||
|
||||
- **Proof-by-exploitation reports**: Shannon Lite reports validated findings with reproducible proof-of-concept steps instead of speculative warnings.
|
||||
- **White-box attack planning**: Shannon Lite uses source-code analysis to guide dynamic testing and focus on realistic attack paths.
|
||||
- **Autonomous execution**: Shannon Lite launches reconnaissance, vulnerability analysis, exploitation, and report generation from a single command.
|
||||
- **Authenticated testing**: Shannon Lite configuration files can describe login flows, test credentials, TOTP, email-based login flows, focus areas, and rules of engagement.
|
||||
- **OWASP-focused coverage**: Shannon Lite targets exploitable Injection, XSS, SSRF, Broken Authentication, and Broken Authorization issues.
|
||||
- **Resumable workspaces**: Shannon Lite can resume interrupted runs without re-running completed agents.
|
||||
- **Proof-by-exploitation reports**: Shannon reports validated findings with reproducible proof-of-concept steps instead of speculative warnings.
|
||||
- **White-box attack planning**: Shannon uses source-code analysis to guide dynamic testing and focus on realistic attack paths.
|
||||
- **Autonomous execution**: Shannon launches reconnaissance, vulnerability analysis, exploitation, and report generation from a single command.
|
||||
- **Authenticated testing**: configuration files can describe login flows, test credentials, TOTP, email-based login flows, focus areas, and rules of engagement.
|
||||
- **OWASP-focused coverage**: Shannon targets exploitable Injection, XSS, SSRF, Broken Authentication, and Broken Authorization issues.
|
||||
- **Resumable workspaces**: Shannon can resume interrupted runs without re-running completed agents.
|
||||
|
||||
## Shannon Lite and Shannon Pro
|
||||
## Editions
|
||||
|
||||
This repository contains **Shannon Lite**, the AGPL-3.0 open-source CLI for strictly white-box, proof-by-exploitation testing of web applications and APIs you own or are authorized to test. Shannon Lite requires access to the target application's source code and repository layout.
|
||||
Shannon ships in two ways: **Shannon Open Source**, the pentester you run yourself, and the **Keygraph platform**, the commercial pentesting product that runs Shannon continuously and closes the full AppSec lifecycle around it.
|
||||
|
||||
**Shannon Pro** is Keygraph's commercial continuous pentesting and AppSec platform for teams running security across many repositories, services, and environments. While Shannon Lite is a local white-box pentesting CLI, Shannon Pro is a full platform: it combines parsed-code SAST, source-to-sink analysis, black-box and white-box agentic pentesting, verified remediation, CI/CD gating, SLA tracking, and reporting for security and compliance teams.
|
||||
**Shannon Open Source** (this repository) is the standalone pentester: a CLI agent for white-box, proof-by-exploitation testing of web applications and APIs you own or are authorized to test. It reads your source, plans attacks, executes real exploits, and reports only what it can prove. It runs on demand and is complete in that lane. You point it at a target, it pentests, it reports.
|
||||
|
||||
Shannon Pro supports both **white-box and black-box agentic pentesting**: use source-aware testing when code is available, or run autonomous black-box testing against deployed applications and APIs when source access is unavailable or unnecessary.
|
||||
The **Keygraph platform** is the enterprise-ready, continuous pentesting product powered by Shannon. In the Keygraph platform, an enhanced build of Shannon runs continuously in a hardened, orchestrated environment fed by Keygraph's full code-analysis stack. Around that engine, the platform closes the entire vulnerability lifecycle, from analysis to a verified fix:
|
||||
|
||||
Shannon Pro covers the full vulnerability lifecycle: finding exploitable issues, deduplicating and prioritizing them, syncing work into developer workflows, generating verified remediations, re-testing fixes, tracking SLAs, and producing dashboards for security reporting and compliance.
|
||||
- **Analyze**: Code Property Graph SAST, SCA with reachability, secrets, IaC, and container scanning. First-class detection in their own right, and context that sharpens Shannon's attacks.
|
||||
- **Prove**: autonomous black-box and source-aware white-box pentests turn candidate findings into proven, exploited vulnerabilities rather than speculative alerts.
|
||||
- **Manage**: one canonical record per vulnerability per repository, deduplicated across every source, with ownership, status, SLA tracking, dashboards, and bidirectional Jira sync.
|
||||
- **Remediate and verify**: patches written automatically and re-tested against the patched code before delivery, landing in your existing review workflow rather than auto-applied.
|
||||
- **Deploy**: self-hosted and air-gapped environments, strict bring-your-own-key model access, and customer-controlled LLM gateway patterns, so source, results, and model traffic stay inside your perimeter.
|
||||
|
||||
For enterprise deployments, Shannon Pro supports self-hosted and air-gapped environments, strict bring-your-own-key model access, and customer-controlled LLM gateway patterns. Deployments can be designed so source code, scan results, prompts, completions, and model traffic remain inside your security perimeter.
|
||||
Shannon is the proof engine at the center of the Keygraph platform. Shannon Open Source gives you that engine to run yourself. The Keygraph platform surrounds Shannon with continuous analysis, finding management, remediation, verification, and enterprise deployment.
|
||||
|
||||
Shannon Lite is a strong fit for local and project-level white-box testing. Shannon Pro is intended for organizations that need continuous AppSec coverage, black-box and white-box pentesting, centralized triage, verified remediation workflows, compliance-ready reporting, enterprise integrations, and commercial support.
|
||||
|
||||
| Need | Shannon Lite | Shannon Pro |
|
||||
| AppSec lifecycle stage | Shannon Open Source | Keygraph platform |
|
||||
| --- | --- | --- |
|
||||
| License | AGPL-3.0 | Commercial |
|
||||
| White-box pentesting | Yes; source code required | Yes; source-aware testing with platform workflows |
|
||||
| Black-box pentesting | No | Yes; autonomous testing without source-code access |
|
||||
| Code analysis / SAST | Prompting and source pass-through to guide pentesting | Actual code parsing, Code Property Graph analysis, source-to-sink path analysis, and agentic SAST |
|
||||
| AppSec coverage | OWASP-focused agentic pentesting | Agentic pentesting, SAST, SCA, secrets, IaC, containers, and business logic testing |
|
||||
| CI/CD and gating | Manual/local CLI runs | Headless commercial CLI for CI/CD gating across enterprise CI/CD platforms |
|
||||
| Finding lifecycle | Local Markdown reports | Canonical findings, deduplication, ownership, status, SLA tracking, workflow sync, and reporting dashboards |
|
||||
| Remediation | Manual | User-initiated remediation with verification before delivery |
|
||||
| Fix verification | None; manual reruns only | Targeted verification without rerunning the entire scan, completing the remediation lifecycle |
|
||||
| Enterprise deployment | Local CLI and Docker worker | Self-hosted, air-gapped, BYOK, and customer-controlled LLM gateway options |
|
||||
| Support | Community | Commercial support |
|
||||
| Analyze | Basic LLM pass-through of source to plan attacks | Actual code-base parsing, plus Code Property Graph, SAST, SCA with reachability, secrets, IaC, and containers |
|
||||
| Pentest and prove | White-box only, proof by exploitation | Enhanced white-box, plus black-box and grey-box modes, run continuously |
|
||||
| Manage findings | Local Markdown report | Canonical findings system: deduplication across sources, ownership, SLA, dashboards, Jira sync, and professional pentest-grade PDF reports |
|
||||
| Remediate and verify | Fix manually from the report, then re-run the full scan to verify | Automated remediation: opens a PR with the fix, verified by point re-test without re-running the full scan |
|
||||
| Deploy and operate | Local CLI and Docker worker | Self-hosted, air-gapped, BYOK, continuous, enterprise integrations |
|
||||
| License and support | AGPL-3.0, community | Commercial, supported |
|
||||
|
||||
Learn more on the [Keygraph website](https://keygraph.io), read the [Shannon Pro technical overview](docs/shannon-pro.md), start a free trial or book a [Shannon Pro demo](https://cal.com/team/keygraph/shannon-pro), or contact [shannon@keygraph.io](mailto:shannon@keygraph.io).
|
||||
Learn more on the [Keygraph website](https://keygraph.io), read the [Keygraph platform technical overview](docs/keygraph-platform.md), start a free trial or book a [demo](https://cal.com/team/keygraph/shannon-pro), or contact [shannon@keygraph.io](mailto:shannon@keygraph.io).
|
||||
|
||||
## Architecture
|
||||
|
||||
Shannon Lite uses a multi-agent workflow that combines source-code analysis with live exploitation:
|
||||
Shannon uses a multi-agent workflow that combines source-code analysis with live exploitation:
|
||||
|
||||
```text
|
||||
┌──────────────────────┐
|
||||
@@ -203,37 +195,41 @@ Use these guides for operational detail:
|
||||
| --- | --- |
|
||||
| [Source build and CLI commands](docs/development.md) | Cloning, building, common commands, output paths, and local development. |
|
||||
| [Configuration](docs/configuration.md) | Authenticated testing, login flows, rules of engagement, report filters, and rate-limit settings. |
|
||||
| [AI providers](docs/ai-providers.md) | Anthropic, AWS Bedrock, and custom Anthropic-compatible endpoints. |
|
||||
| [AI providers](docs/ai-providers.md) | Anthropic, AWS Bedrock, Google Vertex AI, and custom Anthropic-compatible endpoints. |
|
||||
| [Platforms and networking](docs/platforms.md) | Windows/WSL2, Linux, macOS, Docker networking, local apps, and custom hostnames. |
|
||||
| [Workspaces and resuming](docs/workspaces.md) | Naming workspaces, resuming interrupted scans, and workspace storage. |
|
||||
| [Safety and limitations](docs/safety.md) | Authorized-use requirements, non-production guidance, mutative effects, cost, and model caveats. |
|
||||
| [Coverage and roadmap](docs/coverage-roadmap.md) | Current vulnerability coverage and planned work. |
|
||||
| [Shannon Pro](docs/shannon-pro.md) | Commercial platform, black-box and white-box pentesting, full lifecycle workflows, and enterprise deployment. |
|
||||
| [Keygraph platform](docs/keygraph-platform.md) | The continuous, agentic pentesting platform: code analysis, black-box and white-box testing, finding management, remediation, verification, and enterprise deployment. |
|
||||
|
||||
## Safety, Scope, and Limitations
|
||||
|
||||
Shannon Lite is not a passive scanner. Its exploitation agents can create users, submit forms, mutate application state, trigger outbound requests, and otherwise affect the target system. Use sandboxed, staging, or local development environments with disposable data.
|
||||
Shannon is not a passive scanner. Its exploitation agents can create users, submit forms, mutate application state, trigger outbound requests, and otherwise affect the target system. Use sandboxed, staging, or local development environments with disposable data.
|
||||
|
||||
You are responsible for using Shannon Lite legally and ethically. Do not point Shannon Lite at systems, repositories, or applications you do not own or do not have explicit authorization to test.
|
||||
You are responsible for using Shannon legally and ethically. Do not point Shannon at systems, repositories, or applications you do not own or do not have explicit authorization to test.
|
||||
|
||||
Important limitations:
|
||||
|
||||
- Shannon Lite focuses on actively exploitable issues such as Injection, XSS, SSRF, Broken Authentication, and Broken Authorization. Broader static-analysis findings, including vulnerable dependencies and insecure configurations, are a core focus of Shannon Pro.
|
||||
- Shannon Open Source focuses on actively exploitable issues such as Injection, XSS, SSRF, Broken Authentication, and Broken Authorization. Broader static-analysis coverage, including vulnerable dependencies and insecure configurations, is delivered through the Keygraph platform.
|
||||
- Findings still require human review. LLM-generated reports can contain weakly supported or incorrect details.
|
||||
- Shannon Lite is officially supported with Claude models. Smaller, alternative, or proxied non-Claude models may be incomplete or unstable.
|
||||
- Shannon is officially supported with Claude models. Smaller, alternative, or proxied non-Claude models may be incomplete or unstable.
|
||||
- A full run can take roughly 1 to 1.5 hours and may incur LLM API costs depending on model pricing and application complexity.
|
||||
- Do not scan untrusted or adversarial codebases; AI-powered tools that read source code can be exposed to prompt injection.
|
||||
- Do not scan untrusted or adversarial codebases. AI-powered tools that read source code can be exposed to prompt injection.
|
||||
|
||||
Read the full [Safety and limitations](docs/safety.md) guide before running Shannon Lite in a new environment.
|
||||
Read the full [Safety and limitations](docs/safety.md) guide before running Shannon in a new environment.
|
||||
|
||||
## License and Enterprise Licensing
|
||||
|
||||
Shannon Lite is licensed under the [GNU Affero General Public License v3.0](LICENSE).
|
||||
Shannon Open Source is licensed under the [GNU Affero General Public License v3.0](LICENSE).
|
||||
|
||||
Commercial and enterprise licensing is available for organizations that need different license terms, commercial support, private redistribution, managed-service use, or broader deployment options.
|
||||
Commercial and enterprise licensing is available for organizations that need different license terms, commercial support, private redistribution, managed-service use, or broader deployment options, including the Keygraph platform.
|
||||
|
||||
For commercial licensing, contact [shannon@keygraph.io](mailto:shannon@keygraph.io).
|
||||
|
||||
## About Keygraph
|
||||
|
||||
**Keygraph** is the company behind Shannon. It also builds the **Keygraph platform**, the commercial agentic pentesting product that closes the full AppSec lifecycle and runs an enhanced build of Shannon as its pentesting engine.
|
||||
|
||||
## Community and Support
|
||||
|
||||
**Community office hours** are available for hands-on help with bugs, deployments, and configuration questions.
|
||||
@@ -242,7 +238,7 @@ For commercial licensing, contact [shannon@keygraph.io](mailto:shannon@keygraph.
|
||||
- Asia: Thursday, 2:00 PM IST
|
||||
- [Book a slot](https://cal.com/george-flores-keygraph/shannon-community-office-hours)
|
||||
|
||||
[Join Discord](https://discord.gg/cmctpMBXwE) to ask questions, share feedback, and connect with other Shannon Lite users.
|
||||
[Join Discord](https://discord.gg/cmctpMBXwE) to ask questions, share feedback, and connect with other Shannon users.
|
||||
|
||||
At this time, Keygraph is not accepting external code contributions. Issues are welcome for bug reports and feature requests:
|
||||
|
||||
@@ -276,10 +272,10 @@ This guide covers the source-build workflow, common CLI commands, repository pat
|
||||
|
||||
## Clone and Build
|
||||
|
||||
Use the source-build workflow if you want to run Shannon Lite from a local clone, modify the open-source CLI, or keep the worker image built locally.
|
||||
Use the source-build workflow if you want to run Shannon from a local clone, modify the open-source CLI, or keep the worker image built locally.
|
||||
|
||||
```bash
|
||||
# 1. Clone Shannon Lite.
|
||||
# 1. Clone Shannon.
|
||||
git clone https://github.com/KeygraphHQ/shannon.git
|
||||
cd shannon
|
||||
|
||||
@@ -298,17 +294,19 @@ At minimum, your `.env` file should include one supported AI provider credential
|
||||
|
||||
```bash
|
||||
ANTHROPIC_API_KEY=your-api-key
|
||||
CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000
|
||||
```
|
||||
|
||||
Environment variables can also be exported directly:
|
||||
|
||||
```bash
|
||||
export ANTHROPIC_API_KEY="your-api-key"
|
||||
export CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000
|
||||
```
|
||||
|
||||
## Prepare Your Repository
|
||||
|
||||
Shannon Lite can scan any repository on your machine. Pass an absolute or relative path with `-r`.
|
||||
Shannon can scan any repository on your machine. Pass an absolute or relative path with `-r`.
|
||||
|
||||
```bash
|
||||
npx @keygraph/shannon start -u https://example.com -r /path/to/repo
|
||||
@@ -339,7 +337,7 @@ Open the Temporal Web UI for detailed monitoring:
|
||||
open http://localhost:8233
|
||||
```
|
||||
|
||||
Stop Shannon Lite:
|
||||
Stop Shannon:
|
||||
|
||||
```bash
|
||||
npx @keygraph/shannon stop
|
||||
@@ -413,7 +411,7 @@ workspaces/{hostname}_{sessionId}/
|
||||
|
||||
# Configuration
|
||||
|
||||
Shannon Lite can run without a configuration file, but configuration enables authenticated testing, scope guidance, rules of engagement, report filtering, and rate-limit tuning.
|
||||
Shannon can run without a configuration file, but configuration enables authenticated testing, scope guidance, rules of engagement, report filtering, and rate-limit tuning.
|
||||
|
||||
## Credential Precedence
|
||||
|
||||
@@ -532,7 +530,7 @@ Supported placeholders:
|
||||
- `$email_password`
|
||||
- `$email_totp`
|
||||
|
||||
At runtime, Shannon Lite replaces these placeholders with the credentials passed in the config.
|
||||
At runtime, Shannon replaces these placeholders with the credentials passed in the config.
|
||||
|
||||
```yaml
|
||||
login_flow:
|
||||
@@ -569,7 +567,7 @@ pipeline:
|
||||
|
||||
# AI Providers
|
||||
|
||||
Shannon Lite works best with Claude models. Anthropic API keys are recommended for most users, and Shannon Lite also supports AWS Bedrock and custom Anthropic-compatible endpoints.
|
||||
Shannon works best with Claude models. Anthropic API keys are recommended for most users, and Shannon also supports AWS Bedrock, Google Vertex AI, and custom Anthropic-compatible endpoints.
|
||||
|
||||
## Anthropic
|
||||
|
||||
@@ -589,8 +587,11 @@ Source-build mode can use a `.env` file:
|
||||
|
||||
```bash
|
||||
ANTHROPIC_API_KEY=your-api-key
|
||||
CLAUDE_CODE_MAX_OUTPUT_TOKENS=64000
|
||||
```
|
||||
|
||||
Each tier can be pointed at any Claude model via `ANTHROPIC_SMALL_MODEL` / `ANTHROPIC_MEDIUM_MODEL` / `ANTHROPIC_LARGE_MODEL` (or the setup wizard). If you set a tier to `claude-fable-5`, note that Fable's safety classifiers route cybersecurity tasks to Opus 4.8, so those phases run on Opus 4.8 regardless.
|
||||
|
||||
## AWS Bedrock
|
||||
|
||||
Run `npx @keygraph/shannon setup` and select **AWS Bedrock**. The wizard prompts for region, bearer token, and model IDs.
|
||||
@@ -617,7 +618,7 @@ ANTHROPIC_MEDIUM_MODEL=us.anthropic.claude-sonnet-4-6
|
||||
ANTHROPIC_LARGE_MODEL=us.anthropic.claude-opus-4-8
|
||||
```
|
||||
|
||||
Shannon Lite uses three model tiers:
|
||||
Shannon uses three model tiers:
|
||||
|
||||
- **small** for summarization
|
||||
- **medium** for security analysis
|
||||
@@ -625,12 +626,44 @@ Shannon Lite uses three model tiers:
|
||||
|
||||
Set `ANTHROPIC_SMALL_MODEL`, `ANTHROPIC_MEDIUM_MODEL`, and `ANTHROPIC_LARGE_MODEL` to Bedrock model IDs available in your region.
|
||||
|
||||
## Google Vertex AI
|
||||
|
||||
Create a service account with the `roles/aiplatform.user` role in the GCP Console, then download a JSON key file.
|
||||
|
||||
Run `npx @keygraph/shannon setup` and select **Google Vertex AI**. The wizard prompts for region, project ID, service account key file path, and model IDs. The key file is copied to `~/.shannon/google-sa-key.json`.
|
||||
|
||||
Or export environment variables directly:
|
||||
|
||||
```bash
|
||||
export CLAUDE_CODE_USE_VERTEX=1
|
||||
export CLOUD_ML_REGION=us-east5
|
||||
export ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project-id
|
||||
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your-sa-key.json
|
||||
export ANTHROPIC_SMALL_MODEL=claude-haiku-4-5@20251001
|
||||
export ANTHROPIC_MEDIUM_MODEL=claude-sonnet-4-6
|
||||
export ANTHROPIC_LARGE_MODEL=claude-opus-4-8
|
||||
```
|
||||
|
||||
Source-build `.env` equivalent:
|
||||
|
||||
```bash
|
||||
CLAUDE_CODE_USE_VERTEX=1
|
||||
CLOUD_ML_REGION=us-east5
|
||||
ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project-id
|
||||
GOOGLE_APPLICATION_CREDENTIALS=./credentials/google-sa-key.json
|
||||
ANTHROPIC_SMALL_MODEL=claude-haiku-4-5@20251001
|
||||
ANTHROPIC_MEDIUM_MODEL=claude-sonnet-4-6
|
||||
ANTHROPIC_LARGE_MODEL=claude-opus-4-8
|
||||
```
|
||||
|
||||
Set `CLOUD_ML_REGION=global` for global endpoints, or use a specific region like `us-east5`. Some models may not be available on global endpoints.
|
||||
|
||||
## Custom Base URL
|
||||
|
||||
Shannon Lite supports pointing the SDK at an Anthropic-compatible endpoint with `ANTHROPIC_BASE_URL`. For proxy-based routing, use an LLM proxy such as LiteLLM configured to expose an Anthropic-compatible endpoint.
|
||||
Shannon supports pointing the SDK at an Anthropic-compatible endpoint with `ANTHROPIC_BASE_URL`. For proxy-based routing, use an LLM proxy such as LiteLLM configured to expose an Anthropic-compatible endpoint.
|
||||
|
||||
> [!IMPORTANT]
|
||||
> Only Claude models are officially supported. Shannon Lite's evaluations, internal testing, and agent harness are optimized for Claude. Smaller or alternative models, including non-Claude models routed through a proxy, may not reliably follow Shannon Lite's instructions or tool-use constraints. Use them at your own risk.
|
||||
> Only Claude models are officially supported. Shannon's evaluations, internal testing, and agent harness are optimized for Claude. Smaller or alternative models, including non-Claude models routed through a proxy, may not reliably follow Shannon's instructions or tool-use constraints. Use them at your own risk.
|
||||
|
||||
The experimental `claude-code-router` integration is being removed. If you rely on it, migrate to an Anthropic-compatible proxy such as LiteLLM before upgrading.
|
||||
|
||||
@@ -664,7 +697,7 @@ This guide covers platform-specific notes and Docker networking behavior.
|
||||
|
||||
## Windows
|
||||
|
||||
Shannon Lite on Windows is supported through WSL2. Native Windows, including Git Bash, is not supported.
|
||||
Shannon on Windows is supported through WSL2. Native Windows, including Git Bash, is not supported.
|
||||
|
||||
### Ensure WSL2
|
||||
|
||||
@@ -685,7 +718,7 @@ wsl --set-version <distro-name> 2
|
||||
|
||||
Install Docker Desktop on Windows and enable the WSL2 backend under **Settings > General > Use the WSL 2 based engine**.
|
||||
|
||||
Run Shannon Lite inside WSL:
|
||||
Run Shannon inside WSL:
|
||||
|
||||
```bash
|
||||
npx @keygraph/shannon setup
|
||||
@@ -703,7 +736,7 @@ cp .env.example .env
|
||||
|
||||
To access the Temporal Web UI, run `ip addr` inside WSL to find your WSL IP address, then navigate to `http://<wsl-ip>:8233` in your Windows browser.
|
||||
|
||||
Windows Defender may flag exploit code in reports as false positives. Add an exclusion for the Shannon Lite directory or use Docker/WSL2 isolation.
|
||||
Windows Defender may flag exploit code in reports as false positives. Add an exclusion for the Shannon directory or use Docker/WSL2 isolation.
|
||||
|
||||
## Linux
|
||||
|
||||
@@ -729,7 +762,7 @@ Source-build equivalent:
|
||||
|
||||
## Custom Hostnames
|
||||
|
||||
If your local stack uses custom hostnames mapped in `/etc/hosts`, Shannon Lite forwards those entries into the worker container at scan start.
|
||||
If your local stack uses custom hostnames mapped in `/etc/hosts`, Shannon forwards those entries into the worker container at scan start.
|
||||
|
||||
To disable forwarding:
|
||||
|
||||
@@ -749,7 +782,7 @@ SHANNON_FORWARD_HOSTS=false
|
||||
|
||||
# Workspaces and Resuming
|
||||
|
||||
Shannon Lite uses workspaces to store scan state, logs, prompts, and deliverables. Workspaces allow interrupted or failed runs to resume without re-running completed agents.
|
||||
Shannon uses workspaces to store scan state, logs, prompts, and deliverables. Workspaces allow interrupted or failed runs to resume without re-running completed agents.
|
||||
|
||||
## How Workspaces Work
|
||||
|
||||
@@ -762,7 +795,7 @@ Shannon Lite uses workspaces to store scan state, logs, prompts, and deliverable
|
||||
- Each agent's progress is checkpointed so resumed runs can skip completed work.
|
||||
|
||||
> [!NOTE]
|
||||
> The URL must match the original workspace URL when resuming. Shannon Lite rejects mismatched URLs to prevent cross-target contamination.
|
||||
> The URL must match the original workspace URL when resuming. Shannon rejects mismatched URLs to prevent cross-target contamination.
|
||||
|
||||
## Examples
|
||||
|
||||
@@ -804,19 +837,19 @@ Source-build equivalents:
|
||||
|
||||
# Safety and Limitations
|
||||
|
||||
Read this before running Shannon Lite in a new environment.
|
||||
Read this before running Shannon in a new environment.
|
||||
|
||||
## Authorized Use Only
|
||||
|
||||
Shannon Lite is designed for legitimate security auditing. You must have explicit written authorization from the owner of the target system before running Shannon Lite.
|
||||
Shannon is designed for legitimate security auditing. You must have explicit written authorization from the owner of the target system before running Shannon.
|
||||
|
||||
Unauthorized scanning or exploitation of systems you do not own is illegal. Keygraph is not responsible for misuse of Shannon Lite.
|
||||
Unauthorized scanning or exploitation of systems you do not own is illegal. Keygraph is not responsible for misuse of Shannon.
|
||||
|
||||
## Do Not Run on Production
|
||||
|
||||
Shannon Lite is not a passive scanner. Exploitation agents actively execute attacks to confirm vulnerabilities. This can mutate application state and data.
|
||||
Shannon is not a passive scanner. Exploitation agents actively execute attacks to confirm vulnerabilities. This can mutate application state and data.
|
||||
|
||||
Do not run Shannon Lite against production systems. Use sandboxed, staging, or local development environments where data integrity is not a concern.
|
||||
Do not run Shannon against production systems. Use sandboxed, staging, or local development environments where data integrity is not a concern.
|
||||
|
||||
Potential mutative effects include:
|
||||
|
||||
@@ -827,17 +860,17 @@ Potential mutative effects include:
|
||||
- Generating unexpected outbound traffic
|
||||
- Writing exploit artifacts to reports or deliverables
|
||||
|
||||
For maximum isolation, run Shannon Lite inside a disposable virtual machine.
|
||||
For maximum isolation, run Shannon inside a disposable virtual machine.
|
||||
|
||||
## LLM and Automation Caveats
|
||||
|
||||
- **Verification is required**: Shannon Lite uses a proof-by-exploitation methodology, but final reports can still contain weakly supported or incorrect details. Human review is essential.
|
||||
- **Model support**: Shannon Lite is officially supported only with Claude models. Alternative models may be incomplete, inaccurate, or unstable.
|
||||
- **Prompt injection risk**: Do not point Shannon Lite at untrusted or adversarial codebases. AI-powered tools that read source code can be influenced by malicious repository content.
|
||||
- **Verification is required**: Shannon uses a proof-by-exploitation methodology, but final reports can still contain weakly supported or incorrect details. Human review is essential.
|
||||
- **Model support**: Shannon is officially supported only with Claude models. Alternative models may be incomplete, inaccurate, or unstable.
|
||||
- **Prompt injection risk**: Do not point Shannon at untrusted or adversarial codebases. AI-powered tools that read source code can be influenced by malicious repository content.
|
||||
|
||||
## Scope of Analysis
|
||||
|
||||
Shannon Lite currently targets exploitable vulnerabilities in these classes:
|
||||
Shannon currently targets exploitable vulnerabilities in these classes:
|
||||
|
||||
- Broken Authentication
|
||||
- Broken Authorization
|
||||
@@ -845,9 +878,9 @@ Shannon Lite currently targets exploitable vulnerabilities in these classes:
|
||||
- Cross-Site Scripting
|
||||
- Server-Side Request Forgery
|
||||
|
||||
Shannon Lite's proof-by-exploitation model means it does not report issues it cannot actively exploit, such as many vulnerable dependency, insecure configuration, or broad policy findings.
|
||||
Shannon's proof-by-exploitation model means it does not report issues it cannot actively exploit, such as many vulnerable dependency, insecure configuration, or broad policy findings.
|
||||
|
||||
For broader coverage, Shannon Pro adds black-box and white-box agentic pentesting, graph-based static analysis, SCA reachability, secrets detection, business logic testing, remediation workflows, SLA tracking, and reporting dashboards.
|
||||
For broader coverage, the Keygraph platform adds black-box and white-box agentic pentesting, graph-based static analysis, SCA reachability, secrets detection, business logic testing, remediation workflows, SLA tracking, and reporting dashboards.
|
||||
|
||||
## Cost and Performance
|
||||
|
||||
@@ -861,9 +894,9 @@ If you use subscription-based model access, consider the rate-limit guidance in
|
||||
|
||||
# Coverage and Roadmap
|
||||
|
||||
Shannon Lite focuses on exploitable findings that can be validated against a running application.
|
||||
Shannon focuses on exploitable findings that can be validated against a running application.
|
||||
|
||||
## Current Shannon Lite Coverage
|
||||
## Current Shannon Coverage
|
||||
|
||||
- Broken Authentication
|
||||
- Broken Authorization
|
||||
@@ -873,29 +906,29 @@ Shannon Lite focuses on exploitable findings that can be validated against a run
|
||||
|
||||
## Reporting Philosophy
|
||||
|
||||
Shannon Lite follows a proof-by-exploitation model. Findings that cannot be demonstrated with a working proof of concept are not included in the final report.
|
||||
Shannon follows a proof-by-exploitation model. Findings that cannot be demonstrated with a working proof of concept are not included in the final report.
|
||||
|
||||
This reduces speculative noise, but it also means Shannon Lite does not aim to report every possible security issue in a repository. In particular, many dependency, policy, configuration, and broad static-analysis findings are outside the core Shannon Lite workflow.
|
||||
This reduces speculative noise, but it also means Shannon does not aim to report every possible security issue in a repository. In particular, many dependency, policy, configuration, and broad static-analysis findings are outside the core Shannon workflow.
|
||||
|
||||
## Roadmap Direction
|
||||
|
||||
Planned coverage areas should continue to live in the repository's canonical roadmap document if one exists. The README should link to that document rather than carrying detailed roadmap history inline.
|
||||
|
||||
For organizations that need broader static and organizational coverage now, see [Shannon Pro](shannon-pro.md).
|
||||
For organizations that need broader static and organizational coverage now, see [the Keygraph platform](keygraph-platform.md).
|
||||
|
||||
---
|
||||
|
||||
# File: docs/shannon-pro.md
|
||||
# File: docs/keygraph-platform.md
|
||||
|
||||
# Shannon Pro
|
||||
# Keygraph Platform
|
||||
|
||||
Shannon Pro is Keygraph's commercial continuous pentesting and AppSec platform for teams running security across many repositories, services, and environments. While Shannon Lite is a local white-box pentesting CLI, Shannon Pro is a full platform: it combines parsed-code SAST, source-to-sink analysis, black-box and white-box agentic pentesting, verified remediation, CI/CD gating, SLA tracking, and reporting for security and compliance teams.
|
||||
The Keygraph platform is Keygraph's commercial continuous pentesting and AppSec platform for teams running security across many repositories, services, and environments. While Shannon is a local white-box pentesting CLI, the Keygraph platform is a complete AppSec system: it combines parsed-code SAST, source-to-sink analysis, black-box and white-box agentic pentesting, verified remediation, CI/CD gating, SLA tracking, and reporting for security and compliance teams.
|
||||
|
||||
This repository contains Shannon Lite, the AGPL-3.0 open-source CLI for strictly white-box pentesting. Shannon Pro supports both white-box and black-box agentic pentesting and adds static analysis, finding management, remediation workflows, reporting, and enterprise deployment options.
|
||||
This repository contains Shannon, the AGPL-3.0 open-source CLI for strictly white-box pentesting. The Keygraph platform supports both white-box and black-box agentic pentesting and adds static analysis, finding management, remediation workflows, reporting, and enterprise deployment options.
|
||||
|
||||
## Who Should Consider Shannon Pro
|
||||
## Who Should Consider the Keygraph Platform
|
||||
|
||||
Shannon Pro is intended for organizations that need:
|
||||
The Keygraph platform is intended for organizations that need:
|
||||
|
||||
- Continuous AppSec coverage across many repositories and services
|
||||
- White-box pentesting when source code is available
|
||||
@@ -910,7 +943,7 @@ Shannon Pro is intended for organizations that need:
|
||||
|
||||
## Full Vulnerability Lifecycle
|
||||
|
||||
Shannon Pro is designed to cover the full vulnerability lifecycle, not only discovery:
|
||||
The Keygraph platform is designed to cover the full vulnerability lifecycle, not only discovery:
|
||||
|
||||
1. **Find** exploitable issues with white-box pentesting, black-box pentesting, SAST, SCA, secrets, IaC, container, and business logic testing.
|
||||
2. **Normalize** results into canonical findings so duplicate scanner outputs become one tracked vulnerability per repository.
|
||||
@@ -923,9 +956,9 @@ Shannon Pro is designed to cover the full vulnerability lifecycle, not only disc
|
||||
|
||||
## Pentesting Modes
|
||||
|
||||
Shannon Lite is strictly white-box: it requires access to the target application's source code and repository layout.
|
||||
Shannon is strictly white-box: it requires access to the target application's source code and repository layout.
|
||||
|
||||
Shannon Pro supports two pentesting modes:
|
||||
The Keygraph platform supports two pentesting modes:
|
||||
|
||||
- **White-box agentic pentesting**: Agents use source-code context to understand architecture, identify realistic attack paths, and validate exploitability against the running application.
|
||||
- **Black-box agentic pentesting**: Agents test deployed applications and APIs without source-code access, useful for third-party surfaces, production-like external validation, or environments where source access is unavailable.
|
||||
@@ -934,7 +967,7 @@ Both modes follow the same core principle: do not report what might be vulnerabl
|
||||
|
||||
## AppSec Coverage
|
||||
|
||||
Shannon Pro combines agentic pentesting with broader AppSec coverage:
|
||||
The Keygraph platform combines agentic pentesting with broader AppSec coverage:
|
||||
|
||||
- **Agentic SAST**: Code Property Graph analysis with LLM reasoning for data flow, context, and sanitization decisions.
|
||||
- **SCA with reachability**: Dependency vulnerability analysis that prioritizes issues reachable from application entry points.
|
||||
@@ -951,7 +984,7 @@ The result is a finding with proof of exploitability, source context when availa
|
||||
|
||||
## Enterprise Deployment
|
||||
|
||||
Shannon Pro supports enterprise deployment patterns for teams with strict data, model, and network requirements:
|
||||
The Keygraph platform supports enterprise deployment patterns for teams with strict data, model, and network requirements:
|
||||
|
||||
- **Self-hosted deployments** inside the customer's cloud or infrastructure
|
||||
- **Air-gapped deployments** for isolated environments
|
||||
@@ -964,7 +997,7 @@ Deployments can be designed so source code, scan results, prompts, completions,
|
||||
|
||||
## Capability Comparison
|
||||
|
||||
| Need | Shannon Lite | Shannon Pro |
|
||||
| Need | Shannon | Keygraph platform |
|
||||
| --- | --- | --- |
|
||||
| Licensing | AGPL-3.0 | Commercial |
|
||||
| White-box pentesting | Yes; source code required | Yes; source-aware testing with platform workflows |
|
||||
@@ -980,4 +1013,4 @@ Deployments can be designed so source code, scan results, prompts, completions,
|
||||
|
||||
## Contact
|
||||
|
||||
Learn more on the [Keygraph website](https://keygraph.io), start a free trial, book a [Shannon Pro demo](https://cal.com/team/keygraph/shannon-pro), or contact [shannon@keygraph.io](mailto:shannon@keygraph.io).
|
||||
Learn more on the [Keygraph website](https://keygraph.io), start a free trial, book a [Keygraph demo](https://cal.com/team/keygraph/shannon-pro), or contact [shannon@keygraph.io](mailto:shannon@keygraph.io).
|
||||
|
||||
@@ -1,36 +1,36 @@
|
||||
# Shannon
|
||||
|
||||
> Shannon is an autonomous AI pentesting project by Keygraph. This repository contains Shannon Lite, the AGPL-3.0 open-source white-box pentesting CLI. Shannon Pro is Keygraph's commercial continuous pentesting and AppSec platform.
|
||||
> Shannon is an autonomous AI pentesting project by Keygraph. This repository contains Shannon, the AGPL-3.0 open-source white-box pentesting CLI. The Keygraph platform is Keygraph's commercial continuous pentesting and AppSec platform.
|
||||
|
||||
Use this file as the concise entry point for AI agents and LLMs reading this repository. For a single combined context file, use [llms-full.txt](llms-full.txt).
|
||||
|
||||
## Start Here
|
||||
|
||||
- [README](README.md): Main project overview, product line, quick start, Shannon Lite capabilities, Shannon Pro positioning, safety notes, licensing, and support links.
|
||||
- [README](README.md): Main project overview, editions, quick start, Shannon capabilities, Keygraph platform positioning, safety notes, licensing, and support links.
|
||||
- [Full Combined Context](llms-full.txt): README and documentation combined into one file for agents that need maximum local context.
|
||||
|
||||
## Shannon Lite
|
||||
## Shannon
|
||||
|
||||
- [Development](docs/development.md): Source-build workflow, common CLI commands, repository paths, and output locations.
|
||||
- [Configuration](docs/configuration.md): Authenticated testing, login flows, rules of engagement, report filters, credential precedence, adaptive thinking, and rate-limit settings.
|
||||
- [AI Providers](docs/ai-providers.md): Anthropic, AWS Bedrock, and custom Anthropic-compatible endpoint setup.
|
||||
- [AI Providers](docs/ai-providers.md): Anthropic, AWS Bedrock, Google Vertex AI, and custom Anthropic-compatible endpoint setup.
|
||||
- [Platforms and Networking](docs/platforms.md): Windows/WSL2, Linux, macOS, Docker networking, local applications, and custom hostnames.
|
||||
- [Workspaces and Resuming](docs/workspaces.md): Workspace storage, naming, resuming interrupted scans, and examples.
|
||||
- [Safety and Limitations](docs/safety.md): Authorized-use requirements, non-production guidance, mutative effects, model caveats, scope limits, cost, and performance.
|
||||
- [Coverage and Roadmap](docs/coverage-roadmap.md): Current Shannon Lite coverage and roadmap direction.
|
||||
- [Coverage and Roadmap](docs/coverage-roadmap.md): Current Shannon coverage and roadmap direction.
|
||||
|
||||
## Shannon Pro
|
||||
## Keygraph Platform
|
||||
|
||||
- [Shannon Pro](docs/shannon-pro.md): Commercial continuous pentesting and AppSec platform, including black-box and white-box pentesting, parsed-code SAST, source-to-sink analysis, remediation workflows, CI/CD gating, SLA tracking, reporting, and enterprise deployment.
|
||||
- [Keygraph platform](docs/keygraph-platform.md): Commercial continuous pentesting and AppSec platform, including black-box and white-box pentesting, parsed-code SAST, source-to-sink analysis, remediation workflows, CI/CD gating, SLA tracking, reporting, and enterprise deployment.
|
||||
|
||||
## External Links
|
||||
|
||||
- [Keygraph website](https://keygraph.io): Company and commercial product information.
|
||||
- [Shannon Pro demo](https://cal.com/team/keygraph/shannon-pro): Demo and trial contact path.
|
||||
- [Keygraph demo](https://cal.com/team/keygraph/shannon-pro): Demo and trial contact path.
|
||||
- [Community Discord](https://discord.gg/cmctpMBXwE): Community support and discussion.
|
||||
|
||||
## Optional
|
||||
|
||||
- [Sample Juice Shop report](sample-reports/shannon-report-juice-shop.md): Shannon Lite sample report for OWASP Juice Shop.
|
||||
- [Sample c{api}tal API report](sample-reports/shannon-report-capital-api.md): Shannon Lite sample report for c{api}tal API.
|
||||
- [Sample crAPI report](sample-reports/shannon-report-crapi.md): Shannon Lite sample report for OWASP crAPI.
|
||||
- [Sample Juice Shop report](sample-reports/shannon-report-juice-shop.md): Shannon sample report for OWASP Juice Shop.
|
||||
- [Sample c{api}tal API report](sample-reports/shannon-report-capital-api.md): Shannon sample report for c{api}tal API.
|
||||
- [Sample crAPI report](sample-reports/shannon-report-crapi.md): Shannon sample report for OWASP crAPI.
|
||||
|
||||
Generated
+145
-1254
File diff suppressed because it is too large
Load Diff
@@ -1,2 +1,5 @@
|
||||
packages:
|
||||
- "apps/*"
|
||||
|
||||
catalog:
|
||||
"@anthropic-ai/claude-agent-sdk": ^0.3.173
|
||||
|
||||
Reference in New Issue
Block a user