fix: add file_path parameter to save_deliverable for large reports (#123)

* fix: add file_path parameter to save_deliverable for large reports

Large deliverable reports can exceed output token limits when passed as
inline content. This change allows agents to write reports to disk first
and pass a file_path instead.

Changes:
- Add file_path parameter to save_deliverable MCP tool with path
  traversal protection
- Pass CLAUDE_CODE_MAX_OUTPUT_TOKENS env var to SDK subprocesses
- Fix false positive error detection by extracting only text content
  (not tool_use JSON) when checking for API errors
- Update all prompts to instruct agents to use file_path for large
  reports and stop immediately after completion

* docs: simplify and condense CLAUDE.md

Reduce verbosity while preserving all essential information for AI
assistance. Makes the documentation more scannable and focused.

* feat: add issue number detection to pr command

The /pr command now automatically detects issue numbers from:
1. Explicit arguments (e.g., /pr 123 or /pr 123,456)
2. Branch name patterns (e.g., fix/123-bug, issue-456-feature)

Adds "Closes #X" lines to PR body to auto-close issues on merge.

* chore: remove CLAUDE_CODE_MAX_OUTPUT_TOKENS env var handling

No longer needed with the new Claude Agent SDK version.

* fix: restore max_output_tokens error handling
This commit is contained in:
Arjun Malleswaran
2026-02-11 13:40:49 -08:00
committed by GitHub
parent 1710bd93f7
commit ae4c4ed402
17 changed files with 293 additions and 328 deletions

View File

@@ -4,10 +4,21 @@ description: Create a PR to main branch using conventional commit style for the
Create a pull request from the current branch to the `main` branch. Create a pull request from the current branch to the `main` branch.
## Arguments
The user may provide issue numbers that this PR fixes: `$ARGUMENTS`
- If provided (e.g., `123` or `123,456`), use these issue numbers
- If not provided, check the branch name for issue numbers (e.g., `fix/123-bug` or `issue-456-feature` → extract `123` or `456`)
- If no issues are found, omit the "Closes" section
## Steps
First, analyze the current branch to understand what changes have been made: First, analyze the current branch to understand what changes have been made:
1. Run `git log --oneline -10` to see recent commit history and understand commit style 1. Run `git log --oneline -10` to see recent commit history and understand commit style
2. Run `git log main..HEAD --oneline` to see all commits on this branch that will be included in the PR 2. Run `git log main..HEAD --oneline` to see all commits on this branch that will be included in the PR
3. Run `git diff main...HEAD --stat` to see a summary of file changes 3. Run `git diff main...HEAD --stat` to see a summary of file changes
4. Run `git branch --show-current` to get the branch name for issue detection (if no explicit issues provided)
Then generate a PR title that: Then generate a PR title that:
- Follows conventional commit format (e.g., `fix:`, `feat:`, `chore:`, `refactor:`) - Follows conventional commit format (e.g., `fix:`, `feat:`, `chore:`, `refactor:`)
@@ -16,17 +27,24 @@ Then generate a PR title that:
Generate a PR body with: Generate a PR body with:
- A `## Summary` section with 1-3 bullet points describing the changes - A `## Summary` section with 1-3 bullet points describing the changes
- A `Closes #X` line for each issue number (if any were provided or detected from branch name)
Finally, create the PR using the gh CLI: Finally, create the PR using the gh CLI:
``` ```
gh pr create --base main --title "<generated title>" --body "$(cat <<'EOF' gh pr create --base main --title "<generated title>" --body "$(cat <<'EOF'
## Summary ## Summary
<bullet points> <bullet points>
Closes #<issue1>
Closes #<issue2>
EOF EOF
)" )"
``` ```
Note: Omit the "Closes" lines entirely if no issues are associated with this PR.
IMPORTANT: IMPORTANT:
- Do NOT include any Claude Code attribution in the PR - Do NOT include any Claude Code attribution in the PR
- Keep the summary concise (1-3 bullet points maximum) - Keep the summary concise (1-3 bullet points maximum)
- Use the conventional commit prefix that best matches the changes (fix, feat, chore, refactor, docs, etc.) - Use the conventional commit prefix that best matches the changes (fix, feat, chore, refactor, docs, etc.)
- The `Closes #X` syntax will automatically close the referenced issues when the PR is merged

362
CLAUDE.md
View File

@@ -1,321 +1,133 @@
# CLAUDE.md # CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. AI-powered penetration testing agent for defensive security analysis. Automates vulnerability assessment by combining reconnaissance tools with AI-powered code analysis.
## Overview
This is an AI-powered penetration testing agent designed for defensive security analysis. The tool automates vulnerability assessment by combining external reconnaissance tools with AI-powered code analysis to identify security weaknesses in web applications and their source code.
## Commands ## Commands
### Prerequisites **Prerequisites:** Docker, Anthropic API key in `.env`
- **Docker** - Container runtime
- **Anthropic API key** - Set in `.env` file
### Running the Penetration Testing Agent (Docker + Temporal)
```bash ```bash
# Configure credentials # Setup
cp .env.example .env cp .env.example .env && edit .env # Set ANTHROPIC_API_KEY
# Edit .env:
# ANTHROPIC_API_KEY=your-key
# Start a pentest workflow # Prepare repo (REPO is a folder name inside ./repos/, not an absolute path)
./shannon start URL=<url> REPO=<name> git clone https://github.com/org/repo.git ./repos/my-repo
``` # or symlink: ln -s /path/to/existing/repo ./repos/my-repo
Examples: # Run
```bash ./shannon start URL=<url> REPO=my-repo
./shannon start URL=https://example.com REPO=repo-name ./shannon start URL=<url> REPO=my-repo CONFIG=./configs/my-config.yaml
./shannon start URL=https://example.com REPO=repo-name CONFIG=./configs/my-config.yaml
./shannon start URL=https://example.com REPO=repo-name OUTPUT=./my-reports
```
### Monitoring Progress # Monitor
```bash ./shannon logs # Real-time worker logs
./shannon logs # View real-time worker logs ./shannon query ID=<workflow-id> # Query workflow progress
./shannon query ID=<workflow-id> # Query specific workflow progress # Temporal Web UI: http://localhost:8233
# Temporal Web UI available at http://localhost:8233
```
### Stopping Shannon # Stop
```bash ./shannon stop # Preserves workflow data
./shannon stop # Stop containers (preserves workflow data)
./shannon stop CLEAN=true # Full cleanup including volumes ./shannon stop CLEAN=true # Full cleanup including volumes
```
### Options # Build
```bash
CONFIG=<file> YAML configuration file for authentication and testing parameters
OUTPUT=<path> Custom output directory for session folder (default: ./audit-logs/)
PIPELINE_TESTING=true Use minimal prompts and fast retry intervals (10s instead of 5min)
REBUILD=true Force Docker rebuild with --no-cache (use when code changes aren't picked up)
ROUTER=true Route requests through claude-code-router for multi-model support
```
### Generate TOTP for Authentication
TOTP generation is handled automatically via the `generate_totp` MCP tool during authentication flows.
### Development Commands
```bash
# Build TypeScript
npm run build npm run build
# Run with pipeline testing mode (fast, minimal deliverables)
./shannon start URL=<url> REPO=<name> PIPELINE_TESTING=true
``` ```
## Architecture & Components **Options:** `CONFIG=<file>` (YAML config), `OUTPUT=<path>` (default: `./audit-logs/`), `PIPELINE_TESTING=true` (minimal prompts, 10s retries), `REBUILD=true` (force Docker rebuild), `ROUTER=true` (multi-model routing via [claude-code-router](https://github.com/musistudio/claude-code-router))
## Architecture
### Core Modules ### Core Modules
- `src/config-parser.ts` - Handles YAML configuration parsing, validation, and distribution to agents - `src/session-manager.ts` — Agent definitions, execution order, parallel groups
- `src/error-handling.ts` - Comprehensive error handling with retry logic and categorized error types - `src/ai/claude-executor.ts` Claude Agent SDK integration with retry logic and git checkpoints
- `src/tool-checker.ts` - Validates availability of external security tools before execution - `src/config-parser.ts` — YAML config parsing with JSON Schema validation
- `src/session-manager.ts` - Agent definitions, execution order, and parallel groups - `src/error-handling.ts` — Categorized error types (PentestError, ConfigError, NetworkError) with retry logic
- `src/queue-validation.ts` - Validates deliverables and agent prerequisites - `src/tool-checker.ts` Validates external security tool availability before execution
- `src/queue-validation.ts` — Deliverable validation and agent prerequisites
### Temporal Orchestration Layer ### Temporal Orchestration
Shannon uses Temporal for durable workflow orchestration: Durable workflow orchestration with crash recovery, queryable progress, intelligent retry, and parallel execution (5 concurrent agents in vuln/exploit phases).
- `src/temporal/shared.ts` - Types, interfaces, query definitions
- `src/temporal/workflows.ts` - Main workflow (pentestPipelineWorkflow)
- `src/temporal/activities.ts` - Activity implementations with heartbeats
- `src/temporal/worker.ts` - Worker process entry point
- `src/temporal/client.ts` - CLI client for starting workflows
- `src/temporal/query.ts` - Query tool for progress inspection
Key features: - `src/temporal/workflows.ts` — Main workflow (`pentestPipelineWorkflow`)
- **Crash recovery** - Workflows resume automatically after worker restart - `src/temporal/activities.ts` — Activity implementations with heartbeats
- **Queryable progress** - Real-time status via `./shannon query` or Temporal Web UI - `src/temporal/worker.ts` — Worker entry point
- **Intelligent retry** - Distinguishes transient vs permanent errors - `src/temporal/client.ts` — CLI client for starting workflows
- **Parallel execution** - 5 concurrent agents in vulnerability/exploitation phases - `src/temporal/shared.ts` — Types, interfaces, query definitions
- `src/temporal/query.ts` — Query tool for progress inspection
### Five-Phase Testing Workflow ### Five-Phase Pipeline
1. **Pre-Reconnaissance** (`pre-recon`) - External tool scans (nmap, subfinder, whatweb) + source code analysis 1. **Pre-Recon** (`pre-recon`) External scans (nmap, subfinder, whatweb) + source code analysis
2. **Reconnaissance** (`recon`) - Analysis of initial findings and attack surface mapping 2. **Recon** (`recon`) Attack surface mapping from initial findings
3. **Vulnerability Analysis** (5 agents run in parallel) 3. **Vulnerability Analysis** (5 parallel agents) — injection, xss, auth, authz, ssrf
- `injection-vuln` - SQL injection, command injection 4. **Exploitation** (5 parallel agents, conditional) — Exploits confirmed vulnerabilities
- `xss-vuln` - Cross-site scripting 5. **Reporting** (`report`) — Executive-level security report
- `auth-vuln` - Authentication bypasses
- `authz-vuln` - Authorization flaws
- `ssrf-vuln` - Server-side request forgery
4. **Exploitation** (5 agents run in parallel, only if vulnerabilities found)
- `injection-exploit` - Exploit injection vulnerabilities
- `xss-exploit` - Exploit XSS vulnerabilities
- `auth-exploit` - Exploit authentication issues
- `authz-exploit` - Exploit authorization flaws
- `ssrf-exploit` - Exploit SSRF vulnerabilities
5. **Reporting** (`report`) - Executive-level security report generation
### Configuration System
The agent supports YAML configuration files with JSON Schema validation:
- `configs/config-schema.json` - JSON Schema for configuration validation
- `configs/example-config.yaml` - Template configuration file
- `configs/juice-shop-config.yaml` - Example configuration for OWASP Juice Shop
- `configs/keygraph-config.yaml` - Configuration for Keygraph applications
- `configs/chatwoot-config.yaml` - Configuration for Chatwoot applications
- `configs/metabase-config.yaml` - Configuration for Metabase applications
- `configs/cal-com-config.yaml` - Configuration for Cal.com applications
Configuration includes:
- Authentication settings (form, SSO, API, basic auth)
- Multi-factor authentication with TOTP support
- Custom login flow instructions
- Application-specific testing parameters
### Prompt Templates
The `prompts/` directory contains specialized prompt templates for each testing phase:
- `pre-recon-code.txt` - Initial code analysis prompts
- `recon.txt` - Reconnaissance analysis prompts
- `vuln-*.txt` - Vulnerability assessment prompts (injection, XSS, auth, authz, SSRF)
- `exploit-*.txt` - Exploitation attempt prompts
- `report-executive.txt` - Executive report generation prompts
### Claude Agent SDK Integration
The agent uses the `@anthropic-ai/claude-agent-sdk` with maximum autonomy configuration:
- `maxTurns: 10_000` - Allows extensive autonomous analysis
- `permissionMode: 'bypassPermissions'` - Full system access for thorough testing
- Playwright MCP integration for web browser automation
- Working directory set to target local repository
- Configuration context injection for authenticated testing
### Authentication & Login Resources
- `prompts/shared/login-instructions.txt` - Login flow template for all agents
- TOTP token generation via MCP `generate_totp` tool
- Support for multi-factor authentication workflows
- Configurable authentication mechanisms (form, SSO, API, basic)
### Output & Deliverables
All analysis results are saved to the `deliverables/` directory within the target local repository, including:
- Pre-reconnaissance reports with external scan results
- Vulnerability assessment findings
- Exploitation attempt results
- Executive-level security reports with business impact analysis
### External Tool Dependencies
The agent integrates with external security tools:
- `nmap` - Network port scanning
- `subfinder` - Subdomain discovery
- `whatweb` - Web technology fingerprinting
Tools are validated for availability before execution using the tool-checker module.
### Audit & Metrics System
The agent implements a crash-safe audit system with the following features:
**Architecture:**
- **audit-logs/** (or custom `--output` path): Centralized metrics and forensic logs
- `{hostname}_{sessionId}/session.json` - Comprehensive metrics with attempt-level detail
- `{hostname}_{sessionId}/prompts/` - Exact prompts used for reproducibility
- `{hostname}_{sessionId}/agents/` - Turn-by-turn execution logs
- `{hostname}_{sessionId}/deliverables/` - Security reports and findings
**Crash Safety:**
- Append-only logging with immediate flush (survives kill -9)
- Atomic writes for session.json (no partial writes)
- Event-based logging (tool_start, tool_end, llm_response)
**Concurrency Safety:**
- SessionMutex prevents race conditions during parallel agent execution
- 5x faster execution with parallel vulnerability and exploitation phases
**Metrics & Reporting:**
- Phase-level and agent-level timing/cost aggregations
- Validation results integrated with metrics
### Supporting Systems
- **Configuration** — YAML configs in `configs/` with JSON Schema validation (`config-schema.json`). Supports auth settings, MFA/TOTP, and per-app testing parameters
- **Prompts** — Per-phase templates in `prompts/` with variable substitution (`{{TARGET_URL}}`, `{{CONFIG_CONTEXT}}`). Shared partials in `prompts/shared/` via `prompt-manager.ts`
- **SDK Integration** — Uses `@anthropic-ai/claude-agent-sdk` with `maxTurns: 10_000` and `bypassPermissions` mode. Playwright MCP for browser automation, TOTP generation via MCP tool. Login flow template at `prompts/shared/login-instructions.txt` supports form, SSO, API, and basic auth
- **Audit System** — Crash-safe append-only logging in `audit-logs/{hostname}_{sessionId}/`. Tracks session metrics, per-agent logs, prompts, and deliverables
- **Deliverables** — Saved to `deliverables/` in the target repo via the `save_deliverable` MCP tool
## Development Notes ## Development Notes
### Learning from Reference Implementations
A working POC exists at `/Users/arjunmalleswaran/Code/shannon-pocs` that demonstrates the ideal Temporal + Claude Agent SDK integration. When implementing Temporal features, agents can ask questions in the chat, and the user will relay them to another Claude Code session working in that POC directory.
**How to use this approach:**
1. When stuck or unsure about Temporal patterns, write a specific question in the chat
2. The user will ask an agent working on the POC to answer
3. The user relays the answer (code snippets, patterns, explanations) back
4. Apply the learned patterns to Shannon's codebase
**Example questions to ask:**
- "How does the POC structure its workflow to handle parallel activities?"
- "Show me how heartbeats are implemented in the POC's activities"
- "What retry configuration does the POC use for long-running agent activities?"
- "How does the POC integrate Claude Agent SDK calls within Temporal activities?"
**Reference implementation:**
- **Temporal + Claude Agent SDK**: `/Users/arjunmalleswaran/Code/shannon-pocs` - working implementation demonstrating workflows, activities, worker setup, and SDK integration
### Adding a New Agent ### Adding a New Agent
1. Define the agent in `src/session-manager.ts` (add to `AGENT_QUEUE` and appropriate parallel group) 1. Define agent in `src/session-manager.ts` (add to `AGENT_QUEUE` and parallel group)
2. Create prompt template in `prompts/` (e.g., `vuln-newtype.txt` or `exploit-newtype.txt`) 2. Create prompt template in `prompts/` (e.g., `vuln-newtype.txt`)
3. Add activity function in `src/temporal/activities.ts` 3. Add activity function in `src/temporal/activities.ts`
4. Register activity in `src/temporal/workflows.ts` within the appropriate phase 4. Register activity in `src/temporal/workflows.ts` within the appropriate phase
### Modifying Prompts ### Modifying Prompts
- Prompt templates use variable substitution: `{{TARGET_URL}}`, `{{CONFIG_CONTEXT}}`, `{{LOGIN_INSTRUCTIONS}}` - Variable substitution: `{{TARGET_URL}}`, `{{CONFIG_CONTEXT}}`, `{{LOGIN_INSTRUCTIONS}}`
- Shared partials in `prompts/shared/` are included via `prompt-manager.ts` - Shared partials in `prompts/shared/` included via `prompt-manager.ts`
- Test changes with `PIPELINE_TESTING=true` for faster iteration - Test with `PIPELINE_TESTING=true` for fast iteration
### Key Design Patterns ### Key Design Patterns
- **Configuration-Driven Architecture**: YAML configs with JSON Schema validation - **Configuration-Driven** — YAML configs with JSON Schema validation
- **Modular Error Handling**: Categorized error types with retry logic - **Progressive Analysis** — Each phase builds on previous results
- **SDK-First Approach**: Heavy reliance on Claude Agent SDK for autonomous AI operations - **SDK-First** — Claude Agent SDK handles autonomous analysis
- **Progressive Analysis**: Each phase builds on previous phase results - **Modular Error Handling** — Categorized errors with automatic retry (3 attempts per agent)
### Error Handling Strategy ### Security
The application uses a comprehensive error handling system with: Defensive security tool only. Use only on systems you own or have explicit permission to test.
- Categorized error types (PentestError, ConfigError, NetworkError, etc.)
- Automatic retry logic for transient failures (3 attempts per agent)
- Graceful degradation when external tools are unavailable
- Detailed error logging and user-friendly error messages
### Testing Mode ## Code Style Guidelines
The agent includes a testing mode that skips external tool execution for faster development cycles:
```bash
./shannon start URL=<url> REPO=<name> PIPELINE_TESTING=true
```
### Security Focus ### Clarity Over Brevity
This is explicitly designed as a **defensive security tool** for: - Optimize for readability, not line count — three clear lines beat one dense expression
- Vulnerability assessment - Use descriptive names that convey intent
- Security analysis - Prefer explicit logic over clever one-liners
- Penetration testing
- Security report generation
The tool should only be used on systems you own or have explicit permission to test. ### Structure
- Keep functions focused on a single responsibility
- Use early returns and guard clauses instead of deep nesting
- Never use nested ternary operators — use if/else or switch
- Extract complex conditions into well-named boolean variables
## Key Files & Directories ### TypeScript Conventions
- Use `function` keyword for top-level functions (not arrow functions)
- Explicit return type annotations on exported/top-level functions
- Prefer `readonly` for data that shouldn't be mutated
**Entry Points:** ### Avoid
- `src/temporal/workflows.ts` - Temporal workflow definition - Combining multiple concerns into a single function to "save lines"
- `src/temporal/activities.ts` - Activity implementations with heartbeats - Dense callback chains when sequential logic is clearer
- `src/temporal/worker.ts` - Worker process entry point - Sacrificing readability for DRY — some repetition is fine if clearer
- `src/temporal/client.ts` - CLI client for starting workflows - Abstractions for one-time operations
**Core Logic:** ## Key Files
- `src/session-manager.ts` - Agent definitions, execution order, parallel groups
- `src/ai/claude-executor.ts` - Claude Agent SDK integration
- `src/config-parser.ts` - YAML config parsing with JSON Schema validation
- `src/audit/` - Crash-safe logging and metrics system
**Configuration:** **Entry Points:** `src/temporal/workflows.ts`, `src/temporal/activities.ts`, `src/temporal/worker.ts`, `src/temporal/client.ts`
- `shannon` - CLI script for running pentests
- `docker-compose.yml` - Temporal server + worker containers
- `configs/` - YAML configs with `config-schema.json` for validation
- `configs/router-config.json` - Router service configuration for multi-model support
- `prompts/` - AI prompt templates (`vuln-*.txt`, `exploit-*.txt`, etc.)
**Output:** **Core Logic:** `src/session-manager.ts`, `src/ai/claude-executor.ts`, `src/config-parser.ts`, `src/audit/`
- `audit-logs/{hostname}_{sessionId}/` - Session metrics, agent logs, deliverables
### Router Mode (Multi-Model Support) **Config:** `shannon` (CLI), `docker-compose.yml`, `configs/`, `prompts/`
Shannon supports routing Claude Agent SDK requests through alternative LLM providers via [claude-code-router](https://github.com/musistudio/claude-code-router).
**Enable router mode:**
```bash
./shannon start URL=<url> REPO=<name> ROUTER=true
```
**Supported Providers:**
| Provider | Models | Use Case |
|----------|--------|----------|
| OpenAI | `gpt-5.2`, `gpt-5-mini` | Good tool use, balanced cost/performance |
| OpenRouter | `google/gemini-3-flash-preview` | Access to Gemini 3 models via single API |
**Configuration (in .env):**
```bash
# OpenAI
OPENAI_API_KEY=sk-your-key
ROUTER_DEFAULT=openai,gpt-5.2
# OpenRouter
OPENROUTER_API_KEY=sk-or-your-key
ROUTER_DEFAULT=openrouter,google/gemini-3-flash-preview
```
**Note:** Shannon is optimized for Anthropic's Claude models. Alternative providers are useful for cost savings during development but may produce varying results.
## Troubleshooting ## Troubleshooting
### Common Issues - **"Repository not found"** — `REPO` must be a folder name inside `./repos/`, not an absolute path. Clone or symlink your repo there first: `ln -s /path/to/repo ./repos/my-repo`
- **"Repository not found"**: Ensure target local directory exists and is accessible - **"Temporal not ready"** — Wait for health check or `docker compose logs temporal`
- **Worker not processing** — Check `docker compose ps`
### Temporal & Docker Issues - **Reset state** — `./shannon stop CLEAN=true`
- **"Temporal not ready"**: Wait for health check or run `docker compose logs temporal` - **Local apps unreachable** — Use `host.docker.internal` instead of `localhost`
- **Worker not processing**: Ensure worker container is running with `docker compose ps` - **Missing tools** — Use `PIPELINE_TESTING=true` to skip nmap/subfinder/whatweb (graceful degradation)
- **Reset workflow state**: `./shannon stop CLEAN=true` removes all Temporal data and volumes - **Container permissions** — On Linux, may need `sudo` for docker commands
- **Local apps unreachable**: Use `host.docker.internal` instead of `localhost` for URLs
- **Container permissions**: On Linux, may need `sudo` for docker commands
### External Tool Dependencies
Missing tools can be skipped using `PIPELINE_TESTING=true` mode during development:
- `nmap` - Network scanning
- `subfinder` - Subdomain discovery
- `whatweb` - Web technology detection
### Diagnostic & Utility Scripts
```bash
# View Temporal workflow history
open http://localhost:8233
```

View File

@@ -16,6 +16,8 @@
import { tool } from '@anthropic-ai/claude-agent-sdk'; import { tool } from '@anthropic-ai/claude-agent-sdk';
import { z } from 'zod'; import { z } from 'zod';
import fs from 'node:fs';
import path from 'node:path';
import { DeliverableType, DELIVERABLE_FILENAMES, isQueueType } from '../types/deliverables.js'; import { DeliverableType, DELIVERABLE_FILENAMES, isQueueType } from '../types/deliverables.js';
import { createToolResult, type ToolResult, type SaveDeliverableResponse } from '../types/tool-responses.js'; import { createToolResult, type ToolResult, type SaveDeliverableResponse } from '../types/tool-responses.js';
import { validateQueueJson } from '../validation/queue-validator.js'; import { validateQueueJson } from '../validation/queue-validator.js';
@@ -27,13 +29,68 @@ import { createValidationError, createGenericError } from '../utils/error-format
*/ */
export const SaveDeliverableInputSchema = z.object({ export const SaveDeliverableInputSchema = z.object({
deliverable_type: z.nativeEnum(DeliverableType).describe('Type of deliverable to save'), deliverable_type: z.nativeEnum(DeliverableType).describe('Type of deliverable to save'),
content: z.string().min(1).describe('File content (markdown for analysis/evidence, JSON for queues)'), content: z.string().min(1).optional().describe('File content (markdown for analysis/evidence, JSON for queues). Optional if file_path is provided.'),
file_path: z.string().optional().describe('Path to a file whose contents should be used as the deliverable content. Relative paths are resolved against the deliverables directory. Use this instead of content for large reports to avoid output token limits.'),
}); });
export type SaveDeliverableInput = z.infer<typeof SaveDeliverableInputSchema>; export type SaveDeliverableInput = z.infer<typeof SaveDeliverableInputSchema>;
/** /**
* Create save_deliverable handler with targetDir captured in closure * Check if a path is contained within a base directory.
* Prevents path traversal attacks (e.g., ../../../etc/passwd).
*/
function isPathContained(basePath: string, targetPath: string): boolean {
const resolvedBase = path.resolve(basePath);
const resolvedTarget = path.resolve(targetPath);
return resolvedTarget === resolvedBase || resolvedTarget.startsWith(resolvedBase + path.sep);
}
/**
* Resolve deliverable content from either inline content or a file path.
* Returns the content string on success, or a ToolResult error on failure.
*/
function resolveContent(
args: SaveDeliverableInput,
targetDir: string,
): string | ToolResult {
if (args.content) {
return args.content;
}
if (!args.file_path) {
return createToolResult(createValidationError(
'Either "content" or "file_path" must be provided',
true,
{ deliverableType: args.deliverable_type },
));
}
const resolvedPath = path.isAbsolute(args.file_path)
? args.file_path
: path.resolve(targetDir, args.file_path);
// Security: Prevent path traversal outside targetDir
if (!isPathContained(targetDir, resolvedPath)) {
return createToolResult(createValidationError(
`Path "${args.file_path}" resolves outside allowed directory`,
false,
{ deliverableType: args.deliverable_type, allowedBase: targetDir },
));
}
try {
return fs.readFileSync(resolvedPath, 'utf-8');
} catch (readError) {
return createToolResult(createValidationError(
`Failed to read file at ${resolvedPath}: ${readError instanceof Error ? readError.message : String(readError)}`,
true,
{ deliverableType: args.deliverable_type, filePath: resolvedPath },
));
}
}
/**
* Create save_deliverable handler with targetDir captured in closure.
* *
* This factory pattern ensures each MCP server instance has its own targetDir, * This factory pattern ensures each MCP server instance has its own targetDir,
* preventing race conditions when multiple workflows run in parallel. * preventing race conditions when multiple workflows run in parallel.
@@ -41,29 +98,28 @@ export type SaveDeliverableInput = z.infer<typeof SaveDeliverableInputSchema>;
function createSaveDeliverableHandler(targetDir: string) { function createSaveDeliverableHandler(targetDir: string) {
return async function saveDeliverable(args: SaveDeliverableInput): Promise<ToolResult> { return async function saveDeliverable(args: SaveDeliverableInput): Promise<ToolResult> {
try { try {
const { deliverable_type, content } = args; const { deliverable_type } = args;
const contentOrError = resolveContent(args, targetDir);
if (typeof contentOrError !== 'string') {
return contentOrError;
}
const content = contentOrError;
// Validate queue JSON if applicable
if (isQueueType(deliverable_type)) { if (isQueueType(deliverable_type)) {
const queueValidation = validateQueueJson(content); const queueValidation = validateQueueJson(content);
if (!queueValidation.valid) { if (!queueValidation.valid) {
const errorResponse = createValidationError( return createToolResult(createValidationError(
queueValidation.message ?? 'Invalid queue JSON', queueValidation.message ?? 'Invalid queue JSON',
true, true,
{ { deliverableType: deliverable_type, expectedFormat: '{"vulnerabilities": [...]}' },
deliverableType: deliverable_type, ));
expectedFormat: '{"vulnerabilities": [...]}',
}
);
return createToolResult(errorResponse);
} }
} }
// Get filename and save file (targetDir captured from closure)
const filename = DELIVERABLE_FILENAMES[deliverable_type]; const filename = DELIVERABLE_FILENAMES[deliverable_type];
const filepath = saveDeliverableFile(targetDir, filename, content); const filepath = saveDeliverableFile(targetDir, filename, content);
// Success response
const successResponse: SaveDeliverableResponse = { const successResponse: SaveDeliverableResponse = {
status: 'success', status: 'success',
message: `Deliverable saved successfully: ${filename}`, message: `Deliverable saved successfully: ${filename}`,
@@ -74,13 +130,11 @@ function createSaveDeliverableHandler(targetDir: string) {
return createToolResult(successResponse); return createToolResult(successResponse);
} catch (error) { } catch (error) {
const errorResponse = createGenericError( return createToolResult(createGenericError(
error, error,
false, false,
{ deliverableType: args.deliverable_type } { deliverableType: args.deliverable_type },
); ));
return createToolResult(errorResponse);
} }
}; };
} }
@@ -94,7 +148,7 @@ function createSaveDeliverableHandler(targetDir: string) {
export function createSaveDeliverableTool(targetDir: string) { export function createSaveDeliverableTool(targetDir: string) {
return tool( return tool(
'save_deliverable', 'save_deliverable',
'Saves deliverable files with automatic validation. Queue files must have {"vulnerabilities": [...]} structure.', 'Saves deliverable files with automatic validation. Queue files must have {"vulnerabilities": [...]} structure. For large reports, write the file to disk first then pass file_path instead of inline content to avoid output token limits.',
SaveDeliverableInputSchema.shape, SaveDeliverableInputSchema.shape,
createSaveDeliverableHandler(targetDir) createSaveDeliverableHandler(targetDir)
); );

View File

@@ -146,8 +146,10 @@ You are the **Identity Compromise Specialist** - proving tangible impact of brok
- **save_deliverable (MCP Tool):** Saves exploitation evidence files. - **save_deliverable (MCP Tool):** Saves exploitation evidence files.
- **Parameters:** - **Parameters:**
- `deliverable_type`: "AUTH_EVIDENCE" (required) - `deliverable_type`: "AUTH_EVIDENCE" (required)
- `content`: Your complete evidence report (markdown) (required) - `file_path`: Path to the evidence report file you wrote to disk (preferred for large reports)
- `content`: Inline content string (optional, use only for small content)
- **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure - **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for evidence reports.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed. - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **Task Agent:** Mandatory coder-executor for brute force batches, credential stuffing, token replay automation, and any scripted workflow. - **Task Agent:** Mandatory coder-executor for brute force batches, credential stuffing, token replay automation, and any scripted workflow.
- **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow. - **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
@@ -406,9 +408,11 @@ If [blocker] were bypassed/removed:
COMPLETION REQUIREMENTS (ALL must be satisfied): COMPLETION REQUIREMENTS (ALL must be satisfied):
1. Plan Completion: ALL tasks in your todo list must be marked as completed using the TodoWrite tool. 1. Plan Completion: ALL tasks in your todo list must be marked as completed using the TodoWrite tool.
2. Deliverable Generation: The required deliverable must be successfully saved using save_deliverable script: 2. Deliverable Generation: The required deliverable must be successfully saved using save_deliverable script:
- Evidence report: Use `save_deliverable` MCP tool with `deliverable_type: "AUTH_EVIDENCE"` and your evidence report as `content` - Evidence report: Write to `deliverables/auth_exploitation_evidence.md`, then call `save_deliverable` with `deliverable_type: "AUTH_EVIDENCE"` and `file_path` (not inline `content`)
CRITICAL WARNING: Announcing completion before every item in deliverables/auth_exploitation_queue.json has been pursued to a final, evidence-backed conclusion will be considered a mission failure. CRITICAL WARNING: Announcing completion before every item in deliverables/auth_exploitation_queue.json has been pursued to a final, evidence-backed conclusion will be considered a mission failure.
ONLY AFTER fulfilling these exhaustive requirements, announce "AUTH EXPLOITATION COMPLETE" and stop. ONLY AFTER fulfilling these exhaustive requirements, announce "AUTH EXPLOITATION COMPLETE" and stop.
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
</conclusion_trigger> </conclusion_trigger>

View File

@@ -133,8 +133,10 @@ You are the **Privilege Escalation Specialist** - proving tangible impact of bro
- **save_deliverable (MCP Tool):** Saves exploitation evidence files. - **save_deliverable (MCP Tool):** Saves exploitation evidence files.
- **Parameters:** - **Parameters:**
- `deliverable_type`: "AUTHZ_EVIDENCE" (required) - `deliverable_type`: "AUTHZ_EVIDENCE" (required)
- `content`: Your complete evidence report (markdown) (required) - `file_path`: Path to the evidence report file you wrote to disk (preferred for large reports)
- `content`: Inline content string (optional, use only for small content)
- **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure - **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for evidence reports.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed. - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **Task Agent:** Mandatory coder-executor for IDOR sweeps, role escalation loops, and workflow bypass automation. - **Task Agent:** Mandatory coder-executor for IDOR sweeps, role escalation loops, and workflow bypass automation.
- **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow. - **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
@@ -408,9 +410,11 @@ If [blocker] were bypassed/removed:
COMPLETION REQUIREMENTS (ALL must be satisfied): COMPLETION REQUIREMENTS (ALL must be satisfied):
1. Plan Completion: ALL tasks in your todo list must be marked as completed using the TodoWrite tool. 1. Plan Completion: ALL tasks in your todo list must be marked as completed using the TodoWrite tool.
2. Deliverable Generation: The required deliverable must be successfully saved using save_deliverable script: 2. Deliverable Generation: The required deliverable must be successfully saved using save_deliverable script:
- Evidence report: Use `save_deliverable` MCP tool with `deliverable_type: "AUTHZ_EVIDENCE"` and your evidence report as `content` - Evidence report: Write to `deliverables/authz_exploitation_evidence.md`, then call `save_deliverable` with `deliverable_type: "AUTHZ_EVIDENCE"` and `file_path` (not inline `content`)
CRITICAL WARNING: Announcing completion before every item in deliverables/authz_exploitation_queue.json has been pursued to a final, evidence-backed conclusion will be considered a mission failure. CRITICAL WARNING: Announcing completion before every item in deliverables/authz_exploitation_queue.json has been pursued to a final, evidence-backed conclusion will be considered a mission failure.
ONLY AFTER fulfilling these exhaustive requirements, announce "AUTHZ EXPLOITATION COMPLETE" and stop. ONLY AFTER fulfilling these exhaustive requirements, announce "AUTHZ EXPLOITATION COMPLETE" and stop.
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
</conclusion_trigger> </conclusion_trigger>

View File

@@ -115,8 +115,10 @@ You are the **Proof-of-Impact Generator** - converting theoretical injection fla
- **save_deliverable (MCP Tool):** Saves exploitation evidence files. - **save_deliverable (MCP Tool):** Saves exploitation evidence files.
- **Parameters:** - **Parameters:**
- `deliverable_type`: "INJECTION_EVIDENCE" (required) - `deliverable_type`: "INJECTION_EVIDENCE" (required)
- `content`: Your complete evidence report (markdown) (required) - `file_path`: Path to the evidence report file you wrote to disk (preferred for large reports)
- `content`: Inline content string (optional, use only for small content)
- **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure - **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for evidence reports.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed. - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **Task Agent:** Mandatory coder-executor for any custom scripting beyond single ad-hoc commands. - **Task Agent:** Mandatory coder-executor for any custom scripting beyond single ad-hoc commands.
- **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow. - **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
@@ -435,9 +437,11 @@ If [blocker] were bypassed/removed:
COMPLETION REQUIREMENTS (ALL must be satisfied): COMPLETION REQUIREMENTS (ALL must be satisfied):
1. **Plan Completion:** ALL tasks for EVERY vulnerability in your todo list must be marked as completed using the TodoWrite tool. **No vulnerability or task can be left unaddressed.** 1. **Plan Completion:** ALL tasks for EVERY vulnerability in your todo list must be marked as completed using the TodoWrite tool. **No vulnerability or task can be left unaddressed.**
2. **Deliverable Generation:** The required deliverable must be successfully saved using save_deliverable MCP tool: 2. **Deliverable Generation:** The required deliverable must be successfully saved using save_deliverable MCP tool:
- Evidence report: Use `save_deliverable` MCP tool with `deliverable_type: "INJECTION_EVIDENCE"` and your evidence report as `content` - Evidence report: Write to `deliverables/injection_exploitation_evidence.md`, then call `save_deliverable` with `deliverable_type: "INJECTION_EVIDENCE"` and `file_path` (not inline `content`)
**CRITICAL WARNING:** Announcing completion before every item in `deliverables/injection_exploitation_queue.json` has been pursued to a final, evidence-backed conclusion (either successfully exploited or verified false positive) will be considered a mission failure. Superficial testing is not acceptable. **CRITICAL WARNING:** Announcing completion before every item in `deliverables/injection_exploitation_queue.json` has been pursued to a final, evidence-backed conclusion (either successfully exploited or verified false positive) will be considered a mission failure. Superficial testing is not acceptable.
ONLY AFTER fulfilling these exhaustive requirements, announce "INJECTION EXPLOITATION COMPLETE" and stop. ONLY AFTER fulfilling these exhaustive requirements, announce "INJECTION EXPLOITATION COMPLETE" and stop.
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
</conclusion_trigger> </conclusion_trigger>

View File

@@ -132,8 +132,10 @@ You are the **Network Boundary Breaker** - proving tangible impact of SSRF vulne
- **save_deliverable (MCP Tool):** Saves exploitation evidence files. - **save_deliverable (MCP Tool):** Saves exploitation evidence files.
- **Parameters:** - **Parameters:**
- `deliverable_type`: "SSRF_EVIDENCE" (required) - `deliverable_type`: "SSRF_EVIDENCE" (required)
- `content`: Your complete evidence report (markdown) (required) - `file_path`: Path to the evidence report file you wrote to disk (preferred for large reports)
- `content`: Inline content string (optional, use only for small content)
- **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure - **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for evidence reports.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed. - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **{{MCP_SERVER}} (Playwright):** Useful for complex multi-step SSRF exploitation that requires browser context or JavaScript execution. - **{{MCP_SERVER}} (Playwright):** Useful for complex multi-step SSRF exploitation that requires browser context or JavaScript execution.
- **Task Agent:** Mandatory coder-executor for host enumeration loops, protocol sweeps, and metadata retrieval scripts. - **Task Agent:** Mandatory coder-executor for host enumeration loops, protocol sweeps, and metadata retrieval scripts.
@@ -485,9 +487,11 @@ If [blocker] were bypassed/removed:
COMPLETION REQUIREMENTS (ALL must be satisfied): COMPLETION REQUIREMENTS (ALL must be satisfied):
1. Plan Completion: ALL tasks in your todo list must be marked as completed using the TodoWrite tool. 1. Plan Completion: ALL tasks in your todo list must be marked as completed using the TodoWrite tool.
2. Deliverable Generation: The required deliverable must be successfully saved using save_deliverable script: 2. Deliverable Generation: The required deliverable must be successfully saved using save_deliverable script:
- Evidence report: Use `save_deliverable` MCP tool with `deliverable_type: "SSRF_EVIDENCE"` and your evidence report as `content` - Evidence report: Write to `deliverables/ssrf_exploitation_evidence.md`, then call `save_deliverable` with `deliverable_type: "SSRF_EVIDENCE"` and `file_path` (not inline `content`)
CRITICAL WARNING: Announcing completion before every item in deliverables/ssrf_exploitation_queue.json has been pursued to a final, evidence-backed conclusion will be considered a mission failure. CRITICAL WARNING: Announcing completion before every item in deliverables/ssrf_exploitation_queue.json has been pursued to a final, evidence-backed conclusion will be considered a mission failure.
ONLY AFTER fulfilling these exhaustive requirements, announce "SSRF EXPLOITATION COMPLETE" and stop. ONLY AFTER fulfilling these exhaustive requirements, announce "SSRF EXPLOITATION COMPLETE" and stop.
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
</conclusion_trigger> </conclusion_trigger>

View File

@@ -135,8 +135,10 @@ You are the **Client-Side Impact Demonstrator** - converting theoretical XSS fla
- **save_deliverable (MCP Tool):** Saves exploitation evidence files. - **save_deliverable (MCP Tool):** Saves exploitation evidence files.
- **Parameters:** - **Parameters:**
- `deliverable_type`: "XSS_EVIDENCE" (required) - `deliverable_type`: "XSS_EVIDENCE" (required)
- `content`: Your complete evidence report (markdown) (required) - `file_path`: Path to the evidence report file you wrote to disk (preferred for large reports)
- `content`: Inline content string (optional, use only for small content)
- **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure - **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for evidence reports.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed. - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **Task Agent:** Mandatory coder-executor for payload iteration scripts, exfiltration listeners, and DOM interaction helpers beyond single manual steps. - **Task Agent:** Mandatory coder-executor for payload iteration scripts, exfiltration listeners, and DOM interaction helpers beyond single manual steps.
- **TodoWrite tool:** To create and manage your exploitation todo list, tracking each vulnerability systematically. - **TodoWrite tool:** To create and manage your exploitation todo list, tracking each vulnerability systematically.
@@ -425,10 +427,11 @@ If [blocker] were bypassed/removed:
COMPLETION REQUIREMENTS (ALL must be satisfied): COMPLETION REQUIREMENTS (ALL must be satisfied):
- Todo List Completion: ALL vulnerabilities from the exploitation queue must have been processed and marked as completed in your todo list. - Todo List Completion: ALL vulnerabilities from the exploitation queue must have been processed and marked as completed in your todo list.
- Deliverable Generation: The required deliverable must be successfully saved using save_deliverable MCP tool: - Deliverable Generation: The required deliverable must be successfully saved using save_deliverable MCP tool:
- Evidence report: Use `save_deliverable` MCP tool with `deliverable_type: "XSS_EVIDENCE"` and your evidence report as `content` - Evidence report: Write to `deliverables/xss_exploitation_evidence.md`, then call `save_deliverable` with `deliverable_type: "XSS_EVIDENCE"` and `file_path` (not inline `content`)
**CRITICAL WARNING:** Announcing completion before every item in `deliverables/xss_exploitation_queue.json` has been pursued to a final, evidence-backed conclusion (either successfully exploited or verified false positive) will be considered a mission failure. Superficial testing is not acceptable. **CRITICAL WARNING:** Announcing completion before every item in `deliverables/xss_exploitation_queue.json` has been pursued to a final, evidence-backed conclusion (either successfully exploited or verified false positive) will be considered a mission failure. Superficial testing is not acceptable.
ONLY AFTER both plan completion AND successful deliverable generation, announce: "XSS EXPLOITATION COMPLETE" ONLY AFTER both plan completion AND successful deliverable generation, announce "XSS EXPLOITATION COMPLETE" and stop.
and stop.
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
</conclusion_trigger> </conclusion_trigger>

View File

@@ -81,9 +81,11 @@ You are the **Code Intelligence Gatherer** and **Architectural Foundation Builde
- **save_deliverable (MCP Tool):** Saves your final deliverable file with automatic validation. - **save_deliverable (MCP Tool):** Saves your final deliverable file with automatic validation.
- **Parameters:** - **Parameters:**
- `deliverable_type`: "CODE_ANALYSIS" (required) - `deliverable_type`: "CODE_ANALYSIS" (required)
- `content`: Your complete markdown report (required) - `file_path`: Path to the file you wrote to disk (preferred for large reports)
- `content`: Inline content string (optional, use only for small content like JSON queues)
- **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure - **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
- **Usage:** Call the tool with your complete markdown report. The tool handles correct naming and file validation automatically. - **Usage:** Write your report to disk first, then call with `file_path`. The tool handles correct naming and file validation automatically.
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed. - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
</available_tools> </available_tools>
@@ -127,7 +129,7 @@ After Phase 1 completes, launch all three vulnerability-focused agents in parall
- Create the `outputs/schemas/` directory using mkdir -p - Create the `outputs/schemas/` directory using mkdir -p
- Copy all discovered schema files to `outputs/schemas/` with descriptive names - Copy all discovered schema files to `outputs/schemas/` with descriptive names
- Include schema locations in your attack surface analysis - Include schema locations in your attack surface analysis
- Save complete analysis using the `save_deliverable` MCP tool with `deliverable_type: "CODE_ANALYSIS"` and your complete markdown report as the `content` - Write your report to `deliverables/code_analysis_deliverable.md`, then call `save_deliverable` with `deliverable_type: "CODE_ANALYSIS"` and `file_path: "deliverables/code_analysis_deliverable.md"` (do NOT use inline `content`)
**EXECUTION PATTERN:** **EXECUTION PATTERN:**
1. **Use TodoWrite to create task list** tracking: Phase 1 agents, Phase 2 agents, and report synthesis 1. **Use TodoWrite to create task list** tracking: Phase 1 agents, Phase 2 agents, and report synthesis
@@ -385,10 +387,12 @@ A component is **out-of-scope** if it **cannot** be invoked through the running
- Phase 3: Synthesis and report generation completed - Phase 3: Synthesis and report generation completed
2. **Deliverable Generation:** The following files must be successfully created: 2. **Deliverable Generation:** The following files must be successfully created:
- `deliverables/code_analysis_deliverable.md` (Created using save_deliverable MCP tool with CODE_ANALYSIS type) - `deliverables/code_analysis_deliverable.md` (via `save_deliverable` with `file_path`, not inline `content`)
- `outputs/schemas/` directory with all discovered schema files copied (if any schemas found) - `outputs/schemas/` directory with all discovered schema files copied (if any schemas found)
3. **TodoWrite Completion:** All tasks in your todo list must be marked as completed 3. **TodoWrite Completion:** All tasks in your todo list must be marked as completed
**ONLY AFTER** all three requirements are satisfied, announce "**PRE-RECON CODE ANALYSIS COMPLETE**" and stop. **ONLY AFTER** all three requirements are satisfied, announce "**PRE-RECON CODE ANALYSIS COMPLETE**" and stop.
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
</conclusion_trigger> </conclusion_trigger>

View File

@@ -63,8 +63,10 @@ Please use these tools for the following use cases:
- **save_deliverable (MCP Tool):** Saves your reconnaissance deliverable file. - **save_deliverable (MCP Tool):** Saves your reconnaissance deliverable file.
- **Parameters:** - **Parameters:**
- `deliverable_type`: "RECON" (required) - `deliverable_type`: "RECON" (required)
- `content`: Your complete markdown report (required) - `file_path`: Path to the file you wrote to disk (preferred for large reports)
- `content`: Inline content string (optional, use only for small content like JSON queues)
- **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure - **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed. - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
**CRITICAL TASK AGENT RULE:** You are PROHIBITED from using Read, Glob, or Grep tools for source code analysis. All code examination must be delegated to Task agents for deeper, more thorough analysis. **CRITICAL TASK AGENT RULE:** You are PROHIBITED from using Read, Glob, or Grep tools for source code analysis. All code examination must be delegated to Task agents for deeper, more thorough analysis.
@@ -365,5 +367,13 @@ CRITICAL: Only include sources tracing to dangerous sinks (shell, DB, file ops,
</deliverable_instructions> </deliverable_instructions>
<conclusion_trigger> <conclusion_trigger>
Once you have saved the complete deliverable using the save_deliverable MCP tool with `deliverable_type: "RECON"` and your complete report as the `content`, your phase is complete. Announce "RECONNAISSANCE COMPLETE" and await further instructions. **DELIVERABLE SAVING:**
1. Write your report to `deliverables/recon_deliverable.md`
2. Call `save_deliverable` with `deliverable_type: "RECON"` and `file_path: "deliverables/recon_deliverable.md"`
**WARNING:** Do NOT pass your report as inline `content` — it will exceed output token limits. Always use `file_path`.
Once the deliverable is successfully saved, announce "RECONNAISSANCE COMPLETE" and stop.
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
</conclusion_trigger> </conclusion_trigger>

View File

@@ -80,9 +80,11 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a
- **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation. - **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation.
- **Parameters:** - **Parameters:**
- `deliverable_type`: "AUTH_ANALYSIS" or "AUTH_QUEUE" (required) - `deliverable_type`: "AUTH_ANALYSIS" or "AUTH_QUEUE" (required)
- `content`: Your markdown report or JSON queue (required) - `file_path`: Path to the file you wrote to disk (preferred for large reports)
- `content`: Inline content string (use only for small content like JSON queues)
- **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure - **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
- **Usage:** Call the tool with your deliverable type and content. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically. - **Usage:** For analysis reports, write to disk first then call with `file_path`. For JSON queues, you may pass inline `content`. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically.
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed. - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **{{MCP_SERVER}} (Playwright):** To interact with the live web application to understand multi-step flows like password reset or registration. - **{{MCP_SERVER}} (Playwright):** To interact with the live web application to understand multi-step flows like password reset or registration.
- **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each endpoint/flow that needs analysis. Mark items as "in_progress" when working on them and "completed" when done. - **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each endpoint/flow that needs analysis. Mark items as "in_progress" when working on them and "completed" when done.
@@ -252,8 +254,10 @@ This file serves as the handoff mechanism and must always be created to signal c
1. **Systematic Analysis:** ALL relevant API endpoints and user-facing features identified in the reconnaissance deliverable must be analyzed for AuthN/AuthZ flaws. 1. **Systematic Analysis:** ALL relevant API endpoints and user-facing features identified in the reconnaissance deliverable must be analyzed for AuthN/AuthZ flaws.
2. **Deliverable Generation:** Both required deliverables must be successfully saved using save_deliverable MCP tool: 2. **Deliverable Generation:** Both required deliverables must be successfully saved using save_deliverable MCP tool:
- Analysis report: Use `save_deliverable` MCP tool with `deliverable_type: "AUTH_ANALYSIS"` and your report as `content` - Analysis report: Write to `deliverables/auth_analysis_deliverable.md`, then call `save_deliverable` with `deliverable_type: "AUTH_ANALYSIS"` and `file_path` (not inline `content`)
- Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "AUTH_QUEUE"` and `content: {"vulnerabilities": [...]}` - Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "AUTH_QUEUE"` and `content: {"vulnerabilities": [...]}`
**ONLY AFTER** both systematic analysis AND successful deliverable generation, announce "**AUTH ANALYSIS COMPLETE**" and stop. **ONLY AFTER** both systematic analysis AND successful deliverable generation, announce "**AUTH ANALYSIS COMPLETE**" and stop.
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
</conclusion_trigger> </conclusion_trigger>

View File

@@ -83,9 +83,11 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a
- **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation. - **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation.
- **Parameters:** - **Parameters:**
- `deliverable_type`: "AUTHZ_ANALYSIS" or "AUTHZ_QUEUE" (required) - `deliverable_type`: "AUTHZ_ANALYSIS" or "AUTHZ_QUEUE" (required)
- `content`: Your markdown report or JSON queue (required) - `file_path`: Path to the file you wrote to disk (preferred for large reports)
- `content`: Inline content string (use only for small content like JSON queues)
- **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure - **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
- **Usage:** Call the tool with your deliverable type and content. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically. - **Usage:** For analysis reports, write to disk first then call with `file_path`. For JSON queues, you may pass inline `content`. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically.
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed. - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **{{MCP_SERVER}} (Playwright):** To interact with the live web application to understand multi-step flows and role-based access controls. - **{{MCP_SERVER}} (Playwright):** To interact with the live web application to understand multi-step flows and role-based access controls.
- **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each endpoint that needs authorization analysis. Mark items as "in_progress" when working on them and "completed" when done. - **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each endpoint that needs authorization analysis. Mark items as "in_progress" when working on them and "completed" when done.
@@ -355,10 +357,12 @@ This file serves as the handoff mechanism and must always be created to signal c
1. **Todo Completion:** ALL tasks in your TodoWrite list must be marked as "completed" 1. **Todo Completion:** ALL tasks in your TodoWrite list must be marked as "completed"
2. **Deliverable Generation:** Both required deliverables must be successfully saved using save_deliverable MCP tool: 2. **Deliverable Generation:** Both required deliverables must be successfully saved using save_deliverable MCP tool:
- Analysis report: Use `save_deliverable` MCP tool with `deliverable_type: "AUTHZ_ANALYSIS"` and your report as `content` - Analysis report: Write to `deliverables/authz_analysis_deliverable.md`, then call `save_deliverable` with `deliverable_type: "AUTHZ_ANALYSIS"` and `file_path` (not inline `content`)
- Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "AUTHZ_QUEUE"` and `content: {"vulnerabilities": [...]}` - Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "AUTHZ_QUEUE"` and `content: {"vulnerabilities": [...]}`
**ONLY AFTER** both todo completion AND successful deliverable generation, announce "**AUTHORIZATION ANALYSIS COMPLETE**" and stop. **ONLY AFTER** both todo completion AND successful deliverable generation, announce "**AUTHORIZATION ANALYSIS COMPLETE**" and stop.
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
**FAILURE TO COMPLETE TODOS = INCOMPLETE ANALYSIS** - You will be considered to have failed the mission if you generate deliverables before completing comprehensive testing of all authorization vectors. **FAILURE TO COMPLETE TODOS = INCOMPLETE ANALYSIS** - You will be considered to have failed the mission if you generate deliverables before completing comprehensive testing of all authorization vectors.
</conclusion_trigger> </conclusion_trigger>

View File

@@ -83,9 +83,11 @@ An **exploitable vulnerability** is a confirmed source-to-sink path where the en
- **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation. - **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation.
- **Parameters:** - **Parameters:**
- `deliverable_type`: "INJECTION_ANALYSIS" or "INJECTION_QUEUE" (required) - `deliverable_type`: "INJECTION_ANALYSIS" or "INJECTION_QUEUE" (required)
- `content`: Your markdown report or JSON queue (required) - `file_path`: Path to the file you wrote to disk (preferred for large reports)
- `content`: Inline content string (use only for small content like JSON queues)
- **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure - **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
- **Usage:** Call the tool with your deliverable type and content. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically. - **Usage:** For analysis reports, write to disk first then call with `file_path`. For JSON queues, you may pass inline `content`. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically.
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed. - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **{{MCP_SERVER}} (Playwright):** To interact with the live web application to understand multi-step flows like password reset or registration. - **{{MCP_SERVER}} (Playwright):** To interact with the live web application to understand multi-step flows like password reset or registration.
- **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each injection source that needs analysis. Mark items as "in_progress" when working on them and "completed" when done. - **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each injection source that needs analysis. Mark items as "in_progress" when working on them and "completed" when done.
@@ -362,10 +364,12 @@ This file serves as the handoff mechanism to the Exploitation phase and must alw
1. **Todo Completion:** ALL tasks in your TodoWrite list must be marked as "completed" 1. **Todo Completion:** ALL tasks in your TodoWrite list must be marked as "completed"
2. **Deliverable Generation:** Both required deliverables must be successfully saved using save_deliverable MCP tool: 2. **Deliverable Generation:** Both required deliverables must be successfully saved using save_deliverable MCP tool:
- Analysis report: Use `save_deliverable` MCP tool with `deliverable_type: "INJECTION_ANALYSIS"` and your report as `content` - Analysis report: Write to `deliverables/injection_analysis_deliverable.md`, then call `save_deliverable` with `deliverable_type: "INJECTION_ANALYSIS"` and `file_path` (not inline `content`)
- Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "INJECTION_QUEUE"` and `content: {"vulnerabilities": [...]}` - Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "INJECTION_QUEUE"` and `content: {"vulnerabilities": [...]}`
**ONLY AFTER** both todo completion AND successful deliverable generation, announce "**INJECTION ANALYSIS COMPLETE**" and stop. **ONLY AFTER** both todo completion AND successful deliverable generation, announce "**INJECTION ANALYSIS COMPLETE**" and stop.
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
**FAILURE TO COMPLETE TODOS = INCOMPLETE ANALYSIS** - You will be considered to have failed the mission if you generate deliverables before completing comprehensive testing of all input vectors. **FAILURE TO COMPLETE TODOS = INCOMPLETE ANALYSIS** - You will be considered to have failed the mission if you generate deliverables before completing comprehensive testing of all input vectors.
</conclusion_trigger> </conclusion_trigger>

View File

@@ -79,9 +79,11 @@ An **exploitable vulnerability** is a data flow where user-controlled input infl
- **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation. - **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation.
- **Parameters:** - **Parameters:**
- `deliverable_type`: "SSRF_ANALYSIS" or "SSRF_QUEUE" (required) - `deliverable_type`: "SSRF_ANALYSIS" or "SSRF_QUEUE" (required)
- `content`: Your markdown report or JSON queue (required) - `file_path`: Path to the file you wrote to disk (preferred for large reports)
- `content`: Inline content string (use only for small content like JSON queues)
- **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure - **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
- **Usage:** Call the tool with your deliverable type and content. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically. - **Usage:** For analysis reports, write to disk first then call with `file_path`. For JSON queues, you may pass inline `content`. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically.
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed. - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **{{MCP_SERVER}} (Playwright):** To interact with the live web application to understand multi-step flows that might involve URL redirection or proxy functionality. - **{{MCP_SERVER}} (Playwright):** To interact with the live web application to understand multi-step flows that might involve URL redirection or proxy functionality.
- **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each SSRF sink that needs analysis. Mark items as "in_progress" when working on them and "completed" when done. - **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each SSRF sink that needs analysis. Mark items as "in_progress" when working on them and "completed" when done.
@@ -299,8 +301,10 @@ This file serves as the handoff mechanism and must always be created to signal c
1. **Systematic Analysis:** ALL relevant API endpoints and request-making features identified in the reconnaissance deliverable must be analyzed for SSRF vulnerabilities. 1. **Systematic Analysis:** ALL relevant API endpoints and request-making features identified in the reconnaissance deliverable must be analyzed for SSRF vulnerabilities.
2. **Deliverable Generation:** Both required deliverables must be successfully saved using save_deliverable MCP tool: 2. **Deliverable Generation:** Both required deliverables must be successfully saved using save_deliverable MCP tool:
- Analysis report: Use `save_deliverable` MCP tool with `deliverable_type: "SSRF_ANALYSIS"` and your report as `content` - Analysis report: Write to `deliverables/ssrf_analysis_deliverable.md`, then call `save_deliverable` with `deliverable_type: "SSRF_ANALYSIS"` and `file_path` (not inline `content`)
- Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "SSRF_QUEUE"` and `content: {"vulnerabilities": [...]}` - Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "SSRF_QUEUE"` and `content: {"vulnerabilities": [...]}`
**ONLY AFTER** both systematic analysis AND successful deliverable generation, announce "**SSRF ANALYSIS COMPLETE**" and stop. **ONLY AFTER** both systematic analysis AND successful deliverable generation, announce "**SSRF ANALYSIS COMPLETE**" and stop.
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
</conclusion_trigger> </conclusion_trigger>

View File

@@ -84,9 +84,11 @@ An **exploitable vulnerability** is a confirmed source-to-sink path where the en
- **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation. - **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation.
- **Parameters:** - **Parameters:**
- `deliverable_type`: "XSS_ANALYSIS" or "XSS_QUEUE" (required) - `deliverable_type`: "XSS_ANALYSIS" or "XSS_QUEUE" (required)
- `content`: Your markdown report or JSON queue (required) - `file_path`: Path to the file you wrote to disk (preferred for large reports)
- `content`: Inline content string (use only for small content like JSON queues)
- **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure - **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
- **Usage:** Call the tool with your deliverable type and content. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically. - **Usage:** For analysis reports, write to disk first then call with `file_path`. For JSON queues, you may pass inline `content`. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically.
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed. - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
</available_tools> </available_tools>
@@ -288,8 +290,10 @@ COMPLETION REQUIREMENTS (ALL must be satisfied):
1. Systematic Analysis: ALL input vectors identified from the reconnaissance deliverable must be analyzed. 1. Systematic Analysis: ALL input vectors identified from the reconnaissance deliverable must be analyzed.
2. Deliverable Generation: Both required deliverables must be successfully saved using save_deliverable MCP tool: 2. Deliverable Generation: Both required deliverables must be successfully saved using save_deliverable MCP tool:
- Analysis report: Use `save_deliverable` MCP tool with `deliverable_type: "XSS_ANALYSIS"` and your report as `content` - Analysis report: Write to `deliverables/xss_analysis_deliverable.md`, then call `save_deliverable` with `deliverable_type: "XSS_ANALYSIS"` and `file_path` (not inline `content`)
- Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "XSS_QUEUE"` and `content: {"vulnerabilities": [...]}` - Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "XSS_QUEUE"` and `content: {"vulnerabilities": [...]}`
ONLY AFTER both systematic analysis AND successful deliverable generation, announce "XSS ANALYSIS COMPLETE" and stop. ONLY AFTER both systematic analysis AND successful deliverable generation, announce "XSS ANALYSIS COMPLETE" and stop.
**CRITICAL:** After announcing completion, STOP IMMEDIATELY. Do NOT output summaries, recaps, or explanations of your work — the deliverable contains everything needed.
</conclusion_trigger> </conclusion_trigger>

View File

@@ -218,6 +218,16 @@ export async function runClaudePrompt(
console.log(chalk.blue(` Running Claude Code: ${description}...`)); console.log(chalk.blue(` Running Claude Code: ${description}...`));
const mcpServers = buildMcpServers(sourceDir, agentName); const mcpServers = buildMcpServers(sourceDir, agentName);
// Build env vars to pass to SDK subprocesses
const sdkEnv: Record<string, string> = {};
if (process.env.ANTHROPIC_API_KEY) {
sdkEnv.ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY;
}
if (process.env.CLAUDE_CODE_OAUTH_TOKEN) {
sdkEnv.CLAUDE_CODE_OAUTH_TOKEN = process.env.CLAUDE_CODE_OAUTH_TOKEN;
}
const options = { const options = {
model: 'claude-sonnet-4-5-20250929', model: 'claude-sonnet-4-5-20250929',
maxTurns: 10_000, maxTurns: 10_000,
@@ -225,6 +235,7 @@ export async function runClaudePrompt(
permissionMode: 'bypassPermissions' as const, permissionMode: 'bypassPermissions' as const,
allowDangerouslySkipPermissions: true, allowDangerouslySkipPermissions: true,
mcpServers, mcpServers,
env: sdkEnv,
}; };
if (!execContext.useCleanOutput) { if (!execContext.useCleanOutput) {

View File

@@ -50,6 +50,20 @@ export function extractMessageContent(message: AssistantMessage): string {
return String(messageContent.content); return String(messageContent.content);
} }
// Extracts only text content (no tool_use JSON) to avoid false positives in error detection
export function extractTextOnlyContent(message: AssistantMessage): string {
const messageContent = message.message;
if (Array.isArray(messageContent.content)) {
return messageContent.content
.filter((c: ContentBlock) => c.type === 'text' || c.text)
.map((c: ContentBlock) => c.text || '')
.join('\n');
}
return String(messageContent.content);
}
export function detectApiError(content: string): ApiErrorDetection { export function detectApiError(content: string): ApiErrorDetection {
if (!content || typeof content !== 'string') { if (!content || typeof content !== 'string') {
return { detected: false }; return { detected: false };
@@ -175,11 +189,14 @@ export function handleAssistantMessage(
const cleanedContent = filterJsonToolCalls(content); const cleanedContent = filterJsonToolCalls(content);
// Prefer structured error field from SDK, fall back to text-sniffing // Prefer structured error field from SDK, fall back to text-sniffing
// Use text-only content for error detection to avoid false positives
// from tool_use JSON (e.g. security reports containing "usage limit")
let errorDetection: ApiErrorDetection; let errorDetection: ApiErrorDetection;
if (message.error) { if (message.error) {
errorDetection = handleStructuredError(message.error, content); errorDetection = handleStructuredError(message.error, content);
} else { } else {
errorDetection = detectApiError(content); const textOnlyContent = extractTextOnlyContent(message);
errorDetection = detectApiError(textOnlyContent);
} }
const result: AssistantResult = { const result: AssistantResult = {