feat: add npx CLI with monorepo, CI/CD, and ephemeral worker architecture (#256)

* feat: integrate npx CLI, CI/CD, and ephemeral worker architecture

Bring in changes from shannon-npx: npx-distributable CLI package (cli/),
semantic-release CI/CD workflows, ephemeral per-scan worker containers,
TOML config support, setup wizard, and workspace management.

Preserves all shannon-only changes: security hardening (localhost-bound
ports, MCP env allowlist, path traversal guard), updated benchmarks
(XBEN 19/31/35/44), README assets, and prompt injection disclaimer.

Applies security hardening to cli/infra/compose.yml as well.

* refactor: migrate to Turborepo + pnpm + Biome monorepo

Restructure into apps/worker, apps/cli, packages/mcp-server with
Turborepo task orchestration, pnpm workspaces, Biome linting/formatting,
and tsdown CLI bundling.

Key changes:
- src/ -> apps/worker/src/, cli/ -> apps/cli/, mcp-server/ -> packages/mcp-server/
- prompts/ and configs/ moved into apps/worker/
- npm replaced with pnpm, package-lock.json replaced with pnpm-lock.yaml
- Dockerfile updated for pnpm-based builds
- CLI logs command rewritten with chokidar for cross-platform reliability
- Router health checking added for auto-detected router mode
- Centralized path resolution via apps/worker/src/paths.ts

* fix: resolve all biome warnings and formatting issues

- Remove unnecessary non-null assertions where values are guaranteed
- Replace array index access with .at() for safer element retrieval
- Use local variables to avoid repeated process.env lookups
- Replace any types with unknown in functional utilities
- Use nullish coalescing for TOTP hash byte access
- Auto-format security patches to match biome config

* fix: pin pnpm to 10.12.1 in Dockerfile for catalog support

* fix: handle Esc cancellation in Bedrock setup flow

Replace p.group() with individual prompts and per-field cancel checks,
matching the pattern used by all other provider setup flows.

* feat: add optional model customization to Anthropic setup

* fix: resolve Docker bind mount permission errors on Linux

Use entrypoint-based UID remapping instead of --user flag so the
container's pentest user matches the host UID/GID, keeping bind-mounted
volumes writable. Git config moved to --system level to survive remapping.

* fix: show resumed workflow ID in splash screen URL

When resuming a workflow, the Temporal Web UI link pointed to the old
(terminated) workflow ID. Now extracts "New Workflow ID" from the resume
header in workflow.log, falling back to the original ID for fresh scans.

* style: fix biome formatting in docker.ts

* fix: align TypeScript config types with JSON Schema

- SuccessCondition.type: use schema values (url_contains,
  element_present, url_equals_exactly, text_contains) instead of
  stale values (url, cookie, element, redirect)
- Authentication.login_flow: mark optional to match schema which
  does not require it

* feat: mark GitHub release as latest during rollback

* fix: use native ARM64 runners for Docker multi-platform builds

Replace QEMU emulation with parallel native builds using a matrix
strategy (ubuntu-latest for amd64, ubuntu-24.04-arm for arm64).
Each platform pushes by digest, then a merge job creates the
multi-arch manifest list before signing with cosign.

* fix: resolve SessionMutex race condition with 3+ concurrent waiters

* fix: skip POSIX permission check on Windows

writeFileSync mode option is ignored on Windows, so config.toml
gets 0o666 and the guard rejects it.

* fix: resolve unsubstituted placeholders in report prompt

Remove unused {{GITHUB_URL}} placeholder and wire up {{AUTH_CONTEXT}}
with structured auth context (login type, username, URL, MFA status).

* fix: remove duplicate environment gate from merge-docker job

Move DOCKERHUB_USERNAME from vars to secrets so merge-docker can access
credentials without its own environment scope. This eliminates the
redundant double approval since build-docker already gates on
release-publish.

* fix: replace POSIX sleep binary with cross-platform async sleep

execFileSync('sleep') is unavailable on Windows. Use node:timers/promises
setTimeout instead, making ensureInfra async.

* fix: use session.json for workflow ID on resume instead of parsing workflow.log

On resume, workflow.log already exists with stale headers from the
previous run. The CLI poll found '====' immediately and extracted the
old workflow ID, producing a wrong Temporal Web UI URL.

Read the workflow ID from session.json instead — the worker writes
resume attempts there atomically. For fresh runs, poll until
originalWorkflowId appears. For resumes, poll until a new
resumeAttempts entry is appended.

* feat: add custom base URL support for Anthropic-compatible proxies

Support ANTHROPIC_BASE_URL + ANTHROPIC_AUTH_TOKEN to route SDK requests
through LiteLLM or any Anthropic-compatible proxy. Adds TUI wizard
option, TOML config mapping, credential validation, and preflight
endpoint reachability check via SDK query.

* fix: remove environment gates and add NPM_TOKEN to publish step

* feat: add beta release and rollback workflows with cosign signing

* fix: remove redundant checkout and pnpm steps from beta release workflow

* docs: normalize README commands to mode-neutral shorthand

Add a substitution note after Quick Start sections so all subsequent
examples use bare `shannon` instead of mixing `./shannon` and
`npx @keygraph/shannon`. Mode-specific commands (build, update,
uninstall) get inline annotations. Also fixes a broken command in the
Custom Base URL section.

* fix: remove redundant `update` command

Image is already auto-pulled by `ensureImage()` during `start` when the
pinned version tag is missing locally. Manual `update` was unnecessary.

* docs: add CLI package README stub

* docs: update README setup instructions for dual CLI modes

* docs: update announcement banner to npx availability

* feat: migrate from MCP tools to CLI based tools (#252)

* feat: migrate from MCP tools to CLI tools

* fix: restore browser action emoji formatters for CLI output

Adapt formatBrowserAction for playwright-cli commands, replacing the old
mcp__playwright__browser_* tool name matching removed during migration.

* fix: mount credential file to fixed container path for Vertex AI

GOOGLE_APPLICATION_CREDENTIALS was forwarded as-is to the container,
causing the relative host path to resolve against the repo mount
instead of the credentials mount. Now both local and npx modes mount
the resolved file to /app/credentials/google-sa-key.json and rewrite
the env var to match.

* feat: add git awareness and optional description field to config

* fix: drop redundant --ipc host flag from worker container

* fix: align announcement banner URL with main branch

* feat: add target URL reachability preflight check (#254)

* Moving asset benchmark graph image to this folder

* Move benchmark results to benchmark repo

Windows Defender flags exploit code in the pentest reports as false positives, forcing every Windows user to add a Defender exclusion just to clone Shannon.

* Updated README

* fix: case-insensitive grep for semantic-release version probe

* fix: harden supply chain security (#255)

* fix: patch smol-toml and tsdown vulnerabilities

Update smol-toml 1.6.0→1.6.1 (DoS via recursive comment parsing) and
tsdown 0.21.2→0.21.5 (picomatch ReDoS + method injection).

* fix: pin all unpinned dependency versions in Dockerfile

Pins subfinder v2.13.0, WhatWeb v0.6.3 (switched from git clone to
release tarball), schemathesis 4.13.0, addressable 2.8.9,
claude-code 2.1.84, and playwright-cli 0.1.1 for reproducible builds.

* fix: pin GitHub Actions to commit SHAs for supply chain security

* fix: pin GitHub Actions to commit SHAs in beta and rollback workflows
This commit is contained in:
ezl-keygraph
2026-03-27 02:34:29 +05:30
committed by GitHub
parent 0d172f5e32
commit bc8fd203ed
4058 changed files with 7774 additions and 1189080 deletions
+6 -6
View File
@@ -22,21 +22,21 @@ You are debugging an issue. Follow this structured approach to avoid spinning in
**Session audit logs:**
```bash
# Find most recent session
ls -lt audit-logs/ | head -5
ls -lt workspaces/ | head -5
# Check session metrics and errors
cat audit-logs/<session>/session.json | jq '.errors, .agentMetrics'
cat workspaces/<session>/session.json | jq '.errors, .agentMetrics'
# Check agent execution logs
ls -lt audit-logs/<session>/agents/
cat audit-logs/<session>/agents/<latest>.log
ls -lt workspaces/<session>/agents/
cat workspaces/<session>/agents/<latest>.log
```
## Step 3: Trace the Call Path
For Shannon, trace through these layers:
1. **Temporal Client**`src/temporal/client.ts` - Workflow initiation
1. **Worker + Client**`src/temporal/worker.ts` - Combined worker + workflow submission
2. **Workflow**`src/temporal/workflows.ts` - Pipeline orchestration
3. **Activities**`src/temporal/activities.ts` - Thin wrappers: heartbeat, error classification
4. **Container**`src/services/container.ts` - Per-workflow DI
@@ -72,7 +72,7 @@ For Shannon, trace through these layers:
npx playwright install chromium
# Check MCP server startup (look for connection errors)
grep -i "mcp\|playwright" audit-logs/<session>/agents/*.log
grep -i "mcp\|playwright" workspaces/<session>/agents/*.log
```
**Git State Issues:**
+8
View File
@@ -46,6 +46,14 @@ temp/
ehthumbs.db
Thumbs.db
# CLI package (runs on host, not in container)
# Keep apps/cli/package.json so pnpm workspaces resolve
apps/cli/src/
apps/cli/dist/
apps/cli/infra/
apps/cli/tsconfig.json
apps/cli/tsdown.config.ts
# Docker files (avoid recursive copying)
Dockerfile*
docker-compose*.yml
+1 -1
View File
@@ -71,7 +71,7 @@ ANTHROPIC_API_KEY=your-api-key-here
# CLAUDE_CODE_USE_VERTEX=1
# CLOUD_ML_REGION=us-east5
# ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project-id
# GOOGLE_APPLICATION_CREDENTIALS=./credentials/gcp-sa-key.json
# GOOGLE_APPLICATION_CREDENTIALS=./credentials/google-sa-key.json
# =============================================================================
# Available Models
+1 -1
View File
@@ -69,7 +69,7 @@ body:
Issues without this information may be difficult to triage.
- Check the audit logs at: `./audit-logs/target_url_shannon-123/workflow.log`
- Check the logs at: `./workspaces/target_url_shannon-123/workflow.log`
Use `grep` or search to identify errors.
Paste the relevant error output below.
- Temporal:
+13 -13
View File
@@ -19,7 +19,7 @@ jobs:
steps:
- name: Setup Node.js
uses: actions/setup-node@v6
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
with:
node-version: 24
registry-url: https://registry.npmjs.org
@@ -61,20 +61,20 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v6
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v4
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
- name: Log in to Docker Hub
uses: docker/login-action@v4
uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # v4.0.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push by digest
id: build
uses: docker/build-push-action@v7
uses: docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0
with:
context: .
platforms: ${{ matrix.platform }}
@@ -89,7 +89,7 @@ jobs:
touch "/tmp/digests/${digest#sha256:}"
- name: Upload digest
uses: actions/upload-artifact@v6
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with:
name: digests-${{ matrix.platform == 'linux/amd64' && 'amd64' || 'arm64' }}
path: /tmp/digests/*
@@ -108,17 +108,17 @@ jobs:
steps:
- name: Download digests
uses: actions/download-artifact@v6
uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
with:
path: /tmp/digests
pattern: digests-*
merge-multiple: true
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v4
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
- name: Log in to Docker Hub
uses: docker/login-action@v4
uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # v4.0.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
@@ -138,7 +138,7 @@ jobs:
echo "digest=$DIGEST" >> "$GITHUB_OUTPUT"
- name: Install cosign
uses: sigstore/cosign-installer@v4.1.0
uses: sigstore/cosign-installer@ba7bc0a3fef59531c69a25acd34668d6d3fe6f22 # v4.1.0
- name: Sign Docker image
run: cosign sign --yes "keygraph/shannon@${{ steps.inspect.outputs.digest }}"
@@ -161,13 +161,13 @@ jobs:
steps:
- name: Checkout
uses: actions/checkout@v6
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Install pnpm
uses: pnpm/action-setup@v4
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
- name: Configure npm registry
uses: actions/setup-node@v6
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
with:
node-version: 24
registry-url: https://registry.npmjs.org
+241
View File
@@ -0,0 +1,241 @@
name: Release
on:
workflow_dispatch:
permissions:
contents: read
concurrency:
group: release-main
cancel-in-progress: false
jobs:
preflight:
name: Preflight
runs-on: ubuntu-latest
permissions:
contents: write
outputs:
should_release: ${{ steps.probe.outputs.should_release }}
version: ${{ steps.probe.outputs.version }}
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
- name: Setup Node.js
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
with:
node-version: 24
cache: 'pnpm'
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Probe semantic-release
id: probe
shell: bash
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
set -euo pipefail
npx semantic-release@25 --dry-run --no-ci 2>&1 | tee semantic-release.log
if grep -qi "the next release version is" semantic-release.log; then
echo "should_release=true" >> "$GITHUB_OUTPUT"
VERSION=$(grep -oiE "the next release version is [0-9]+\.[0-9]+\.[0-9]+" semantic-release.log | grep -oE "[0-9]+\.[0-9]+\.[0-9]+")
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
else
echo "should_release=false" >> "$GITHUB_OUTPUT"
fi
build-docker:
name: Build Docker (${{ matrix.platform }})
needs: preflight
if: needs.preflight.outputs.should_release == 'true'
permissions:
contents: read
strategy:
fail-fast: true
matrix:
include:
- platform: linux/amd64
runner: ubuntu-latest
- platform: linux/arm64
runner: ubuntu-24.04-arm
runs-on: ${{ matrix.runner }}
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
- name: Log in to Docker Hub
uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # v4.0.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build and push by digest
id: build
uses: docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0
with:
context: .
platforms: ${{ matrix.platform }}
provenance: mode=max
sbom: true
outputs: type=image,name=keygraph/shannon,push-by-digest=true,name-canonical=true,push=true
- name: Export digest
run: |
mkdir -p /tmp/digests
digest="${{ steps.build.outputs.digest }}"
touch "/tmp/digests/${digest#sha256:}"
- name: Upload digest
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with:
name: digests-${{ matrix.platform == 'linux/amd64' && 'amd64' || 'arm64' }}
path: /tmp/digests/*
if-no-files-found: error
retention-days: 1
merge-docker:
name: Push Docker manifests
needs: [preflight, build-docker]
runs-on: ubuntu-latest
permissions:
contents: read
id-token: write
outputs:
digest: ${{ steps.inspect.outputs.digest }}
steps:
- name: Download digests
uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
with:
path: /tmp/digests
pattern: digests-*
merge-multiple: true
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
- name: Log in to Docker Hub
uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # v4.0.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Create manifest list and push
working-directory: /tmp/digests
run: |
docker buildx imagetools create \
--tag "keygraph/shannon:${{ needs.preflight.outputs.version }}" \
--tag "keygraph/shannon:latest" \
$(printf 'keygraph/shannon@sha256:%s ' *)
- name: Inspect image
id: inspect
run: |
docker buildx imagetools inspect "keygraph/shannon:${{ needs.preflight.outputs.version }}"
DIGEST="sha256:$(docker buildx imagetools inspect --raw "keygraph/shannon:${{ needs.preflight.outputs.version }}" | sha256sum | cut -d' ' -f1)"
echo "digest=$DIGEST" >> "$GITHUB_OUTPUT"
- name: Install cosign
uses: sigstore/cosign-installer@ba7bc0a3fef59531c69a25acd34668d6d3fe6f22 # v4.1.0
- name: Sign Docker image
run: cosign sign --yes "keygraph/shannon@${{ steps.inspect.outputs.digest }}"
- name: Verify Docker image signature
run: |
sleep 10
cosign verify \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
--certificate-identity https://github.com/${{ github.repository }}/.github/workflows/release.yml@${{ github.ref }} \
"keygraph/shannon@${{ steps.inspect.outputs.digest }}"
publish-npm:
name: Publish npm
needs: [preflight, merge-docker]
runs-on: ubuntu-latest
permissions:
contents: read
id-token: write
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
- name: Configure npm registry
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
with:
node-version: 24
registry-url: https://registry.npmjs.org
cache: 'pnpm'
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Set CLI package version
run: cd apps/cli && npm version "${{ needs.preflight.outputs.version }}" --no-git-tag-version --allow-same-version
- name: Sync lockfile with bumped version
run: pnpm install --lockfile-only
- name: Build CLI
run: pnpm --filter @keygraph/shannon run build
- name: Publish npm package
working-directory: apps/cli
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
run: |
if npm view "@keygraph/shannon@${{ needs.preflight.outputs.version }}" version 2>/dev/null; then
echo "Version already published, skipping"
else
pnpm publish --access public --no-git-checks
fi
release:
name: Create GitHub release
needs: [preflight, publish-npm]
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
- name: Install pnpm
uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
- name: Setup Node.js
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
with:
node-version: 24
cache: 'pnpm'
- name: Install dependencies
run: pnpm install --frozen-lockfile
- name: Create GitHub release
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: npx semantic-release@25
+1 -1
View File
@@ -38,7 +38,7 @@ jobs:
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
- name: Setup Node.js
uses: actions/setup-node@v6
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
with:
node-version: 24
registry-url: https://registry.npmjs.org
+129
View File
@@ -0,0 +1,129 @@
name: Rollback
on:
workflow_dispatch:
inputs:
version:
description: "Version to move npm latest and Docker latest to (example: 1.4.2)"
required: true
type: string
permissions:
contents: write
concurrency:
group: rollback-latest-${{ github.event.inputs.version }}
cancel-in-progress: false
jobs:
rollback:
name: Roll back npm, Docker, and GitHub release latest
runs-on: ubuntu-latest
steps:
- name: Checkout tags
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
- name: Fetch all tags
run: git fetch --force --tags
- name: Validate target version
id: target
shell: bash
env:
RAW_VERSION: ${{ inputs.version }}
run: |
set -euo pipefail
VERSION="${RAW_VERSION#v}"
case "$VERSION" in
''|*[!0-9.]*)
echo "Invalid version: $VERSION"
exit 1
;;
esac
if ! [[ "$VERSION" =~ ^[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
echo "Version must be in semver format X.Y.Z"
exit 1
fi
if ! git rev-parse "refs/tags/v$VERSION" >/dev/null 2>&1; then
echo "Git tag v$VERSION does not exist"
exit 1
fi
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
- name: Setup Node.js
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
with:
node-version: 24
registry-url: https://registry.npmjs.org
- name: Verify npm package version exists
run: npm view "@keygraph/shannon@${{ steps.target.outputs.version }}" version
- name: Show current npm dist-tags
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
run: npm dist-tag ls @keygraph/shannon
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
- name: Log in to Docker Hub
uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # v4.0.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Verify Docker image tag exists
run: docker buildx imagetools inspect "keygraph/shannon:${{ steps.target.outputs.version }}"
- name: Install cosign
uses: sigstore/cosign-installer@ba7bc0a3fef59531c69a25acd34668d6d3fe6f22 # v4.1.0
- name: Verify Docker image signature before rollback
run: |
cosign verify \
--certificate-oidc-issuer https://token.actions.githubusercontent.com \
--certificate-identity "https://github.com/${{ github.repository }}/.github/workflows/release.yml@refs/heads/main" \
"keygraph/shannon:${{ steps.target.outputs.version }}"
- name: Move Docker latest
run: |
docker buildx imagetools create \
--tag "keygraph/shannon:latest" \
"keygraph/shannon:${{ steps.target.outputs.version }}"
- name: Move npm latest
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
run: npm dist-tag add "@keygraph/shannon@${{ steps.target.outputs.version }}" latest
- name: Mark GitHub release as latest
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: gh release edit "v${{ steps.target.outputs.version }}" --latest
- name: Show final npm dist-tags
env:
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
run: npm dist-tag ls @keygraph/shannon
- name: Verify Docker latest now points to target
run: docker buildx imagetools inspect "keygraph/shannon:latest"
- name: Write summary
run: |
{
echo "## Rollback latest"
echo ""
echo "- Target version: \`${{ steps.target.outputs.version }}\`"
echo "- npm package: \`@keygraph/shannon\`"
echo "- Docker image: \`keygraph/shannon\`"
echo "- GitHub release: \`v${{ steps.target.outputs.version }}\` marked as latest"
} >> "$GITHUB_STEP_SUMMARY"
+2 -1
View File
@@ -1,6 +1,7 @@
node_modules/
.env
audit-logs/
workspaces/
credentials/
dist/
repos/
.turbo/
+2
View File
@@ -0,0 +1,2 @@
auto-install-peers=true
strict-peer-dependencies=false
+21
View File
@@ -0,0 +1,21 @@
{
"branches": ["main"],
"plugins": [
"@semantic-release/commit-analyzer",
"@semantic-release/release-notes-generator",
[
"@semantic-release/npm",
{
"npmPublish": false
}
],
[
"@semantic-release/github",
{
"successCommentCondition": false,
"failCommentCondition": false,
"releasedLabels": false
}
]
]
}
+137 -52
View File
@@ -4,60 +4,137 @@ AI-powered penetration testing agent for defensive security analysis. Automates
## Commands
**Prerequisites:** Docker, Anthropic API key in `.env`
**Prerequisites:** Docker, AI provider credentials (`.env` for local, `shn setup` or env vars for npx)
### Dual CLI
Shannon supports two CLI modes, auto-detected based on the current working directory:
| | **npx** (`npx @keygraph/shannon`) | **Local** (`./shannon`) |
|---|---|---|
| **Install** | Zero-install via npm | Clone the repo |
| **Image** | Pulled from Docker Hub (`keygraph/shannon:latest`) | Built locally (`shannon-worker`) |
| **State** | `~/.shannon/` | Project directory |
| **Credentials** | `~/.shannon/config.toml` (via `shn setup`) or env vars | `./.env` |
| **Config** | `~/.shannon/config.toml` (via `shn setup`) | N/A |
| **Prompts** | Bundled in Docker image | Mounted from `./apps/worker/prompts/` (live-editable) |
Mode auto-detection: local mode activates when env var `SHANNON_LOCAL=1` is set by the `./shannon` entry point (`apps/cli/src/mode.ts`). Otherwise npx mode.
### npx Quick Start
```bash
# Configure credentials (interactive wizard)
npx @keygraph/shannon setup
# Or export env vars directly (non-interactive / CI)
export ANTHROPIC_API_KEY=your-key
# Run
npx @keygraph/shannon start -u <url> -r /path/to/repo
```
### Local (Development) Quick Start
```bash
# Setup
cp .env.example .env && edit .env # Set ANTHROPIC_API_KEY
echo "ANTHROPIC_API_KEY=your-key" > .env
# Prepare repo (REPO is a folder name inside ./repos/, not an absolute path)
git clone https://github.com/org/repo.git ./repos/my-repo
# or symlink: ln -s /path/to/existing/repo ./repos/my-repo
# Build (auto-runs if image missing)
./shannon build
# Run
./shannon start URL=<url> REPO=my-repo
./shannon start URL=<url> REPO=my-repo CONFIG=./configs/my-config.yaml
./shannon start -u <url> -r my-repo
./shannon start -u <url> -r my-repo -c ./apps/worker/configs/my-config.yaml
./shannon start -u <url> -r /any/path/to/repo
```
### Common Commands
```bash
# Setup (npx mode only — one-time credential configuration)
npx @keygraph/shannon setup
# Workspaces & Resume
./shannon start URL=<url> REPO=my-repo WORKSPACE=my-audit # New named workspace
./shannon start URL=<url> REPO=my-repo WORKSPACE=my-audit # Resume (same command)
./shannon start URL=<url> REPO=my-repo WORKSPACE=<auto-name> # Resume auto-named run
./shannon workspaces # List all workspaces
./shannon start -u <url> -r my-repo -w my-audit # New named workspace
./shannon start -u <url> -r my-repo -w my-audit # Resume (same command)
./shannon workspaces # List all workspaces
# Monitor
./shannon logs # Real-time worker logs
./shannon logs <workspace> # Tail workflow log
./shannon status # Show running workers
# Temporal Web UI: http://localhost:8233
# Stop
./shannon stop # Preserves workflow data
./shannon stop CLEAN=true # Full cleanup including volumes
./shannon stop # Preserves workflow data
./shannon stop --clean # Full cleanup including volumes (confirms first)
# Build
npm run build
# Image management
./shannon build [--no-cache] # Local mode: build worker image
npx @keygraph/shannon uninstall # npx mode: remove ~/.shannon/ (confirms first)
# Build TypeScript (development)
pnpm run build # Build all packages via Turborepo
pnpm run check # Type-check all packages
pnpm biome # Biome lint + format + import sorting check
pnpm biome:fix # Auto-fix lint, format, and import sorting
```
**Options:** `CONFIG=<file>` (YAML config), `OUTPUT=<path>` (default: `./audit-logs/`), `WORKSPACE=<name>` (named workspace; auto-resumes if exists), `PIPELINE_TESTING=true` (minimal prompts, 10s retries), `REBUILD=true` (force Docker rebuild), `ROUTER=true` (multi-model routing via [claude-code-router](https://github.com/musistudio/claude-code-router))
**Monorepo tooling:** pnpm workspaces, Turborepo for task orchestration, Biome for linting/formatting. TypeScript compiler options shared via `tsconfig.base.json` at the root. All packages extend it, overriding only `rootDir` and `outDir`. Shared devDependencies (`typescript`, `@types/node`, `turbo`, `@biomejs/biome`) are hoisted to the root workspace.
**Options:** `-c <file>` (YAML config), `-o <path>` (output directory), `-w <name>` (named workspace; auto-resumes if exists), `--pipeline-testing` (minimal prompts, 10s retries), `--router` (multi-model routing via [claude-code-router](https://github.com/musistudio/claude-code-router))
## Architecture
### Core Modules
- `src/session-manager.ts` — Agent definitions (`AGENTS` record). Agent types in `src/types/agents.ts`
- `src/config-parser.ts` — YAML config parsing with JSON Schema validation
- `src/ai/claude-executor.ts` — Claude Agent SDK integration with retry logic
- `src/services/` — Business logic layer (Temporal-agnostic). Activities delegate here. Key: `agent-execution.ts`, `error-handling.ts`, `container.ts`
- `src/types/` — Consolidated types: `Result<T,E>`, `ErrorCode`, `AgentName`, `ActivityLogger`, etc.
- `src/utils/` — Shared utilities (file I/O, formatting, concurrency)
### Monorepo Layout
```
apps/cli/ — @keygraph/shannon (published to npm, bundled with tsdown)
apps/worker/ — @shannon/worker (private, Temporal worker + pipeline logic)
```
### CLI Package (`apps/cli/`)
Published as `@keygraph/shannon` on npm. Contains only Docker orchestration logic — no Temporal SDK, business logic, or prompts. Bundled with tsdown for single-file ESM output.
- `apps/cli/src/index.ts` — CLI dispatcher (`setup`, `start`, `stop`, `logs`, `workspaces`, `status`, `build`, `uninstall`, `info`)
- `apps/cli/src/mode.ts` — Auto-detection: local mode if `SHANNON_LOCAL=1` env var is set
- `apps/cli/src/docker.ts` — Compose lifecycle, image pull/build, ephemeral `docker run` worker spawning
- `apps/cli/src/home.ts` — State directory management (`~/.shannon/` for npx, `./` for local)
- `apps/cli/src/env.ts``.env` loading, TOML fallback (npx only) via `apps/cli/src/config/resolver.ts`, credential validation, env flag building
- `apps/cli/src/config/resolver.ts` — Cascading config (npx only): env vars → `~/.shannon/config.toml` (parsed with `smol-toml`)
- `apps/cli/src/config/writer.ts` — TOML serialization and secure file persistence (0o600)
- `apps/cli/src/commands/setup.ts` — Interactive TUI wizard (`@clack/prompts`) for provider credential setup (npx only)
- `apps/cli/src/paths.ts` — Repo/config path resolution (bare name → `./repos/<name>`, or any absolute/relative path)
- `apps/cli/src/commands/` — Command handlers
- `apps/cli/infra/compose.yml` — Bundled Temporal + router compose file for npx mode
- `apps/cli/tsdown.config.ts` — tsdown bundler config
- `shannon` — Node.js entry point (`#!/usr/bin/env node`) that delegates to `apps/cli/dist/index.mjs`
### Docker Architecture
Infra (Temporal + router) runs via `docker-compose.yml`. Workers are ephemeral `docker run --rm` containers, one per scan, each with a unique task queue and isolated volume mounts.
- `docker-compose.yml` — Infra only: `shannon-temporal` (port 7233/8233) and `shannon-router` (port 3456, optional via profile). Network: `shannon-net`
- `Dockerfile` — 2-stage build (builder + Chainguard Wolfi runtime). Uses pnpm. Entrypoint: `CMD ["node", "apps/worker/dist/temporal/worker.js"]`
- No `docker-compose.docker.yml` — host gateway handled via `--add-host` flag in CLI
### Worker Package (`apps/worker/`)
- `apps/worker/src/paths.ts` — Centralized path constants (`PROMPTS_DIR`, `CONFIGS_DIR`, `WORKSPACES_DIR`)
- `apps/worker/src/session-manager.ts` — Agent definitions (`AGENTS` record). Agent types in `apps/worker/src/types/agents.ts`
- `apps/worker/src/config-parser.ts` — YAML config parsing with JSON Schema validation
- `apps/worker/src/ai/claude-executor.ts` — Claude Agent SDK integration with retry logic
- `apps/worker/src/services/` — Business logic layer (Temporal-agnostic). Activities delegate here. Key: `agent-execution.ts`, `error-handling.ts`, `container.ts`
- `apps/worker/src/types/` — Consolidated types: `Result<T,E>`, `ErrorCode`, `AgentName`, `ActivityLogger`, etc.
- `apps/worker/src/utils/` — Shared utilities (file I/O, formatting, concurrency)
### Temporal Orchestration
Durable workflow orchestration with crash recovery, queryable progress, intelligent retry, and parallel execution (5 concurrent agents in vuln/exploit phases).
- `src/temporal/workflows.ts` — Main workflow (`pentestPipelineWorkflow`)
- `src/temporal/activities.ts` — Thin wrappers — heartbeat loop, error classification, container lifecycle. Business logic delegated to `src/services/`
- `src/temporal/activity-logger.ts``TemporalActivityLogger` implementation of `ActivityLogger` interface
- `src/temporal/summary-mapper.ts` — Maps `PipelineSummary` to `WorkflowSummary`
- `src/temporal/worker.ts`Worker entry point
- `src/temporal/client.ts`CLI client for starting workflows
- `src/temporal/shared.ts` — Types, interfaces, query definitions
- `apps/worker/src/temporal/workflows.ts` — Main workflow (`pentestPipelineWorkflow`)
- `apps/worker/src/temporal/activities.ts` — Thin wrappers — heartbeat loop, error classification, container lifecycle. Business logic delegated to `apps/worker/src/services/`
- `apps/worker/src/temporal/activity-logger.ts``TemporalActivityLogger` implementation of `ActivityLogger` interface
- `apps/worker/src/temporal/summary-mapper.ts` — Maps `PipelineSummary` to `WorkflowSummary`
- `apps/worker/src/temporal/worker.ts`Combined worker + client entry point (per-invocation task queue, submits workflow, waits for result)
- `apps/worker/src/temporal/shared.ts`Types, interfaces, query definitions
### Five-Phase Pipeline
1. **Pre-Recon** (`pre-recon`) — External scans (nmap, subfinder, whatweb) + source code analysis
@@ -67,39 +144,43 @@ Durable workflow orchestration with crash recovery, queryable progress, intellig
5. **Reporting** (`report`) — Executive-level security report
### Supporting Systems
- **Configuration** — YAML configs in `configs/` with JSON Schema validation (`config-schema.json`). Supports auth settings, MFA/TOTP, and per-app testing parameters
- **Prompts** — Per-phase templates in `prompts/` with variable substitution (`{{TARGET_URL}}`, `{{CONFIG_CONTEXT}}`). Shared partials in `prompts/shared/` via `src/services/prompt-manager.ts`
- **SDK Integration** — Uses `@anthropic-ai/claude-agent-sdk` with `maxTurns: 10_000` and `bypassPermissions` mode. Playwright MCP for browser automation, TOTP generation via MCP tool. Login flow template at `prompts/shared/login-instructions.txt` supports form, SSO, API, and basic auth
- **Audit System** — Crash-safe append-only logging in `audit-logs/{hostname}_{sessionId}/`. Tracks session metrics, per-agent logs, prompts, and deliverables. WorkflowLogger (`audit/workflow-logger.ts`) provides unified human-readable per-workflow logs, backed by LogStream (`audit/log-stream.ts`) shared stream primitive
- **Deliverables** — Saved to `deliverables/` in the target repo via the `save_deliverable` MCP tool
- **Workspaces & Resume** — Named workspaces via `WORKSPACE=<name>` or auto-named from URL+timestamp. Resume passes `--workspace` to the Temporal client (`src/temporal/client.ts`), which loads `session.json` to detect completed agents. `loadResumeState()` in `src/temporal/activities.ts` validates deliverable existence, restores git checkpoints, and cleans up incomplete deliverables. Workspace listing via `src/temporal/workspaces.ts`
- **Configuration** — YAML configs in `apps/worker/configs/` with JSON Schema validation (`config-schema.json`). Supports auth settings, MFA/TOTP, and per-app testing parameters. Credential resolution — local mode: env vars → `./.env`; npx mode: env vars → `~/.shannon/config.toml` (via `shn setup`)
- **Prompts** — Per-phase templates in `apps/worker/prompts/` with variable substitution (`{{TARGET_URL}}`, `{{CONFIG_CONTEXT}}`). Shared partials in `apps/worker/prompts/shared/` via `apps/worker/src/services/prompt-manager.ts`
- **SDK Integration** — Uses `@anthropic-ai/claude-agent-sdk` with `maxTurns: 10_000` and `bypassPermissions` mode. Browser automation via `playwright-cli` with session isolation (`-s=<session>`). TOTP generation via `generate-totp` CLI tool. Login flow template at `apps/worker/prompts/shared/login-instructions.txt` supports form, SSO, API, and basic auth
- **Audit System** — Crash-safe append-only logging in `workspaces/{hostname}_{sessionId}/`. Tracks session metrics, per-agent logs, prompts, and deliverables. WorkflowLogger (`apps/worker/src/audit/workflow-logger.ts`) provides unified human-readable per-workflow logs, backed by LogStream (`apps/worker/src/audit/log-stream.ts`) shared stream primitive
- **Deliverables** — Saved to `deliverables/` in the target repo via the `save-deliverable` CLI script (`apps/worker/src/scripts/save-deliverable.ts`)
- **Workspaces & Resume** — Named workspaces via `-w <name>` or auto-named from URL+timestamp. Resume detects completed agents via `session.json`. `loadResumeState()` in `apps/worker/src/temporal/activities.ts` validates deliverable existence, restores git checkpoints, and cleans up incomplete deliverables. Workspace listing via `apps/worker/src/temporal/workspaces.ts`
## Development Notes
### Adding a New Agent
1. Define agent in `src/session-manager.ts` (add to `AGENTS` record). `ALL_AGENTS`/`AgentName` types live in `src/types/agents.ts`
2. Create prompt template in `prompts/` (e.g., `vuln-newtype.txt`)
3. Two-layer pattern: add a thin activity wrapper in `src/temporal/activities.ts` (heartbeat + error classification). `AgentExecutionService` in `src/services/agent-execution.ts` handles the agent lifecycle automatically via the `AGENTS` registry
4. Register activity in `src/temporal/workflows.ts` within the appropriate phase
1. Define agent in `apps/worker/src/session-manager.ts` (add to `AGENTS` record). `ALL_AGENTS`/`AgentName` types live in `apps/worker/src/types/agents.ts`
2. Create prompt template in `apps/worker/prompts/` (e.g., `vuln-newtype.txt`)
3. Two-layer pattern: add a thin activity wrapper in `apps/worker/src/temporal/activities.ts` (heartbeat + error classification). `AgentExecutionService` in `apps/worker/src/services/agent-execution.ts` handles the agent lifecycle automatically via the `AGENTS` registry
4. Register activity in `apps/worker/src/temporal/workflows.ts` within the appropriate phase
### Modifying Prompts
- Variable substitution: `{{TARGET_URL}}`, `{{CONFIG_CONTEXT}}`, `{{LOGIN_INSTRUCTIONS}}`
- Shared partials in `prompts/shared/` included via `src/services/prompt-manager.ts`
- Test with `PIPELINE_TESTING=true` for fast iteration
- Shared partials in `apps/worker/prompts/shared/` included via `apps/worker/src/services/prompt-manager.ts`
- Test with `--pipeline-testing` for fast iteration
### Key Design Patterns
- **Configuration-Driven** — YAML configs with JSON Schema validation
- **Progressive Analysis** — Each phase builds on previous results
- **SDK-First** — Claude Agent SDK handles autonomous analysis
- **Modular Error Handling** — `ErrorCode` enum, `Result<T,E>` for explicit error propagation, automatic retry (3 attempts per agent)
- **Services Boundary** — Activities are thin Temporal wrappers; `src/services/` owns business logic, accepts `ActivityLogger`, returns `Result<T,E>`. No Temporal imports in services
- **DI Container** — Per-workflow in `src/services/container.ts`. `AuditSession` excluded (parallel safety)
- **Services Boundary** — Activities are thin Temporal wrappers; `apps/worker/src/services/` owns business logic, accepts `ActivityLogger`, returns `Result<T,E>`. No Temporal imports in services
- **DI Container** — Per-workflow in `apps/worker/src/services/container.ts`. `AuditSession` excluded (parallel safety)
- **Ephemeral Workers** — Each scan runs in its own `docker run --rm` container with a per-invocation task queue. Temporal routes activities by queue name, so per-scan queues ensure activities never land on a worker with the wrong repo mounted
### Security
Defensive security tool only. Use only on systems you own or have explicit permission to test.
## Code Style Guidelines
### Formatting
Biome handles formatting and linting. Run `pnpm biome:fix` to auto-fix. Config in `biome.json`: single quotes, semicolons, trailing commas, 2-space indent, 120 char line width.
### Clarity Over Brevity
- Optimize for readability, not line count — three clear lines beat one dense expression
- Use descriptive names that convey intent
@@ -142,18 +223,22 @@ Comments must be **timeless** — no references to this conversation, refactorin
## Key Files
**Entry Points:** `src/temporal/workflows.ts`, `src/temporal/activities.ts`, `src/temporal/worker.ts`, `src/temporal/client.ts`
**CLI:** `shannon` (entry point), `apps/cli/src/index.ts` (dispatcher), `apps/cli/src/docker.ts` (orchestration), `apps/cli/src/mode.ts` (auto-detection)
**Core Logic:** `src/session-manager.ts`, `src/ai/claude-executor.ts`, `src/config-parser.ts`, `src/services/`, `src/audit/`
**Entry Points:** `apps/worker/src/temporal/workflows.ts`, `apps/worker/src/temporal/activities.ts`, `apps/worker/src/temporal/worker.ts`
**Config:** `shannon` (CLI), `docker-compose.yml`, `configs/`, `prompts/`
**Core Logic:** `apps/worker/src/session-manager.ts`, `apps/worker/src/ai/claude-executor.ts`, `apps/worker/src/config-parser.ts`, `apps/worker/src/services/`, `apps/worker/src/audit/`
**Config:** `docker-compose.yml`, `apps/cli/infra/compose.yml`, `apps/worker/configs/`, `apps/worker/prompts/`, `tsconfig.base.json` (shared compiler options), `turbo.json`, `biome.json`
**CI/CD:** `.github/workflows/release.yml` (Docker Hub push + npm publish + GitHub release, manual dispatch)
## Troubleshooting
- **"Repository not found"** — `REPO` must be a folder name inside `./repos/`, not an absolute path. Clone or symlink your repo there first: `ln -s /path/to/repo ./repos/my-repo`
- **"Repository not found"** — Pass a bare name (`-r my-repo`) for `./repos/my-repo`, or a path (`-r /path/to/repo`) for any directory
- **"Temporal not ready"** — Wait for health check or `docker compose logs temporal`
- **Worker not processing** — Check `docker compose ps`
- **Reset state** — `./shannon stop CLEAN=true`
- **Worker not processing** — Check `docker ps --filter "name=shannon-worker-"`
- **Reset state** — `./shannon stop --clean`
- **Local apps unreachable** — Use `host.docker.internal` instead of `localhost`
- **Missing tools** — Use `PIPELINE_TESTING=true` to skip nmap/subfinder/whatweb (graceful degradation)
- **Missing tools** — Use `--pipeline-testing` to skip nmap/subfinder/whatweb (graceful degradation)
- **Container permissions** — On Linux, may need `sudo` for docker commands
+55 -37
View File
@@ -38,17 +38,38 @@ ENV CGO_ENABLED=1
RUN mkdir -p $GOPATH/bin
# Install Go-based security tools
RUN go install -v github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest
# Install WhatWeb from GitHub (Ruby-based tool)
RUN git clone --depth 1 https://github.com/urbanadventurer/WhatWeb.git /opt/whatweb && \
RUN go install -v github.com/projectdiscovery/subfinder/v2/cmd/subfinder@v2.13.0
# Install WhatWeb from release tarball (Ruby-based tool)
RUN curl -sL https://github.com/urbanadventurer/WhatWeb/archive/refs/tags/v0.6.3.tar.gz | tar xz -C /opt && \
mv /opt/WhatWeb-0.6.3 /opt/whatweb && \
chmod +x /opt/whatweb/whatweb && \
gem install addressable && \
gem install addressable -v 2.8.9 && \
echo '#!/bin/bash' > /usr/local/bin/whatweb && \
echo 'cd /opt/whatweb && exec ./whatweb "$@"' >> /usr/local/bin/whatweb && \
chmod +x /usr/local/bin/whatweb
# Install Python-based tools
RUN pip3 install --no-cache-dir schemathesis
RUN pip3 install --no-cache-dir schemathesis==4.13.0
# Install pnpm
RUN npm install -g pnpm@10.12.1
# Build Node.js application in builder to avoid QEMU emulation failures in CI
WORKDIR /app
# Copy workspace manifests for install layer caching
COPY package.json pnpm-workspace.yaml pnpm-lock.yaml .npmrc ./
COPY apps/worker/package.json ./apps/worker/
COPY apps/cli/package.json ./apps/cli/
RUN pnpm install --frozen-lockfile
COPY . .
# Build worker. CLI not needed in Docker
RUN pnpm --filter @shannon/worker run build
RUN pnpm prune --prod
# Runtime stage - Minimal production image
FROM cgr.dev/chainguard/wolfi-base:latest AS runtime
@@ -95,67 +116,64 @@ COPY --from=builder /opt/whatweb /opt/whatweb
COPY --from=builder /usr/local/bin/whatweb /usr/local/bin/whatweb
# Install WhatWeb Ruby dependencies in runtime stage
RUN gem install addressable
RUN gem install addressable -v 2.8.9
# Copy Python packages from builder
COPY --from=builder /usr/lib/python3.*/site-packages /usr/lib/python3.12/site-packages
COPY --from=builder /usr/bin/schemathesis /usr/bin/
# Create non-root user for security
# Create non-root user
RUN addgroup -g 1001 pentest && \
adduser -u 1001 -G pentest -s /bin/bash -D pentest
# System-level git config (survives UID remapping in entrypoint)
RUN git config --system user.email "agent@localhost" && \
git config --system user.name "Pentest Agent" && \
git config --system --add safe.directory '*'
# Set working directory
WORKDIR /app
# Copy package files first for better caching
COPY package*.json ./
COPY mcp-server/package*.json ./mcp-server/
# Copy only what the worker needs (skip CLI source, infra, tsdown artifacts)
COPY --from=builder /app/package.json /app/pnpm-workspace.yaml /app/pnpm-lock.yaml /app/.npmrc /app/
COPY --from=builder /app/node_modules /app/node_modules
COPY --from=builder /app/apps/worker /app/apps/worker
COPY --from=builder /app/apps/cli/package.json /app/apps/cli/package.json
# Install Node.js dependencies (including devDependencies for TypeScript build)
RUN npm ci && \
cd mcp-server && npm ci && cd .. && \
npm cache clean --force
RUN npm install -g @anthropic-ai/claude-code@2.1.84 @playwright/cli@0.1.1
RUN mkdir -p /tmp/.claude/skills && \
playwright-cli install --skills && \
cp -r .claude/skills/playwright-cli /tmp/.claude/skills/ && \
rm -rf .claude
# Copy application source code
COPY . .
# Build TypeScript (mcp-server first, then main project)
RUN cd mcp-server && npm run build && cd .. && npm run build
# Remove devDependencies after build to reduce image size
RUN npm prune --production && \
cd mcp-server && npm prune --production
RUN npm install -g @anthropic-ai/claude-code
# Symlink CLI tools onto PATH
RUN ln -s /app/apps/worker/dist/scripts/save-deliverable.js /usr/local/bin/save-deliverable && \
chmod +x /app/apps/worker/dist/scripts/save-deliverable.js && \
ln -s /app/apps/worker/dist/scripts/generate-totp.js /usr/local/bin/generate-totp && \
chmod +x /app/apps/worker/dist/scripts/generate-totp.js
# Create directories for session data and ensure proper permissions
RUN mkdir -p /app/sessions /app/deliverables /app/repos /app/configs && \
RUN mkdir -p /app/sessions /app/deliverables /app/repos /app/workspaces && \
mkdir -p /tmp/.cache /tmp/.config /tmp/.npm && \
chmod 777 /app && \
chmod 777 /tmp/.cache && \
chmod 777 /tmp/.config && \
chmod 777 /tmp/.npm && \
chown -R pentest:pentest /app
chown -R pentest:pentest /app /tmp/.claude
# Switch to non-root user
USER pentest
COPY entrypoint.sh /app/entrypoint.sh
RUN chmod +x /app/entrypoint.sh
# Set environment variables
ENV NODE_ENV=production
ENV PATH="/usr/local/bin:$PATH"
ENV SHANNON_DOCKER=true
ENV PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1
ENV PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH=/usr/bin/chromium-browser
ENV PLAYWRIGHT_MCP_EXECUTABLE_PATH=/usr/bin/chromium-browser
ENV npm_config_cache=/tmp/.npm
ENV HOME=/tmp
ENV XDG_CACHE_HOME=/tmp/.cache
ENV XDG_CONFIG_HOME=/tmp/.config
# Configure Git identity and trust all directories
RUN git config --global user.email "agent@localhost" && \
git config --global user.name "Pentest Agent" && \
git config --global --add safe.directory '*'
# Set entrypoint
ENTRYPOINT ["node", "dist/shannon.js"]
ENTRYPOINT ["/app/entrypoint.sh"]
CMD ["node", "apps/worker/dist/temporal/worker.js"]
+420 -252
View File
File diff suppressed because it is too large Load Diff
+3
View File
@@ -0,0 +1,3 @@
src/
tsconfig.json
node_modules/
+22
View File
@@ -0,0 +1,22 @@
<div align="center">
<img src="https://raw.githubusercontent.com/KeygraphHQ/shannon/main/assets/github-banner.png" alt="Shannon — AI Pentester for Web Applications and APIs" width="100%">
# Shannon — AI Pentester by Keygraph
Shannon is an autonomous, white-box AI pentester for web applications and APIs. <br />
It analyzes your source code, identifies attack vectors, and executes real exploits to prove vulnerabilities before they reach production.
---
<a href="https://github.com/KeygraphHQ/shannon/discussions/categories/announcements"><img src="https://raw.githubusercontent.com/KeygraphHQ/shannon/main/assets/announcements.png" height="40" alt="Announcements"></a>
<a href="https://discord.gg/9ZqQPuhJB7"><img src="https://raw.githubusercontent.com/KeygraphHQ/shannon/main/assets/discord.png" height="40" alt="Join Discord"></a>
<a href="https://keygraph.io/"><img src="https://raw.githubusercontent.com/KeygraphHQ/shannon/main/assets/Keygraph_Button.png" height="40" alt="Visit Keygraph.io"></a>
<a href="https://www.linkedin.com/company/keygraph/"><img src="https://raw.githubusercontent.com/KeygraphHQ/shannon/main/assets/linkedin.png" height="40" alt="Follow Us on Linkedin"></a>
---
**Full README and usage guide**
[https://github.com/KeygraphHQ/shannon#readme](https://github.com/KeygraphHQ/shannon#readme)
</div>
+50
View File
@@ -0,0 +1,50 @@
networks:
default:
name: shannon-net
services:
temporal:
image: temporalio/temporal:latest
container_name: shannon-temporal
command: ["server", "start-dev", "--db-filename", "/home/temporal/temporal.db", "--ip", "0.0.0.0"]
ports:
- "127.0.0.1:7233:7233"
- "127.0.0.1:8233:8233"
volumes:
- temporal-data:/home/temporal
healthcheck:
test: ["CMD", "temporal", "operator", "cluster", "health", "--address", "localhost:7233"]
interval: 10s
timeout: 5s
retries: 10
start_period: 30s
router:
image: node:20-slim
container_name: shannon-router
profiles: ["router"]
command: >
sh -c "apt-get update && apt-get install -y gettext-base &&
npm install -g @musistudio/claude-code-router &&
mkdir -p /root/.claude-code-router &&
envsubst < /config/router-config.json > /root/.claude-code-router/config.json &&
ccr start"
ports:
- "127.0.0.1:3456:3456"
volumes:
- ./router-config.json:/config/router-config.json:ro
environment:
- HOST=0.0.0.0
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:-}
- OPENAI_API_KEY=${OPENAI_API_KEY:-}
- OPENROUTER_API_KEY=${OPENROUTER_API_KEY:-}
- ROUTER_DEFAULT=${ROUTER_DEFAULT:-openai,gpt-4o}
healthcheck:
test: ["CMD", "node", "-e", "require('http').get('http://localhost:3456/health', r => process.exit(r.statusCode === 200 ? 0 : 1)).on('error', () => process.exit(1))"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
volumes:
temporal-data:
@@ -19,9 +19,7 @@
"name": "openrouter",
"api_base_url": "https://openrouter.ai/api/v1/chat/completions",
"api_key": "$OPENROUTER_API_KEY",
"models": [
"google/gemini-3-flash-preview"
],
"models": ["google/gemini-3-flash-preview"],
"transformer": {
"use": ["openrouter"]
}
+50
View File
@@ -0,0 +1,50 @@
{
"name": "@keygraph/shannon",
"version": "0.0.0",
"description": "Shannon - Autonomous white-box AI pentester for web applications and APIs by Keygraph",
"type": "module",
"main": "dist/index.mjs",
"bin": {
"shannon": "dist/index.mjs"
},
"files": [
"dist",
"infra"
],
"scripts": {
"build": "tsdown",
"check": "tsc --noEmit",
"clean": "rm -rf dist"
},
"dependencies": {
"@clack/prompts": "^1.1.0",
"chokidar": "^5.0.0",
"dotenv": "^17.3.1",
"smol-toml": "^1.6.1"
},
"keywords": [
"security",
"pentest",
"penetration-testing",
"vulnerability-assessment",
"ai",
"white-box",
"owasp",
"exploitation",
"appsec",
"keygraph"
],
"author": "",
"license": "AGPL-3.0-only",
"repository": {
"type": "git",
"url": "git+https://github.com/KeygraphHQ/shannon.git",
"directory": "apps/cli"
},
"engines": {
"node": ">=18"
},
"devDependencies": {
"tsdown": "^0.21.5"
}
}
+19
View File
@@ -0,0 +1,19 @@
/**
* `shannon build` command — build the worker Docker image locally.
* Only available in local mode (running from cloned repository).
*/
import { buildImage } from '../docker.js';
import { isLocal } from '../mode.js';
export function build(noCache: boolean): void {
if (!isLocal()) {
console.error('ERROR: Build is only available when running from the Shannon repository');
console.error(' (Dockerfile not found in current directory)');
console.error('');
console.error('For npx usage, run: shannon update');
process.exit(1);
}
buildImage(noCache);
}
+106
View File
@@ -0,0 +1,106 @@
/**
* `shannon logs` command — tail a workspace's workflow log.
*
* Uses chokidar for reliable cross-platform file watching and
* bounded synchronous reads to prevent duplicate output.
*/
import fs from 'node:fs';
import path from 'node:path';
import { watch } from 'chokidar';
import { getWorkspacesDir } from '../home.js';
// Match the exact line the worker writes — anchored to prevent false positives from agent output
const COMPLETION_PATTERN = /^Workflow (COMPLETED|FAILED)$/m;
/** Read a byte range from a file and return it as a UTF-8 string. */
function readRange(filePath: string, start: number, end: number): string {
const length = end - start;
const buffer = Buffer.alloc(length);
const fd = fs.openSync(filePath, 'r');
try {
fs.readSync(fd, buffer, 0, length, start);
} finally {
fs.closeSync(fd);
}
return buffer.toString('utf-8');
}
/** Resolve a workspace ID to its workflow.log path, or exit with an error. */
function resolveLogFile(workspaceId: string): string {
const workspacesDir = getWorkspacesDir();
// 1. Direct match
const directPath = path.join(workspacesDir, workspaceId, 'workflow.log');
if (fs.existsSync(directPath)) return directPath;
// 2. Resume workflow ID (e.g. workspace_resume_123)
const resumeBase = workspaceId.replace(/_resume_\d+$/, '');
if (resumeBase !== workspaceId) {
const resumePath = path.join(workspacesDir, resumeBase, 'workflow.log');
if (fs.existsSync(resumePath)) return resumePath;
}
// 3. Named workspace ID (e.g. workspace_shannon-123)
const namedBase = workspaceId.replace(/_shannon-\d+$/, '');
if (namedBase !== workspaceId) {
const namedPath = path.join(workspacesDir, namedBase, 'workflow.log');
if (fs.existsSync(namedPath)) return namedPath;
}
console.error(`ERROR: Workflow log not found for: ${workspaceId}`);
console.error('');
console.error('Possible causes:');
console.error(" - Workflow hasn't started yet");
console.error(' - Workspace ID is incorrect');
console.error('');
console.error('Check the Temporal Web UI at http://localhost:8233 for workflow details');
process.exit(1);
}
export function logs(workspaceId: string): void {
const logFile = resolveLogFile(workspaceId);
let position = 0;
/**
* Output any new content appended since the last read.
* Returns true when the workflow completion marker is detected.
*/
function flush(): boolean {
try {
const { size } = fs.statSync(logFile);
if (size <= position) return false;
const data = readRange(logFile, position, size);
process.stdout.write(data);
position = size;
return COMPLETION_PATTERN.test(data);
} catch {
// File deleted or unreadable — treat as done
return true;
}
}
console.log(`Tailing workflow log: ${logFile}`);
// 1. Output existing content
if (flush()) {
process.exit(0);
}
// 2. Watch for appended content via chokidar
const watcher = watch(logFile, { persistent: true });
const shutdown = (): void => {
watcher.close().finally(() => process.exit(0));
// Safety net — force exit if watcher.close() stalls
setTimeout(() => process.exit(0), 1000).unref();
};
watcher.on('change', () => {
if (flush()) shutdown();
});
process.on('SIGINT', shutdown);
}
+350
View File
@@ -0,0 +1,350 @@
/**
* `shn setup` — interactive TUI wizard for one-time credential configuration.
*
* Walks the user through selecting a provider and entering credentials,
* then persists everything to ~/.shannon/config.toml with 0o600 permissions.
*/
import fs from 'node:fs';
import os from 'node:os';
import path from 'node:path';
import * as p from '@clack/prompts';
import { type ShannonConfig, saveConfig } from '../config/writer.js';
const SHANNON_HOME = path.join(os.homedir(), '.shannon');
type Provider = 'anthropic' | 'custom_base_url' | 'bedrock' | 'vertex' | 'router';
export async function setup(): Promise<void> {
p.intro('Shannon Setup');
// 1. Select provider
const provider = await p.select({
message: 'Select your AI provider',
options: [
{ value: 'anthropic' as const, label: 'Claude Direct', hint: 'recommended' },
{ value: 'custom_base_url' as const, label: 'Custom Base URL', hint: 'proxies, gateways' },
{ value: 'bedrock' as const, label: 'Claude via AWS Bedrock' },
{ value: 'vertex' as const, label: 'Claude via Google Vertex AI' },
{ value: 'router' as const, label: 'Router', hint: 'experimental' },
],
});
if (p.isCancel(provider)) return cancelAndExit();
const config = await setupProvider(provider as Provider);
// 2. Save config
saveConfig(config);
const configPath = path.join(SHANNON_HOME, 'config.toml');
p.log.success(`Configuration saved to ${configPath}`);
p.outro('Run `npx @keygraph/shannon start` to begin a scan.');
}
async function setupProvider(provider: Provider): Promise<ShannonConfig> {
switch (provider) {
case 'anthropic':
return setupAnthropic();
case 'custom_base_url':
return setupCustomBaseUrl();
case 'bedrock':
return setupBedrock();
case 'vertex':
return setupVertex();
case 'router':
return setupRouter();
}
}
// === Provider Setup Flows ===
async function setupAnthropic(): Promise<ShannonConfig> {
const authMethod = await p.select({
message: 'Authentication method',
options: [
{ value: 'api_key' as const, label: 'API Key' },
{ value: 'oauth' as const, label: 'OAuth Token' },
],
});
if (p.isCancel(authMethod)) return cancelAndExit();
const config: ShannonConfig = {};
if (authMethod === 'oauth') {
const token = await promptSecret('Enter your OAuth token');
config.anthropic = { oauth_token: token };
} else {
const apiKey = await promptSecret('Enter your Anthropic API key');
config.anthropic = { api_key: apiKey };
}
const customizeModels = await p.confirm({
message:
'Do you want to change the default models?\n' +
' Small - claude-haiku-4-5-20251001\n' +
' Medium - claude-sonnet-4-6\n' +
' Large - claude-opus-4-6',
initialValue: false,
});
if (p.isCancel(customizeModels)) return cancelAndExit();
if (customizeModels) {
const small = await p.text({
message: 'Small model ID',
initialValue: 'claude-haiku-4-5-20251001',
validate: required('Small model ID is required'),
});
if (p.isCancel(small)) return cancelAndExit();
const medium = await p.text({
message: 'Medium model ID',
initialValue: 'claude-sonnet-4-6',
validate: required('Medium model ID is required'),
});
if (p.isCancel(medium)) return cancelAndExit();
const large = await p.text({
message: 'Large model ID',
initialValue: 'claude-opus-4-6',
validate: required('Large model ID is required'),
});
if (p.isCancel(large)) return cancelAndExit();
config.models = { small, medium, large };
}
return config;
}
async function setupCustomBaseUrl(): Promise<ShannonConfig> {
const baseUrl = await p.text({
message: 'Endpoint URL',
placeholder: 'https://your-proxy.example.com',
validate: (value) => {
if (!value) return 'Endpoint URL is required';
try {
new URL(value);
} catch {
return 'Must be a valid URL';
}
return undefined;
},
});
if (p.isCancel(baseUrl)) return cancelAndExit();
const authToken = await promptSecret('Enter the auth token for the custom endpoint');
const config: ShannonConfig = {
custom_base_url: { base_url: baseUrl, auth_token: authToken },
};
const customizeModels = await p.confirm({
message:
'Do you want to change the default models?\n' +
' Small - claude-haiku-4-5-20251001\n' +
' Medium - claude-sonnet-4-6\n' +
' Large - claude-opus-4-6',
initialValue: false,
});
if (p.isCancel(customizeModels)) return cancelAndExit();
if (customizeModels) {
const small = await p.text({
message: 'Small model ID',
initialValue: 'claude-haiku-4-5-20251001',
validate: required('Small model ID is required'),
});
if (p.isCancel(small)) return cancelAndExit();
const medium = await p.text({
message: 'Medium model ID',
initialValue: 'claude-sonnet-4-6',
validate: required('Medium model ID is required'),
});
if (p.isCancel(medium)) return cancelAndExit();
const large = await p.text({
message: 'Large model ID',
initialValue: 'claude-opus-4-6',
validate: required('Large model ID is required'),
});
if (p.isCancel(large)) return cancelAndExit();
config.models = { small, medium, large };
}
return config;
}
async function setupBedrock(): Promise<ShannonConfig> {
const region = await p.text({
message: 'AWS Region',
placeholder: 'us-east-1',
validate: required('AWS Region is required'),
});
if (p.isCancel(region)) return cancelAndExit();
const token = await promptSecret('Enter your AWS Bearer Token');
const small = await p.text({
message: 'Small model ID',
placeholder: 'us.anthropic.claude-haiku-4-5-20251001-v1:0',
validate: required('Small model ID is required'),
});
if (p.isCancel(small)) return cancelAndExit();
const medium = await p.text({
message: 'Medium model ID',
placeholder: 'us.anthropic.claude-sonnet-4-6',
validate: required('Medium model ID is required'),
});
if (p.isCancel(medium)) return cancelAndExit();
const large = await p.text({
message: 'Large model ID',
placeholder: 'us.anthropic.claude-opus-4-6',
validate: required('Large model ID is required'),
});
if (p.isCancel(large)) return cancelAndExit();
return {
bedrock: { use: true, region, token },
models: { small, medium, large },
};
}
async function setupVertex(): Promise<ShannonConfig> {
// 1. Collect region and project ID
const region = await p.text({
message: 'Google Cloud region',
placeholder: 'us-east5',
validate: required('Region is required'),
});
if (p.isCancel(region)) return cancelAndExit();
const projectId = await p.text({
message: 'GCP Project ID',
validate: required('Project ID is required'),
});
if (p.isCancel(projectId)) return cancelAndExit();
// 2. File picker for service account key
p.log.info('Select the path to your GCP Service Account JSON key file.');
const keySourcePath = await p.path({
message: 'Service Account JSON key file',
validate: (value) => {
if (!value) return 'Path is required';
if (!fs.existsSync(value)) return 'File not found';
if (!value.endsWith('.json')) return 'Must be a .json file';
return undefined;
},
});
if (p.isCancel(keySourcePath)) return cancelAndExit();
// 3. Copy key to ~/.shannon/ and lock permissions
const destPath = path.join(SHANNON_HOME, 'google-sa-key.json');
fs.mkdirSync(SHANNON_HOME, { recursive: true });
fs.copyFileSync(keySourcePath, destPath);
fs.chmodSync(destPath, 0o600);
p.log.success(`Key copied to ${destPath} (permissions: 0600)`);
// 4. Model tiers
const models = await p.group({
small: () =>
p.text({
message: 'Small model ID',
placeholder: 'claude-haiku-4-5@20251001',
validate: required('Small model ID is required'),
}),
medium: () =>
p.text({
message: 'Medium model ID',
placeholder: 'claude-sonnet-4-6',
validate: required('Medium model ID is required'),
}),
large: () =>
p.text({
message: 'Large model ID',
placeholder: 'claude-opus-4-6',
validate: required('Large model ID is required'),
}),
});
if (p.isCancel(models)) return cancelAndExit();
return {
vertex: {
use: true,
region,
project_id: projectId,
key_path: destPath,
},
models: { small: models.small, medium: models.medium, large: models.large },
};
}
async function setupRouter(): Promise<ShannonConfig> {
const routerProvider = await p.select({
message: 'Router provider',
options: [
{ value: 'openai' as const, label: 'OpenAI' },
{ value: 'openrouter' as const, label: 'OpenRouter' },
],
});
if (p.isCancel(routerProvider)) return cancelAndExit();
const apiKey = await promptSecret(
routerProvider === 'openai' ? 'Enter your OpenAI API key' : 'Enter your OpenRouter API key',
);
let defaultModel: string;
if (routerProvider === 'openai') {
const model = await p.select({
message: 'Default model',
options: [
{ value: 'gpt-5.2' as const, label: 'GPT-5.2' },
{ value: 'gpt-5-mini' as const, label: 'GPT-5 Mini' },
],
});
if (p.isCancel(model)) return cancelAndExit();
defaultModel = `openai,${model}`;
} else {
const model = await p.select({
message: 'Default model',
options: [{ value: 'google/gemini-3-flash-preview' as const, label: 'Google Gemini 3 Flash Preview' }],
});
if (p.isCancel(model)) return cancelAndExit();
defaultModel = `openrouter,${model}`;
}
const router: ShannonConfig['router'] = { default: defaultModel };
if (routerProvider === 'openai') {
router.openai_key = apiKey;
} else {
router.openrouter_key = apiKey;
}
return { router };
}
// === Helpers ===
async function promptSecret(message: string): Promise<string> {
const value = await p.password({
message,
validate: required(`${message.replace(/^Enter /, '')} is required`),
});
if (p.isCancel(value)) return cancelAndExit();
return value;
}
function required(errorMessage: string): (value: string | undefined) => string | undefined {
return (value) => {
if (!value) return errorMessage;
return undefined;
};
}
function cancelAndExit(): never {
p.cancel('Setup cancelled.');
process.exit(0);
}
+226
View File
@@ -0,0 +1,226 @@
/**
* `shannon start` command — launch a pentest scan.
*
* Handles both local mode (local build, ./workspaces/, mounted prompts)
* and npx mode (Docker Hub pull, ~/.shannon/).
*/
import { execFileSync } from 'node:child_process';
import fs from 'node:fs';
import path from 'node:path';
import { ensureImage, ensureInfra, randomSuffix, spawnWorker } from '../docker.js';
import { buildEnvFlags, isRouterConfigured, loadEnv, validateCredentials } from '../env.js';
import { getCredentialsPath, getWorkspacesDir, initHome } from '../home.js';
import { isLocal } from '../mode.js';
import { ensureDeliverables, resolveConfig, resolveRepo } from '../paths.js';
import { displaySplash } from '../splash.js';
export interface StartArgs {
url: string;
repo: string;
config?: string;
workspace?: string;
output?: string;
pipelineTesting: boolean;
router: boolean;
version: string;
}
export async function start(args: StartArgs): Promise<void> {
// 1. Initialize state directories and load env
initHome();
loadEnv();
// 2. Validate credentials and auto-detect router mode
const creds = validateCredentials();
if (!creds.valid) {
console.error(`ERROR: ${creds.error}`);
process.exit(1);
}
const useRouter = args.router || isRouterConfigured();
// 3. Resolve paths
const repo = resolveRepo(args.repo);
const config = args.config ? resolveConfig(args.config) : undefined;
ensureDeliverables(repo.hostPath);
// 4. Ensure workspaces dir is writable by container user (UID 1001)
const workspacesDir = getWorkspacesDir();
fs.mkdirSync(workspacesDir, { recursive: true });
fs.chmodSync(workspacesDir, 0o777);
// 5. Handle router env
if (useRouter) {
process.env.ANTHROPIC_BASE_URL = 'http://shannon-router:3456';
process.env.ANTHROPIC_AUTH_TOKEN = 'shannon-router-key';
}
// 6. Ensure image (auto-build in dev, pull in npx) and start infra
ensureImage(args.version);
await ensureInfra(useRouter);
// 7. Generate unique task queue and container name
const suffix = randomSuffix();
const taskQueue = `shannon-${suffix}`;
const containerName = `shannon-worker-${suffix}`;
// 8. Generate workspace name if not provided
const workspace =
args.workspace ?? `${new URL(args.url).hostname.replace(/[^a-zA-Z0-9-]/g, '-')}_shannon-${Date.now()}`;
// 9. Resolve credentials — mount single file to fixed container path
const credentialsPath = getCredentialsPath();
const hasCredentials = fs.existsSync(credentialsPath);
if (hasCredentials) {
process.env.GOOGLE_APPLICATION_CREDENTIALS = '/app/credentials/google-sa-key.json';
}
// 10. Resolve output directory
const outputDir = args.output ? path.resolve(args.output) : undefined;
if (outputDir) {
fs.mkdirSync(outputDir, { recursive: true });
}
// 11. Resolve prompts directory (local mode only)
const promptsDir = isLocal() ? path.resolve('apps/worker/prompts') : undefined;
// 12. Display splash screen
displaySplash(isLocal() ? undefined : args.version);
// 13. Spawn worker container
const proc = spawnWorker({
version: args.version,
url: args.url,
repo,
workspacesDir,
taskQueue,
containerName,
envFlags: buildEnvFlags(),
...(config && { config }),
...(hasCredentials && { credentials: credentialsPath }),
...(promptsDir && { promptsDir }),
...(outputDir && { outputDir }),
...(workspace && { workspace }),
...(args.pipelineTesting && { pipelineTesting: true }),
});
// 14. Wait for workflow to register, then display info
proc.on('error', (err) => {
console.error(`Failed to start worker: ${err.message}`);
process.exit(1);
});
// Detect whether this is a fresh workspace or a resume by checking session.json existence
const sessionJson = path.join(workspacesDir, workspace, 'session.json');
const isResume = fs.existsSync(sessionJson);
let initialResumeCount = 0;
if (isResume) {
try {
const session = JSON.parse(fs.readFileSync(sessionJson, 'utf-8'));
initialResumeCount = session.session?.resumeAttempts?.length ?? 0;
} catch {
// Corrupted file — worker will handle validation
}
}
// Poll for workflow to register in session.json
process.stdout.write('Waiting for workflow to start...');
let workflowId = '';
let started = false;
let attempts = 0;
const pollInterval = setInterval(() => {
attempts++;
if (attempts > 60) {
clearInterval(pollInterval);
process.stdout.write('\n');
console.error('Timeout waiting for workflow to start');
process.exit(1);
}
try {
const session = JSON.parse(fs.readFileSync(sessionJson, 'utf-8'));
const resumeAttempts: { workflowId: string }[] = session.session?.resumeAttempts ?? [];
// Fresh: session.json appears with originalWorkflowId. Resume: new resumeAttempts entry.
const ready = isResume ? resumeAttempts.length > initialResumeCount : !!session.session?.originalWorkflowId;
if (ready) {
clearInterval(pollInterval);
started = true;
// Latest workflow ID: last resume attempt, or originalWorkflowId for fresh scans
workflowId = resumeAttempts.at(-1)?.workflowId ?? session.session?.originalWorkflowId ?? '';
// Clear waiting line and show info
process.stdout.write('\r\x1b[K');
printInfo(args, useRouter, workspace, workflowId, repo.hostPath, workspacesDir);
return;
}
} catch {
// File doesn't exist yet
}
process.stdout.write('.');
}, 2000);
// Stop the worker container only if it hasn't started yet
let cleaned = false;
const cleanup = (): void => {
if (cleaned || started) return;
cleaned = true;
clearInterval(pollInterval);
console.log(`\nStopping worker ${containerName}...`);
try {
execFileSync('docker', ['stop', containerName], { stdio: 'pipe' });
} catch {
// Container may have already exited
}
};
process.on('SIGINT', () => {
cleanup();
process.exit(0);
});
process.on('SIGTERM', () => {
cleanup();
process.exit(0);
});
process.on('exit', cleanup);
}
function printInfo(
args: StartArgs,
routerActive: boolean,
workspace: string,
workflowId: string,
repoPath: string,
workspacesDir: string,
): void {
const logsCmd = isLocal() ? `./shannon logs ${workspace}` : `npx @keygraph/shannon logs ${workspace}`;
const reportsPath = path.join(workspacesDir, workspace);
console.log(` Target: ${args.url}`);
console.log(` Repository: ${repoPath}`);
console.log(` Workspace: ${workspace}`);
if (args.config) {
console.log(` Config: ${path.resolve(args.config)}`);
}
if (args.pipelineTesting) {
console.log(' Mode: Pipeline Testing');
}
if (routerActive) {
console.log(' Router: Enabled');
}
console.log('');
console.log(' Monitor:');
if (workflowId) {
console.log(` Web UI: http://localhost:8233/namespaces/default/workflows/${workflowId}`);
} else {
console.log(' Web UI: http://localhost:8233');
}
console.log(` Logs: ${logsCmd}`);
console.log('');
console.log(' Output:');
console.log(` Reports: ${reportsPath}/`);
console.log('');
}
+24
View File
@@ -0,0 +1,24 @@
/**
* `shannon status` command — show running workers and Temporal health.
*/
import { isTemporalReady, listRunningWorkers } from '../docker.js';
export function status(): void {
// 1. Temporal health
const temporalUp = isTemporalReady();
console.log(`Temporal: ${temporalUp ? 'running' : 'not running'}`);
if (temporalUp) {
console.log(' Web UI: http://localhost:8233');
}
console.log('');
// 2. Running workers
const workers = listRunningWorkers();
if (workers) {
console.log('Workers:');
console.log(workers);
} else {
console.log('Workers: none running');
}
}
+21
View File
@@ -0,0 +1,21 @@
/**
* `shannon stop` command — stop workers and infrastructure.
*/
import * as p from '@clack/prompts';
import { stopInfra, stopWorkers } from '../docker.js';
export async function stop(clean: boolean): Promise<void> {
if (clean) {
const confirmed = await p.confirm({
message: 'This will stop all running scans and remove the Temporal data. Continue?',
});
if (p.isCancel(confirmed) || !confirmed) {
p.cancel('Aborted.');
process.exit(0);
}
}
stopWorkers();
stopInfra(clean);
}
+37
View File
@@ -0,0 +1,37 @@
/**
* `shn uninstall` command — remove ~/.shannon/ after confirmation (npx only).
*/
import fs from 'node:fs';
import os from 'node:os';
import path from 'node:path';
import * as p from '@clack/prompts';
import { stopInfra, stopWorkers } from '../docker.js';
const SHANNON_HOME = path.join(os.homedir(), '.shannon');
export async function uninstall(): Promise<void> {
p.intro('Shannon Uninstall');
if (!fs.existsSync(SHANNON_HOME)) {
p.log.info('Nothing to remove. Shannon is not configured on this machine.');
p.outro('Done.');
return;
}
const confirmed = await p.confirm({
message: 'This will permanently remove all past scan data, saved configurations, and API keys. Continue?',
});
if (p.isCancel(confirmed) || !confirmed) {
p.cancel('Aborted.');
process.exit(0);
}
// Stop any running containers first
stopWorkers();
stopInfra(false);
fs.rmSync(SHANNON_HOME, { recursive: true, force: true });
p.log.success('All Shannon data has been removed.');
p.outro('Shannon has been uninstalled. Run `npx @keygraph/shannon setup` to start fresh.');
}
+35
View File
@@ -0,0 +1,35 @@
/**
* `shannon workspaces` command — list all workspaces.
*/
import { execFileSync } from 'node:child_process';
import os from 'node:os';
import { getWorkerImage } from '../docker.js';
import { getWorkspacesDir } from '../home.js';
export function workspaces(version: string): void {
const workspacesDir = getWorkspacesDir();
const image = getWorkerImage(version);
try {
execFileSync(
'docker',
[
'run',
'--rm',
'-v',
`${workspacesDir}:/app/workspaces`,
'-e',
'WORKSPACES_DIR=/app/workspaces',
image,
'node',
'apps/worker/dist/temporal/workspaces.js',
],
{ stdio: 'inherit', ...(os.platform() === 'win32' && { env: { ...process.env, MSYS_NO_PATHCONV: '1' } }) },
);
} catch {
console.error('ERROR: Failed to list workspaces. Is the Docker image available?');
console.error(` Run: docker pull ${image}`);
process.exit(1);
}
}
+300
View File
@@ -0,0 +1,300 @@
/**
* Configuration resolver with environment-first, TOML-fallback precedence.
*
* Priority: process.env > ~/.shannon/config.toml
* Env var names match .env.example exactly; TOML uses nested sections.
*/
import fs from 'node:fs';
import { parse as parseTOML } from 'smol-toml';
import { getConfigFile } from '../home.js';
import { getMode } from '../mode.js';
// === TOML ↔ Env Mapping ===
type TOMLType = 'string' | 'number' | 'boolean';
interface ConfigMapping {
readonly env: string;
readonly toml: string;
readonly type: TOMLType;
}
/** Maps every supported env var to its TOML path (section.key) and expected type. */
const CONFIG_MAP: readonly ConfigMapping[] = [
// Core
{ env: 'CLAUDE_CODE_MAX_OUTPUT_TOKENS', toml: 'core.max_tokens', type: 'number' },
// Anthropic
{ env: 'ANTHROPIC_API_KEY', toml: 'anthropic.api_key', type: 'string' },
{ env: 'CLAUDE_CODE_OAUTH_TOKEN', toml: 'anthropic.oauth_token', type: 'string' },
// Bedrock
{ env: 'CLAUDE_CODE_USE_BEDROCK', toml: 'bedrock.use', type: 'boolean' },
{ env: 'AWS_REGION', toml: 'bedrock.region', type: 'string' },
{ env: 'AWS_BEARER_TOKEN_BEDROCK', toml: 'bedrock.token', type: 'string' },
// Vertex
{ env: 'CLAUDE_CODE_USE_VERTEX', toml: 'vertex.use', type: 'boolean' },
{ env: 'CLOUD_ML_REGION', toml: 'vertex.region', type: 'string' },
{ env: 'ANTHROPIC_VERTEX_PROJECT_ID', toml: 'vertex.project_id', type: 'string' },
{ env: 'GOOGLE_APPLICATION_CREDENTIALS', toml: 'vertex.key_path', type: 'string' },
// Custom Base URL
{ env: 'ANTHROPIC_BASE_URL', toml: 'custom_base_url.base_url', type: 'string' },
{ env: 'ANTHROPIC_AUTH_TOKEN', toml: 'custom_base_url.auth_token', type: 'string' },
// Router
{ env: 'ROUTER_DEFAULT', toml: 'router.default', type: 'string' },
{ env: 'OPENAI_API_KEY', toml: 'router.openai_key', type: 'string' },
{ env: 'OPENROUTER_API_KEY', toml: 'router.openrouter_key', type: 'string' },
// Model tiers
{ env: 'ANTHROPIC_SMALL_MODEL', toml: 'models.small', type: 'string' },
{ env: 'ANTHROPIC_MEDIUM_MODEL', toml: 'models.medium', type: 'string' },
{ env: 'ANTHROPIC_LARGE_MODEL', toml: 'models.large', type: 'string' },
] as const;
// === TOML Parsing ===
type TOMLValue = string | number | boolean;
type TOMLSection = Record<string, TOMLValue>;
type TOMLConfig = Record<string, TOMLSection>;
/** Read a nested TOML value by dotted path (e.g. "anthropic.api_key"). */
function getTomlValue(config: TOMLConfig, path: string): string | undefined {
const [section, key] = path.split('.');
if (!section || !key) return undefined;
const sectionObj = config[section];
if (!sectionObj || typeof sectionObj !== 'object') return undefined;
const value = sectionObj[key];
if (value === undefined || value === null) return undefined;
// NOTE: env.ts checks bedrock/vertex via `=== '1'`, so booleans must map to "1"/"0"
if (typeof value === 'boolean') return value ? '1' : '0';
return String(value);
}
/** Parse the global TOML config file, returning null if it doesn't exist. */
function loadTOML(): TOMLConfig | null {
const configPath = getConfigFile();
if (!fs.existsSync(configPath)) return null;
// Config contains secrets — refuse to read if group or others have any access.
// Skip on Windows where POSIX permissions are not supported.
if (process.platform !== 'win32') {
const mode = fs.statSync(configPath).mode;
if (mode & 0o077) {
const actual = (mode & 0o777).toString(8).padStart(3, '0');
console.error(`\nInsecure permissions (${actual}) on ${configPath}. Run: chmod 600 ${configPath}\n`);
process.exit(1);
}
}
try {
const content = fs.readFileSync(configPath, 'utf-8');
return parseTOML(content) as TOMLConfig;
} catch (err) {
const message = err instanceof Error ? err.message : String(err);
console.error(`\nFailed to parse ${configPath}: ${message}`);
console.error(`\nRun 'npx @keygraph/shannon setup' to reconfigure.\n`);
process.exit(1);
}
}
// === Validation ===
/** Build a lookup of allowed keys per section from CONFIG_MAP. */
function buildSchema(): Map<string, Map<string, TOMLType>> {
const schema = new Map<string, Map<string, TOMLType>>();
for (const mapping of CONFIG_MAP) {
const [section, key] = mapping.toml.split('.');
if (!section || !key) continue;
let keys = schema.get(section);
if (!keys) {
keys = new Map();
schema.set(section, keys);
}
keys.set(key, mapping.type);
}
return schema;
}
/** Check that a provider section has all required fields and dependencies. */
function validateProviderFields(config: TOMLConfig, provider: string, errors: string[]): void {
const section = config[provider] as Record<string, unknown> | undefined;
if (!section) return;
const keys = Object.keys(section);
switch (provider) {
case 'anthropic':
if (!keys.includes('api_key') && !keys.includes('oauth_token')) {
errors.push('[anthropic] requires either api_key or oauth_token');
}
break;
case 'custom_base_url': {
const required = ['base_url', 'auth_token'];
const missing = required.filter((k) => !keys.includes(k));
if (missing.length > 0) {
errors.push(`[custom_base_url] missing required keys: ${missing.join(', ')}`);
}
break;
}
case 'bedrock': {
const required = ['use', 'region', 'token'];
const missing = required.filter((k) => !keys.includes(k));
if (missing.length > 0) {
errors.push(`[bedrock] missing required keys: ${missing.join(', ')}`);
}
validateModelTiers(config, 'bedrock', errors);
break;
}
case 'vertex': {
const required = ['use', 'region', 'project_id', 'key_path'];
const missing = required.filter((k) => !keys.includes(k));
if (missing.length > 0) {
errors.push(`[vertex] missing required keys: ${missing.join(', ')}`);
}
validateModelTiers(config, 'vertex', errors);
break;
}
case 'router': {
if (!keys.includes('default')) {
errors.push('[router] missing required key: default');
}
if (!keys.includes('openai_key') && !keys.includes('openrouter_key')) {
errors.push('[router] requires either openai_key or openrouter_key');
}
const models = config.models as Record<string, unknown> | undefined;
if (models && typeof models === 'object' && Object.keys(models).length > 0) {
errors.push('[models] is not supported with [router]');
}
break;
}
}
}
/** Bedrock and Vertex require a [models] section with all three tiers. */
function validateModelTiers(config: TOMLConfig, provider: string, errors: string[]): void {
const models = config.models as Record<string, unknown> | undefined;
if (!models || typeof models !== 'object') {
errors.push(`[${provider}] requires a [models] section with small, medium, and large`);
return;
}
const required = ['small', 'medium', 'large'];
const missing = required.filter((k) => !Object.keys(models).includes(k));
if (missing.length > 0) {
errors.push(`[models] missing required keys for ${provider}: ${missing.join(', ')}`);
}
}
/**
* Validate a parsed TOML config against the known schema.
* Returns an array of human-readable error messages (empty = valid).
*/
function validateConfig(config: TOMLConfig): string[] {
const schema = buildSchema();
const errors: string[] = [];
for (const [section, sectionObj] of Object.entries(config)) {
// 1. Reject unknown sections
const allowedKeys = schema.get(section);
if (!allowedKeys) {
const known = [...schema.keys()].join(', ');
errors.push(`Unknown section [${section}]. Valid sections: ${known}`);
continue;
}
// 2. Section value must be a table
if (!sectionObj || typeof sectionObj !== 'object') {
errors.push(`[${section}] must be a table, got ${typeof sectionObj}`);
continue;
}
// 3. Validate each key in the section
for (const [key, value] of Object.entries(sectionObj as Record<string, unknown>)) {
const expectedType = allowedKeys.get(key);
if (!expectedType) {
const known = [...allowedKeys.keys()].join(', ');
errors.push(`Unknown key "${key}" in [${section}]. Valid keys: ${known}`);
continue;
}
if (typeof value !== expectedType) {
errors.push(`[${section}].${key} must be ${expectedType}, got ${typeof value}`);
continue;
}
// Reject empty strings — they pass type checks but are never useful
if (typeof value === 'string' && value.trim() === '') {
errors.push(`[${section}].${key} must not be empty`);
}
}
}
// 4. Only one provider section allowed (ignore empty sections)
const PROVIDER_SECTIONS = ['anthropic', 'custom_base_url', 'bedrock', 'vertex', 'router'] as const;
const present = PROVIDER_SECTIONS.filter((s) => {
const section = config[s];
return section && typeof section === 'object' && Object.keys(section).length > 0;
});
if (present.length > 1) {
errors.push(
`Multiple providers configured: [${present.join('], [')}]. Only one provider section is allowed at a time`,
);
}
// 5. Required fields per provider
const singleProvider = present.length === 1 ? present[0] : undefined;
if (singleProvider) {
validateProviderFields(config, singleProvider, errors);
}
return errors;
}
// === Public API ===
/**
* Resolve all config values into process.env (npx mode only).
*
* For each mapped variable: if not already set in the environment,
* look it up in ~/.shannon/config.toml and inject it into process.env.
* Local mode uses .env exclusively — TOML is skipped.
* Exits with an error if the TOML contains unknown or invalid keys.
*/
export function resolveConfig(): void {
if (getMode() === 'local') return;
const toml = loadTOML();
if (!toml) return;
// Validate before injecting
const errors = validateConfig(toml);
if (errors.length > 0) {
console.error('\nInvalid configuration:');
for (const err of errors) {
console.error(` - ${err}`);
}
console.error(`\nRun 'shn setup' to reconfigure.\n`);
process.exit(1);
}
for (const mapping of CONFIG_MAP) {
if (process.env[mapping.env]) continue;
const value = getTomlValue(toml, mapping.toml);
if (value) {
process.env[mapping.env] = value;
}
}
}
+30
View File
@@ -0,0 +1,30 @@
/** TOML config writer for ~/.shannon/config.toml. */
import fs from 'node:fs';
import path from 'node:path';
import { stringify } from 'smol-toml';
import { getConfigFile } from '../home.js';
// === Types ===
export interface ShannonConfig {
core?: { max_tokens?: number };
anthropic?: { api_key?: string; oauth_token?: string };
custom_base_url?: { base_url?: string; auth_token?: string };
bedrock?: { use?: boolean; region?: string; token?: string };
vertex?: { use?: boolean; region?: string; project_id?: string; key_path?: string };
router?: { default?: string; openai_key?: string; openrouter_key?: string };
models?: { small?: string; medium?: string; large?: string };
}
// === File Operations ===
/** Write the config to ~/.shannon/config.toml with 0o600 permissions. */
export function saveConfig(config: ShannonConfig): void {
const configPath = getConfigFile();
const dir = path.dirname(configPath);
fs.mkdirSync(dir, { recursive: true });
const content = stringify(config);
fs.writeFileSync(configPath, content, { mode: 0o600 });
}
+317
View File
@@ -0,0 +1,317 @@
/**
* Docker orchestration — compose lifecycle, network, image pull/build, worker spawning.
*
* Local mode: builds locally, uses docker-compose.yml from repo root, mounts prompts.
* NPX mode: pulls from Docker Hub, uses bundled compose.yml.
*/
import { type ChildProcess, execFileSync, spawn } from 'node:child_process';
import crypto from 'node:crypto';
import os from 'node:os';
import path from 'node:path';
import { setTimeout as sleep } from 'node:timers/promises';
import { fileURLToPath } from 'node:url';
import { getMode } from './mode.js';
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const NPX_IMAGE_REPO = 'keygraph/shannon';
const DEV_IMAGE = 'shannon-worker';
export function getWorkerImage(version: string): string {
return getMode() === 'local' ? DEV_IMAGE : `${NPX_IMAGE_REPO}:${version}`;
}
function getComposeFile(): string {
return getMode() === 'local'
? path.resolve('docker-compose.yml')
: path.resolve(__dirname, '..', 'infra', 'compose.yml');
}
/** Generate an 8-char random hex suffix for container/queue names. */
export function randomSuffix(): string {
return crypto.randomBytes(4).toString('hex');
}
/** Run a command silently, return true if it succeeds. */
function runQuiet(cmd: string, args: string[]): boolean {
try {
execFileSync(cmd, args, { stdio: 'pipe' });
return true;
} catch {
return false;
}
}
/** Run a command and return stdout, or empty string on failure. */
function runOutput(cmd: string, args: string[]): string {
try {
return execFileSync(cmd, args, { stdio: 'pipe', encoding: 'utf-8' }).trim();
} catch {
return '';
}
}
/**
* Check if Temporal is running and healthy.
*/
export function isTemporalReady(): boolean {
const output = runOutput('docker', [
'exec',
'shannon-temporal',
'temporal',
'operator',
'cluster',
'health',
'--address',
'localhost:7233',
]);
return output.includes('SERVING');
}
/** Check if the router container is running and healthy. */
function isRouterReady(): boolean {
const status = runOutput('docker', ['inspect', '--format', '{{.State.Health.Status}}', 'shannon-router']);
return status === 'healthy';
}
/**
* Ensure Temporal (and optionally router) are running via compose.
* If Temporal is already up but router is needed and missing, starts router only.
*/
export async function ensureInfra(useRouter: boolean): Promise<void> {
const temporalReady = isTemporalReady();
const routerNeeded = useRouter && !isRouterReady();
if (temporalReady && !routerNeeded) {
return;
}
const composeFile = getComposeFile();
const composeArgs = ['compose', '-f', composeFile];
if (useRouter) composeArgs.push('--profile', 'router');
composeArgs.push('up', '-d');
if (temporalReady && routerNeeded) {
console.log('Starting router...');
} else {
console.log('Starting Shannon infrastructure...');
}
execFileSync('docker', composeArgs, { stdio: 'inherit' });
// Wait for Temporal if it wasn't already running
if (!temporalReady) {
console.log('Waiting for Temporal to be ready...');
for (let i = 0; i < 30; i++) {
if (isTemporalReady()) {
console.log('Temporal is ready!');
break;
}
if (i === 29) {
console.error('Timeout waiting for Temporal');
process.exit(1);
}
await sleep(2000);
}
}
// Wait for router if needed
if (routerNeeded) {
console.log('Waiting for router to be ready...');
for (let i = 0; i < 15; i++) {
if (isRouterReady()) {
console.log('Router is ready!');
return;
}
await sleep(2000);
}
console.error('Timeout waiting for router');
process.exit(1);
}
}
/**
* Build the worker image locally (local mode only).
*/
export function buildImage(noCache: boolean): void {
console.log(`Building ${DEV_IMAGE}...`);
const args = ['build'];
if (noCache) args.push('--no-cache');
args.push('-t', DEV_IMAGE, '.');
execFileSync('docker', args, { stdio: 'inherit' });
console.log(`Build complete: ${DEV_IMAGE}`);
}
/**
* Ensure the worker image is available.
* Local mode: auto-builds if missing. NPX mode: pulls from Docker Hub.
*/
export function ensureImage(version: string): void {
const image = getWorkerImage(version);
const exists = runQuiet('docker', ['image', 'inspect', image]);
if (exists) return;
if (getMode() === 'local') {
console.log('Worker image not found, building...');
buildImage(false);
} else {
console.log(`Pulling ${image}...`);
try {
execFileSync('docker', ['pull', image], { stdio: 'inherit' });
} catch {
console.error(`\nERROR: Failed to pull ${image}`);
console.error('The image may not be available for your platform yet.');
console.error('Check https://hub.docker.com/r/keygraph/shannon for available tags.');
process.exit(1);
}
pruneOldImages(version);
}
}
/**
* Detect if --add-host is needed (Linux without Podman).
* macOS has host.docker.internal built in.
*/
function addHostFlag(): string[] {
if (os.platform() === 'linux') {
const hasPodman = runQuiet('which', ['podman']);
if (!hasPodman) {
return ['--add-host', 'host.docker.internal:host-gateway'];
}
}
return [];
}
export interface WorkerOptions {
version: string;
url: string;
repo: { hostPath: string; containerPath: string };
workspacesDir: string;
taskQueue: string;
containerName: string;
envFlags: string[];
config?: { hostPath: string; containerPath: string };
credentials?: string;
promptsDir?: string;
outputDir?: string;
workspace?: string;
pipelineTesting?: boolean;
}
/**
* Spawn the worker container in detached mode and return the process.
*/
export function spawnWorker(opts: WorkerOptions): ChildProcess {
const args = ['run', '-d', '--rm', '--name', opts.containerName, '--network', 'shannon-net'];
// Add host flag for Linux
args.push(...addHostFlag());
// UID remapping for Linux bind mounts
if (os.platform() === 'linux' && process.getuid && process.getgid) {
args.push('-e', `SHANNON_HOST_UID=${process.getuid()}`, '-e', `SHANNON_HOST_GID=${process.getgid()}`);
}
// Volume mounts
args.push('-v', `${opts.workspacesDir}:/app/workspaces`);
args.push('-v', `${opts.repo.hostPath}:${opts.repo.containerPath}`);
// Local mode: mount prompts for live editing
if (opts.promptsDir) {
args.push('-v', `${opts.promptsDir}:/app/apps/worker/prompts:ro`);
}
if (opts.config) {
args.push('-v', `${opts.config.hostPath}:${opts.config.containerPath}:ro`);
}
// Output directory for deliverables copy
if (opts.outputDir) {
args.push('-v', `${opts.outputDir}:/app/output`);
}
// Mount credentials file to fixed container path
if (opts.credentials) {
args.push('-v', `${opts.credentials}:/app/credentials/google-sa-key.json:ro`);
}
// Environment
args.push(...opts.envFlags);
// Container settings
args.push('--shm-size', '2gb', '--security-opt', 'seccomp=unconfined');
// Image
args.push(getWorkerImage(opts.version));
// Worker command
args.push('node', 'apps/worker/dist/temporal/worker.js', opts.url, opts.repo.containerPath);
args.push('--task-queue', opts.taskQueue);
if (opts.config) {
args.push('--config', opts.config.containerPath);
}
if (opts.outputDir) {
args.push('--output', '/app/output');
}
if (opts.workspace) {
args.push('--workspace', opts.workspace);
}
if (opts.pipelineTesting) {
args.push('--pipeline-testing');
}
// Prevent MSYS/Git Bash from converting Unix paths (e.g. /repos/my-repo) to Windows paths
return spawn('docker', args, {
stdio: 'pipe',
...(os.platform() === 'win32' && { env: { ...process.env, MSYS_NO_PATHCONV: '1' } }),
});
}
/**
* Stop all running shannon-worker-* containers.
*/
export function stopWorkers(): void {
const workers = runOutput('docker', ['ps', '-q', '--filter', 'name=shannon-worker-']);
if (!workers) return;
const ids = workers.split('\n').filter(Boolean);
console.log('Stopping worker containers...');
execFileSync('docker', ['stop', ...ids], { stdio: 'inherit' });
}
/**
* Tear down the compose stack.
*/
export function stopInfra(clean: boolean): void {
const composeFile = getComposeFile();
const args = ['compose', '-f', composeFile, '--profile', 'router', 'down'];
if (clean) args.push('-v');
execFileSync('docker', args, { stdio: 'inherit' });
}
/**
* Remove old keygraph/shannon images that don't match the current version.
*/
function pruneOldImages(currentVersion: string): void {
const output = runOutput('docker', ['images', NPX_IMAGE_REPO, '--format', '{{.Tag}}']);
if (!output) return;
const currentTag = currentVersion;
const stale = output.split('\n').filter((tag) => tag && tag !== currentTag);
for (const tag of stale) {
runQuiet('docker', ['rmi', `${NPX_IMAGE_REPO}:${tag}`]);
}
}
/**
* List running worker containers.
*/
export function listRunningWorkers(): string {
return runOutput('docker', [
'ps',
'--filter',
'name=shannon-worker-',
'--format',
'table {{.Names}}\t{{.Status}}\t{{.RunningFor}}',
]);
}
+171
View File
@@ -0,0 +1,171 @@
/**
* Environment variable loading and credential validation.
*
* Local mode: loads ./.env via dotenv.
* NPX mode: fills gaps from ~/.shannon/config.toml (no .env).
*/
import dotenv from 'dotenv';
import { resolveConfig } from './config/resolver.js';
import { getMode } from './mode.js';
/** Environment variables forwarded to worker containers. */
const FORWARD_VARS = [
'ANTHROPIC_API_KEY',
'ANTHROPIC_BASE_URL',
'ANTHROPIC_AUTH_TOKEN',
'ROUTER_DEFAULT',
'CLAUDE_CODE_OAUTH_TOKEN',
'CLAUDE_CODE_USE_BEDROCK',
'AWS_REGION',
'AWS_BEARER_TOKEN_BEDROCK',
'CLAUDE_CODE_USE_VERTEX',
'CLOUD_ML_REGION',
'ANTHROPIC_VERTEX_PROJECT_ID',
'GOOGLE_APPLICATION_CREDENTIALS',
'ANTHROPIC_SMALL_MODEL',
'ANTHROPIC_MEDIUM_MODEL',
'ANTHROPIC_LARGE_MODEL',
'CLAUDE_CODE_MAX_OUTPUT_TOKENS',
'OPENAI_API_KEY',
'OPENROUTER_API_KEY',
] as const;
/**
* Load credentials into process.env.
* Local mode: loads ./.env via dotenv.
* NPX mode: fills gaps from ~/.shannon/config.toml.
* Exported env vars always take precedence in both modes.
*/
export function loadEnv(): void {
if (getMode() === 'local') {
dotenv.config({ path: '.env', quiet: true });
} else {
resolveConfig();
}
}
/**
* Build `-e KEY=VALUE` flags for docker run, only for set variables.
*/
export function buildEnvFlags(): string[] {
const flags: string[] = ['-e', 'TEMPORAL_ADDRESS=shannon-temporal:7233'];
for (const key of FORWARD_VARS) {
const value = process.env[key];
if (value) {
flags.push('-e', `${key}=${value}`);
}
}
return flags;
}
interface CredentialValidation {
valid: boolean;
error?: string;
mode: 'api-key' | 'oauth' | 'custom-base-url' | 'bedrock' | 'vertex' | 'router';
}
/** Check if router credentials are present in the environment. */
export function isRouterConfigured(): boolean {
return !!(process.env.ROUTER_DEFAULT && (process.env.OPENAI_API_KEY || process.env.OPENROUTER_API_KEY));
}
/** Check if a custom Anthropic-compatible base URL is configured. */
function isCustomBaseUrlConfigured(): boolean {
return !!(process.env.ANTHROPIC_BASE_URL && process.env.ANTHROPIC_AUTH_TOKEN);
}
/** Detect which providers are configured via environment variables. */
function detectProviders(): string[] {
const providers: string[] = [];
if (process.env.ANTHROPIC_API_KEY) providers.push('Anthropic API key');
if (process.env.CLAUDE_CODE_OAUTH_TOKEN) providers.push('Anthropic OAuth');
if (isCustomBaseUrlConfigured()) providers.push('Custom Base URL');
if (process.env.CLAUDE_CODE_USE_BEDROCK === '1') providers.push('AWS Bedrock');
if (process.env.CLAUDE_CODE_USE_VERTEX === '1') providers.push('Google Vertex');
if (isRouterConfigured()) providers.push('Router');
return providers;
}
/**
* Validate that exactly one authentication method is configured.
*/
export function validateCredentials(): CredentialValidation {
// Reject multiple providers
const providers = detectProviders();
if (providers.length > 1) {
return {
valid: false,
mode: 'api-key',
error: `Multiple providers detected: ${providers.join(', ')}. Only one provider can be active at a time.`,
};
}
if (process.env.ANTHROPIC_API_KEY) {
return { valid: true, mode: 'api-key' };
}
if (process.env.CLAUDE_CODE_OAUTH_TOKEN) {
return { valid: true, mode: 'oauth' };
}
if (isCustomBaseUrlConfigured()) {
// Set auth token as API key so the SDK can initialize
process.env.ANTHROPIC_API_KEY = process.env.ANTHROPIC_AUTH_TOKEN;
return { valid: true, mode: 'custom-base-url' };
}
if (process.env.CLAUDE_CODE_USE_BEDROCK === '1') {
const missing: string[] = [];
if (!process.env.AWS_REGION) missing.push('AWS_REGION');
if (!process.env.AWS_BEARER_TOKEN_BEDROCK) missing.push('AWS_BEARER_TOKEN_BEDROCK');
if (!process.env.ANTHROPIC_SMALL_MODEL) missing.push('ANTHROPIC_SMALL_MODEL');
if (!process.env.ANTHROPIC_MEDIUM_MODEL) missing.push('ANTHROPIC_MEDIUM_MODEL');
if (!process.env.ANTHROPIC_LARGE_MODEL) missing.push('ANTHROPIC_LARGE_MODEL');
if (missing.length > 0) {
return {
valid: false,
mode: 'bedrock',
error: `Bedrock mode requires: ${missing.join(', ')}`,
};
}
return { valid: true, mode: 'bedrock' };
}
if (process.env.CLAUDE_CODE_USE_VERTEX === '1') {
const missing: string[] = [];
if (!process.env.CLOUD_ML_REGION) missing.push('CLOUD_ML_REGION');
if (!process.env.ANTHROPIC_VERTEX_PROJECT_ID) missing.push('ANTHROPIC_VERTEX_PROJECT_ID');
if (!process.env.ANTHROPIC_SMALL_MODEL) missing.push('ANTHROPIC_SMALL_MODEL');
if (!process.env.ANTHROPIC_MEDIUM_MODEL) missing.push('ANTHROPIC_MEDIUM_MODEL');
if (!process.env.ANTHROPIC_LARGE_MODEL) missing.push('ANTHROPIC_LARGE_MODEL');
if (missing.length > 0) {
return {
valid: false,
mode: 'vertex',
error: `Vertex AI mode requires: ${missing.join(', ')}`,
};
}
if (!process.env.GOOGLE_APPLICATION_CREDENTIALS) {
return {
valid: false,
mode: 'vertex',
error: 'Vertex AI mode requires GOOGLE_APPLICATION_CREDENTIALS',
};
}
return { valid: true, mode: 'vertex' };
}
if (isRouterConfigured()) {
// Set a placeholder so the worker doesn't reject the missing key
process.env.ANTHROPIC_API_KEY = 'router-mode';
return { valid: true, mode: 'router' };
}
const hint =
getMode() === 'local'
? `No credentials found. Set ANTHROPIC_API_KEY in .env or export it.`
: `Authentication not configured. Export variables or run 'npx @keygraph/shannon setup'.`;
return {
valid: false,
mode: 'api-key',
error: hint,
};
}
+52
View File
@@ -0,0 +1,52 @@
/**
* Shannon state directory management.
*
* Local mode (cloned repo): uses ./workspaces/, ./credentials/
* NPX mode: uses ~/.shannon/workspaces/, ~/.shannon/
*/
import fs from 'node:fs';
import os from 'node:os';
import path from 'node:path';
import { getMode } from './mode.js';
const SHANNON_HOME = path.join(os.homedir(), '.shannon');
export function getConfigFile(): string {
return path.join(SHANNON_HOME, 'config.toml');
}
export function getWorkspacesDir(): string {
return getMode() === 'local' ? path.resolve('workspaces') : path.join(SHANNON_HOME, 'workspaces');
}
/**
* Resolve the Vertex credentials file path.
*
* Checks GOOGLE_APPLICATION_CREDENTIALS env var first (may be set by TOML resolver),
* then falls back to mode-appropriate default location.
*/
export function getCredentialsPath(): string {
const envPath = process.env.GOOGLE_APPLICATION_CREDENTIALS;
if (envPath && fs.existsSync(envPath)) return path.resolve(envPath);
if (getMode() === 'local') {
return path.resolve('credentials', 'google-sa-key.json');
}
return path.join(SHANNON_HOME, 'google-sa-key.json');
}
/**
* Initialize state directories.
* Local mode: creates ./workspaces/ and ./credentials/
* NPX mode: creates ~/.shannon/workspaces/
*/
export function initHome(): void {
if (getMode() === 'local') {
fs.mkdirSync(path.resolve('workspaces'), { recursive: true });
fs.mkdirSync(path.resolve('credentials'), { recursive: true });
} else {
fs.mkdirSync(path.join(SHANNON_HOME, 'workspaces'), { recursive: true });
}
}
+239
View File
@@ -0,0 +1,239 @@
/**
* Shannon CLI — AI Penetration Testing Framework
*
* Unified CLI supporting two modes:
* Local mode: Run from cloned repo — builds locally, mounts prompts, uses ./workspaces/
* NPX mode: Run via npx — pulls from Docker Hub, uses ~/.shannon/
*
* Mode is auto-detected based on presence of Dockerfile + docker-compose.yml + prompts/
* in the current working directory.
*/
import fs from 'node:fs';
import path from 'node:path';
import { fileURLToPath } from 'node:url';
import { build } from './commands/build.js';
import { logs } from './commands/logs.js';
import { setup } from './commands/setup.js';
import { start } from './commands/start.js';
import { status } from './commands/status.js';
import { stop } from './commands/stop.js';
import { uninstall } from './commands/uninstall.js';
import { workspaces } from './commands/workspaces.js';
import { getMode } from './mode.js';
import { displaySplash } from './splash.js';
const __dirname = path.dirname(fileURLToPath(import.meta.url));
function getVersion(): string {
try {
const pkgPath = path.join(__dirname, '..', 'package.json');
const pkg = JSON.parse(fs.readFileSync(pkgPath, 'utf-8')) as { version?: string };
return pkg.version || '1.0.0';
} catch {
return '1.0.0';
}
}
function showHelp(): void {
const mode = getMode();
const prefix = mode === 'local' ? './shannon' : 'npx @keygraph/shannon';
console.log(`
Shannon - AI Penetration Testing Framework
Usage:${
mode === 'local'
? ''
: `
${prefix} setup Configure credentials`
}
${prefix} start --url <url> --repo <path> [options] Start a pentest scan
${prefix} stop [--clean] Stop all containers
${prefix} workspaces List all workspaces
${prefix} logs <workspace> Tail workflow log
${prefix} status Show running workers${
mode === 'local'
? `
${prefix} build [--no-cache] Build worker image`
: `
${prefix} uninstall Remove ~/.shannon/ and all data`
}
${prefix} info Show splash screen
${prefix} help Show this help
Options for 'start':
-u, --url <url> Target URL (required)
-r, --repo <path> Repository path${mode === 'local' ? ' or bare name' : ''} (required)
-c, --config <path> Configuration file (YAML)
-o, --output <path> Copy deliverables to this directory after run
-w, --workspace <name> Named workspace (auto-resumes if exists)
--pipeline-testing Use minimal prompts for fast testing
--router Route requests through claude-code-router
Examples:
${prefix} start -u https://example.com -r ${mode === 'local' ? 'my-repo' : './my-repo'}
${prefix} start -u https://example.com -r /path/to/repo -c config.yaml -w q1-audit
${prefix} logs q1-audit
${prefix} stop --clean
${
mode === 'local'
? `
State directory: ./workspaces/`
: `
State directory: ~/.shannon/`
}
Monitor workflows at http://localhost:8233
`);
}
interface ParsedStartArgs {
url: string;
repo: string;
config?: string;
workspace?: string;
output?: string;
pipelineTesting: boolean;
router: boolean;
}
function parseStartArgs(argv: string[]): ParsedStartArgs {
let url = '';
let repo = '';
let config: string | undefined;
let workspace: string | undefined;
let output: string | undefined;
let pipelineTesting = false;
let router = false;
for (let i = 0; i < argv.length; i++) {
const arg = argv[i];
const next = argv[i + 1];
switch (arg) {
case '-u':
case '--url':
if (next && !next.startsWith('-')) {
url = next;
i++;
}
break;
case '-r':
case '--repo':
if (next && !next.startsWith('-')) {
repo = next;
i++;
}
break;
case '-c':
case '--config':
if (next && !next.startsWith('-')) {
config = next;
i++;
}
break;
case '-w':
case '--workspace':
if (next && !next.startsWith('-')) {
workspace = next;
i++;
}
break;
case '-o':
case '--output':
if (next && !next.startsWith('-')) {
output = next;
i++;
}
break;
case '--pipeline-testing':
pipelineTesting = true;
break;
case '--router':
router = true;
break;
default:
console.error(`Unknown option: ${arg}`);
console.error(`Run "${getMode() === 'local' ? './shannon' : 'npx @keygraph/shannon'} help" for usage`);
process.exit(1);
}
}
if (!url || !repo) {
console.error('ERROR: --url and --repo are required');
console.error(`Usage: ${getMode() === 'local' ? './shannon' : 'npx @keygraph/shannon'} start -u <url> -r <path>`);
process.exit(1);
}
return {
url,
repo,
pipelineTesting,
router,
...(config && { config }),
...(workspace && { workspace }),
...(output && { output }),
};
}
// === Main Dispatch ===
const args = process.argv.slice(2);
const command = args[0];
switch (command) {
case 'start': {
const parsed = parseStartArgs(args.slice(1));
await start({ ...parsed, version: getVersion() });
break;
}
case 'stop':
stop(args.includes('--clean'));
break;
case 'logs': {
const workspaceId = args[1];
if (!workspaceId) {
console.error('ERROR: Workspace ID is required');
console.error(`Usage: ${getMode() === 'local' ? './shannon' : 'npx @keygraph/shannon'} logs <workspace>`);
process.exit(1);
}
logs(workspaceId);
break;
}
case 'workspaces':
workspaces(getVersion());
break;
case 'status':
status();
break;
case 'setup':
if (getMode() === 'local') {
console.error('ERROR: setup is only available in npx mode. In local mode, use .env');
process.exit(1);
}
setup();
break;
case 'build':
build(args.includes('--no-cache'));
break;
case 'uninstall':
if (getMode() === 'local') {
console.error('ERROR: uninstall is only available in npx mode.');
process.exit(1);
}
uninstall();
break;
case 'info':
displaySplash(getMode() === 'local' ? undefined : getVersion());
break;
case 'help':
case '--help':
case '-h':
case undefined:
showHelp();
break;
default:
console.error(`Unknown command: ${command}`);
showHelp();
process.exit(1);
}
+25
View File
@@ -0,0 +1,25 @@
/**
* Runtime mode detection — local (build from source) vs npx (Docker Hub).
*
* The root `./shannon` entry point sets SHANNON_LOCAL=1 before importing.
* When run via npx, `cli/dist/index.js` is executed directly without it.
*/
export type Mode = 'local' | 'npx';
let cachedMode: Mode | undefined;
export function getMode(): Mode {
if (cachedMode !== undefined) return cachedMode;
cachedMode = process.env.SHANNON_LOCAL === '1' ? 'local' : 'npx';
return cachedMode;
}
export function setMode(mode: Mode): void {
cachedMode = mode;
}
export function isLocal(): boolean {
return getMode() === 'local';
}
+87
View File
@@ -0,0 +1,87 @@
/**
* Path resolution for --repo and --config arguments.
*
* Local mode supports bare repo names (e.g. "my-repo" → ./repos/my-repo).
* Both modes resolve relative paths against CWD.
*/
import fs from 'node:fs';
import path from 'node:path';
import { isLocal } from './mode.js';
export interface MountPair {
hostPath: string;
containerPath: string;
}
/**
* Resolve --repo to absolute path and container mount.
* Dev mode: bare names (no / or . prefix) check ./repos/<name> first.
*/
export function resolveRepo(repoArg: string): MountPair {
let hostPath: string;
if (isLocal() && !repoArg.startsWith('/') && !repoArg.startsWith('.')) {
// Bare name — check ./repos/<name> for backward compatibility
const barePath = path.resolve('repos', repoArg);
if (fs.existsSync(barePath)) {
hostPath = barePath;
} else {
console.error(`ERROR: Repository not found at ./repos/${repoArg}`);
console.error('');
console.error('Place your target repository under the ./repos/ directory,');
console.error('or pass an absolute/relative path: -r /path/to/repo');
process.exit(1);
}
} else {
hostPath = path.resolve(repoArg);
}
if (!fs.existsSync(hostPath)) {
console.error(`ERROR: Repository not found: ${hostPath}`);
process.exit(1);
}
if (!fs.statSync(hostPath).isDirectory()) {
console.error(`ERROR: Not a directory: ${hostPath}`);
process.exit(1);
}
const basename = path.basename(hostPath);
return {
hostPath,
containerPath: `/repos/${basename}`,
};
}
/**
* Resolve --config to absolute path and container mount.
*/
export function resolveConfig(configArg: string): MountPair {
const hostPath = path.resolve(configArg);
if (!fs.existsSync(hostPath)) {
console.error(`ERROR: Config file not found: ${hostPath}`);
process.exit(1);
}
if (!fs.statSync(hostPath).isFile()) {
console.error(`ERROR: Not a file: ${hostPath}`);
process.exit(1);
}
const basename = path.basename(hostPath);
return {
hostPath,
containerPath: `/app/configs/${basename}`,
};
}
/**
* Ensure the deliverables directory exists and is writable by the container user.
*/
export function ensureDeliverables(repoHostPath: string): void {
const deliverables = path.join(repoHostPath, 'deliverables');
fs.mkdirSync(deliverables, { recursive: true });
fs.chmodSync(deliverables, 0o777);
}
+50
View File
@@ -0,0 +1,50 @@
/**
* Splash screen display — pure terminal output, no npm dependencies.
*/
export function displaySplash(version?: string): void {
const GOLD = '\x1b[38;2;244;197;66m';
const CYAN = '\x1b[36;1m';
const WHITE = '\x1b[1;37m';
const GRAY = '\x1b[0;37m';
const YELLOW = '\x1b[1;33m';
const RESET = '\x1b[0m';
const B = `${CYAN}\u2551${RESET}`;
const S67 = ' '.repeat(67);
const HR = '\u2550'.repeat(67);
const lines = [
'',
` ${CYAN}\u2554${HR}\u2557${RESET}`,
` ${B}${S67}${B}`,
` ${B} ${GOLD}\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2557\u2588\u2588\u2557 \u2588\u2588\u2557 \u2588\u2588\u2588\u2588\u2588\u2557 \u2588\u2588\u2588\u2557 \u2588\u2588\u2557\u2588\u2588\u2588\u2557 \u2588\u2588\u2557 \u2588\u2588\u2588\u2588\u2588\u2588\u2557 \u2588\u2588\u2588\u2557 \u2588\u2588\u2557${RESET} ${B}`,
` ${B} ${GOLD}\u2588\u2588\u2554\u2550\u2550\u2550\u2550\u255D\u2588\u2588\u2551 \u2588\u2588\u2551\u2588\u2588\u2554\u2550\u2550\u2588\u2588\u2557\u2588\u2588\u2588\u2588\u2557 \u2588\u2588\u2551\u2588\u2588\u2588\u2588\u2557 \u2588\u2588\u2551\u2588\u2588\u2554\u2550\u2550\u2550\u2588\u2588\u2557\u2588\u2588\u2588\u2588\u2557 \u2588\u2588\u2551${RESET} ${B}`,
` ${B} ${GOLD}\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2557\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2551\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2551\u2588\u2588\u2554\u2588\u2588\u2557 \u2588\u2588\u2551\u2588\u2588\u2554\u2588\u2588\u2557 \u2588\u2588\u2551\u2588\u2588\u2551 \u2588\u2588\u2551\u2588\u2588\u2554\u2588\u2588\u2557 \u2588\u2588\u2551${RESET} ${B}`,
` ${B} ${GOLD}\u255A\u2550\u2550\u2550\u2550\u2588\u2588\u2551\u2588\u2588\u2554\u2550\u2550\u2588\u2588\u2551\u2588\u2588\u2554\u2550\u2550\u2588\u2588\u2551\u2588\u2588\u2551\u255A\u2588\u2588\u2557\u2588\u2588\u2551\u2588\u2588\u2551\u255A\u2588\u2588\u2557\u2588\u2588\u2551\u2588\u2588\u2551 \u2588\u2588\u2551\u2588\u2588\u2551\u255A\u2588\u2588\u2557\u2588\u2588\u2551${RESET} ${B}`,
` ${B} ${GOLD}\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2551\u2588\u2588\u2551 \u2588\u2588\u2551\u2588\u2588\u2551 \u2588\u2588\u2551\u2588\u2588\u2551 \u255A\u2588\u2588\u2588\u2588\u2551\u2588\u2588\u2551 \u255A\u2588\u2588\u2588\u2588\u2551\u255A\u2588\u2588\u2588\u2588\u2588\u2588\u2554\u255D\u2588\u2588\u2551 \u255A\u2588\u2588\u2588\u2588\u2551${RESET} ${B}`,
` ${B} ${GOLD}\u255A\u2550\u2550\u2550\u2550\u2550\u2550\u255D\u255A\u2550\u255D \u255A\u2550\u255D\u255A\u2550\u255D \u255A\u2550\u255D\u255A\u2550\u255D \u255A\u2550\u2550\u2550\u255D\u255A\u2550\u255D \u255A\u2550\u2550\u2550\u255D \u255A\u2550\u2550\u2550\u2550\u2550\u255D \u255A\u2550\u255D \u255A\u2550\u2550\u2550\u255D${RESET} ${B}`,
` ${B}${S67}${B}`,
` ${B} ${CYAN}\u2554\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2557${RESET} ${B}`,
` ${B} ${CYAN}\u2551${RESET} ${WHITE}AI Penetration Testing Framework${RESET} ${CYAN}\u2551${RESET} ${B}`,
` ${B} ${CYAN}\u255A\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u255D${RESET} ${B}`,
` ${B}${S67}${B}`,
];
if (version) {
const verStr = `v${version}`;
const verPadLeft = Math.floor((67 - verStr.length) / 2);
const verPadRight = 67 - verStr.length - verPadLeft;
lines.push(` ${B}${' '.repeat(verPadLeft)}${GRAY}${verStr}${RESET}${' '.repeat(verPadRight)}${B}`);
}
lines.push(
` ${B}${S67}${B}`,
` ${B} ${YELLOW}\uD83D\uDD10 DEFENSIVE SECURITY ONLY \uD83D\uDD10${RESET} ${B}`,
` ${B}${S67}${B}`,
` ${CYAN}\u255A${HR}\u255D${RESET}`,
'',
);
console.log(lines.join('\n'));
}
+9
View File
@@ -0,0 +1,9 @@
{
"extends": "../../tsconfig.base.json",
"compilerOptions": {
"rootDir": "./src",
"outDir": "./dist"
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist"]
}
+11
View File
@@ -0,0 +1,11 @@
import { defineConfig } from 'tsdown';
export default defineConfig({
entry: ['src/index.ts'],
format: 'esm',
target: 'node18',
outDir: 'dist',
clean: true,
deps: { neverBundle: ['@clack/prompts', 'dotenv', 'smol-toml'] },
banner: { js: '#!/usr/bin/env node' },
});
@@ -122,12 +122,20 @@
"type": "object",
"description": "Deprecated: Use 'authentication' section instead",
"deprecated": true
},
"description": {
"type": "string",
"description": "Description of the target environment, its deployment context, and any information that helps guide the security assessment",
"minLength": 1,
"maxLength": 500,
"pattern": "\\S"
}
},
"anyOf": [
{"required": ["authentication"]},
{"required": ["rules"]},
{"required": ["authentication", "rules"]}
{ "required": ["authentication"] },
{ "required": ["rules"] },
{ "required": ["authentication", "rules"] },
{ "required": ["description"] }
],
"additionalProperties": false,
"$defs": {
@@ -157,4 +165,4 @@
"additionalProperties": false
}
}
}
}
@@ -1,6 +1,9 @@
# Example configuration file for pentest-agent
# Copy this file and modify it for your specific testing needs
# Description of the target environment (optional, max 500 chars)
description: "Next.js e-commerce app on PostgreSQL. Local dev environment — .env files contain local-only credentials, not deployed to production."
authentication:
login_type: form # Options: 'form' or 'sso'
login_url: "https://example.com/login"
+26
View File
@@ -0,0 +1,26 @@
{
"name": "@shannon/worker",
"version": "0.0.0",
"private": true,
"type": "module",
"scripts": {
"build": "tsc",
"check": "tsc --noEmit",
"clean": "rm -rf dist"
},
"dependencies": {
"@anthropic-ai/claude-agent-sdk": "catalog:",
"@temporalio/activity": "^1.11.0",
"@temporalio/client": "^1.11.0",
"@temporalio/worker": "^1.11.0",
"@temporalio/workflow": "^1.11.0",
"ajv": "^8.12.0",
"ajv-formats": "^2.1.1",
"dotenv": "^16.4.5",
"js-yaml": "^4.1.0",
"zx": "^8.0.0"
},
"devDependencies": {
"@types/js-yaml": "^4.0.9"
}
}
@@ -141,15 +141,13 @@ Before beginning exploitation, read these strategic intelligence files in order:
You are the **Identity Compromise Specialist** - proving tangible impact of broken authentication through successful account takeover and session hijacking.
</system_architecture>
<available_tools>
- **{{MCP_SERVER}} (Playwright):** Essential for interacting with multi-step authentication flows, injecting stolen session cookies, and verifying account takeover in a real browser context.
- **save_deliverable (MCP Tool):** Saves exploitation evidence files.
- **Parameters:**
- `deliverable_type`: "AUTH_EVIDENCE" (required)
- `file_path`: Path to the evidence report file you wrote to disk (preferred for large reports)
- `content`: Inline content string (optional, use only for small content)
- **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for evidence reports.
<cli_tools>
- **Browser Automation (playwright-cli skill):** Essential for interacting with multi-step authentication flows, injecting stolen session cookies, and verifying account takeover in a real browser context. Invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
- **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
- **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
- **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
- **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **Task Agent:** Mandatory coder-executor for brute force batches, credential stuffing, token replay automation, and any scripted workflow.
- **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
@@ -158,7 +156,7 @@ You are the **Identity Compromise Specialist** - proving tangible impact of brok
**CRITICAL TASK AGENT WORKFLOW:**
- Delegate every custom script or multi-step automation to the Task Agent; do not hand-write scripts in Bash.
- Keep requests ≤15 lines and use the template below so the Task Agent has clear inputs and success criteria.
</available_tools>
</cli_tools>
## Task Agent Scripting Requirements
Use the Task Agent to execute all bulk or repetitive auth attacks. Manual commands are for quick verification only.
@@ -314,7 +312,7 @@ When your attack plan is fully explored (all checkboxes are marked `[x]`), your
**1. Your Specialist Deliverable**
First, synthesize all evidence into your specialist section and save it using the save_deliverable script with type AUTH_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment.
First, synthesize all evidence into your specialist section and save it using the save-deliverable CLI with --type AUTH_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment.
**Section Ordering & Format Requirements:**
- Use the exact vulnerability IDs from `deliverables/auth_exploitation_queue.json`
@@ -407,12 +405,12 @@ If [blocker] were bypassed/removed:
COMPLETION REQUIREMENTS (ALL must be satisfied):
1. Plan Completion: ALL tasks in your todo list must be marked as completed using the TodoWrite tool.
2. Deliverable Generation: The required deliverable must be successfully saved using save_deliverable script:
2. Deliverable Generation: The required deliverable must be successfully saved using the save-deliverable CLI:
- **CHUNKED WRITING (MANDATORY):**
1. Use the **Write** tool to create `deliverables/auth_exploitation_evidence.md` with the title and first major section
2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
3. Repeat step 2 for all remaining sections
4. Call `save_deliverable` with `deliverable_type: "AUTH_EVIDENCE"` and `file_path: "deliverables/auth_exploitation_evidence.md"`
4. Run `save-deliverable` with `--type AUTH_EVIDENCE --file-path "deliverables/auth_exploitation_evidence.md"`
**WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.
CRITICAL WARNING: Announcing completion before every item in deliverables/auth_exploitation_queue.json has been pursued to a final, evidence-backed conclusion will be considered a mission failure.
@@ -128,15 +128,13 @@ Before beginning exploitation, read these strategic intelligence files in order:
You are the **Privilege Escalation Specialist** - proving tangible impact of broken authorization through access control bypass and unauthorized data access.
</system_architecture>
<available_tools>
- **{{MCP_SERVER}} (Playwright):** Essential for interacting with complex authorization flows, testing role-based access controls in browser contexts, and verifying privilege escalation through UI elements.
- **save_deliverable (MCP Tool):** Saves exploitation evidence files.
- **Parameters:**
- `deliverable_type`: "AUTHZ_EVIDENCE" (required)
- `file_path`: Path to the evidence report file you wrote to disk (preferred for large reports)
- `content`: Inline content string (optional, use only for small content)
- **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for evidence reports.
<cli_tools>
- **Browser Automation (playwright-cli skill):** Essential for interacting with complex authorization flows, testing role-based access controls in browser contexts, and verifying privilege escalation through UI elements. Invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
- **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
- **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
- **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
- **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **Task Agent:** Mandatory coder-executor for IDOR sweeps, role escalation loops, and workflow bypass automation.
- **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
@@ -145,7 +143,7 @@ You are the **Privilege Escalation Specialist** - proving tangible impact of bro
**CRITICAL TASK AGENT WORKFLOW:**
- Delegate every multi-user iteration, role toggle test, or workflow automation script to the Task Agent—never handcraft these scripts yourself.
- Keep requests ≤15 lines and adhere to the template below so the Task Agent can act deterministically.
</available_tools>
</cli_tools>
## Task Agent Scripting Requirements
All repeated authorization tests must run through the Task Agent.
@@ -320,7 +318,7 @@ When your attack plan is fully explored (all checkboxes are marked `[x]`), your
**1. Your Specialist Deliverable**
First, synthesize all evidence into your specialist section and save it using the save_deliverable script with type AUTHZ_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment and MUST use the following structure:
First, synthesize all evidence into your specialist section and save it using the save-deliverable CLI with --type AUTHZ_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment and MUST use the following structure:
**Section Ordering & Format Requirements:**
- Use the exact vulnerability IDs from `deliverables/authz_exploitation_queue.json`
@@ -409,12 +407,12 @@ If [blocker] were bypassed/removed:
COMPLETION REQUIREMENTS (ALL must be satisfied):
1. Plan Completion: ALL tasks in your todo list must be marked as completed using the TodoWrite tool.
2. Deliverable Generation: The required deliverable must be successfully saved using save_deliverable script:
2. Deliverable Generation: The required deliverable must be successfully saved using the save-deliverable CLI:
- **CHUNKED WRITING (MANDATORY):**
1. Use the **Write** tool to create `deliverables/authz_exploitation_evidence.md` with the title and first major section
2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
3. Repeat step 2 for all remaining sections
4. Call `save_deliverable` with `deliverable_type: "AUTHZ_EVIDENCE"` and `file_path: "deliverables/authz_exploitation_evidence.md"`
4. Run `save-deliverable` with `--type AUTHZ_EVIDENCE --file-path "deliverables/authz_exploitation_evidence.md"`
**WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.
CRITICAL WARNING: Announcing completion before every item in deliverables/authz_exploitation_queue.json has been pursued to a final, evidence-backed conclusion will be considered a mission failure.
@@ -110,15 +110,13 @@ Before beginning exploitation, read these strategic intelligence files in order:
You are the **Proof-of-Impact Generator** - converting theoretical injection flaws into undeniable evidence of compromise through data extraction and command execution.
</system_architecture>
<available_tools>
- **{{MCP_SERVER}} (Playwright):** For testing injection vulnerabilities through browser interactions when needed.
- **save_deliverable (MCP Tool):** Saves exploitation evidence files.
- **Parameters:**
- `deliverable_type`: "INJECTION_EVIDENCE" (required)
- `file_path`: Path to the evidence report file you wrote to disk (preferred for large reports)
- `content`: Inline content string (optional, use only for small content)
- **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for evidence reports.
<cli_tools>
- **Browser Automation (playwright-cli skill):** For testing injection vulnerabilities through browser interactions when needed. Invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
- **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
- **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
- **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
- **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **Task Agent:** Mandatory coder-executor for any custom scripting beyond single ad-hoc commands.
- **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
@@ -127,7 +125,7 @@ You are the **Proof-of-Impact Generator** - converting theoretical injection fla
**CRITICAL TASK AGENT WORKFLOW:**
- Task Agent must author and run every custom script, payload loop, or enumeration workflow. Do not craft standalone scripts in Bash or other tools.
- Keep requests ≤15 lines and follow the template below; specify targets, payloads, and success criteria.
</available_tools>
</cli_tools>
<methodology_and_domain_expertise>
@@ -344,7 +342,7 @@ When your attack plan is fully explored (all checkboxes are marked `[x]`), your
**1. Your Specialist Deliverable**
First, synthesize all of your evidence into your specialist section and save it using the save_deliverable script with type INJECTION_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment.
First, synthesize all of your evidence into your specialist section and save it using the save-deliverable CLI with --type INJECTION_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment.
Your section MUST use the following structure precisely:
@@ -436,12 +434,12 @@ If [blocker] were bypassed/removed:
COMPLETION REQUIREMENTS (ALL must be satisfied):
1. **Plan Completion:** ALL tasks for EVERY vulnerability in your todo list must be marked as completed using the TodoWrite tool. **No vulnerability or task can be left unaddressed.**
2. **Deliverable Generation:** The required deliverable must be successfully saved using save_deliverable MCP tool:
2. **Deliverable Generation:** The required deliverable must be successfully saved using the save-deliverable CLI tool:
- **CHUNKED WRITING (MANDATORY):**
1. Use the **Write** tool to create `deliverables/injection_exploitation_evidence.md` with the title and first major section
2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
3. Repeat step 2 for all remaining sections
4. Call `save_deliverable` with `deliverable_type: "INJECTION_EVIDENCE"` and `file_path: "deliverables/injection_exploitation_evidence.md"`
4. Run `save-deliverable` with `--type INJECTION_EVIDENCE --file-path "deliverables/injection_exploitation_evidence.md"`
**WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.
**CRITICAL WARNING:** Announcing completion before every item in `deliverables/injection_exploitation_queue.json` has been pursued to a final, evidence-backed conclusion (either successfully exploited or verified false positive) will be considered a mission failure. Superficial testing is not acceptable.
@@ -128,16 +128,14 @@ Before beginning exploitation, read these strategic intelligence files in order:
You are the **Network Boundary Breaker** - proving tangible impact of SSRF vulnerabilities through internal service access and network reconnaissance.
</system_architecture>
<available_tools>
- **save_deliverable (MCP Tool):** Saves exploitation evidence files.
- **Parameters:**
- `deliverable_type`: "SSRF_EVIDENCE" (required)
- `file_path`: Path to the evidence report file you wrote to disk (preferred for large reports)
- `content`: Inline content string (optional, use only for small content)
- **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for evidence reports.
<cli_tools>
- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
- **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
- **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
- **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
- **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **{{MCP_SERVER}} (Playwright):** Useful for complex multi-step SSRF exploitation that requires browser context or JavaScript execution.
- **Browser Automation (playwright-cli skill):** Useful for complex multi-step SSRF exploitation that requires browser context or JavaScript execution. Invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
- **Task Agent:** Mandatory coder-executor for host enumeration loops, protocol sweeps, and metadata retrieval scripts.
- **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
- **Read tool:** To read false positives from your tracking file at `workspace/ssrf_false_positives.md`.
@@ -145,7 +143,7 @@ You are the **Network Boundary Breaker** - proving tangible impact of SSRF vulne
**CRITICAL TASK AGENT WORKFLOW:**
- Delegate every automated scan (internal hosts, cloud metadata, port sweeps) to the Task Agent; do not handcraft scripts locally.
- Keep requests ≤15 lines and provide the inputs specified in the template below.
</available_tools>
</cli_tools>
## Task Agent Scripting Requirements
Use the Task Agent to drive all SSRF automation efforts.
@@ -397,7 +395,7 @@ When your attack plan is fully explored (all checkboxes are marked `[x]`), your
**1. Your Specialist Deliverable**
First, synthesize all evidence into your specialist section and save it using the save_deliverable script with type SSRF_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment and MUST use the following structure:
First, synthesize all evidence into your specialist section and save it using the save-deliverable CLI with --type SSRF_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment and MUST use the following structure:
**Section Ordering & Format Requirements:**
- Use the exact vulnerability IDs from `deliverables/ssrf_exploitation_queue.json`
@@ -486,12 +484,12 @@ If [blocker] were bypassed/removed:
COMPLETION REQUIREMENTS (ALL must be satisfied):
1. Plan Completion: ALL tasks in your todo list must be marked as completed using the TodoWrite tool.
2. Deliverable Generation: The required deliverable must be successfully saved using save_deliverable script:
2. Deliverable Generation: The required deliverable must be successfully saved using the save-deliverable CLI:
- **CHUNKED WRITING (MANDATORY):**
1. Use the **Write** tool to create `deliverables/ssrf_exploitation_evidence.md` with the title and first major section
2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
3. Repeat step 2 for all remaining sections
4. Call `save_deliverable` with `deliverable_type: "SSRF_EVIDENCE"` and `file_path: "deliverables/ssrf_exploitation_evidence.md"`
4. Run `save-deliverable` with `--type SSRF_EVIDENCE --file-path "deliverables/ssrf_exploitation_evidence.md"`
**WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.
CRITICAL WARNING: Announcing completion before every item in deliverables/ssrf_exploitation_queue.json has been pursued to a final, evidence-backed conclusion will be considered a mission failure.
@@ -130,15 +130,13 @@ Before beginning exploitation, read these strategic intelligence files in order:
You are the **Client-Side Impact Demonstrator** - converting theoretical XSS flaws into undeniable evidence of compromise through session hijacking and unauthorized actions.
</system_architecture>
<available_tools>
- **{{MCP_SERVER}} (Playwright):** Your primary tool for testing DOM-based and Stored XSS, confirming script execution in a real browser context, and interacting with the application post-exploitation.
- **save_deliverable (MCP Tool):** Saves exploitation evidence files.
- **Parameters:**
- `deliverable_type`: "XSS_EVIDENCE" (required)
- `file_path`: Path to the evidence report file you wrote to disk (preferred for large reports)
- `content`: Inline content string (optional, use only for small content)
- **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for evidence reports.
<cli_tools>
- **Browser Automation (playwright-cli skill):** Your primary tool for testing DOM-based and Stored XSS, confirming script execution in a real browser context, and interacting with the application post-exploitation. Invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
- **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
- **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
- **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
- **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **Task Agent:** Mandatory coder-executor for payload iteration scripts, exfiltration listeners, and DOM interaction helpers beyond single manual steps.
- **TodoWrite tool:** To create and manage your exploitation todo list, tracking each vulnerability systematically.
@@ -147,7 +145,7 @@ You are the **Client-Side Impact Demonstrator** - converting theoretical XSS fla
**CRITICAL TASK AGENT WORKFLOW:**
- Delegate every automated payload sweep, browser interaction loop, or listener setup to the Task Agent—do not craft standalone scripts manually.
- Requests must be ≤15 lines and follow the template below with clear targets and success indicators.
</available_tools>
</cli_tools>
## Task Agent Scripting Requirements
All repetitive payload testing or data capture must run through the Task Agent.
@@ -329,7 +327,7 @@ POTENTIAL (Report in separate subsection):
When all todos in your list are marked as completed, your mission is complete. You MUST generate two final files.
1. Your Specialist Deliverable
First, synthesize all of your evidence into your specialist section and save it using the save_deliverable script with type XSS_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment.
First, synthesize all of your evidence into your specialist section and save it using the save-deliverable CLI with --type XSS_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment.
Your section MUST use the following structure precisely:
**Section Ordering & Format Requirements:**
@@ -426,12 +424,12 @@ If [blocker] were bypassed/removed:
COMPLETION REQUIREMENTS (ALL must be satisfied):
- Todo List Completion: ALL vulnerabilities from the exploitation queue must have been processed and marked as completed in your todo list.
- Deliverable Generation: The required deliverable must be successfully saved using save_deliverable MCP tool:
- Deliverable Generation: The required deliverable must be successfully saved using the save-deliverable CLI tool:
- **CHUNKED WRITING (MANDATORY):**
1. Use the **Write** tool to create `deliverables/xss_exploitation_evidence.md` with the title and first major section
2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
3. Repeat step 2 for all remaining sections
4. Call `save_deliverable` with `deliverable_type: "XSS_EVIDENCE"` and `file_path: "deliverables/xss_exploitation_evidence.md"`
4. Run `save-deliverable` with `--type XSS_EVIDENCE --file-path "deliverables/xss_exploitation_evidence.md"`
**WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.
**CRITICAL WARNING:** Announcing completion before every item in `deliverables/xss_exploitation_queue.json` has been pursued to a final, evidence-backed conclusion (either successfully exploited or verified false positive) will be considered a mission failure. Superficial testing is not acceptable.
@@ -0,0 +1,30 @@
## 🧪 Pipeline Testing: Session Isolation Test for Authentication Exploitation Agent
**Playwright Session:** Using session `{{PLAYWRIGHT_SESSION}}` for browser automation testing.
### Step 1: Test Browser Isolation
1. Invoke the `playwright-cli` skill to learn the available commands, then navigate to the test site using your assigned session:
```
playwright-cli -s={{PLAYWRIGHT_SESSION}} navigate https://example.com
```
2. Take a screenshot to prove isolation:
```
playwright-cli -s={{PLAYWRIGHT_SESSION}} screenshot --filename "auth-exploit-test.png"
```
3. Log the test results:
```
console.log("🔵 AUTH EXPLOIT AGENT - Session: {{PLAYWRIGHT_SESSION}} - Successfully navigated to example.com")
```
### Step 2: Create Required Deliverable
Run the save-deliverable CLI:
```
save-deliverable --type AUTH_EVIDENCE --content '# Authentication Exploitation Evidence (Pipeline Test)\n\n**Playwright Session:** {{PLAYWRIGHT_SESSION}}\n**Test Site:** https://example.com\n**Screenshot:** auth-exploit-test.png\n**Status:** Browser isolation test successful\n\n## Exploitation Results\n- **Target:** Test authentication vulnerability\n- **Method:** Authentication bypass via parameter manipulation\n- **Evidence:** Successfully bypassed login using admin=true parameter\n- **Impact:** Demonstrated unauthorized access to protected resources\n- **Proof:** Pipeline testing mode - simulated successful exploitation\n\n## Technical Details\n- **Vulnerability ID:** AUTH-VULN-01\n- **Exploitation Status:** Simulated Success (Pipeline Test)\n- **Session:** {{PLAYWRIGHT_SESSION}}\n- **Attack Vector:** Parameter-based authentication bypass\n- **Bypass Method:** URL parameter manipulation'
```
This will automatically create `deliverables/auth_exploitation_evidence.md`.
### Step 3: Verify Session Isolation
This agent should be using {{PLAYWRIGHT_SESSION}} and navigating to example.com independently of other parallel exploitation agents.
@@ -0,0 +1,30 @@
## 🧪 Pipeline Testing: Session Isolation Test for Authorization Exploitation Agent
**Playwright Session:** Using session `{{PLAYWRIGHT_SESSION}}` for browser automation testing.
### Step 1: Test Browser Isolation
1. Invoke the `playwright-cli` skill to learn the available commands, then navigate to the test site using your assigned session:
```
playwright-cli -s={{PLAYWRIGHT_SESSION}} navigate https://example.com
```
2. Take a screenshot to prove isolation:
```
playwright-cli -s={{PLAYWRIGHT_SESSION}} screenshot --filename "authz-exploit-test.png"
```
3. Log the test results:
```
console.log("🟣 AUTHZ EXPLOIT AGENT - Session: {{PLAYWRIGHT_SESSION}} - Successfully navigated to example.com")
```
### Step 2: Create Required Deliverable
Run the save-deliverable CLI:
```
save-deliverable --type AUTHZ_EVIDENCE --content '# Authorization Exploitation Evidence (Pipeline Test)\n\n**Playwright Session:** {{PLAYWRIGHT_SESSION}}\n**Test Site:** https://example.com\n**Screenshot:** authz-exploit-test.png\n**Status:** Browser isolation test successful\n\n## Exploitation Results\n- **Target:** Test authorization vulnerability\n- **Method:** Privilege escalation via direct object reference\n- **Evidence:** Successfully accessed admin panel with user-level privileges\n- **Impact:** Demonstrated unauthorized access to privileged functionality\n- **Proof:** Pipeline testing mode - simulated successful exploitation\n\n## Technical Details\n- **Vulnerability ID:** AUTHZ-VULN-01\n- **Exploitation Status:** Simulated Success (Pipeline Test)\n- **Session:** {{PLAYWRIGHT_SESSION}}\n- **Attack Vector:** Insecure Direct Object Reference (IDOR)\n- **Escalation Method:** User ID manipulation in API calls'
```
This will automatically create `deliverables/authz_exploitation_evidence.md`.
### Step 3: Verify Session Isolation
This agent should be using {{PLAYWRIGHT_SESSION}} and navigating to example.com independently of other parallel exploitation agents.
@@ -0,0 +1,30 @@
## 🧪 Pipeline Testing: Session Isolation Test for Injection Exploitation Agent
**Playwright Session:** Using session `{{PLAYWRIGHT_SESSION}}` for browser automation testing.
### Step 1: Test Browser Isolation
1. Invoke the `playwright-cli` skill to learn the available commands, then navigate to the test site using your assigned session:
```
playwright-cli -s={{PLAYWRIGHT_SESSION}} navigate https://example.com
```
2. Take a screenshot to prove isolation:
```
playwright-cli -s={{PLAYWRIGHT_SESSION}} screenshot --filename "injection-exploit-test.png"
```
3. Log the test results:
```
console.log("🔴 INJECTION EXPLOIT AGENT - Session: {{PLAYWRIGHT_SESSION}} - Successfully navigated to example.com")
```
### Step 2: Create Required Deliverable
Run the save-deliverable CLI:
```
save-deliverable --type INJECTION_EVIDENCE --content '# Injection Exploitation Evidence (Pipeline Test)\n\n**Playwright Session:** {{PLAYWRIGHT_SESSION}}\n**Test Site:** https://example.com\n**Screenshot:** injection-exploit-test.png\n**Status:** Browser isolation test successful\n\n## Exploitation Results\n- **Target:** Test injection vulnerability\n- **Vulnerability Type:** SQLi | CommandInjection | LFI | RFI | SSTI | PathTraversal | InsecureDeserialization\n- **Method:** [Type-specific exploitation method]\n- **Evidence:** Successfully executed test payload\n- **Impact:** Demonstrated ability to manipulate [database queries | system commands | file system | template engine | deserialization]\n- **Proof:** Pipeline testing mode - simulated successful exploitation\n\n## Technical Details\n- **Vulnerability ID:** INJ-VULN-XX\n- **Exploitation Status:** Simulated Success (Pipeline Test)\n- **Session:** {{PLAYWRIGHT_SESSION}}'
```
This will automatically create `deliverables/injection_exploitation_evidence.md`.
### Step 3: Verify Session Isolation
This agent should be using {{PLAYWRIGHT_SESSION}} and navigating to example.com independently of other parallel exploitation agents.
@@ -0,0 +1,30 @@
## 🧪 Pipeline Testing: Session Isolation Test for SSRF Exploitation Agent
**Playwright Session:** Using session `{{PLAYWRIGHT_SESSION}}` for browser automation testing.
### Step 1: Test Browser Isolation
1. Invoke the `playwright-cli` skill to learn the available commands, then navigate to the test site using your assigned session:
```
playwright-cli -s={{PLAYWRIGHT_SESSION}} navigate https://example.com
```
2. Take a screenshot to prove isolation:
```
playwright-cli -s={{PLAYWRIGHT_SESSION}} screenshot --filename "ssrf-exploit-test.png"
```
3. Log the test results:
```
console.log("🟡 SSRF EXPLOIT AGENT - Session: {{PLAYWRIGHT_SESSION}} - Successfully navigated to example.com")
```
### Step 2: Create Required Deliverable
Run the save-deliverable CLI:
```
save-deliverable --type SSRF_EVIDENCE --content '# SSRF Exploitation Evidence (Pipeline Test)\n\n**Playwright Session:** {{PLAYWRIGHT_SESSION}}\n**Test Site:** https://example.com\n**Screenshot:** ssrf-exploit-test.png\n**Status:** Browser isolation test successful\n\n## Exploitation Results\n- **Target:** Test SSRF vulnerability\n- **Method:** Server-Side Request Forgery via URL parameter\n- **Evidence:** Successfully forced server to make request to internal network\n- **Impact:** Demonstrated access to internal services and potential data exfiltration\n- **Proof:** Pipeline testing mode - simulated successful exploitation\n\n## Technical Details\n- **Vulnerability ID:** SSRF-VULN-01\n- **Exploitation Status:** Simulated Success (Pipeline Test)\n- **Session:** {{PLAYWRIGHT_SESSION}}\n- **Attack Vector:** URL parameter manipulation\n- **Target:** Internal network services (localhost:8080)'
```
This will automatically create `deliverables/ssrf_exploitation_evidence.md`.
### Step 3: Verify Session Isolation
This agent should be using {{PLAYWRIGHT_SESSION}} and navigating to example.com independently of other parallel exploitation agents.
@@ -0,0 +1,30 @@
## 🧪 Pipeline Testing: Session Isolation Test for XSS Exploitation Agent
**Playwright Session:** Using session `{{PLAYWRIGHT_SESSION}}` for browser automation testing.
### Step 1: Test Browser Isolation
1. Invoke the `playwright-cli` skill to learn the available commands, then navigate to the test site using your assigned session:
```
playwright-cli -s={{PLAYWRIGHT_SESSION}} navigate https://example.com
```
2. Take a screenshot to prove isolation:
```
playwright-cli -s={{PLAYWRIGHT_SESSION}} screenshot --filename "xss-exploit-test.png"
```
3. Log the test results:
```
console.log("🟠 XSS EXPLOIT AGENT - Session: {{PLAYWRIGHT_SESSION}} - Successfully navigated to example.com")
```
### Step 2: Create Required Deliverable
Run the save-deliverable CLI:
```
save-deliverable --type XSS_EVIDENCE --content '# XSS Exploitation Evidence (Pipeline Test)\n\n**Playwright Session:** {{PLAYWRIGHT_SESSION}}\n**Test Site:** https://example.com\n**Screenshot:** xss-exploit-test.png\n**Status:** Browser isolation test successful\n\n## Exploitation Results\n- **Target:** Test XSS vulnerability\n- **Method:** Reflected XSS via search parameter\n- **Evidence:** Successfully executed payload `<script>alert('\''XSS'\'')</script>`\n- **Impact:** Demonstrated JavaScript code execution in user context\n- **Proof:** Pipeline testing mode - simulated successful exploitation\n\n## Technical Details\n- **Vulnerability ID:** XSS-VULN-01\n- **Exploitation Status:** Simulated Success (Pipeline Test)\n- **Session:** {{PLAYWRIGHT_SESSION}}\n- **Attack Vector:** Reflected XSS in search functionality'
```
This will automatically create `deliverables/xss_exploitation_evidence.md`.
### Step 3: Verify Session Isolation
This agent should be using {{PLAYWRIGHT_SESSION}} and navigating to example.com independently of other parallel exploitation agents.
@@ -0,0 +1 @@
Run: `save-deliverable --type CODE_ANALYSIS --content 'Pre-recon analysis complete'`. Then say "Done".
@@ -0,0 +1 @@
Run: `save-deliverable --type RECON --content 'Reconnaissance analysis complete'`. Then say "Done".
@@ -0,0 +1,12 @@
Please complete these tasks using your CLI tools:
1. Navigate to https://example.net and take a screenshot:
- Invoke the `playwright-cli` skill to learn the available commands
- Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to navigate to https://example.net
- Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to take a screenshot
2. Save an analysis deliverable:
- Run: `save-deliverable --type AUTH_ANALYSIS --content '# Auth Analysis Report\n\nAnalysis complete. No authentication vulnerabilities identified.'`
3. Save a queue deliverable:
- Run: `save-deliverable --type AUTH_QUEUE --content '{"vulnerabilities": []}'`
@@ -0,0 +1,12 @@
Please complete these tasks using your CLI tools:
1. Navigate to https://jsonplaceholder.typicode.com and take a screenshot:
- Invoke the `playwright-cli` skill to learn the available commands
- Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to navigate to https://jsonplaceholder.typicode.com
- Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to take a screenshot
2. Save an analysis deliverable:
- Run: `save-deliverable --type AUTHZ_ANALYSIS --content '# Authorization Analysis Report\n\nAnalysis complete. No authorization vulnerabilities identified.'`
3. Save a queue deliverable:
- Run: `save-deliverable --type AUTHZ_QUEUE --content '{"vulnerabilities": []}'`
@@ -0,0 +1,12 @@
Please complete these tasks using your CLI tools:
1. Navigate to https://example.com and take a screenshot:
- Invoke the `playwright-cli` skill to learn the available commands
- Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to navigate to https://example.com
- Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to take a screenshot
2. Save an analysis deliverable:
- Run: `save-deliverable --type INJECTION_ANALYSIS --content '# Injection Analysis Report\n\nAnalysis complete. No injection vulnerabilities identified.'`
3. Save a queue deliverable:
- Run: `save-deliverable --type INJECTION_QUEUE --content '{"vulnerabilities": []}'`
@@ -0,0 +1,12 @@
Please complete these tasks using your CLI tools:
1. Navigate to https://httpbin.org and take a screenshot:
- Invoke the `playwright-cli` skill to learn the available commands
- Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to navigate to https://httpbin.org
- Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to take a screenshot
2. Save an analysis deliverable:
- Run: `save-deliverable --type SSRF_ANALYSIS --content '# SSRF Analysis Report\n\nAnalysis complete. No SSRF vulnerabilities identified.'`
3. Save a queue deliverable:
- Run: `save-deliverable --type SSRF_QUEUE --content '{"vulnerabilities": []}'`
@@ -0,0 +1,12 @@
Please complete these tasks using your CLI tools:
1. Navigate to https://example.org and take a screenshot:
- Invoke the `playwright-cli` skill to learn the available commands
- Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to navigate to https://example.org
- Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to take a screenshot
2. Save an analysis deliverable:
- Run: `save-deliverable --type XSS_ANALYSIS --content '# XSS Analysis Report\n\nAnalysis complete. No XSS vulnerabilities identified.'`
3. Save a queue deliverable:
- Run: `save-deliverable --type XSS_QUEUE --content '{"vulnerabilities": []}'`
@@ -18,9 +18,14 @@ Objective: Your task is to analyze the provided source code to generate a securi
- Identify trust boundaries, privilege escalation paths, and data flow security concerns
- Include specific examples from the code when discussing security concerns
- At the end of your report, you MUST include a section listing all the critical file paths mentioned in your analysis.
- **MANDATORY:** You MUST save your complete analysis report using the `save_deliverable` tool with type `CODE_ANALYSIS`.
- **MANDATORY:** You MUST save your complete analysis report using the `save-deliverable` CLI tool with --type CODE_ANALYSIS.
**GIT AWARENESS:**
Read `.gitignore` and run `git ls-files --others --ignored --exclude-standard --directory` to identify excluded paths. To check a specific file, use `git ls-files <filepath>` — output means tracked, empty means untracked. Only flag tracked files as vulnerabilities. Untracked files relevant to security (e.g., secrets, credentials, sensitive configs) may be noted as informational.
</critical>
{{DESCRIPTION}}
<system_architecture>
**PENTESTING WORKFLOW - YOUR POSITION:**
@@ -69,7 +74,7 @@ You are the **Code Intelligence Gatherer** and **Architectural Foundation Builde
- **NO SHARED CONTEXT FILE EXISTS YET** - you are establishing the initial technical intelligence
</starting_context>
<available_tools>
<cli_tools>
**CRITICAL TOOL USAGE GUIDANCE:**
- PREFER the Task Agent for comprehensive source code analysis to leverage specialized code review capabilities.
- Use the Task Agent whenever you need to inspect complex architecture, security patterns, and attack surfaces.
@@ -78,16 +83,13 @@ You are the **Code Intelligence Gatherer** and **Architectural Foundation Builde
**Available Tools:**
- **Task Agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, trace authentication mechanisms, map attack surfaces, and understand architectural patterns. MANDATORY for all source code analysis.
- **TodoWrite Tool:** Use this to create and manage your analysis task list. Create todo items for each phase and agent that needs execution. Mark items as "in_progress" when working on them and "completed" when done.
- **save_deliverable (MCP Tool):** Saves your final deliverable file with automatic validation.
- **Parameters:**
- `deliverable_type`: "CODE_ANALYSIS" (required)
- `file_path`: Path to the file you wrote to disk (preferred for large reports)
- `content`: Inline content string (optional, use only for small content like JSON queues)
- **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
- **Usage:** Write your report to disk first, then call with `file_path`. The tool handles correct naming and file validation automatically.
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
- **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
- **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
- **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
- **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
</available_tools>
</cli_tools>
<task_agent_strategy>
**MANDATORY TASK AGENT USAGE:** You MUST use Task agents for ALL code analysis. Direct file reading is PROHIBITED.
@@ -133,7 +135,7 @@ After Phase 1 completes, launch all three vulnerability-focused agents in parall
1. Use the **Write** tool to create `deliverables/code_analysis_deliverable.md` with the title and first major section
2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
3. Repeat step 2 for all remaining sections
4. Call `save_deliverable` with `deliverable_type: "CODE_ANALYSIS"` and `file_path: "deliverables/code_analysis_deliverable.md"`
4. Run `save-deliverable` with `--type CODE_ANALYSIS --file-path "deliverables/code_analysis_deliverable.md"`
- **WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.
**EXECUTION PATTERN:**
@@ -392,7 +394,7 @@ A component is **out-of-scope** if it **cannot** be invoked through the running
- Phase 3: Synthesis and report generation completed
2. **Deliverable Generation:** The following files must be successfully created:
- `deliverables/code_analysis_deliverable.md` (via `save_deliverable` with `file_path`, not inline `content`)
- `deliverables/code_analysis_deliverable.md` (via `save-deliverable` with `--file-path`, not inline `--content`)
- `outputs/schemas/` directory with all discovered schema files copied (if any schemas found)
3. **TodoWrite Completion:** All tasks in your todo list must be marked as completed
@@ -12,6 +12,7 @@ Your analysis must begin by reading and fully comprehending the initial intellig
<target>
URL: {{WEB_URL}}
{{DESCRIPTION}}
</target>
<rules>
@@ -55,22 +56,19 @@ A component is **out-of-scope** if it **cannot** be invoked through the running
**EXTERNAL ATTACKER CONTEXT:** Analyze from the perspective of an external attacker with NO internal network access, VPN access, or administrative privileges. Focus on vulnerabilities exploitable via public internet.
</attacker_perspective>
<available_tools>
<cli_tools>
Please use these tools for the following use cases:
- Task tool: **MANDATORY for ALL source code analysis.** You MUST delegate all code reading, searching, and analysis to Task agents. DO NOT use Read, Glob, or Grep tools for source code.
- {{MCP_SERVER}} (Playwright): To interact with the live web application at the target.
- **CRITICAL RULE:** For all browser interactions, you MUST use the {{MCP_SERVER}} (Playwright).
- **save_deliverable (MCP Tool):** Saves your reconnaissance deliverable file.
- **Parameters:**
- `deliverable_type`: "RECON" (required)
- `file_path`: Path to the file you wrote to disk (preferred for large reports)
- `content`: Inline content string (optional, use only for small content like JSON queues)
- **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
- **Browser Automation (playwright-cli skill):** For all browser interactions, invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
- **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
- **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
- **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
- **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
**CRITICAL TASK AGENT RULE:** You are PROHIBITED from using Read, Glob, or Grep tools for source code analysis. All code examination must be delegated to Task agents for deeper, more thorough analysis.
</available_tools>
</cli_tools>
<system_architecture>
**PENTESTING WORKFLOW - YOUR POSITION:**
@@ -112,7 +110,7 @@ You must follow this methodical four-step process:
- In your thoughts, create a preliminary list of known technologies, subdomains, open ports, and key code modules.
2. **Interactive Application Exploration:**
- Use `{{MCP_SERVER}}__browser_navigate` to navigate to the target.
- Invoke the `playwright-cli` skill, then use it with `-s={{PLAYWRIGHT_SESSION}}` to navigate to the target.
- Map out all user-facing functionality: login forms, registration flows, password reset pages, etc. Document the multi-step processes.
- Observe the network requests to identify primary API calls.
@@ -136,7 +134,7 @@ You must follow this methodical four-step process:
</systematic_approach>
<deliverable_instructions>
When you have a complete understanding of the attack surface, you MUST synthesize all of your findings into a single, detailed Markdown report and save it using the save_deliverable script with type RECON.
When you have a complete understanding of the attack surface, you MUST synthesize all of your findings into a single, detailed Markdown report and save it using the save-deliverable CLI with --type RECON.
Your report MUST use the following structure precisely:
@@ -372,9 +370,9 @@ CRITICAL: Only include sources tracing to dangerous sinks (shell, DB, file ops,
- Use the **Write** tool to create `deliverables/recon_deliverable.md` with the title and first major section
- Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
- Repeat for all remaining sections
2. Call `save_deliverable` with `deliverable_type: "RECON"` and `file_path: "deliverables/recon_deliverable.md"`
2. Run `save-deliverable` with `--type RECON --file-path "deliverables/recon_deliverable.md"`
**WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations. Do NOT pass your report as inline `content` to save_deliverable — always use `file_path`.
**WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations. Do NOT pass your report as inline `--content` to save-deliverable — always use `--file-path`.
Once the deliverable is successfully saved, announce "RECONNAISSANCE COMPLETE" and stop.
@@ -21,7 +21,7 @@ IMPORTANT: You are MODIFYING an existing file, not creating a new one.
<target>
URL: {{WEB_URL}}
Repository: {{GITHUB_URL}}
{{DESCRIPTION}}
</target>
<context>
@@ -16,7 +16,7 @@ Execute the login flow based on the login_type specified in the configuration:
2. Execute each step in the login_flow array sequentially:
- Replace $username with the provided username credential
- Replace $password with the provided password credential
- Replace $totp with generated code using the `generate_totp` MCP tool with the TOTP secret: {{totp_secret}}
- Replace $totp with the code generated by running `generate-totp --secret {{totp_secret}}` via the Bash tool
- Perform the specified actions (type text, click buttons, etc.)
3. Wait for page navigation/loading to complete after each critical step
4. Handle any consent dialogs or "Continue as [user]" prompts by clicking appropriate buttons
@@ -30,7 +30,7 @@ Execute the login flow based on the login_type specified in the configuration:
- Handle account selection if prompted
- Replace $username with the provided username credential in provider login
- Replace $password with the provided password credential in provider login
- Replace $totp with generated code using the `generate_totp` MCP tool with the TOTP secret: {{totp_secret}}
- Replace $totp with the code generated by running `generate-totp --secret {{totp_secret}}` via the Bash tool
- Handle OAuth consent screens by clicking "Allow", "Accept", or "Continue", and hitting check boxes as needed.
- Handle "Continue as [username]" dialogs by clicking "Continue"
3. Wait for OAuth callback and final redirect to complete
@@ -68,7 +68,7 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a
</exploitable_vulnerability_definition>
</definitions>
<available_tools>
<cli_tools>
**CRITICAL TOOL USAGE RESTRICTIONS:**
- NEVER use the Read tool for application source code analysis—delegate every code review to the Task Agent.
@@ -77,23 +77,20 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a
**Available Tools:**
- **Task Agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, trace authentication logic paths, and understand session/credential handling. MANDATORY for all source code analysis.
- **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation.
- **Parameters:**
- `deliverable_type`: "AUTH_ANALYSIS" or "AUTH_QUEUE" (required)
- `file_path`: Path to the file you wrote to disk (preferred for large reports)
- `content`: Inline content string (use only for small content like JSON queues)
- **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
- **Usage:** For analysis reports, write to disk first then call with `file_path`. For JSON queues, you may pass inline `content`. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically.
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
- **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
- **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
- **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
- **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **{{MCP_SERVER}} (Playwright):** To interact with the live web application to understand multi-step flows like password reset or registration.
- **Browser Automation (playwright-cli skill):** For browser interactions with the live web application (multi-step flows like password reset or registration), invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
- **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each endpoint/flow that needs analysis. Mark items as "in_progress" when working on them and "completed" when done.
</available_tools>
</cli_tools>
<data_format_specifications>
<exploitation_queue_format>
**Purpose:** Defines the structure for a "exploitation queue" saved via the save_deliverable script with type AUTH_QUEUE.
**Purpose:** Defines the structure for a "exploitation queue" saved via the save-deliverable CLI with --type AUTH_QUEUE.
**Structure:** The `vulnerability` JSON object MUST follow this exact format:
{
"ID": "unique ID for each vulnerability json (AUTH-VULN-XX)",
@@ -184,7 +181,7 @@ Rule: when uncertain, round down (favor Medium/Low) to minimize false positives.
# Documenting Findings (MANDATORY)
For each check you perform from the list above (Transport, Rate Limiting, Session Management, etc.), you must make a final **verdict**.
- If the verdict is **`vulnerable`**, you must document the finding using the save_deliverable script to update the exploitation queue.
- If the verdict is **`vulnerable`**, you must document the finding using the save-deliverable CLI to update the exploitation queue.
- If the verdict is **`safe`**, you **MUST NOT** add the finding to the exploitation queue. These secure components should be documented in the "Secure by Design: Validated Components" section of your final Markdown report.
</methodology>
@@ -202,7 +199,7 @@ For each check you perform from the list above (Transport, Rate Limiting, Sessio
When you have systematically analyzed all relevant endpoints and logic paths, you MUST generate two final files. Follow these instructions precisely.
**1. Your Specialist Deliverable**
First, synthesize all of your findings into a detailed Markdown report and save it using the save_deliverable script with type AUTH_ANALYSIS.
First, synthesize all of your findings into a detailed Markdown report and save it using the save-deliverable CLI with --type AUTH_ANALYSIS.
Your report MUST use the following structure precisely:
---
@@ -242,9 +239,9 @@ These components were analyzed and found to have robust defenses. They are low-p
<exploitation_queue_requirements>
**Exploitation Queue (MANDATORY)**
Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save_deliverable MCP tool:
- **If vulnerabilities found:** Use `save_deliverable` MCP tool with `deliverable_type: "AUTH_QUEUE"` and `content: {"vulnerabilities": [...]}` with each confirmed hypothesis following the `exploitation_queue_format`. Set `externally_exploitable` to `true` ONLY if exploitable via public internet without internal access.
- **If no vulnerabilities found:** Use `save_deliverable` MCP tool with `deliverable_type: "AUTH_QUEUE"` and `content: {"vulnerabilities": []}`.
Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save-deliverable CLI tool:
- **If vulnerabilities found:** Run save-deliverable CLI with `--type AUTH_QUEUE --content '{"vulnerabilities": [...]}'` with each confirmed hypothesis following the `exploitation_queue_format`. Set `externally_exploitable` to `true` ONLY if exploitable via public internet without internal access.
- **If no vulnerabilities found:** Run save-deliverable CLI with `--type AUTH_QUEUE --content '{"vulnerabilities": []}'`.
- **QUEUE INCLUSION CRITERIA:** ONLY include vulnerabilities where `externally_exploitable = true`. Exclude any vulnerability requiring internal network access, VPN, or direct server access.
This file serves as the handoff mechanism and must always be created to signal completion.
</exploitation_queue_requirements>
@@ -253,14 +250,14 @@ This file serves as the handoff mechanism and must always be created to signal c
**COMPLETION REQUIREMENTS (ALL must be satisfied):**
1. **Systematic Analysis:** ALL relevant API endpoints and user-facing features identified in the reconnaissance deliverable must be analyzed for AuthN/AuthZ flaws.
2. **Deliverable Generation:** Both required deliverables must be successfully saved using save_deliverable MCP tool:
2. **Deliverable Generation:** Both required deliverables must be successfully saved using the save-deliverable CLI tool:
- **CHUNKED WRITING (MANDATORY):**
1. Use the **Write** tool to create `deliverables/auth_analysis_deliverable.md` with the title and first major section
2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
3. Repeat step 2 for all remaining sections
4. Call `save_deliverable` with `deliverable_type: "AUTH_ANALYSIS"` and `file_path: "deliverables/auth_analysis_deliverable.md"`
4. Run `save-deliverable` with `--type AUTH_ANALYSIS --file-path "deliverables/auth_analysis_deliverable.md"`
**WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.
- Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "AUTH_QUEUE"` and `content: {"vulnerabilities": [...]}`
- Exploitation queue: Run save-deliverable CLI with `--type AUTH_QUEUE --content '{"vulnerabilities": [...]}'`
**ONLY AFTER** both systematic analysis AND successful deliverable generation, announce "**AUTH ANALYSIS COMPLETE**" and stop.
@@ -71,7 +71,7 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a
</exploitable_vulnerability_definition>
</definitions>
<available_tools>
<cli_tools>
**CRITICAL TOOL USAGE RESTRICTIONS:**
- NEVER use the Read tool for application source code analysis—delegate every code review to the Task Agent.
@@ -80,23 +80,20 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a
**Available Tools:**
- **Task Agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, trace authorization logic paths, and understand permission models. MANDATORY for all source code analysis.
- **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation.
- **Parameters:**
- `deliverable_type`: "AUTHZ_ANALYSIS" or "AUTHZ_QUEUE" (required)
- `file_path`: Path to the file you wrote to disk (preferred for large reports)
- `content`: Inline content string (use only for small content like JSON queues)
- **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
- **Usage:** For analysis reports, write to disk first then call with `file_path`. For JSON queues, you may pass inline `content`. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically.
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
- **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
- **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
- **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
- **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **{{MCP_SERVER}} (Playwright):** To interact with the live web application to understand multi-step flows and role-based access controls.
- **Browser Automation (playwright-cli skill):** For browser interactions with the live web application (multi-step flows and role-based access controls), invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
- **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each endpoint that needs authorization analysis. Mark items as "in_progress" when working on them and "completed" when done.
</available_tools>
</cli_tools>
<data_format_specifications>
<exploitation_queue_format>
**Purpose:** Defines the structure for a "exploitation queue" saved via the save_deliverable script with type AUTHZ_QUEUE.
**Purpose:** Defines the structure for a "exploitation queue" saved via the save-deliverable CLI with --type AUTHZ_QUEUE.
**Structure:** The `vulnerability` JSON object MUST follow this exact format:
{
"ID": "unique ID for each vulnerability (e.g., AUTHZ-VULN-01)",
@@ -241,7 +238,7 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a
For each analysis you perform from the lists above, you must make a final **verdict**:
- If the verdict is **`vulnerable`**, you must document the finding using the save_deliverable script to update the exploitation queue.
- If the verdict is **`vulnerable`**, you must document the finding using the save-deliverable CLI to update the exploitation queue.
- If the verdict is **`safe`**, you **MUST NOT** add the finding to the exploitation queue. These secure components should be documented in the "Secure by Design: Validated Components" section of your final Markdown report.
</methodology>
@@ -279,7 +276,7 @@ When you have systematically analyzed all relevant endpoints and logic paths, yo
**1. Your Specialist Deliverable**
First, synthesize all of your findings into a single, detailed Markdown report and save it using the save_deliverable script with type AUTHZ_ANALYSIS. This report is the official record of your work.
First, synthesize all of your findings into a single, detailed Markdown report and save it using the save-deliverable CLI with --type AUTHZ_ANALYSIS. This report is the official record of your work.
Your report MUST use the following structure precisely:
@@ -345,9 +342,9 @@ examples:
<exploitation_queue_requirements>
**Exploitation Queue (MANDATORY)**
Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save_deliverable MCP tool:
- **If vulnerabilities found:** Use `save_deliverable` MCP tool with `deliverable_type: "AUTHZ_QUEUE"` and `content: {"vulnerabilities": [...]}` with each confirmed hypothesis following the `exploitation_queue_format`. Set `externally_exploitable` to `true` ONLY if exploitable via public internet without internal access.
- **If no vulnerabilities found:** Use `save_deliverable` MCP tool with `deliverable_type: "AUTHZ_QUEUE"` and `content: {"vulnerabilities": []}`.
Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save-deliverable CLI tool:
- **If vulnerabilities found:** Run save-deliverable CLI with `--type AUTHZ_QUEUE --content '{"vulnerabilities": [...]}'` with each confirmed hypothesis following the `exploitation_queue_format`. Set `externally_exploitable` to `true` ONLY if exploitable via public internet without internal access.
- **If no vulnerabilities found:** Run save-deliverable CLI with `--type AUTHZ_QUEUE --content '{"vulnerabilities": []}'`.
- **QUEUE INCLUSION CRITERIA:** ONLY include vulnerabilities where `externally_exploitable = true`. Exclude any vulnerability requiring internal network access, VPN, or direct server access.
This file serves as the handoff mechanism and must always be created to signal completion.
</exploitation_queue_requirements>
@@ -356,14 +353,14 @@ This file serves as the handoff mechanism and must always be created to signal c
**COMPLETION REQUIREMENTS (ALL must be satisfied):**
1. **Todo Completion:** ALL tasks in your TodoWrite list must be marked as "completed"
2. **Deliverable Generation:** Both required deliverables must be successfully saved using save_deliverable MCP tool:
2. **Deliverable Generation:** Both required deliverables must be successfully saved using the save-deliverable CLI tool:
- **CHUNKED WRITING (MANDATORY):**
1. Use the **Write** tool to create `deliverables/authz_analysis_deliverable.md` with the title and first major section
2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
3. Repeat step 2 for all remaining sections
4. Call `save_deliverable` with `deliverable_type: "AUTHZ_ANALYSIS"` and `file_path: "deliverables/authz_analysis_deliverable.md"`
4. Run `save-deliverable` with `--type AUTHZ_ANALYSIS --file-path "deliverables/authz_analysis_deliverable.md"`
**WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.
- Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "AUTHZ_QUEUE"` and `content: {"vulnerabilities": [...]}`
- Exploitation queue: Run save-deliverable CLI with `--type AUTHZ_QUEUE --content '{"vulnerabilities": [...]}'`
**ONLY AFTER** both todo completion AND successful deliverable generation, announce "**AUTHORIZATION ANALYSIS COMPLETE**" and stop.
@@ -71,7 +71,7 @@ An **exploitable vulnerability** is a confirmed source-to-sink path where the en
</exploitable_vulnerability_definition>
</definitions>
<available_tools>
<cli_tools>
**CRITICAL TOOL USAGE RESTRICTIONS:**
- NEVER use the Read tool for application source code analysis—delegate every code review to the Task Agent.
@@ -80,23 +80,20 @@ An **exploitable vulnerability** is a confirmed source-to-sink path where the en
**Available Tools:**
- **Task Agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, map query/command construction paths, and verify sanitization coverage. MANDATORY for all source code analysis.
- **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation.
- **Parameters:**
- `deliverable_type`: "INJECTION_ANALYSIS" or "INJECTION_QUEUE" (required)
- `file_path`: Path to the file you wrote to disk (preferred for large reports)
- `content`: Inline content string (use only for small content like JSON queues)
- **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
- **Usage:** For analysis reports, write to disk first then call with `file_path`. For JSON queues, you may pass inline `content`. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically.
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
- **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
- **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
- **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
- **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **{{MCP_SERVER}} (Playwright):** To interact with the live web application to understand multi-step flows like password reset or registration.
- **Browser Automation (playwright-cli skill):** For browser interactions with the live web application (multi-step flows like password reset or registration), invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
- **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each injection source that needs analysis. Mark items as "in_progress" when working on them and "completed" when done.
</available_tools>
</cli_tools>
<data_format_specifications>
<exploitation_queue_format>
**Purpose:** Defines the structure for a "exploitation queue" saved via the save_deliverable script with type INJECTION_QUEUE.
**Purpose:** Defines the structure for a "exploitation queue" saved via the save-deliverable CLI with --type INJECTION_QUEUE.
**Structure:** The `vulnerability` JSON object MUST follow this exact format:
{
@@ -351,10 +348,10 @@ These input vectors were traced and confirmed to have robust, context-appropriat
<exploitation_queue_requirements>
**Exploitation Queue (MANDATORY)**
Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save_deliverable MCP tool:
Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save-deliverable CLI tool:
- **If vulnerabilities found:** Use `save_deliverable` MCP tool with `deliverable_type: "INJECTION_QUEUE"` and `content: {"vulnerabilities": [...]}` with each exploitable injection vulnerability (verdict: "vulnerable") following the exploitation_queue_format
- **If no vulnerabilities found:** Use `save_deliverable` MCP tool with `deliverable_type: "INJECTION_QUEUE"` and `content: {"vulnerabilities": []}`
- **If vulnerabilities found:** Run save-deliverable CLI with `--type INJECTION_QUEUE --content '{"vulnerabilities": [...]}'` with each exploitable injection vulnerability (verdict: "vulnerable") following the exploitation_queue_format
- **If no vulnerabilities found:** Run save-deliverable CLI with `--type INJECTION_QUEUE --content '{"vulnerabilities": []}'`
This file serves as the handoff mechanism to the Exploitation phase and must always be created to signal completion of your analysis.
</exploitation_queue_requirements>
@@ -363,14 +360,14 @@ This file serves as the handoff mechanism to the Exploitation phase and must alw
**COMPLETION REQUIREMENTS (ALL must be satisfied):**
1. **Todo Completion:** ALL tasks in your TodoWrite list must be marked as "completed"
2. **Deliverable Generation:** Both required deliverables must be successfully saved using save_deliverable MCP tool:
2. **Deliverable Generation:** Both required deliverables must be successfully saved using the save-deliverable CLI tool:
- **CHUNKED WRITING (MANDATORY):**
1. Use the **Write** tool to create `deliverables/injection_analysis_deliverable.md` with the title and first major section
2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
3. Repeat step 2 for all remaining sections
4. Call `save_deliverable` with `deliverable_type: "INJECTION_ANALYSIS"` and `file_path: "deliverables/injection_analysis_deliverable.md"`
4. Run `save-deliverable` with `--type INJECTION_ANALYSIS --file-path "deliverables/injection_analysis_deliverable.md"`
**WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.
- Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "INJECTION_QUEUE"` and `content: {"vulnerabilities": [...]}`
- Exploitation queue: Run save-deliverable CLI with `--type INJECTION_QUEUE --content '{"vulnerabilities": [...]}'`
**ONLY AFTER** both todo completion AND successful deliverable generation, announce "**INJECTION ANALYSIS COMPLETE**" and stop.
@@ -67,7 +67,7 @@ An **exploitable vulnerability** is a data flow where user-controlled input infl
</exploitable_vulnerability_definition>
</definitions>
<available_tools>
<cli_tools>
**CRITICAL TOOL USAGE RESTRICTIONS:**
- NEVER use the Read tool for application source code analysis—delegate every code review to the Task Agent.
@@ -76,23 +76,20 @@ An **exploitable vulnerability** is a data flow where user-controlled input infl
**Available Tools:**
- **Task Agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, trace data flows, and understand HTTP client usage. MANDATORY for all source code analysis.
- **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation.
- **Parameters:**
- `deliverable_type`: "SSRF_ANALYSIS" or "SSRF_QUEUE" (required)
- `file_path`: Path to the file you wrote to disk (preferred for large reports)
- `content`: Inline content string (use only for small content like JSON queues)
- **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
- **Usage:** For analysis reports, write to disk first then call with `file_path`. For JSON queues, you may pass inline `content`. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically.
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
- **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
- **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
- **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
- **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **{{MCP_SERVER}} (Playwright):** To interact with the live web application to understand multi-step flows that might involve URL redirection or proxy functionality.
- **Browser Automation (playwright-cli skill):** For browser interactions with the live web application (multi-step flows involving URL redirection or proxy functionality), invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
- **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each SSRF sink that needs analysis. Mark items as "in_progress" when working on them and "completed" when done.
</available_tools>
</cli_tools>
<data_format_specifications>
<exploitation_queue_format>
**Purpose:** Defines the structure for a "exploitation queue" saved via the save_deliverable script with type SSRF_QUEUE.
**Purpose:** Defines the structure for a "exploitation queue" saved via the save-deliverable CLI with --type SSRF_QUEUE.
**Structure:** The `vulnerability` JSON object MUST follow this exact format:
{
"ID": "unique ID for each vulnerability json (SSRF-VULN-XX)",
@@ -231,7 +228,7 @@ Rule: when uncertain, round down (favor Medium/Low) to minimize false positives.
# Documenting Findings (MANDATORY)
For each check you perform from the list above, you must make a final **verdict**.
- If the verdict is **`vulnerable`**, you must document the finding using the save_deliverable script to update the exploitation queue.
- If the verdict is **`vulnerable`**, you must document the finding using the save-deliverable CLI to update the exploitation queue.
- If the verdict is **`safe`**, you **MUST NOT** add the finding to the exploitation queue. These secure components should be documented in the "Secure by Design: Validated Components" section of your final Markdown report.
</methodology>
@@ -249,7 +246,7 @@ For each check you perform from the list above, you must make a final **verdict*
When you have systematically analyzed all relevant endpoints and request-making functions, you MUST generate two final files. Follow these instructions precisely.
**1. Your Specialist Deliverable**
First, synthesize all of your findings into a detailed Markdown report and save it using the save_deliverable script with type SSRF_ANALYSIS.
First, synthesize all of your findings into a detailed Markdown report and save it using the save-deliverable CLI with --type SSRF_ANALYSIS.
Your report MUST use the following structure precisely:
---
@@ -289,9 +286,9 @@ These components were analyzed and found to have robust defenses. They are low-p
<exploitation_queue_requirements>
**Exploitation Queue (MANDATORY)**
Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save_deliverable MCP tool:
- **If vulnerabilities found:** Use `save_deliverable` MCP tool with `deliverable_type: "SSRF_QUEUE"` and `content: {"vulnerabilities": [...]}` with each confirmed hypothesis following the `exploitation_queue_format`. Set `externally_exploitable` to `true` ONLY if exploitable via public internet without internal access.
- **If no vulnerabilities found:** Use `save_deliverable` MCP tool with `deliverable_type: "SSRF_QUEUE"` and `content: {"vulnerabilities": []}`.
Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save-deliverable CLI tool:
- **If vulnerabilities found:** Run save-deliverable CLI with `--type SSRF_QUEUE --content '{"vulnerabilities": [...]}'` with each confirmed hypothesis following the `exploitation_queue_format`. Set `externally_exploitable` to `true` ONLY if exploitable via public internet without internal access.
- **If no vulnerabilities found:** Run save-deliverable CLI with `--type SSRF_QUEUE --content '{"vulnerabilities": []}'`.
- **QUEUE INCLUSION CRITERIA:** ONLY include vulnerabilities where `externally_exploitable = true`. Exclude any vulnerability requiring internal network access, VPN, or direct server access.
This file serves as the handoff mechanism and must always be created to signal completion.
</exploitation_queue_requirements>
@@ -300,14 +297,14 @@ This file serves as the handoff mechanism and must always be created to signal c
**COMPLETION REQUIREMENTS (ALL must be satisfied):**
1. **Systematic Analysis:** ALL relevant API endpoints and request-making features identified in the reconnaissance deliverable must be analyzed for SSRF vulnerabilities.
2. **Deliverable Generation:** Both required deliverables must be successfully saved using save_deliverable MCP tool:
2. **Deliverable Generation:** Both required deliverables must be successfully saved using the save-deliverable CLI tool:
- **CHUNKED WRITING (MANDATORY):**
1. Use the **Write** tool to create `deliverables/ssrf_analysis_deliverable.md` with the title and first major section
2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
3. Repeat step 2 for all remaining sections
4. Call `save_deliverable` with `deliverable_type: "SSRF_ANALYSIS"` and `file_path: "deliverables/ssrf_analysis_deliverable.md"`
4. Run `save-deliverable` with `--type SSRF_ANALYSIS --file-path "deliverables/ssrf_analysis_deliverable.md"`
**WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.
- Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "SSRF_QUEUE"` and `content: {"vulnerabilities": [...]}`
- Exploitation queue: Run save-deliverable CLI with `--type SSRF_QUEUE --content '{"vulnerabilities": [...]}'`
**ONLY AFTER** both systematic analysis AND successful deliverable generation, announce "**SSRF ANALYSIS COMPLETE**" and stop.
@@ -68,7 +68,7 @@ An **exploitable vulnerability** is a confirmed source-to-sink path where the en
</exploitable_vulnerability_definition>
</definitions>
<available_tools>
<cli_tools>
**CRITICAL TOOL USAGE RESTRICTIONS:**
- NEVER use the Read tool for application source code analysis - ALWAYS delegate to Task agents for examining .js, .ts, .py, .php files and application logic. You MAY use Read
@@ -79,23 +79,20 @@ An **exploitable vulnerability** is a confirmed source-to-sink path where the en
**Available Tools:**
- **Task Agent (Code Analysis):** MANDATORY for all source code analysis and data flow tracing. Use this instead of Read tool for examining application code, models, controllers, and templates.
- **Terminal (curl):** MANDATORY for testing HTTP-based XSS vectors and observing raw HTML responses. Use for reflected XSS testing and JSONP injection testing.
- **{{MCP_SERVER}} (Playwright):** MANDATORY for testing DOM-based XSS and form submission vectors. Use for stored XSS testing and client-side payload execution verification.
- **Browser Automation (playwright-cli skill):** MANDATORY for testing DOM-based XSS and form submission vectors. Invoke the `playwright-cli` skill to learn available commands. Use for stored XSS testing and client-side payload execution verification. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
- **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each sink you need to analyze.
- **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation.
- **Parameters:**
- `deliverable_type`: "XSS_ANALYSIS" or "XSS_QUEUE" (required)
- `file_path`: Path to the file you wrote to disk (preferred for large reports)
- `content`: Inline content string (use only for small content like JSON queues)
- **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
- **Usage:** For analysis reports, write to disk first then call with `file_path`. For JSON queues, you may pass inline `content`. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically.
- **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
- **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
- **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
- **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
- **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
- **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
</available_tools>
</cli_tools>
<data_format_specifications>
<exploitation_queue_format>
Purpose: Defines the structure for a "exploitation queue" saved via the save_deliverable script with type XSS_QUEUE.
Purpose: Defines the structure for a "exploitation queue" saved via the save-deliverable CLI with --type XSS_QUEUE.
Structure: The vulnerability JSON object MUST follow this exact format:
{
"ID": "unique ID for each vulnerability json (XSS-VULN-XX)",
@@ -180,7 +177,7 @@ This rulebook is used for the **Early Termination** check in Step 2.
- Include both safe and vulnerable paths to demonstrate **full coverage**.
- Craft a minimal `witness_payload` that proves control over the render context.
- For every path analyzed, you must document the outcome. The location of the documentation depends on the verdict:
- If the verdict is 'vulnerable', you MUST use the save_deliverable script to save the finding to the exploitation queue, including complete source-to-sink information.
- If the verdict is 'vulnerable', you MUST use the save-deliverable CLI to save the finding to the exploitation queue, including complete source-to-sink information.
- If the verdict is 'safe', you MUST NOT add it to the exploitation queue. Instead, you will document these secure paths in the "Vectors Analyzed and Confirmed Secure" table of your final analysis report.
- For vulnerable findings, craft a minimal witness_payload that proves control over the render context.
@@ -272,13 +269,13 @@ These input vectors were traced and confirmed to have robust, context-appropriat
## Exploitation Queue (MANDATORY)
Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save_deliverable MCP tool.
Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save-deliverable CLI tool.
- **If exploitable vulnerabilities found:**
Use `save_deliverable` MCP tool with `deliverable_type: "XSS_QUEUE"` and `content: {"vulnerabilities": [...]}` with each exploitable XSS vulnerability (verdict: "vulnerable") following the `exploitation_queue_format`. Set `externally_exploitable` to `true` ONLY if exploitable via public internet without internal access.
Run save-deliverable CLI with `--type XSS_QUEUE --content '{"vulnerabilities": [...]}'` with each exploitable XSS vulnerability (verdict: "vulnerable") following the `exploitation_queue_format`. Set `externally_exploitable` to `true` ONLY if exploitable via public internet without internal access.
- **If no exploitable vulnerabilities found:**
Use `save_deliverable` MCP tool with `deliverable_type: "XSS_QUEUE"` and `content: {"vulnerabilities": []}`
Run save-deliverable CLI with `--type XSS_QUEUE --content '{"vulnerabilities": []}'`
- **QUEUE INCLUSION CRITERIA:** ONLY include vulnerabilities where `externally_exploitable = true`. Exclude any vulnerability requiring internal network access, VPN, or direct server access.
@@ -289,14 +286,14 @@ This file is the mandatory handoff to the Exploitation phase.
COMPLETION REQUIREMENTS (ALL must be satisfied):
1. Systematic Analysis: ALL input vectors identified from the reconnaissance deliverable must be analyzed.
2. Deliverable Generation: Both required deliverables must be successfully saved using save_deliverable MCP tool:
2. Deliverable Generation: Both required deliverables must be successfully saved using the save-deliverable CLI tool:
- **CHUNKED WRITING (MANDATORY):**
1. Use the **Write** tool to create `deliverables/xss_analysis_deliverable.md` with the title and first major section
2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
3. Repeat step 2 for all remaining sections
4. Call `save_deliverable` with `deliverable_type: "XSS_ANALYSIS"` and `file_path: "deliverables/xss_analysis_deliverable.md"`
4. Run `save-deliverable` with `--type XSS_ANALYSIS --file-path "deliverables/xss_analysis_deliverable.md"`
**WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.
- Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "XSS_QUEUE"` and `content: {"vulnerabilities": [...]}`
- Exploitation queue: Run save-deliverable CLI with `--type XSS_QUEUE --content '{"vulnerabilities": [...]}'`
ONLY AFTER both systematic analysis AND successful deliverable generation, announce "XSS ANALYSIS COMPLETE" and stop.
@@ -6,26 +6,21 @@
// Production Claude agent execution with retry, git checkpoints, and audit logging
import { fs, path } from 'zx';
import { query } from '@anthropic-ai/claude-agent-sdk';
import { fs, path } from 'zx';
import type { AuditSession } from '../audit/index.js';
import { isRetryableError, PentestError } from '../services/error-handling.js';
import { isSpendingCapBehavior } from '../utils/billing-detection.js';
import { Timer } from '../utils/metrics.js';
import { formatTimestamp } from '../utils/formatting.js';
import { AGENT_VALIDATORS, MCP_AGENT_MAPPING } from '../session-manager.js';
import { AuditSession } from '../audit/index.js';
import { createShannonHelperServer } from '../../mcp-server/dist/index.js';
import { AGENTS } from '../session-manager.js';
import type { AgentName } from '../types/index.js';
import { dispatchMessage } from './message-handlers.js';
import { detectExecutionContext, formatErrorOutput, formatCompletionMessage } from './output-formatters.js';
import { createProgressManager } from './progress-manager.js';
import { createAuditLogger } from './audit-logger.js';
import { getActualModelName } from './router-utils.js';
import { resolveModel, type ModelTier } from './models.js';
import { AGENT_VALIDATORS } from '../session-manager.js';
import type { ActivityLogger } from '../types/activity-logger.js';
import { isSpendingCapBehavior } from '../utils/billing-detection.js';
import { formatTimestamp } from '../utils/formatting.js';
import { Timer } from '../utils/metrics.js';
import { createAuditLogger } from './audit-logger.js';
import { dispatchMessage } from './message-handlers.js';
import { type ModelTier, resolveModel } from './models.js';
import { detectExecutionContext, formatCompletionMessage, formatErrorOutput } from './output-formatters.js';
import { createProgressManager } from './progress-manager.js';
import { getActualModelName } from './router-utils.js';
declare global {
var SHANNON_DISABLE_LOADER: boolean | undefined;
@@ -46,89 +41,6 @@ export interface ClaudePromptResult {
retryable?: boolean | undefined;
}
interface StdioMcpServer {
type: 'stdio';
command: string;
args: string[];
env: Record<string, string>;
}
type McpServer = ReturnType<typeof createShannonHelperServer> | StdioMcpServer;
// Configures MCP servers for agent execution, with Docker-specific Chromium handling
function buildMcpServers(
sourceDir: string,
agentName: string | null,
logger: ActivityLogger
): Record<string, McpServer> {
// 1. Create the shannon-helper server (always present)
const shannonHelperServer = createShannonHelperServer(sourceDir);
const mcpServers: Record<string, McpServer> = {
'shannon-helper': shannonHelperServer,
};
// 2. Look up the agent's Playwright MCP mapping
if (agentName) {
const promptTemplate = AGENTS[agentName as AgentName].promptTemplate;
const playwrightMcpName = MCP_AGENT_MAPPING[promptTemplate as keyof typeof MCP_AGENT_MAPPING] || null;
if (playwrightMcpName) {
logger.info(`Assigned ${agentName} -> ${playwrightMcpName}`);
const userDataDir = `/tmp/${playwrightMcpName}`;
// 3. Configure Playwright MCP args with Docker/local browser handling
const isDocker = process.env.SHANNON_DOCKER === 'true';
const mcpArgs: string[] = [
'@playwright/mcp@0.0.68',
'--isolated',
'--user-data-dir', userDataDir,
];
if (isDocker) {
mcpArgs.push('--executable-path', '/usr/bin/chromium-browser');
mcpArgs.push('--browser', 'chromium');
}
// NOTE: Explicit allowlist — the Playwright MCP subprocess must not inherit
// secrets (API keys, AWS tokens) from the parent process.
const MCP_ENV_ALLOWLIST = [
'PATH', 'HOME', 'NODE_PATH', 'DISPLAY',
'PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH',
] as const;
const envVars: Record<string, string> = {
PLAYWRIGHT_HEADLESS: 'true',
...(isDocker && { PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: '1' }),
};
for (const key of MCP_ENV_ALLOWLIST) {
if (process.env[key]) {
envVars[key] = process.env[key]!;
}
}
for (const [key, value] of Object.entries(process.env)) {
if (key.startsWith('XDG_') && value !== undefined) {
envVars[key] = value;
}
}
mcpServers[playwrightMcpName] = {
type: 'stdio' as const,
command: 'npx',
args: mcpArgs,
env: envVars,
};
}
}
// 4. Return configured servers
return mcpServers;
}
function outputLines(lines: string[]): void {
for (const line of lines) {
console.log(line);
@@ -139,7 +51,7 @@ async function writeErrorLog(
err: Error & { code?: string; status?: number },
sourceDir: string,
fullPrompt: string,
duration: number
duration: number,
): Promise<void> {
try {
const errorLog = {
@@ -150,17 +62,17 @@ async function writeErrorLog(
message: err.message,
code: err.code,
status: err.status,
stack: err.stack
stack: err.stack,
},
context: {
sourceDir,
prompt: fullPrompt.slice(0, 200) + '...',
retryable: isRetryableError(err)
prompt: `${fullPrompt.slice(0, 200)}...`,
retryable: isRetryableError(err),
},
duration
duration,
};
const logPath = path.join(sourceDir, 'error.log');
await fs.appendFile(logPath, JSON.stringify(errorLog) + '\n');
await fs.appendFile(logPath, `${JSON.stringify(errorLog)}\n`);
} catch {
// Best-effort error log writing - don't propagate failures
}
@@ -170,7 +82,7 @@ export async function validateAgentOutput(
result: ClaudePromptResult,
agentName: string | null,
sourceDir: string,
logger: ActivityLogger
logger: ActivityLogger,
): Promise<boolean> {
logger.info(`Validating ${agentName} agent output`);
@@ -202,7 +114,6 @@ export async function validateAgentOutput(
}
return validationResult;
} catch (error) {
const errMsg = error instanceof Error ? error.message : String(error);
logger.error(`Validation failed with error: ${errMsg}`);
@@ -217,10 +128,10 @@ export async function runClaudePrompt(
sourceDir: string,
context: string = '',
description: string = 'Claude analysis',
agentName: string | null = null,
_agentName: string | null = null,
auditSession: AuditSession | null = null,
logger: ActivityLogger,
modelTier: ModelTier = 'medium'
modelTier: ModelTier = 'medium',
): Promise<ClaudePromptResult> {
// 1. Initialize timing and prompt
const timer = new Timer(`agent-${description.toLowerCase().replace(/\s+/g, '-')}`);
@@ -230,16 +141,13 @@ export async function runClaudePrompt(
const execContext = detectExecutionContext(description);
const progress = createProgressManager(
{ description, useCleanOutput: execContext.useCleanOutput },
global.SHANNON_DISABLE_LOADER ?? false
global.SHANNON_DISABLE_LOADER ?? false,
);
const auditLogger = createAuditLogger(auditSession);
logger.info(`Running Claude Code: ${description}...`);
// 3. Configure MCP servers
const mcpServers = buildMcpServers(sourceDir, agentName, logger);
// 4. Build env vars to pass to SDK subprocesses
// 3. Build env vars to pass to SDK subprocesses
const sdkEnv: Record<string, string> = {
CLAUDE_CODE_MAX_OUTPUT_TOKENS: process.env.CLAUDE_CODE_MAX_OUTPUT_TOKENS || '64000',
};
@@ -258,21 +166,25 @@ export async function runClaudePrompt(
'ANTHROPIC_SMALL_MODEL',
'ANTHROPIC_MEDIUM_MODEL',
'ANTHROPIC_LARGE_MODEL',
'HOME',
'PATH',
'PLAYWRIGHT_MCP_EXECUTABLE_PATH',
];
for (const name of passthroughVars) {
if (process.env[name]) {
sdkEnv[name] = process.env[name]!;
const val = process.env[name];
if (val) {
sdkEnv[name] = val;
}
}
// 5. Configure SDK options
// 4. Configure SDK options
const options = {
model: resolveModel(modelTier),
maxTurns: 10_000,
cwd: sourceDir,
permissionMode: 'bypassPermissions' as const,
allowDangerouslySkipPermissions: true,
mcpServers,
settingSources: ['user'] as ('user' | 'project' | 'local')[],
env: sdkEnv,
};
@@ -293,7 +205,7 @@ export async function runClaudePrompt(
fullPrompt,
options,
{ execContext, description, progress, auditLogger, logger },
timer
timer,
);
turnCount = messageLoopResult.turnCount;
@@ -309,7 +221,7 @@ export async function runClaudePrompt(
throw new PentestError(
`Spending cap likely reached (turns=${turnCount}, cost=$0): ${result?.slice(0, 100)}`,
'billing',
true // Retryable - Temporal will use 5-30 min backoff
true, // Retryable - Temporal will use 5-30 min backoff
);
}
@@ -330,9 +242,8 @@ export async function runClaudePrompt(
cost: totalCost,
model,
partialCost: totalCost,
apiErrorDetected
apiErrorDetected,
};
} catch (error) {
// 9. Handle errors — log, write error file, return failure
const duration = timer.stop();
@@ -347,16 +258,15 @@ export async function runClaudePrompt(
return {
error: err.message,
errorType: err.constructor.name,
prompt: fullPrompt.slice(0, 100) + '...',
prompt: `${fullPrompt.slice(0, 100)}...`,
success: false,
duration,
cost: totalCost,
retryable: isRetryableError(err)
retryable: isRetryableError(err),
};
}
}
interface MessageLoopResult {
turnCount: number;
result: string | null;
@@ -377,7 +287,7 @@ async function processMessageStream(
fullPrompt: string,
options: NonNullable<Parameters<typeof query>[0]['options']>,
deps: MessageLoopDeps,
timer: Timer
timer: Timer,
): Promise<MessageLoopResult> {
const { execContext, description, progress, auditLogger, logger } = deps;
const HEARTBEAT_INTERVAL = 30000;
@@ -402,11 +312,13 @@ async function processMessageStream(
turnCount++;
}
const dispatchResult = await dispatchMessage(
message as { type: string; subtype?: string },
turnCount,
{ execContext, description, progress, auditLogger, logger }
);
const dispatchResult = await dispatchMessage(message as { type: string; subtype?: string }, turnCount, {
execContext,
description,
progress,
auditLogger,
logger,
});
if (dispatchResult.type === 'throw') {
throw dispatchResult.error;
@@ -4,35 +4,35 @@
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
import type { SDKAssistantMessageError } from '@anthropic-ai/claude-agent-sdk';
import { PentestError } from '../services/error-handling.js';
import type { ActivityLogger } from '../types/activity-logger.js';
import { ErrorCode } from '../types/errors.js';
import { matchesBillingTextPattern } from '../utils/billing-detection.js';
import { filterJsonToolCalls } from './output-formatters.js';
import { formatTimestamp } from '../utils/formatting.js';
import { getActualModelName } from './router-utils.js';
import type { ActivityLogger } from '../types/activity-logger.js';
import type { AuditLogger } from './audit-logger.js';
import {
filterJsonToolCalls,
formatAssistantOutput,
formatResultOutput,
formatToolUseOutput,
formatToolResultOutput,
formatToolUseOutput,
} from './output-formatters.js';
import type { AuditLogger } from './audit-logger.js';
import type { ProgressManager } from './progress-manager.js';
import type { SDKAssistantMessageError } from '@anthropic-ai/claude-agent-sdk';
import { getActualModelName } from './router-utils.js';
import type {
AssistantMessage,
ResultMessage,
ToolUseMessage,
ToolResultMessage,
AssistantResult,
ResultData,
ToolUseData,
ToolResultData,
ApiErrorDetection,
AssistantMessage,
AssistantResult,
ContentBlock,
SystemInitMessage,
ExecutionContext,
ResultData,
ResultMessage,
SystemInitMessage,
ToolResultData,
ToolResultMessage,
ToolUseData,
ToolUseMessage,
} from './types.js';
// Handles both array and string content formats from SDK
@@ -40,9 +40,7 @@ function extractMessageContent(message: AssistantMessage): string {
const messageContent = message.message;
if (Array.isArray(messageContent.content)) {
return messageContent.content
.map((c: ContentBlock) => c.text || JSON.stringify(c))
.join('\n');
return messageContent.content.map((c: ContentBlock) => c.text || JSON.stringify(c)).join('\n');
}
return String(messageContent.content);
@@ -81,7 +79,7 @@ function detectApiError(content: string): ApiErrorDetection {
'billing',
true, // RETRYABLE - Temporal will use 5-30 min backoff
{},
ErrorCode.SPENDING_CAP_REACHED
ErrorCode.SPENDING_CAP_REACHED,
),
};
}
@@ -104,10 +102,7 @@ function detectApiError(content: string): ApiErrorDetection {
}
// Maps SDK structured error types to our error handling.
function handleStructuredError(
errorType: SDKAssistantMessageError,
content: string
): ApiErrorDetection {
function handleStructuredError(errorType: SDKAssistantMessageError, content: string): ApiErrorDetection {
switch (errorType) {
case 'billing_error':
return {
@@ -117,7 +112,7 @@ function handleStructuredError(
'billing',
true, // Retryable with backoff
{},
ErrorCode.INSUFFICIENT_CREDITS
ErrorCode.INSUFFICIENT_CREDITS,
),
};
case 'rate_limit':
@@ -128,7 +123,7 @@ function handleStructuredError(
'network',
true, // Retryable with backoff
{},
ErrorCode.API_RATE_LIMITED
ErrorCode.API_RATE_LIMITED,
),
};
case 'authentication_failed':
@@ -137,7 +132,7 @@ function handleStructuredError(
shouldThrow: new PentestError(
`Authentication failed: ${content.slice(0, 100)}`,
'config',
false // Not retryable - needs API key fix
false, // Not retryable - needs API key fix
),
};
case 'server_error':
@@ -146,7 +141,7 @@ function handleStructuredError(
shouldThrow: new PentestError(
`Server error (structured): ${content.slice(0, 100)}`,
'network',
true // Retryable
true, // Retryable
),
};
case 'invalid_request':
@@ -155,7 +150,7 @@ function handleStructuredError(
shouldThrow: new PentestError(
`Invalid request: ${content.slice(0, 100)}`,
'config',
false // Not retryable - needs code fix
false, // Not retryable - needs code fix
),
};
case 'max_output_tokens':
@@ -164,19 +159,15 @@ function handleStructuredError(
shouldThrow: new PentestError(
`Max output tokens reached: ${content.slice(0, 100)}`,
'billing',
true // Retryable - may succeed with different content
true, // Retryable - may succeed with different content
),
};
case 'unknown':
default:
return { detected: true };
}
}
function handleAssistantMessage(
message: AssistantMessage,
turnCount: number
): AssistantResult {
function handleAssistantMessage(message: AssistantMessage, turnCount: number): AssistantResult {
const content = extractMessageContent(message);
const cleanedContent = filterJsonToolCalls(content);
@@ -246,8 +237,7 @@ function handleToolUseMessage(message: ToolUseMessage): ToolUseData {
// Truncates long results for display (500 char limit), preserves full content for logging
function handleToolResultMessage(message: ToolResultMessage): ToolResultData {
const content = message.content;
const contentStr =
typeof content === 'string' ? content : JSON.stringify(content, null, 2);
const contentStr = typeof content === 'string' ? content : JSON.stringify(content, null, 2);
const displayContent =
contentStr.length > 500
@@ -284,7 +274,7 @@ export interface MessageDispatchDeps {
export async function dispatchMessage(
message: { type: string; subtype?: string },
turnCount: number,
deps: MessageDispatchDeps
deps: MessageDispatchDeps,
): Promise<MessageDispatchAction> {
const { execContext, description, progress, auditLogger, logger } = deps;
@@ -298,12 +288,7 @@ export async function dispatchMessage(
if (assistantResult.cleanedContent.trim()) {
progress.stop();
outputLines(formatAssistantOutput(
assistantResult.cleanedContent,
execContext,
turnCount,
description
));
outputLines(formatAssistantOutput(assistantResult.cleanedContent, execContext, turnCount, description));
progress.start();
}
@@ -323,10 +308,6 @@ export async function dispatchMessage(
const actualModel = getActualModelName(initMsg.model);
if (!execContext.useCleanOutput) {
logger.info(`Model: ${actualModel}, Permission: ${initMsg.permissionMode}`);
if (initMsg.mcp_servers && initMsg.mcp_servers.length > 0) {
const mcpStatus = initMsg.mcp_servers.map(s => `${s.name}(${s.status})`).join(', ');
logger.info(`MCP: ${mcpStatus}`);
}
}
// Return actual model for tracking in audit logs
return { type: 'continue', model: actualModel };
@@ -4,8 +4,8 @@
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
import { extractAgentType, formatDuration } from '../utils/formatting.js';
import { AGENTS } from '../session-manager.js';
import { extractAgentType, formatDuration } from '../utils/formatting.js';
import type { ExecutionContext, ResultData } from './types.js';
interface ToolCallInput {
@@ -16,6 +16,7 @@ interface ToolCallInput {
text?: string;
action?: string;
description?: string;
command?: string;
todos?: Array<{
status: string;
content: string;
@@ -76,6 +77,80 @@ function extractDomain(url: string): string {
}
}
/**
* Format playwright-cli commands into clean progress indicators
*/
function formatBrowserAction(command: string): string | null {
// Extract subcommand after optional session flag (e.g., "playwright-cli -s=session1 navigate https://example.com")
const match = command.match(/playwright-cli\s+(?:-s=\S+\s+)?(\S+)(?:\s+(.*))?/);
if (!match) return null;
const subcommand = match[1];
const args = match[2] || '';
switch (subcommand) {
case 'open':
case 'goto': {
const domain = args.trim() ? extractDomain(args.trim()) : '';
return domain ? `🌐 Navigating to ${domain}` : '🌐 Opening browser';
}
case 'go-back':
return '⬅️ Going back';
case 'go-forward':
return '➡️ Going forward';
case 'reload':
return '🔄 Reloading page';
case 'click':
case 'dblclick':
return `🖱️ Clicking ${(args || 'element').slice(0, 25)}`;
case 'hover':
return `👆 Hovering over ${(args || 'element').slice(0, 20)}`;
case 'type':
return `⌨️ Typing ${(args || 'text').slice(0, 20)}`;
case 'press':
case 'keydown':
case 'keyup':
return `⌨️ Pressing ${args || 'key'}`;
case 'fill':
return `📝 Filling ${(args || 'field').slice(0, 25)}`;
case 'select':
return '📋 Selecting dropdown option';
case 'check':
case 'uncheck':
return `☑️ ${subcommand === 'check' ? 'Checking' : 'Unchecking'} ${(args || 'element').slice(0, 20)}`;
case 'upload':
return '📁 Uploading file';
case 'drag':
return '🖱️ Dragging element';
case 'snapshot':
return '📸 Taking page snapshot';
case 'screenshot':
return '📸 Taking screenshot';
case 'eval':
case 'run-code':
return '🔍 Running JavaScript analysis';
case 'console':
return '📜 Checking console logs';
case 'network':
return '🌐 Analyzing network traffic';
case 'tab-list':
case 'tab-new':
case 'tab-close':
case 'tab-select':
return `🗂️ ${subcommand.replace('tab-', '')} browser tab`;
case 'dialog-accept':
return '💬 Accepting dialog';
case 'dialog-dismiss':
return '💬 Dismissing dialog';
case 'pdf':
return '📄 Saving page as PDF';
case 'resize':
return `🖥️ Resizing browser ${args || ''}`.trim();
default:
return `🌐 Browser: ${subcommand}`;
}
}
/**
* Summarize TodoWrite updates into clean progress indicators
*/
@@ -89,118 +164,20 @@ function summarizeTodoUpdate(input: ToolCallInput | undefined): string | null {
const inProgress = todos.filter((t) => t.status === 'in_progress');
// Show recently completed tasks
if (completed.length > 0) {
const recent = completed[completed.length - 1]!;
const recent = completed.at(-1);
if (recent) {
return `${recent.content}`;
}
// Show current in-progress task
if (inProgress.length > 0) {
const current = inProgress[0]!;
const current = inProgress.at(0);
if (current) {
return `🔄 ${current.content}`;
}
return null;
}
/**
* Format browser tool calls into clean progress indicators
*/
function formatBrowserAction(toolCall: ToolCall): string {
const toolName = toolCall.name;
const input = toolCall.input || {};
// Core Browser Operations
if (toolName === 'mcp__playwright__browser_navigate') {
const url = input.url || '';
const domain = extractDomain(url);
return `🌐 Navigating to ${domain}`;
}
if (toolName === 'mcp__playwright__browser_navigate_back') {
return `⬅️ Going back`;
}
// Page Interaction
if (toolName === 'mcp__playwright__browser_click') {
const element = input.element || 'element';
return `🖱️ Clicking ${element.slice(0, 25)}`;
}
if (toolName === 'mcp__playwright__browser_hover') {
const element = input.element || 'element';
return `👆 Hovering over ${element.slice(0, 20)}`;
}
if (toolName === 'mcp__playwright__browser_type') {
const element = input.element || 'field';
return `⌨️ Typing in ${element.slice(0, 20)}`;
}
if (toolName === 'mcp__playwright__browser_press_key') {
const key = input.key || 'key';
return `⌨️ Pressing ${key}`;
}
// Form Handling
if (toolName === 'mcp__playwright__browser_fill_form') {
const fieldCount = input.fields?.length || 0;
return `📝 Filling ${fieldCount} form fields`;
}
if (toolName === 'mcp__playwright__browser_select_option') {
return `📋 Selecting dropdown option`;
}
if (toolName === 'mcp__playwright__browser_file_upload') {
return `📁 Uploading file`;
}
// Page Analysis
if (toolName === 'mcp__playwright__browser_snapshot') {
return `📸 Taking page snapshot`;
}
if (toolName === 'mcp__playwright__browser_take_screenshot') {
return `📸 Taking screenshot`;
}
if (toolName === 'mcp__playwright__browser_evaluate') {
return `🔍 Running JavaScript analysis`;
}
// Waiting & Monitoring
if (toolName === 'mcp__playwright__browser_wait_for') {
if (input.text) {
return `⏳ Waiting for "${input.text.slice(0, 20)}"`;
}
return `⏳ Waiting for page response`;
}
if (toolName === 'mcp__playwright__browser_console_messages') {
return `📜 Checking console logs`;
}
if (toolName === 'mcp__playwright__browser_network_requests') {
return `🌐 Analyzing network traffic`;
}
// Tab Management
if (toolName === 'mcp__playwright__browser_tabs') {
const action = input.action || 'managing';
return `🗂️ ${action} browser tab`;
}
// Dialog Handling
if (toolName === 'mcp__playwright__browser_handle_dialog') {
return `💬 Handling browser dialog`;
}
// Fallback for any missed tools
const actionType = toolName.split('_').pop();
return `🌐 Browser: ${actionType}`;
}
/**
* Filter out JSON tool calls from content, with special handling for Task calls
*/
@@ -241,17 +218,16 @@ export function filterJsonToolCalls(content: string | null | undefined): string
continue;
}
// Special handling for browser tool calls
if (toolCall.name.startsWith('mcp__playwright__browser_')) {
const browserAction = formatBrowserAction(toolCall);
if (browserAction) {
processedLines.push(browserAction);
// Special handling for browser tool calls (playwright-cli via Bash)
if (toolCall.name === 'Bash') {
const command = toolCall.input?.command || '';
if (command.includes('playwright-cli')) {
const browserAction = formatBrowserAction(command);
if (browserAction) {
processedLines.push(browserAction);
}
}
continue;
}
// Hide all other tool calls (Read, Write, Grep, etc.)
continue;
} catch {
// If JSON parsing fails, treat as regular text
processedLines.push(line);
@@ -266,8 +242,7 @@ export function filterJsonToolCalls(content: string | null | undefined): string
}
export function detectExecutionContext(description: string): ExecutionContext {
const isParallelExecution =
description.includes('vuln agent') || description.includes('exploit agent');
const isParallelExecution = description.includes('vuln agent') || description.includes('exploit agent');
const useCleanOutput =
description.includes('Pre-recon agent') ||
@@ -287,7 +262,7 @@ export function formatAssistantOutput(
cleanedContent: string,
context: ExecutionContext,
turnCount: number,
description: string
description: string,
): string[] {
if (!cleanedContent.trim()) {
return [];
@@ -341,7 +316,7 @@ export function formatErrorOutput(
description: string,
duration: number,
sourceDir: string,
isRetryable: boolean
isRetryable: boolean,
): string[] {
const lines: string[] = [];
@@ -374,7 +349,7 @@ export function formatCompletionMessage(
context: ExecutionContext,
description: string,
turnCount: number,
duration: number
duration: number,
): string {
if (context.isParallelExecution) {
const prefix = getAgentPrefix(description);
@@ -388,10 +363,7 @@ export function formatCompletionMessage(
return ` Claude Code completed: ${description} (${turnCount} turns) in ${formatDuration(duration)}`;
}
export function formatToolUseOutput(
toolName: string,
input: Record<string, unknown> | undefined
): string[] {
export function formatToolUseOutput(toolName: string, input: Record<string, unknown> | undefined): string[] {
const lines: string[] = [];
lines.push(`\n Using Tool: ${toolName}`);
@@ -63,10 +63,7 @@ class NullProgressManager implements ProgressManager {
}
// Returns no-op when disabled
export function createProgressManager(
context: ProgressContext,
disableLoader: boolean
): ProgressManager {
export function createProgressManager(context: ProgressContext, disableLoader: boolean): ProgressManager {
if (!context.useCleanOutput || disableLoader) {
return new NullProgressManager();
}
@@ -25,4 +25,3 @@ export function getActualModelName(sdkReportedModel?: string): string | undefine
// Fall back to SDK-reported model
return sdkReportedModel;
}
@@ -53,7 +53,6 @@ export interface ContentBlock {
text?: string;
}
export interface AssistantMessage {
type: 'assistant';
error?: SDKAssistantMessageError;
@@ -93,10 +92,8 @@ export interface SystemInitMessage {
subtype: 'init';
model?: string;
permissionMode?: string;
mcp_servers?: Array<{ name: string; status: string }>;
}
export interface UserMessage {
type: 'user';
}
@@ -11,15 +11,15 @@
* crash-safe audit logging.
*/
import { AgentLogger } from './logger.js';
import { WorkflowLogger, type AgentLogDetails, type WorkflowSummary } from './workflow-logger.js';
import { MetricsTracker } from './metrics-tracker.js';
import { initializeAuditStructure, type SessionMetadata } from './utils.js';
import { formatTimestamp } from '../utils/formatting.js';
import { SessionMutex } from '../utils/concurrency.js';
import type { AgentEndResult } from '../types/index.js';
import { PentestError } from '../services/error-handling.js';
import { ErrorCode } from '../types/errors.js';
import type { AgentEndResult } from '../types/index.js';
import { SessionMutex } from '../utils/concurrency.js';
import { formatTimestamp } from '../utils/formatting.js';
import { AgentLogger } from './logger.js';
import { MetricsTracker } from './metrics-tracker.js';
import { initializeAuditStructure, type SessionMetadata } from './utils.js';
import { type AgentLogDetails, WorkflowLogger, type WorkflowSummary } from './workflow-logger.js';
// Global mutex instance
const sessionMutex = new SessionMutex();
@@ -47,7 +47,7 @@ export class AuditSession {
'config',
false,
{ field: 'sessionMetadata.id' },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
if (!this.sessionMetadata.webUrl) {
@@ -56,7 +56,7 @@ export class AuditSession {
'config',
false,
{ field: 'sessionMetadata.webUrl' },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
@@ -82,8 +82,8 @@ export class AuditSession {
// Initialize metrics tracker (loads or creates session.json)
await this.metricsTracker.initialize(workflowId);
// Initialize workflow logger
await this.workflowLogger.initialize();
// Initialize workflow logger with actual Temporal workflow ID
await this.workflowLogger.initialize(workflowId);
this.initialized = true;
}
@@ -100,11 +100,7 @@ export class AuditSession {
/**
* Start agent execution
*/
async startAgent(
agentName: string,
promptContent: string,
attemptNumber: number = 1
): Promise<void> {
async startAgent(agentName: string, promptContent: string, attemptNumber: number = 1): Promise<void> {
await this.ensureInitialized();
// 1. Save prompt snapshot (only on first attempt)
@@ -140,7 +136,7 @@ export class AuditSession {
'validation',
false,
{},
ErrorCode.AGENT_EXECUTION_FAILED
ErrorCode.AGENT_EXECUTION_FAILED,
);
}
@@ -152,18 +148,10 @@ export class AuditSession {
const agentName = this.currentAgentName || 'unknown';
switch (eventType) {
case 'tool_start':
await this.workflowLogger.logToolStart(
agentName,
String(data.toolName || ''),
data.parameters
);
await this.workflowLogger.logToolStart(agentName, String(data.toolName || ''), data.parameters);
break;
case 'llm_response':
await this.workflowLogger.logLlmResponse(
agentName,
Number(data.turn || 0),
String(data.content || '')
);
await this.workflowLogger.logLlmResponse(agentName, Number(data.turn || 0), String(data.content || ''));
break;
// tool_end and error events are intentionally not logged to workflow log
// to reduce noise - the agent completion message captures the outcome
@@ -266,11 +254,7 @@ export class AuditSession {
* @param terminatedWorkflows - IDs of workflows that were terminated
* @param checkpointHash - Git checkpoint hash that was restored
*/
async addResumeAttempt(
workflowId: string,
terminatedWorkflows: string[],
checkpointHash?: string
): Promise<void> {
async addResumeAttempt(workflowId: string, terminatedWorkflows: string[], checkpointHash?: string): Promise<void> {
await this.ensureInitialized();
const unlock = await sessionMutex.lock(this.sessionId);
@@ -12,8 +12,8 @@
* and proper cleanup.
*/
import fs from 'fs';
import path from 'path';
import fs from 'node:fs';
import path from 'node:path';
import { ensureDirectory } from '../utils/file-io.js';
/**
@@ -103,7 +103,7 @@ export class LogStream {
}
return new Promise((resolve) => {
this.stream!.end(() => {
this.stream?.end(() => {
this._isOpen = false;
this.stream = null;
resolve();
@@ -11,14 +11,10 @@
* Uses LogStream for stream management with backpressure handling.
*/
import {
generateLogPath,
generatePromptPath,
type SessionMetadata,
} from './utils.js';
import { atomicWrite } from '../utils/file-io.js';
import { formatTimestamp } from '../utils/formatting.js';
import { LogStream } from './log-stream.js';
import { generateLogPath, generatePromptPath, type SessionMetadata } from './utils.js';
interface LogEvent {
type: string;
@@ -103,11 +99,7 @@ export class AgentLogger {
* Save prompt snapshot to prompts directory
* Static method - doesn't require logger instance
*/
static async savePrompt(
sessionMetadata: SessionMetadata,
agentName: string,
promptContent: string
): Promise<void> {
static async savePrompt(sessionMetadata: SessionMetadata, agentName: string, promptContent: string): Promise<void> {
const promptPath = generatePromptPath(sessionMetadata, agentName);
// Create header with metadata
@@ -11,16 +11,13 @@
* Tracks attempt-level data for complete forensic trail.
*/
import {
generateSessionJsonPath,
type SessionMetadata,
} from './utils.js';
import { atomicWrite, readJson, fileExists } from '../utils/file-io.js';
import { formatTimestamp, calculatePercentage } from '../utils/formatting.js';
import { AGENT_PHASE_MAP, type PhaseName } from '../session-manager.js';
import { PentestError } from '../services/error-handling.js';
import { AGENT_PHASE_MAP, type PhaseName } from '../session-manager.js';
import { ErrorCode } from '../types/errors.js';
import type { AgentName, AgentEndResult } from '../types/index.js';
import type { AgentEndResult, AgentName } from '../types/index.js';
import { atomicWrite, fileExists, readJson } from '../utils/file-io.js';
import { calculatePercentage, formatTimestamp } from '../utils/formatting.js';
import { generateSessionJsonPath, type SessionMetadata } from './utils.js';
interface AttemptData {
attempt_number: number;
@@ -166,7 +163,7 @@ export class MetricsTracker {
'validation',
false,
{},
ErrorCode.AGENT_EXECUTION_FAILED
ErrorCode.AGENT_EXECUTION_FAILED,
);
}
@@ -254,18 +251,14 @@ export class MetricsTracker {
* @param terminatedWorkflows - IDs of workflows that were terminated
* @param checkpointHash - Git checkpoint hash that was restored
*/
async addResumeAttempt(
workflowId: string,
terminatedWorkflows: string[],
checkpointHash?: string
): Promise<void> {
async addResumeAttempt(workflowId: string, terminatedWorkflows: string[], checkpointHash?: string): Promise<void> {
if (!this.data) {
throw new PentestError(
'MetricsTracker not initialized',
'validation',
false,
{},
ErrorCode.AGENT_EXECUTION_FAILED
ErrorCode.AGENT_EXECUTION_FAILED,
);
}
@@ -307,15 +300,10 @@ export class MetricsTracker {
const agents = this.data.metrics.agents;
// Only count successful agents
const successfulAgents = Object.entries(agents).filter(
([, data]) => data.status === 'success'
);
const successfulAgents = Object.entries(agents).filter(([, data]) => data.status === 'success');
// Calculate total duration and cost
const totalDuration = successfulAgents.reduce(
(sum, [, data]) => sum + data.final_duration_ms,
0
);
const totalDuration = successfulAgents.reduce((sum, [, data]) => sum + data.final_duration_ms, 0);
const totalCost = successfulAgents.reduce((sum, [, data]) => sum + data.total_cost_usd, 0);
@@ -329,15 +317,13 @@ export class MetricsTracker {
/**
* Calculate phase-level metrics
*/
private calculatePhaseMetrics(
successfulAgents: Array<[string, AgentAuditMetrics]>
): Record<string, PhaseMetrics> {
private calculatePhaseMetrics(successfulAgents: Array<[string, AgentAuditMetrics]>): Record<string, PhaseMetrics> {
const phases: Record<PhaseName, AgentAuditMetrics[]> = {
'pre-recon': [],
'recon': [],
recon: [],
'vulnerability-analysis': [],
'exploitation': [],
'reporting': [],
exploitation: [],
reporting: [],
};
// Group agents by phase using imported AGENT_PHASE_MAP
@@ -350,6 +336,7 @@ export class MetricsTracker {
// Calculate metrics per phase
const phaseMetrics: Record<string, PhaseMetrics> = {};
// biome-ignore lint/style/noNonNullAssertion: called from recalculateAggregations which guards this.data
const totalDuration = this.data!.metrics.total_duration_ms;
for (const [phaseName, agentList] of Object.entries(phases)) {
@@ -11,22 +11,15 @@
* All functions are pure and crash-safe.
*/
import fs from 'fs/promises';
import path from 'path';
import { fileURLToPath } from 'url';
import fs from 'node:fs/promises';
import path from 'node:path';
import { WORKSPACES_DIR } from '../paths.js';
import { ensureDirectory } from '../utils/file-io.js';
export type { SessionMetadata } from '../types/audit.js';
import type { SessionMetadata } from '../types/audit.js';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
// Get Shannon repository root
const SHANNON_ROOT = path.resolve(__dirname, '..', '..');
const AUDIT_LOGS_DIR = path.join(SHANNON_ROOT, 'audit-logs');
/**
* Extract and sanitize hostname from URL for use in identifiers
*/
@@ -44,11 +37,11 @@ export function generateSessionIdentifier(sessionMetadata: SessionMetadata): str
/**
* Generate path to audit log directory for a session
* Uses custom outputPath if provided, otherwise defaults to AUDIT_LOGS_DIR
* Uses custom outputPath if provided, otherwise defaults to WORKSPACES_DIR
*/
export function generateAuditPath(sessionMetadata: SessionMetadata): string {
const sessionIdentifier = generateSessionIdentifier(sessionMetadata);
const baseDir = sessionMetadata.outputPath || AUDIT_LOGS_DIR;
const baseDir = sessionMetadata.outputPath || WORKSPACES_DIR;
return path.join(baseDir, sessionIdentifier);
}
@@ -59,7 +52,7 @@ export function generateLogPath(
sessionMetadata: SessionMetadata,
agentName: string,
timestamp: number,
attemptNumber: number
attemptNumber: number,
): string {
const auditPath = generateAuditPath(sessionMetadata);
const filename = `${timestamp}_${agentName}_attempt-${attemptNumber}.log`;
@@ -92,7 +85,7 @@ export function generateWorkflowLogPath(sessionMetadata: SessionMetadata): strin
/**
* Initialize audit directory structure for a session
* Creates: audit-logs/{sessionId}/, agents/, prompts/, deliverables/
* Creates: workspaces/{sessionId}/, agents/, prompts/, deliverables/
*/
export async function initializeAuditStructure(sessionMetadata: SessionMetadata): Promise<void> {
const auditPath = generateAuditPath(sessionMetadata);
@@ -107,13 +100,10 @@ export async function initializeAuditStructure(sessionMetadata: SessionMetadata)
}
/**
* Copy deliverable files from repo to audit-logs for self-contained audit trail.
* Copy deliverable files from repo to workspaces for self-contained audit trail.
* No-ops if source directory doesn't exist. Idempotent and parallel-safe.
*/
export async function copyDeliverablesToAudit(
sessionMetadata: SessionMetadata,
repoPath: string
): Promise<void> {
export async function copyDeliverablesToAudit(sessionMetadata: SessionMetadata, repoPath: string): Promise<void> {
const sourceDir = path.join(repoPath, 'deliverables');
const destDir = path.join(generateAuditPath(sessionMetadata), 'deliverables');
@@ -11,10 +11,10 @@
* Optimized for `tail -f` viewing during concurrent workflow execution.
*/
import fs from 'fs/promises';
import { generateWorkflowLogPath, type SessionMetadata } from './utils.js';
import fs from 'node:fs/promises';
import { formatDuration, formatTimestamp } from '../utils/formatting.js';
import { LogStream } from './log-stream.js';
import { generateWorkflowLogPath, type SessionMetadata } from './utils.js';
export interface AgentLogDetails {
attemptNumber?: number;
@@ -44,6 +44,7 @@ export interface WorkflowSummary {
export class WorkflowLogger {
private readonly sessionMetadata: SessionMetadata;
private readonly logStream: LogStream;
private workflowId: string | undefined;
constructor(sessionMetadata: SessionMetadata) {
this.sessionMetadata = sessionMetadata;
@@ -54,7 +55,11 @@ export class WorkflowLogger {
/**
* Initialize the log stream (creates file and writes header)
*/
async initialize(): Promise<void> {
async initialize(workflowId?: string): Promise<void> {
if (workflowId) {
this.workflowId = workflowId;
}
if (this.logStream.isOpen) {
return;
}
@@ -76,7 +81,7 @@ export class WorkflowLogger {
`================================================================================`,
`Shannon Pentest - Workflow Log`,
`================================================================================`,
`Workflow ID: ${this.sessionMetadata.id}`,
`Workflow ID: ${this.workflowId ?? this.sessionMetadata.id}`,
`Target URL: ${this.sessionMetadata.webUrl}`,
`Started: ${formatTimestamp()}`,
`================================================================================`,
@@ -142,11 +147,7 @@ export class WorkflowLogger {
/**
* Log an agent event
*/
async logAgent(
agentName: string,
event: 'start' | 'end',
details?: AgentLogDetails
): Promise<void> {
async logAgent(agentName: string, event: 'start' | 'end', details?: AgentLogDetails): Promise<void> {
await this.ensureInitialized();
let message: string;
@@ -155,7 +156,7 @@ export class WorkflowLogger {
const attempt = details?.attemptNumber ?? 1;
message = `${agentName}: Starting (attempt ${attempt})`;
} else {
const parts: string[] = [agentName + ':'];
const parts: string[] = [`${agentName}:`];
if (details?.success === false) {
parts.push('Failed');
@@ -208,7 +209,7 @@ export class WorkflowLogger {
*/
private truncate(str: string, maxLen: number): string {
if (str.length <= maxLen) return str;
return str.slice(0, maxLen - 3) + '...';
return `${str.slice(0, maxLen - 3)}...`;
}
/**
@@ -259,22 +260,6 @@ export class WorkflowLogger {
return String(p.url);
}
break;
case 'mcp__playwright__browser_navigate':
if (p.url) {
return String(p.url);
}
break;
case 'mcp__playwright__browser_click':
if (p.selector) {
return this.truncate(String(p.selector), 60);
}
break;
case 'mcp__playwright__browser_type':
if (p.selector) {
const text = p.text ? `: "${this.truncate(String(p.text), 30)}"` : '';
return `${this.truncate(String(p.selector), 40)}${text}`;
}
break;
}
// Default: show first string-valued param truncated
@@ -322,11 +307,9 @@ export class WorkflowLogger {
const label = 'Error: ';
const indent = ' '.repeat(label.length);
const lines = segments.map((segment, i) =>
i === 0 ? `${label}${segment.trim()}` : `${indent}${segment.trim()}`
);
const lines = segments.map((segment, i) => (i === 0 ? `${label}${segment.trim()}` : `${indent}${segment.trim()}`));
return lines.join('\n') + '\n';
return `${lines.join('\n')}\n`;
}
/**
@@ -337,35 +320,40 @@ export class WorkflowLogger {
const status = summary.status === 'completed' ? 'COMPLETED' : 'FAILED';
await this.logStream.write('\n');
await this.logStream.write(`================================================================================\n`);
await this.logStream.write(`Workflow ${status}\n`);
await this.logStream.write(`────────────────────────────────────────\n`);
await this.logStream.write(`Workflow ID: ${this.sessionMetadata.id}\n`);
await this.logStream.write(`Status: ${summary.status}\n`);
await this.logStream.write(`Duration: ${formatDuration(summary.totalDurationMs)}\n`);
await this.logStream.write(`Total Cost: $${summary.totalCostUsd.toFixed(4)}\n`);
await this.logStream.write(`Agents: ${summary.completedAgents.length} completed\n`);
const lines: string[] = [
'',
'================================================================================',
`Workflow ${status}`,
'────────────────────────────────────────',
`Workflow ID: ${this.workflowId ?? this.sessionMetadata.id}`,
`Status: ${summary.status}`,
`Duration: ${formatDuration(summary.totalDurationMs)}`,
`Total Cost: $${summary.totalCostUsd.toFixed(4)}`,
`Agents: ${summary.completedAgents.length} completed`,
];
if (summary.error) {
await this.logStream.write(this.formatErrorBlock(summary.error));
lines.push(this.formatErrorBlock(summary.error).trimEnd());
}
await this.logStream.write(`\n`);
await this.logStream.write(`Agent Breakdown:\n`);
lines.push('');
lines.push('Agent Breakdown:');
for (const agentName of summary.completedAgents) {
const metrics = summary.agentMetrics[agentName];
if (metrics) {
const duration = formatDuration(metrics.durationMs);
const cost = metrics.costUsd !== null ? `$${metrics.costUsd.toFixed(4)}` : 'N/A';
await this.logStream.write(` - ${agentName} (${duration}, ${cost})\n`);
lines.push(` - ${agentName} (${duration}, ${cost})`);
} else {
await this.logStream.write(` - ${agentName}\n`);
lines.push(` - ${agentName}`);
}
}
await this.logStream.write(`================================================================================\n`);
lines.push('================================================================================');
// Single atomic write to prevent interleaved/duplicate output in log tailers
await this.logStream.write(`${lines.join('\n')}\n`);
}
/**
@@ -4,19 +4,14 @@
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
import { createRequire } from 'module';
import { fs } from 'zx';
import yaml from 'js-yaml';
import { Ajv, type ValidateFunction, type ErrorObject } from 'ajv';
import { createRequire } from 'node:module';
import { Ajv, type ErrorObject, type ValidateFunction } from 'ajv';
import type { FormatsPlugin } from 'ajv-formats';
import yaml from 'js-yaml';
import { fs } from 'zx';
import { PentestError } from './services/error-handling.js';
import type { Authentication, Config, DistributedConfig, Rule } from './types/config.js';
import { ErrorCode } from './types/errors.js';
import type {
Config,
Rule,
Authentication,
DistributedConfig,
} from './types/config.js';
// Handle ESM/CJS interop for ajv-formats using require
const require = createRequire(import.meta.url);
@@ -35,12 +30,10 @@ try {
validateSchema = ajv.compile(configSchema);
} catch (error) {
const errMsg = error instanceof Error ? error.message : String(error);
throw new PentestError(
`Failed to load configuration schema: ${errMsg}`,
'config',
false,
{ schemaPath: '../configs/config-schema.json', originalError: errMsg }
);
throw new PentestError(`Failed to load configuration schema: ${errMsg}`, 'config', false, {
schemaPath: '../configs/config-schema.json',
originalError: errMsg,
});
}
const DANGEROUS_PATTERNS: RegExp[] = [
@@ -185,7 +178,7 @@ export const parseConfig = async (configPath: string): Promise<Config> => {
'config',
false,
{ configPath },
ErrorCode.CONFIG_NOT_FOUND
ErrorCode.CONFIG_NOT_FOUND,
);
}
@@ -198,7 +191,7 @@ export const parseConfig = async (configPath: string): Promise<Config> => {
'config',
false,
{ configPath, fileSize: stats.size, maxFileSize },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
@@ -211,7 +204,7 @@ export const parseConfig = async (configPath: string): Promise<Config> => {
'config',
false,
{ configPath },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
@@ -230,7 +223,7 @@ export const parseConfig = async (configPath: string): Promise<Config> => {
'config',
false,
{ configPath, originalError: errMsg },
ErrorCode.CONFIG_PARSE_ERROR
ErrorCode.CONFIG_PARSE_ERROR,
);
}
@@ -241,7 +234,7 @@ export const parseConfig = async (configPath: string): Promise<Config> => {
'config',
false,
{ configPath },
ErrorCode.CONFIG_PARSE_ERROR
ErrorCode.CONFIG_PARSE_ERROR,
);
}
@@ -260,7 +253,7 @@ export const parseConfig = async (configPath: string): Promise<Config> => {
'config',
false,
{ configPath, originalError: errMsg },
ErrorCode.CONFIG_PARSE_ERROR
ErrorCode.CONFIG_PARSE_ERROR,
);
}
};
@@ -272,7 +265,7 @@ const validateConfig = (config: Config): void => {
'config',
false,
{},
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
@@ -282,7 +275,7 @@ const validateConfig = (config: Config): void => {
'config',
false,
{},
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
@@ -295,20 +288,18 @@ const validateConfig = (config: Config): void => {
'config',
false,
{ validationErrors: errorMessages },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
performSecurityValidation(config);
if (!config.rules && !config.authentication) {
if (!config.rules && !config.authentication && !config.description) {
console.warn(
'⚠️ Configuration file contains no rules or authentication. The pentest will run without any scoping restrictions or login capabilities.'
'⚠️ Configuration file contains no rules, authentication, or description. The pentest will run without any scoping restrictions or login capabilities.',
);
} else if (config.rules && !config.rules.avoid && !config.rules.focus) {
console.warn(
'⚠️ Configuration file contains no rules. The pentest will run without any scoping restrictions.'
);
console.warn('⚠️ Configuration file contains no rules. The pentest will run without any scoping restrictions.');
}
};
@@ -325,7 +316,7 @@ const performSecurityValidation = (config: Config): void => {
'config',
false,
{ field: 'login_url', pattern: pattern.source },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
}
@@ -339,7 +330,7 @@ const performSecurityValidation = (config: Config): void => {
'config',
false,
{ field: 'credentials.username', pattern: pattern.source },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
if (pattern.test(auth.credentials.password)) {
@@ -348,7 +339,7 @@ const performSecurityValidation = (config: Config): void => {
'config',
false,
{ field: 'credentials.password', pattern: pattern.source },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
}
@@ -363,7 +354,7 @@ const performSecurityValidation = (config: Config): void => {
'config',
false,
{ field: `login_flow[${index}]`, pattern: pattern.source },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
}
@@ -379,6 +370,20 @@ const performSecurityValidation = (config: Config): void => {
checkForDuplicates(config.rules.focus || [], 'focus');
checkForConflicts(config.rules.avoid, config.rules.focus);
}
if (config.description) {
for (const pattern of DANGEROUS_PATTERNS) {
if (pattern.test(config.description)) {
throw new PentestError(
`description contains potentially dangerous pattern: ${pattern.source}`,
'config',
false,
{ field: 'description', pattern: pattern.source },
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
}
}
};
const validateRulesSecurity = (rules: Rule[] | undefined, ruleType: string): void => {
@@ -392,7 +397,7 @@ const validateRulesSecurity = (rules: Rule[] | undefined, ruleType: string): voi
'config',
false,
{ field: `rules.${ruleType}[${index}].url_path`, pattern: pattern.source },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
if (pattern.test(rule.description)) {
@@ -401,7 +406,7 @@ const validateRulesSecurity = (rules: Rule[] | undefined, ruleType: string): voi
'config',
false,
{ field: `rules.${ruleType}[${index}].description`, pattern: pattern.source },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
}
@@ -421,7 +426,7 @@ const validateRuleTypeSpecific = (rule: Rule, ruleType: string, index: number):
'config',
false,
{ field, ruleType: rule.type },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
break;
@@ -435,7 +440,7 @@ const validateRuleTypeSpecific = (rule: Rule, ruleType: string, index: number):
'config',
false,
{ field, ruleType: rule.type },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
// Must contain at least one dot for domains
@@ -445,7 +450,7 @@ const validateRuleTypeSpecific = (rule: Rule, ruleType: string, index: number):
'config',
false,
{ field, ruleType: rule.type },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
break;
@@ -458,7 +463,7 @@ const validateRuleTypeSpecific = (rule: Rule, ruleType: string, index: number):
'config',
false,
{ field, ruleType: rule.type, allowedMethods },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
break;
@@ -471,7 +476,7 @@ const validateRuleTypeSpecific = (rule: Rule, ruleType: string, index: number):
'config',
false,
{ field, ruleType: rule.type },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
break;
@@ -483,7 +488,7 @@ const validateRuleTypeSpecific = (rule: Rule, ruleType: string, index: number):
'config',
false,
{ field, ruleType: rule.type },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
break;
@@ -500,7 +505,7 @@ const checkForDuplicates = (rules: Rule[], ruleType: string): void => {
'config',
false,
{ field: `rules.${ruleType}[${index}]`, ruleType: rule.type, urlPath: rule.url_path },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
seen.add(key);
@@ -518,7 +523,7 @@ const checkForConflicts = (avoidRules: Rule[] = [], focusRules: Rule[] = []): vo
'config',
false,
{ field: `rules.focus[${index}]`, urlPath: rule.url_path },
ErrorCode.CONFIG_VALIDATION_FAILED
ErrorCode.CONFIG_VALIDATION_FAILED,
);
}
});
@@ -536,11 +541,13 @@ export const distributeConfig = (config: Config | null): DistributedConfig => {
const avoid = config?.rules?.avoid || [];
const focus = config?.rules?.focus || [];
const authentication = config?.authentication || null;
const description = config?.description?.trim() || '';
return {
avoid: avoid.map(sanitizeRule),
focus: focus.map(sanitizeRule),
authentication: authentication ? sanitizeAuthentication(authentication) : null,
description,
};
};
+30
View File
@@ -0,0 +1,30 @@
/** Centralized path constants for the worker package */
import fs from 'node:fs';
import path from 'node:path';
/** Worker package root (apps/worker/) resolved from compiled dist/ files */
const WORKER_ROOT = path.resolve(import.meta.dirname, '..');
export const PROMPTS_DIR = path.join(WORKER_ROOT, 'prompts');
export const CONFIGS_DIR = path.join(WORKER_ROOT, 'configs');
/**
* Repository root — walk up from WORKER_ROOT looking for pnpm-workspace.yaml.
* Falls back to two levels up (apps/worker/ → repo root) if not found.
*/
function findRepoRoot(): string {
let dir = WORKER_ROOT;
for (let i = 0; i < 5; i++) {
if (fs.existsSync(path.join(dir, 'pnpm-workspace.yaml'))) {
return dir;
}
const parent = path.dirname(dir);
if (parent === dir) break;
dir = parent;
}
return path.resolve(WORKER_ROOT, '..', '..');
}
const REPO_ROOT = findRepoRoot();
export const WORKSPACES_DIR = path.join(REPO_ROOT, 'workspaces');
@@ -37,7 +37,7 @@ export class ProgressIndicator {
}
// Clear the spinner line
process.stdout.write('\r' + ' '.repeat(this.message.length + 5) + '\r');
process.stdout.write(`\r${' '.repeat(this.message.length + 5)}\r`);
this.isRunning = false;
}
+137
View File
@@ -0,0 +1,137 @@
#!/usr/bin/env node
// Copyright (C) 2025 Keygraph, Inc.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
/**
* generate-totp CLI
*
* Generates 6-digit TOTP codes for authentication.
* Replaces the MCP generate_totp tool.
* Based on RFC 6238 (TOTP) and RFC 4226 (HOTP).
*
* Usage:
* generate-totp --secret JBSWY3DPEHPK3PXP
*/
import { createHmac } from 'node:crypto';
// === Base32 Decoding ===
function base32Decode(encoded: string): Buffer {
const alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567';
const cleanInput = encoded.toUpperCase().replace(/[^A-Z2-7]/g, '');
if (cleanInput.length === 0) {
throw new Error('TOTP secret is empty after cleaning');
}
const output: number[] = [];
let bits = 0;
let value = 0;
for (const char of cleanInput) {
const index = alphabet.indexOf(char);
if (index === -1) {
throw new Error(`Invalid base32 character: ${char}`);
}
value = (value << 5) | index;
bits += 5;
if (bits >= 8) {
output.push((value >>> (bits - 8)) & 255);
bits -= 8;
}
}
return Buffer.from(output);
}
// === TOTP Generation (RFC 6238) ===
function generateHOTP(secret: string, counter: number, digits: number = 6): string {
const key = base32Decode(secret);
// Convert counter to 8-byte buffer (big-endian)
const counterBuffer = Buffer.alloc(8);
counterBuffer.writeBigUInt64BE(BigInt(counter));
// Generate HMAC-SHA1
const hmac = createHmac('sha1', key);
hmac.update(counterBuffer);
const hash = hmac.digest();
// Dynamic truncation (SHA-1 always produces 20 bytes)
const lastByte = hash[hash.length - 1] ?? 0;
const offset = lastByte & 0x0f;
const code =
(((hash[offset] ?? 0) & 0x7f) << 24) |
(((hash[offset + 1] ?? 0) & 0xff) << 16) |
(((hash[offset + 2] ?? 0) & 0xff) << 8) |
((hash[offset + 3] ?? 0) & 0xff);
return (code % 10 ** digits).toString().padStart(digits, '0');
}
function generateTOTP(secret: string, timeStep: number = 30, digits: number = 6): string {
const counter = Math.floor(Date.now() / 1000 / timeStep);
return generateHOTP(secret, counter, digits);
}
// === Argument Parsing ===
function parseSecret(argv: string[]): string {
for (let i = 2; i < argv.length; i++) {
const next = argv[i + 1];
if (argv[i] === '--secret' && next) {
return next;
}
}
return '';
}
// === Main ===
function main(): void {
const secret = parseSecret(process.argv);
if (!secret) {
console.log(JSON.stringify({ status: 'error', message: 'Missing required --secret argument', retryable: false }));
process.exit(1);
}
const base32Regex = /^[A-Z2-7]+$/i;
if (!base32Regex.test(secret)) {
console.log(
JSON.stringify({
status: 'error',
message: 'Secret must be base32-encoded (characters A-Z and 2-7)',
retryable: false,
}),
);
process.exit(1);
}
try {
const totpCode = generateTOTP(secret);
const expiresIn = 30 - (Math.floor(Date.now() / 1000) % 30);
console.log(
JSON.stringify({
status: 'success',
totpCode,
expiresIn,
}),
);
} catch (error) {
const msg = error instanceof Error ? error.message : String(error);
console.log(JSON.stringify({ status: 'error', message: `TOTP generation failed: ${msg}`, retryable: false }));
process.exit(1);
}
}
main();
+191
View File
@@ -0,0 +1,191 @@
#!/usr/bin/env node
// Copyright (C) 2025 Keygraph, Inc.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
/**
* save-deliverable CLI
*
* Standalone script to save deliverable files with validation.
* Replaces the MCP save_deliverable tool.
*
* Usage:
* node save-deliverable.js --type INJECTION_QUEUE --content '{"vulnerabilities": [...]}'
* node save-deliverable.js --type INJECTION_ANALYSIS --file-path deliverables/injection_analysis_deliverable.md
*/
import { mkdirSync, readFileSync, writeFileSync } from 'node:fs';
import { join, resolve } from 'node:path';
import { DELIVERABLE_FILENAMES, type DeliverableType, isQueueType } from '../types/deliverables.js';
// === Argument Parsing ===
interface ParsedArgs {
type: string;
content?: string;
filePath?: string;
}
function parseArgs(argv: string[]): ParsedArgs {
const args: ParsedArgs = { type: '' };
for (let i = 2; i < argv.length; i++) {
const arg = argv[i];
const next = argv[i + 1];
if (arg === '--type' && next) {
args.type = next;
i++;
} else if (arg === '--content' && next) {
args.content = next;
i++;
} else if (arg === '--file-path' && next) {
args.filePath = next;
i++;
}
}
return args;
}
// === Queue Validation ===
interface ValidationResult {
valid: boolean;
message?: string;
}
function validateQueueJson(content: string): ValidationResult {
try {
const parsed = JSON.parse(content) as unknown;
if (typeof parsed !== 'object' || parsed === null) {
return {
valid: false,
message: `Invalid queue structure: Expected an object. Got: ${typeof parsed}`,
};
}
const obj = parsed as Record<string, unknown>;
if (!('vulnerabilities' in obj)) {
return {
valid: false,
message: `Invalid queue structure: Missing 'vulnerabilities' property. Expected: {"vulnerabilities": [...]}`,
};
}
if (!Array.isArray(obj.vulnerabilities)) {
return {
valid: false,
message: `Invalid queue structure: 'vulnerabilities' must be an array. Expected: {"vulnerabilities": [...]}`,
};
}
return { valid: true };
} catch (error) {
return {
valid: false,
message: `Invalid JSON: ${error instanceof Error ? error.message : String(error)}`,
};
}
}
// === File Operations ===
function saveDeliverableFile(targetDir: string, filename: string, content: string): string {
const deliverablesDir = join(targetDir, 'deliverables');
const filepath = join(deliverablesDir, filename);
try {
mkdirSync(deliverablesDir, { recursive: true });
} catch {
throw new Error(`Cannot create deliverables directory at ${deliverablesDir}`);
}
writeFileSync(filepath, content, 'utf8');
return filepath;
}
// === Main ===
function main(): void {
const args = parseArgs(process.argv);
// 1. Validate --type
if (!args.type) {
console.log(JSON.stringify({ status: 'error', message: 'Missing required --type argument', retryable: false }));
process.exit(1);
}
const deliverableType = args.type as DeliverableType;
const filename = DELIVERABLE_FILENAMES[deliverableType];
if (!filename) {
console.log(
JSON.stringify({ status: 'error', message: `Unknown deliverable type: ${args.type}`, retryable: false }),
);
process.exit(1);
}
// 2. Resolve content from --content or --file-path
let content: string;
if (args.content) {
content = args.content;
} else if (args.filePath) {
// Path traversal protection: must resolve inside cwd
const cwd = process.cwd();
const resolved = resolve(cwd, args.filePath);
if (!resolved.startsWith(`${cwd}/`) && resolved !== cwd) {
console.log(
JSON.stringify({ status: 'error', message: `Path traversal detected: ${args.filePath}`, retryable: false }),
);
process.exit(1);
}
try {
content = readFileSync(resolved, 'utf8');
} catch (error) {
const msg = error instanceof Error ? error.message : String(error);
console.log(JSON.stringify({ status: 'error', message: `Failed to read file: ${msg}`, retryable: true }));
process.exit(1);
}
} else {
console.log(
JSON.stringify({
status: 'error',
message: 'Either --content or --file-path is required',
retryable: false,
}),
);
process.exit(1);
}
// 3. Validate queue types
let validated = false;
if (isQueueType(args.type)) {
const validation = validateQueueJson(content);
if (!validation.valid) {
console.log(JSON.stringify({ status: 'error', message: validation.message, retryable: true }));
process.exit(1);
}
validated = true;
}
// 4. Save the file
try {
const targetDir = process.cwd();
const filepath = saveDeliverableFile(targetDir, filename, content);
console.log(JSON.stringify({ status: 'success', filepath, validated }));
} catch (error) {
const msg = error instanceof Error ? error.message : String(error);
console.log(JSON.stringify({ status: 'error', message: `Failed to save: ${msg}`, retryable: true }));
process.exit(1);
}
}
main();
@@ -21,29 +21,20 @@
* No Temporal dependencies - pure domain logic.
*/
import type { ActivityLogger } from '../types/activity-logger.js';
import { Result, ok, err, isErr } from '../types/result.js';
import { ErrorCode, type PentestErrorType } from '../types/errors.js';
import { PentestError } from './error-handling.js';
import { isSpendingCapBehavior } from '../utils/billing-detection.js';
import { type ClaudePromptResult, runClaudePrompt, validateAgentOutput } from '../ai/claude-executor.js';
import type { AuditSession } from '../audit/index.js';
import { AGENTS } from '../session-manager.js';
import { loadPrompt } from './prompt-manager.js';
import {
runClaudePrompt,
validateAgentOutput,
type ClaudePromptResult,
} from '../ai/claude-executor.js';
import {
createGitCheckpoint,
commitGitSuccess,
rollbackGitWorkspace,
getGitCommitHash,
} from './git-manager.js';
import { AuditSession } from '../audit/index.js';
import type { AgentEndResult } from '../types/audit.js';
import type { ActivityLogger } from '../types/activity-logger.js';
import type { AgentName } from '../types/agents.js';
import type { ConfigLoaderService } from './config-loader.js';
import type { AgentEndResult } from '../types/audit.js';
import { ErrorCode, type PentestErrorType } from '../types/errors.js';
import type { AgentMetrics } from '../types/metrics.js';
import { err, isErr, ok, type Result } from '../types/result.js';
import { isSpendingCapBehavior } from '../utils/billing-detection.js';
import type { ConfigLoaderService } from './config-loader.js';
import { PentestError } from './error-handling.js';
import { commitGitSuccess, createGitCheckpoint, getGitCommitHash, rollbackGitWorkspace } from './git-manager.js';
import { loadPrompt } from './prompt-manager.js';
/**
* Input for agent execution.
@@ -94,7 +85,7 @@ export class AgentExecutionService {
agentName: AgentName,
input: AgentExecutionInput,
auditSession: AuditSession,
logger: ActivityLogger
logger: ActivityLogger,
): Promise<Result<AgentEndResult, PentestError>> {
const { webUrl, repoPath, configPath, pipelineTestingMode = false, attemptNumber } = input;
@@ -109,13 +100,7 @@ export class AgentExecutionService {
const promptTemplate = AGENTS[agentName].promptTemplate;
let prompt: string;
try {
prompt = await loadPrompt(
promptTemplate,
{ webUrl, repoPath },
distributedConfig,
pipelineTestingMode,
logger
);
prompt = await loadPrompt(promptTemplate, { webUrl, repoPath }, distributedConfig, pipelineTestingMode, logger);
} catch (error) {
const errorMessage = error instanceof Error ? error.message : String(error);
return err(
@@ -124,8 +109,8 @@ export class AgentExecutionService {
'prompt',
false,
{ agentName, promptTemplate, originalError: errorMessage },
ErrorCode.PROMPT_LOAD_FAILED
)
ErrorCode.PROMPT_LOAD_FAILED,
),
);
}
@@ -140,8 +125,8 @@ export class AgentExecutionService {
'filesystem',
false,
{ agentName, repoPath, originalError: errorMessage },
ErrorCode.GIT_CHECKPOINT_FAILED
)
ErrorCode.GIT_CHECKPOINT_FAILED,
),
);
}
@@ -157,7 +142,7 @@ export class AgentExecutionService {
agentName,
auditSession,
logger,
AGENTS[agentName].modelTier
AGENTS[agentName].modelTier,
);
// 6. Spending cap check - defense-in-depth
@@ -165,7 +150,8 @@ export class AgentExecutionService {
const resultText = result.result || '';
if (isSpendingCapBehavior(result.turns ?? 0, result.cost || 0, resultText)) {
return this.failAgent(agentName, repoPath, auditSession, logger, {
attemptNumber, result,
attemptNumber,
result,
rollbackReason: 'spending cap detected',
errorMessage: `Spending cap likely reached: ${resultText.slice(0, 100)}`,
errorCode: ErrorCode.SPENDING_CAP_REACHED,
@@ -179,7 +165,8 @@ export class AgentExecutionService {
// 7. Handle execution failure
if (!result.success) {
return this.failAgent(agentName, repoPath, auditSession, logger, {
attemptNumber, result,
attemptNumber,
result,
rollbackReason: 'execution failure',
errorMessage: result.error || 'Agent execution failed',
errorCode: ErrorCode.AGENT_EXECUTION_FAILED,
@@ -193,7 +180,8 @@ export class AgentExecutionService {
const validationPassed = await validateAgentOutput(result, agentName, repoPath, logger);
if (!validationPassed) {
return this.failAgent(agentName, repoPath, auditSession, logger, {
attemptNumber, result,
attemptNumber,
result,
rollbackReason: 'validation failure',
errorMessage: `Agent ${agentName} failed output validation`,
errorCode: ErrorCode.OUTPUT_VALIDATION_FAILED,
@@ -225,7 +213,7 @@ export class AgentExecutionService {
repoPath: string,
auditSession: AuditSession,
logger: ActivityLogger,
opts: FailAgentOpts
opts: FailAgentOpts,
): Promise<Result<AgentEndResult, PentestError>> {
await rollbackGitWorkspace(repoPath, opts.rollbackReason, logger);
@@ -239,15 +227,7 @@ export class AgentExecutionService {
};
await auditSession.endAgent(agentName, endResult);
return err(
new PentestError(
opts.errorMessage,
opts.category,
opts.retryable,
opts.context,
opts.errorCode
)
);
return err(new PentestError(opts.errorMessage, opts.category, opts.retryable, opts.context, opts.errorCode));
}
/**
@@ -267,7 +247,7 @@ export class AgentExecutionService {
agentName: AgentName,
input: AgentExecutionInput,
auditSession: AuditSession,
logger: ActivityLogger
logger: ActivityLogger,
): Promise<AgentEndResult> {
const result = await this.execute(agentName, input, auditSession, logger);
if (isErr(result)) {
@@ -11,11 +11,11 @@
* Pure service with no Temporal dependencies.
*/
import { parseConfig, distributeConfig } from '../config-parser.js';
import { PentestError } from './error-handling.js';
import { Result, ok, err } from '../types/result.js';
import { ErrorCode } from '../types/errors.js';
import { distributeConfig, parseConfig } from '../config-parser.js';
import type { DistributedConfig } from '../types/config.js';
import { ErrorCode } from '../types/errors.js';
import { err, ok, type Result } from '../types/result.js';
import { PentestError } from './error-handling.js';
/**
* Service for loading and distributing configuration files.
@@ -52,8 +52,8 @@ export class ConfigLoaderService {
'config',
false,
{ configPath, originalError: errorMessage },
errorCode
)
errorCode,
),
);
}
}
@@ -64,9 +64,7 @@ export class ConfigLoaderService {
* @param configPath - Optional path to the YAML configuration file
* @returns Result containing DistributedConfig (or null) on success, PentestError on failure
*/
async loadOptional(
configPath: string | undefined
): Promise<Result<DistributedConfig | null, PentestError>> {
async loadOptional(configPath: string | undefined): Promise<Result<DistributedConfig | null, PentestError>> {
if (!configPath) {
return ok(null);
}
@@ -75,10 +75,7 @@ const containers = new Map<string, Container>();
* @param sessionMetadata - Session metadata for audit paths
* @returns Container instance for the workflow
*/
export function getOrCreateContainer(
workflowId: string,
sessionMetadata: SessionMetadata
): Container {
export function getOrCreateContainer(workflowId: string, sessionMetadata: SessionMetadata): Container {
let container = containers.get(workflowId);
if (!container) {
@@ -4,16 +4,8 @@
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
import {
ErrorCode,
type PentestErrorType,
type PentestErrorContext,
type PromptErrorResult,
} from '../types/errors.js';
import {
matchesBillingApiPattern,
matchesBillingTextPattern,
} from '../utils/billing-detection.js';
import { ErrorCode, type PentestErrorContext, type PentestErrorType, type PromptErrorResult } from '../types/errors.js';
import { matchesBillingApiPattern, matchesBillingTextPattern } from '../utils/billing-detection.js';
export class PentestError extends Error {
override name = 'PentestError' as const;
@@ -29,7 +21,7 @@ export class PentestError extends Error {
type: PentestErrorType,
retryable: boolean = false,
context: PentestErrorContext = {},
code?: ErrorCode
code?: ErrorCode,
) {
super(message);
this.type = type;
@@ -42,18 +34,13 @@ export class PentestError extends Error {
}
}
export function handlePromptError(
promptName: string,
error: Error
): PromptErrorResult {
export function handlePromptError(promptName: string, error: Error): PromptErrorResult {
return {
success: false,
error: new PentestError(
`Failed to load prompt '${promptName}': ${error.message}`,
'prompt',
false,
{ promptName, originalError: error.message }
),
error: new PentestError(`Failed to load prompt '${promptName}': ${error.message}`, 'prompt', false, {
promptName,
originalError: error.message,
}),
};
}
@@ -76,7 +63,6 @@ const RETRYABLE_PATTERNS = [
'service unavailable',
'bad gateway',
// Claude API errors
'mcp server',
'model unavailable',
'service temporarily unavailable',
'api error',
@@ -111,10 +97,7 @@ export function isRetryableError(error: Error): boolean {
* Classifies errors by ErrorCode for reliable, code-based classification.
* Used when error is a PentestError with a specific ErrorCode.
*/
function classifyByErrorCode(
code: ErrorCode,
retryableFromError: boolean
): { type: string; retryable: boolean } {
function classifyByErrorCode(code: ErrorCode, retryableFromError: boolean): { type: string; retryable: boolean } {
switch (code) {
// Billing errors - retryable (wait for cap reset or credits added)
case ErrorCode.SPENDING_CAP_REACHED:
@@ -206,49 +189,30 @@ export function classifyErrorForTemporal(error: unknown): { type: string; retrya
}
// Permission (403) - access won't be granted
if (
message.includes('permission') ||
message.includes('forbidden') ||
message.includes('403')
) {
if (message.includes('permission') || message.includes('forbidden') || message.includes('403')) {
return { type: 'PermissionError', retryable: false };
}
// === OUTPUT VALIDATION ERRORS (Retryable) ===
// Agent didn't produce expected deliverables - retry may succeed
// IMPORTANT: Must come BEFORE generic 'validation' check below
if (
message.includes('failed output validation') ||
message.includes('output validation failed')
) {
if (message.includes('failed output validation') || message.includes('output validation failed')) {
return { type: 'OutputValidationError', retryable: true };
}
// Invalid Request (400) - malformed request is permanent
// Note: Checked AFTER billing and AFTER output validation
if (
message.includes('invalid_request_error') ||
message.includes('malformed') ||
message.includes('validation')
) {
if (message.includes('invalid_request_error') || message.includes('malformed') || message.includes('validation')) {
return { type: 'InvalidRequestError', retryable: false };
}
// Request Too Large (413) - won't fit no matter how many retries
if (
message.includes('request_too_large') ||
message.includes('too large') ||
message.includes('413')
) {
if (message.includes('request_too_large') || message.includes('too large') || message.includes('413')) {
return { type: 'RequestTooLargeError', retryable: false };
}
// Configuration errors - missing files need manual fix
if (
message.includes('enoent') ||
message.includes('no such file') ||
message.includes('cli not installed')
) {
if (message.includes('enoent') || message.includes('no such file') || message.includes('cli not installed')) {
return { type: 'ConfigurationError', retryable: false };
}
@@ -13,13 +13,9 @@
* No Temporal dependencies - this is pure business logic.
*/
import {
validateQueueSafe,
type VulnType,
type ExploitationDecision,
} from './queue-validation.js';
import { isOk } from '../types/result.js';
import type { ActivityLogger } from '../types/activity-logger.js';
import { isOk } from '../types/result.js';
import { type ExploitationDecision, type VulnType, validateQueueSafe } from './queue-validation.js';
/**
* Service for checking exploitation queue decisions.
@@ -46,7 +42,7 @@ export class ExploitationCheckerService {
if (isOk(result)) {
const decision = result.value;
logger.info(
`${vulnType}: ${decision.shouldExploit ? `${decision.vulnerabilityCount} vulnerabilities found` : 'no vulnerabilities, skipping exploitation'}`
`${vulnType}: ${decision.shouldExploit ? `${decision.vulnerabilityCount} vulnerabilities found` : 'no vulnerabilities, skipping exploitation'}`,
);
return decision;
}
@@ -5,9 +5,9 @@
// as published by the Free Software Foundation.
import { $ } from 'zx';
import { PentestError } from './error-handling.js';
import { ErrorCode } from '../types/errors.js';
import type { ActivityLogger } from '../types/activity-logger.js';
import { ErrorCode } from '../types/errors.js';
import { PentestError } from './error-handling.js';
/**
* Check if a directory is a git repository.
@@ -31,15 +31,8 @@ interface GitOperationResult {
/**
* Get list of changed files from git status --porcelain output
*/
async function getChangedFiles(
sourceDir: string,
operationDescription: string
): Promise<string[]> {
const status = await executeGitCommandWithRetry(
['git', 'status', '--porcelain'],
sourceDir,
operationDescription
);
async function getChangedFiles(sourceDir: string, operationDescription: string): Promise<string[]> {
const status = await executeGitCommandWithRetry(['git', 'status', '--porcelain'], sourceDir, operationDescription);
return status.stdout
.trim()
.split('\n')
@@ -55,14 +48,15 @@ function logChangeSummary(
messageWithoutChanges: string,
logger: ActivityLogger,
level: 'info' | 'warn' = 'info',
maxToShow: number = 5
maxToShow: number = 5,
): void {
if (changes.length > 0) {
const msg = messageWithChanges.replace('{count}', String(changes.length));
const fileList = changes.slice(0, maxToShow).map((c) => ` ${c}`).join(', ');
const suffix = changes.length > maxToShow
? ` ... and ${changes.length - maxToShow} more files`
: '';
const fileList = changes
.slice(0, maxToShow)
.map((c) => ` ${c}`)
.join(', ');
const suffix = changes.length > maxToShow ? ` ... and ${changes.length - maxToShow} more files` : '';
logger[level](`${msg} ${fileList}${suffix}`);
} else {
logger[level](messageWithoutChanges);
@@ -101,7 +95,7 @@ class GitSemaphore {
if (!this.running && this.queue.length > 0) {
this.running = true;
const resolve = this.queue.shift();
resolve!();
resolve?.();
}
}
}
@@ -125,7 +119,7 @@ export async function executeGitCommandWithRetry(
commandArgs: string[],
sourceDir: string,
description: string,
maxRetries: number = 5
maxRetries: number = 5,
): Promise<{ stdout: string; stderr: string }> {
await gitSemaphore.acquire();
@@ -139,11 +133,11 @@ export async function executeGitCommandWithRetry(
const errMsg = error instanceof Error ? error.message : String(error);
if (isGitLockError(errMsg) && attempt < maxRetries) {
const delay = Math.pow(2, attempt - 1) * 1000;
const delay = 2 ** (attempt - 1) * 1000;
// executeGitCommandWithRetry is also called outside activity context
// (e.g., from resume logic), so we use console.warn as a fallback here
console.warn(
`Git lock conflict during ${description} (attempt ${attempt}/${maxRetries}). Retrying in ${delay}ms...`
`Git lock conflict during ${description} (attempt ${attempt}/${maxRetries}). Retrying in ${delay}ms...`,
);
await new Promise((resolve) => setTimeout(resolve, delay));
continue;
@@ -157,7 +151,7 @@ export async function executeGitCommandWithRetry(
'filesystem',
true, // Retryable - transient git lock issues
{ maxRetries, description },
ErrorCode.GIT_CHECKPOINT_FAILED
ErrorCode.GIT_CHECKPOINT_FAILED,
);
} finally {
gitSemaphore.release();
@@ -168,7 +162,7 @@ export async function executeGitCommandWithRetry(
export async function rollbackGitWorkspace(
sourceDir: string,
reason: string = 'retry preparation',
logger: ActivityLogger
logger: ActivityLogger,
): Promise<GitOperationResult> {
// Skip git operations if not a git repository
if (!(await isGitRepository(sourceDir))) {
@@ -180,16 +174,8 @@ export async function rollbackGitWorkspace(
try {
const changes = await getChangedFiles(sourceDir, 'status check for rollback');
await executeGitCommandWithRetry(
['git', 'reset', '--hard', 'HEAD'],
sourceDir,
'hard reset for rollback'
);
await executeGitCommandWithRetry(
['git', 'clean', '-fd'],
sourceDir,
'cleaning untracked files for rollback'
);
await executeGitCommandWithRetry(['git', 'reset', '--hard', 'HEAD'], sourceDir, 'hard reset for rollback');
await executeGitCommandWithRetry(['git', 'clean', '-fd'], sourceDir, 'cleaning untracked files for rollback');
logChangeSummary(
changes,
@@ -197,7 +183,7 @@ export async function rollbackGitWorkspace(
'Rollback completed - no changes to remove',
logger,
'info',
3
3,
);
return { success: true };
} catch (error) {
@@ -210,7 +196,7 @@ export async function rollbackGitWorkspace(
'filesystem',
false, // Non-retryable - rollback is best-effort cleanup
{ sourceDir, reason },
ErrorCode.GIT_ROLLBACK_FAILED
ErrorCode.GIT_ROLLBACK_FAILED,
),
};
}
@@ -221,7 +207,7 @@ export async function createGitCheckpoint(
sourceDir: string,
description: string,
attempt: number,
logger: ActivityLogger
logger: ActivityLogger,
): Promise<GitOperationResult> {
// Skip git operations if not a git repository
if (!(await isGitRepository(sourceDir))) {
@@ -248,7 +234,7 @@ export async function createGitCheckpoint(
await executeGitCommandWithRetry(
['git', 'commit', '-m', `📍 Checkpoint: ${description} (attempt ${attempt})`, '--allow-empty'],
sourceDir,
'creating commit'
'creating commit',
);
// 4. Log result
@@ -268,7 +254,7 @@ export async function createGitCheckpoint(
export async function commitGitSuccess(
sourceDir: string,
description: string,
logger: ActivityLogger
logger: ActivityLogger,
): Promise<GitOperationResult> {
// Skip git operations if not a git repository
if (!(await isGitRepository(sourceDir))) {
@@ -280,22 +266,18 @@ export async function commitGitSuccess(
try {
const changes = await getChangedFiles(sourceDir, 'status check for success commit');
await executeGitCommandWithRetry(
['git', 'add', '-A'],
sourceDir,
'staging changes for success commit'
);
await executeGitCommandWithRetry(['git', 'add', '-A'], sourceDir, 'staging changes for success commit');
await executeGitCommandWithRetry(
['git', 'commit', '-m', `${description}: completed successfully`, '--allow-empty'],
sourceDir,
'creating success commit'
'creating success commit',
);
logChangeSummary(
changes,
'Success commit created with {count} file changes:',
'Empty success commit created (agent made no file changes)',
logger
logger,
);
return { success: true };
} catch (error) {
@@ -11,13 +11,12 @@
* Services are pure domain logic with no Temporal dependencies.
*/
export { Container, getOrCreateContainer, removeContainer } from './container.js';
export type { ContainerDependencies } from './container.js';
export type { AgentExecutionInput } from './agent-execution.js';
export { AgentExecutionService } from './agent-execution.js';
export { ConfigLoaderService } from './config-loader.js';
export type { ContainerDependencies } from './container.js';
export { Container, getOrCreateContainer, removeContainer } from './container.js';
export { ExploitationCheckerService } from './exploitation-checker.js';
export { AgentExecutionService } from './agent-execution.js';
export type { AgentExecutionInput } from './agent-execution.js';
export { assembleFinalReport, injectModelIntoReport } from './reporting.js';
export { loadPrompt } from './prompt-manager.js';
export { assembleFinalReport, injectModelIntoReport } from './reporting.js';
@@ -15,24 +15,31 @@
* 1. Repository path exists and contains .git
* 2. Config file parses and validates (if provided)
* 3. Credentials validate via Claude Agent SDK query (API key, OAuth, Bedrock, Vertex AI, or router mode)
* 4. Target URL is reachable from the container (DNS + HTTP)
*/
import fs from 'fs/promises';
import { query } from '@anthropic-ai/claude-agent-sdk';
import { lookup } from 'node:dns/promises';
import fs from 'node:fs/promises';
import http from 'node:http';
import https from 'node:https';
import type { SDKAssistantMessageError } from '@anthropic-ai/claude-agent-sdk';
import { PentestError, isRetryableError } from './error-handling.js';
import { ErrorCode } from '../types/errors.js';
import { type Result, ok, err } from '../types/result.js';
import { parseConfig } from '../config-parser.js';
import { query } from '@anthropic-ai/claude-agent-sdk';
import { resolveModel } from '../ai/models.js';
import { parseConfig } from '../config-parser.js';
import type { ActivityLogger } from '../types/activity-logger.js';
import { ErrorCode } from '../types/errors.js';
import { err, ok, type Result } from '../types/result.js';
import { isRetryableError, PentestError } from './error-handling.js';
const TARGET_URL_TIMEOUT_MS = 10_000;
function isLoopbackAddress(address: string): boolean {
return address === '127.0.0.1' || address === '::1' || address === '0.0.0.0';
}
// === Repository Validation ===
async function validateRepo(
repoPath: string,
logger: ActivityLogger
): Promise<Result<void, PentestError>> {
async function validateRepo(repoPath: string, logger: ActivityLogger): Promise<Result<void, PentestError>> {
logger.info('Checking repository path...', { repoPath });
// 1. Check repo directory exists
@@ -45,8 +52,8 @@ async function validateRepo(
'config',
false,
{ repoPath },
ErrorCode.REPO_NOT_FOUND
)
ErrorCode.REPO_NOT_FOUND,
),
);
}
} catch {
@@ -56,8 +63,8 @@ async function validateRepo(
'config',
false,
{ repoPath },
ErrorCode.REPO_NOT_FOUND
)
ErrorCode.REPO_NOT_FOUND,
),
);
}
@@ -71,8 +78,8 @@ async function validateRepo(
'config',
false,
{ repoPath },
ErrorCode.REPO_NOT_FOUND
)
ErrorCode.REPO_NOT_FOUND,
),
);
}
} catch {
@@ -82,8 +89,8 @@ async function validateRepo(
'config',
false,
{ repoPath },
ErrorCode.REPO_NOT_FOUND
)
ErrorCode.REPO_NOT_FOUND,
),
);
}
@@ -93,10 +100,7 @@ async function validateRepo(
// === Config Validation ===
async function validateConfig(
configPath: string,
logger: ActivityLogger
): Promise<Result<void, PentestError>> {
async function validateConfig(configPath: string, logger: ActivityLogger): Promise<Result<void, PentestError>> {
logger.info('Validating configuration file...', { configPath });
try {
@@ -114,8 +118,8 @@ async function validateConfig(
'config',
false,
{ configPath },
ErrorCode.CONFIG_VALIDATION_FAILED
)
ErrorCode.CONFIG_VALIDATION_FAILED,
),
);
}
}
@@ -123,43 +127,60 @@ async function validateConfig(
// === Credential Validation ===
/** Map SDK error type to a human-readable preflight PentestError. */
function classifySdkError(
sdkError: SDKAssistantMessageError,
authType: string
): Result<void, PentestError> {
function classifySdkError(sdkError: SDKAssistantMessageError, authType: string): Result<void, PentestError> {
switch (sdkError) {
case 'authentication_failed':
return err(new PentestError(
`Invalid ${authType}. Check your credentials in .env and try again.`,
'config', false, { authType, sdkError }, ErrorCode.AUTH_FAILED
));
return err(
new PentestError(
`Invalid ${authType}. Check your credentials in .env and try again.`,
'config',
false,
{ authType, sdkError },
ErrorCode.AUTH_FAILED,
),
);
case 'billing_error':
return err(new PentestError(
`Anthropic account has a billing issue. Add credits or check your billing dashboard.`,
'billing', true, { authType, sdkError }, ErrorCode.BILLING_ERROR
));
return err(
new PentestError(
`Anthropic account has a billing issue. Add credits or check your billing dashboard.`,
'billing',
true,
{ authType, sdkError },
ErrorCode.BILLING_ERROR,
),
);
case 'rate_limit':
return err(new PentestError(
`Anthropic rate limit or spending cap reached. Wait a few minutes and try again.`,
'billing', true, { authType, sdkError }, ErrorCode.BILLING_ERROR
));
return err(
new PentestError(
`Anthropic rate limit or spending cap reached. Wait a few minutes and try again.`,
'billing',
true,
{ authType, sdkError },
ErrorCode.BILLING_ERROR,
),
);
case 'server_error':
return err(new PentestError(
`Anthropic API is temporarily unavailable. Try again shortly.`,
'network', true, { authType, sdkError }
));
return err(
new PentestError(`Anthropic API is temporarily unavailable. Try again shortly.`, 'network', true, {
authType,
sdkError,
}),
);
default:
return err(new PentestError(
`${authType} validation failed unexpectedly. Check your credentials in .env.`,
'config', false, { authType, sdkError }, ErrorCode.AUTH_FAILED
));
return err(
new PentestError(
`${authType} validation failed unexpectedly. Check your credentials in .env.`,
'config',
false,
{ authType, sdkError },
ErrorCode.AUTH_FAILED,
),
);
}
}
/** Validate credentials via a minimal Claude Agent SDK query. */
async function validateCredentials(
logger: ActivityLogger
): Promise<Result<void, PentestError>> {
async function validateCredentials(logger: ActivityLogger): Promise<Result<void, PentestError>> {
// 1. Custom base URL — validate endpoint is reachable via SDK query
if (process.env.ANTHROPIC_BASE_URL) {
const baseUrl = process.env.ANTHROPIC_BASE_URL;
@@ -185,16 +206,22 @@ async function validateCredentials(
'network',
false,
{ baseUrl },
ErrorCode.AUTH_FAILED
)
ErrorCode.AUTH_FAILED,
),
);
}
}
// 2. Bedrock mode — validate required AWS credentials are present
if (process.env.CLAUDE_CODE_USE_BEDROCK === '1') {
const required = ['AWS_REGION', 'AWS_BEARER_TOKEN_BEDROCK', 'ANTHROPIC_SMALL_MODEL', 'ANTHROPIC_MEDIUM_MODEL', 'ANTHROPIC_LARGE_MODEL'];
const missing = required.filter(v => !process.env[v]);
const required = [
'AWS_REGION',
'AWS_BEARER_TOKEN_BEDROCK',
'ANTHROPIC_SMALL_MODEL',
'ANTHROPIC_MEDIUM_MODEL',
'ANTHROPIC_LARGE_MODEL',
];
const missing = required.filter((v) => !process.env[v]);
if (missing.length > 0) {
return err(
new PentestError(
@@ -202,8 +229,8 @@ async function validateCredentials(
'config',
false,
{ missing },
ErrorCode.AUTH_FAILED
)
ErrorCode.AUTH_FAILED,
),
);
}
logger.info('Bedrock credentials OK');
@@ -212,8 +239,14 @@ async function validateCredentials(
// 3. Vertex AI mode — validate required GCP credentials are present
if (process.env.CLAUDE_CODE_USE_VERTEX === '1') {
const required = ['CLOUD_ML_REGION', 'ANTHROPIC_VERTEX_PROJECT_ID', 'ANTHROPIC_SMALL_MODEL', 'ANTHROPIC_MEDIUM_MODEL', 'ANTHROPIC_LARGE_MODEL'];
const missing = required.filter(v => !process.env[v]);
const required = [
'CLOUD_ML_REGION',
'ANTHROPIC_VERTEX_PROJECT_ID',
'ANTHROPIC_SMALL_MODEL',
'ANTHROPIC_MEDIUM_MODEL',
'ANTHROPIC_LARGE_MODEL',
];
const missing = required.filter((v) => !process.env[v]);
if (missing.length > 0) {
return err(
new PentestError(
@@ -221,8 +254,8 @@ async function validateCredentials(
'config',
false,
{ missing },
ErrorCode.AUTH_FAILED
)
ErrorCode.AUTH_FAILED,
),
);
}
// Validate service account credentials file is accessible
@@ -234,8 +267,8 @@ async function validateCredentials(
'config',
false,
{},
ErrorCode.AUTH_FAILED
)
ErrorCode.AUTH_FAILED,
),
);
}
try {
@@ -247,8 +280,8 @@ async function validateCredentials(
'config',
false,
{ credPath },
ErrorCode.AUTH_FAILED
)
ErrorCode.AUTH_FAILED,
),
);
}
logger.info('Vertex AI credentials OK');
@@ -263,8 +296,8 @@ async function validateCredentials(
'config',
false,
{},
ErrorCode.AUTH_FAILED
)
ErrorCode.AUTH_FAILED,
),
);
}
@@ -296,8 +329,113 @@ async function validateCredentials(
retryable ? 'network' : 'config',
retryable,
{ authType },
retryable ? undefined : ErrorCode.AUTH_FAILED
)
retryable ? undefined : ErrorCode.AUTH_FAILED,
),
);
}
}
// === Target URL Validation ===
/** HTTP HEAD with TLS verification disabled — we check reachability, not certificate validity. */
function httpHead(url: string, timeoutMs: number): Promise<number> {
return new Promise((resolve, reject) => {
const parsed = new URL(url);
const isHttps = parsed.protocol === 'https:';
const transport = isHttps ? https : http;
const req = transport.request(
url,
{
method: 'HEAD',
timeout: timeoutMs,
...(isHttps && { rejectUnauthorized: false }),
},
(res) => {
res.resume();
resolve(res.statusCode ?? 0);
},
);
req.on('timeout', () => {
req.destroy();
reject(new Error(`Connection timed out after ${timeoutMs}ms`));
});
req.on('error', reject);
req.end();
});
}
/** Check that the target URL is reachable from inside the container. */
async function validateTargetUrl(targetUrl: string, logger: ActivityLogger): Promise<Result<void, PentestError>> {
logger.info('Checking target URL reachability...', { targetUrl });
// 1. Parse URL
let parsed: URL;
try {
parsed = new URL(targetUrl);
} catch {
return err(
new PentestError(
`Invalid target URL: ${targetUrl}`,
'config',
false,
{ targetUrl },
ErrorCode.TARGET_UNREACHABLE,
),
);
}
// 2. DNS lookup — detect loopback addresses early for a better hint
const hostname = parsed.hostname;
let resolvedAddress: string | undefined;
try {
const result = await lookup(hostname);
resolvedAddress = result.address;
} catch {
return err(
new PentestError(
`Target URL ${targetUrl} is not reachable. Verify the URL is correct and the site is up.`,
'network',
false,
{ targetUrl, hostname },
ErrorCode.TARGET_UNREACHABLE,
),
);
}
// 3. HTTP reachability check
try {
await httpHead(targetUrl, TARGET_URL_TIMEOUT_MS);
logger.info('Target URL OK');
return ok(undefined);
} catch (error) {
const isLoopback = isLoopbackAddress(resolvedAddress);
const detail = error instanceof Error ? error.message : String(error);
if (isLoopback) {
const suggestion = targetUrl.replace(hostname, 'host.docker.internal');
return err(
new PentestError(
`Target URL ${targetUrl} resolves to ${resolvedAddress} (loopback) and is not reachable. ` +
`For local services, use host.docker.internal instead of ${hostname} (e.g., ${suggestion})`,
'network',
false,
{ targetUrl, resolvedAddress, hostname },
ErrorCode.TARGET_UNREACHABLE,
),
);
}
return err(
new PentestError(
`Target URL ${targetUrl} is not reachable: ${detail}`,
'network',
false,
{ targetUrl, resolvedAddress },
ErrorCode.TARGET_UNREACHABLE,
),
);
}
}
@@ -310,13 +448,15 @@ async function validateCredentials(
* 1. Repository path exists and contains .git
* 2. Config file parses and validates (if configPath provided)
* 3. Credentials validate (API key, OAuth, or router mode)
* 4. Target URL is reachable from the container
*
* Returns on first failure.
*/
export async function runPreflightChecks(
targetUrl: string,
repoPath: string,
configPath: string | undefined,
logger: ActivityLogger
logger: ActivityLogger,
): Promise<Result<void, PentestError>> {
// 1. Repository check (free — filesystem only)
const repoResult = await validateRepo(repoPath, logger);
@@ -338,6 +478,12 @@ export async function runPreflightChecks(
return credResult;
}
// 4. Target URL reachability check (cheap — 1 HTTP round-trip)
const urlResult = await validateTargetUrl(targetUrl, logger);
if (!urlResult.ok) {
return urlResult;
}
logger.info('All preflight checks passed');
return ok(undefined);
}

Some files were not shown because too many files have changed in this diff Show More