feat: add npx CLI with monorepo, CI/CD, and ephemeral worker architecture (#256)

* feat: integrate npx CLI, CI/CD, and ephemeral worker architecture Bring in changes from shannon-npx: npx-distributable CLI package (cli/), semantic-release CI/CD workflows, ephemeral per-scan worker containers, TOML config support, setup wizard, and workspace management. Preserves all shannon-only changes: security hardening (localhost-bound ports, MCP env allowlist, path traversal guard), updated benchmarks (XBEN 19/31/35/44), README assets, and prompt injection disclaimer. Applies security hardening to cli/infra/compose.yml as well. * refactor: migrate to Turborepo + pnpm + Biome monorepo Restructure into apps/worker, apps/cli, packages/mcp-server with Turborepo task orchestration, pnpm workspaces, Biome linting/formatting, and tsdown CLI bundling. Key changes: - src/ -> apps/worker/src/, cli/ -> apps/cli/, mcp-server/ -> packages/mcp-server/ - prompts/ and configs/ moved into apps/worker/ - npm replaced with pnpm, package-lock.json replaced with pnpm-lock.yaml - Dockerfile updated for pnpm-based builds - CLI logs command rewritten with chokidar for cross-platform reliability - Router health checking added for auto-detected router mode - Centralized path resolution via apps/worker/src/paths.ts * fix: resolve all biome warnings and formatting issues - Remove unnecessary non-null assertions where values are guaranteed - Replace array index access with .at() for safer element retrieval - Use local variables to avoid repeated process.env lookups - Replace any types with unknown in functional utilities - Use nullish coalescing for TOTP hash byte access - Auto-format security patches to match biome config * fix: pin pnpm to 10.12.1 in Dockerfile for catalog support * fix: handle Esc cancellation in Bedrock setup flow Replace p.group() with individual prompts and per-field cancel checks, matching the pattern used by all other provider setup flows. * feat: add optional model customization to Anthropic setup * fix: resolve Docker bind mount permission errors on Linux Use entrypoint-based UID remapping instead of --user flag so the container's pentest user matches the host UID/GID, keeping bind-mounted volumes writable. Git config moved to --system level to survive remapping. * fix: show resumed workflow ID in splash screen URL When resuming a workflow, the Temporal Web UI link pointed to the old (terminated) workflow ID. Now extracts "New Workflow ID" from the resume header in workflow.log, falling back to the original ID for fresh scans. * style: fix biome formatting in docker.ts * fix: align TypeScript config types with JSON Schema - SuccessCondition.type: use schema values (url_contains, element_present, url_equals_exactly, text_contains) instead of stale values (url, cookie, element, redirect) - Authentication.login_flow: mark optional to match schema which does not require it * feat: mark GitHub release as latest during rollback * fix: use native ARM64 runners for Docker multi-platform builds Replace QEMU emulation with parallel native builds using a matrix strategy (ubuntu-latest for amd64, ubuntu-24.04-arm for arm64). Each platform pushes by digest, then a merge job creates the multi-arch manifest list before signing with cosign. * fix: resolve SessionMutex race condition with 3+ concurrent waiters * fix: skip POSIX permission check on Windows writeFileSync mode option is ignored on Windows, so config.toml gets 0o666 and the guard rejects it. * fix: resolve unsubstituted placeholders in report prompt Remove unused {{GITHUB_URL}} placeholder and wire up {{AUTH_CONTEXT}} with structured auth context (login type, username, URL, MFA status). * fix: remove duplicate environment gate from merge-docker job Move DOCKERHUB_USERNAME from vars to secrets so merge-docker can access credentials without its own environment scope. This eliminates the redundant double approval since build-docker already gates on release-publish. * fix: replace POSIX sleep binary with cross-platform async sleep execFileSync('sleep') is unavailable on Windows. Use node:timers/promises setTimeout instead, making ensureInfra async. * fix: use session.json for workflow ID on resume instead of parsing workflow.log On resume, workflow.log already exists with stale headers from the previous run. The CLI poll found '====' immediately and extracted the old workflow ID, producing a wrong Temporal Web UI URL. Read the workflow ID from session.json instead — the worker writes resume attempts there atomically. For fresh runs, poll until originalWorkflowId appears. For resumes, poll until a new resumeAttempts entry is appended. * feat: add custom base URL support for Anthropic-compatible proxies Support ANTHROPIC_BASE_URL + ANTHROPIC_AUTH_TOKEN to route SDK requests through LiteLLM or any Anthropic-compatible proxy. Adds TUI wizard option, TOML config mapping, credential validation, and preflight endpoint reachability check via SDK query. * fix: remove environment gates and add NPM_TOKEN to publish step * feat: add beta release and rollback workflows with cosign signing * fix: remove redundant checkout and pnpm steps from beta release workflow * docs: normalize README commands to mode-neutral shorthand Add a substitution note after Quick Start sections so all subsequent examples use bare `shannon` instead of mixing `./shannon` and `npx @keygraph/shannon`. Mode-specific commands (build, update, uninstall) get inline annotations. Also fixes a broken command in the Custom Base URL section. * fix: remove redundant `update` command Image is already auto-pulled by `ensureImage()` during `start` when the pinned version tag is missing locally. Manual `update` was unnecessary. * docs: add CLI package README stub * docs: update README setup instructions for dual CLI modes * docs: update announcement banner to npx availability * feat: migrate from MCP tools to CLI based tools (#252) * feat: migrate from MCP tools to CLI tools * fix: restore browser action emoji formatters for CLI output Adapt formatBrowserAction for playwright-cli commands, replacing the old mcp__playwright__browser_* tool name matching removed during migration. * fix: mount credential file to fixed container path for Vertex AI GOOGLE_APPLICATION_CREDENTIALS was forwarded as-is to the container, causing the relative host path to resolve against the repo mount instead of the credentials mount. Now both local and npx modes mount the resolved file to /app/credentials/google-sa-key.json and rewrite the env var to match. * feat: add git awareness and optional description field to config * fix: drop redundant --ipc host flag from worker container * fix: align announcement banner URL with main branch * feat: add target URL reachability preflight check (#254) * Moving asset benchmark graph image to this folder * Move benchmark results to benchmark repo Windows Defender flags exploit code in the pentest reports as false positives, forcing every Windows user to add a Defender exclusion just to clone Shannon. * Updated README * fix: case-insensitive grep for semantic-release version probe * fix: harden supply chain security (#255) * fix: patch smol-toml and tsdown vulnerabilities Update smol-toml 1.6.0→1.6.1 (DoS via recursive comment parsing) and tsdown 0.21.2→0.21.5 (picomatch ReDoS + method injection). * fix: pin all unpinned dependency versions in Dockerfile Pins subfinder v2.13.0, WhatWeb v0.6.3 (switched from git clone to release tarball), schemathesis 4.13.0, addressable 2.8.9, claude-code 2.1.84, and playwright-cli 0.1.1 for reproducible builds. * fix: pin GitHub Actions to commit SHAs for supply chain security * fix: pin GitHub Actions to commit SHAs in beta and rollback workflows
2026-05-22 16:49:46 +02:00 · 2026-03-27 02:34:29 +05:30
parent 0d172f5e32
commit bc8fd203ed
4058 changed files with 7774 additions and 1189080 deletions
@@ -22,21 +22,21 @@ You are debugging an issue. Follow this structured approach to avoid spinning in
 **Session audit logs:**
 ```bash
 # Find most recent session
-ls -lt audit-logs/ | head -5
+ls -lt workspaces/ | head -5

 # Check session metrics and errors
-cat audit-logs/<session>/session.json | jq '.errors, .agentMetrics'
+cat workspaces/<session>/session.json | jq '.errors, .agentMetrics'

 # Check agent execution logs
-ls -lt audit-logs/<session>/agents/
-cat audit-logs/<session>/agents/<latest>.log
+ls -lt workspaces/<session>/agents/
+cat workspaces/<session>/agents/<latest>.log
 ```

 ## Step 3: Trace the Call Path

 For Shannon, trace through these layers:

-1. **Temporal Client** → `src/temporal/client.ts` - Workflow initiation
+1. **Worker + Client** → `src/temporal/worker.ts` - Combined worker + workflow submission
 2. **Workflow** → `src/temporal/workflows.ts` - Pipeline orchestration
 3. **Activities** → `src/temporal/activities.ts` - Thin wrappers: heartbeat, error classification
 4. **Container** → `src/services/container.ts` - Per-workflow DI
@@ -72,7 +72,7 @@ For Shannon, trace through these layers:
 npx playwright install chromium

 # Check MCP server startup (look for connection errors)
-grep -i "mcp\|playwright" audit-logs/<session>/agents/*.log
+grep -i "mcp\|playwright" workspaces/<session>/agents/*.log
 ```

 **Git State Issues:**
@@ -46,6 +46,14 @@ temp/
 ehthumbs.db
 Thumbs.db

+# CLI package (runs on host, not in container)
+# Keep apps/cli/package.json so pnpm workspaces resolve
+apps/cli/src/
+apps/cli/dist/
+apps/cli/infra/
+apps/cli/tsconfig.json
+apps/cli/tsdown.config.ts
+
 # Docker files (avoid recursive copying)
 Dockerfile*
 docker-compose*.yml
@@ -71,7 +71,7 @@ ANTHROPIC_API_KEY=your-api-key-here
 # CLAUDE_CODE_USE_VERTEX=1
 # CLOUD_ML_REGION=us-east5
 # ANTHROPIC_VERTEX_PROJECT_ID=your-gcp-project-id
-# GOOGLE_APPLICATION_CREDENTIALS=./credentials/gcp-sa-key.json
+# GOOGLE_APPLICATION_CREDENTIALS=./credentials/google-sa-key.json

 # =============================================================================
 # Available Models
@@ -69,7 +69,7 @@ body:

        Issues without this information may be difficult to triage.

-        - Check the audit logs at: `./audit-logs/target_url_shannon-123/workflow.log`
+        - Check the logs at: `./workspaces/target_url_shannon-123/workflow.log`
          Use `grep` or search to identify errors.
          Paste the relevant error output below.
        - Temporal:
@@ -19,7 +19,7 @@ jobs:

    steps:
      - name: Setup Node.js
-        uses: actions/setup-node@v6
+        uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
        with:
          node-version: 24
          registry-url: https://registry.npmjs.org
@@ -61,20 +61,20 @@ jobs:

    steps:
      - name: Checkout
-        uses: actions/checkout@v6
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v4
+        uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0

      - name: Log in to Docker Hub
-        uses: docker/login-action@v4
+        uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # v4.0.0
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}

      - name: Build and push by digest
        id: build
-        uses: docker/build-push-action@v7
+        uses: docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0
        with:
          context: .
          platforms: ${{ matrix.platform }}
@@ -89,7 +89,7 @@ jobs:
          touch "/tmp/digests/${digest#sha256:}"

      - name: Upload digest
-        uses: actions/upload-artifact@v6
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
        with:
          name: digests-${{ matrix.platform == 'linux/amd64' && 'amd64' || 'arm64' }}
          path: /tmp/digests/*
@@ -108,17 +108,17 @@ jobs:

    steps:
      - name: Download digests
-        uses: actions/download-artifact@v6
+        uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
        with:
          path: /tmp/digests
          pattern: digests-*
          merge-multiple: true

      - name: Set up Docker Buildx
-        uses: docker/setup-buildx-action@v4
+        uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0

      - name: Log in to Docker Hub
-        uses: docker/login-action@v4
+        uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # v4.0.0
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}
@@ -138,7 +138,7 @@ jobs:
          echo "digest=$DIGEST" >> "$GITHUB_OUTPUT"

      - name: Install cosign
-        uses: sigstore/cosign-installer@v4.1.0
+        uses: sigstore/cosign-installer@ba7bc0a3fef59531c69a25acd34668d6d3fe6f22 # v4.1.0

      - name: Sign Docker image
        run: cosign sign --yes "keygraph/shannon@${{ steps.inspect.outputs.digest }}"
@@ -161,13 +161,13 @@ jobs:

    steps:
      - name: Checkout
-        uses: actions/checkout@v6
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

      - name: Install pnpm
-        uses: pnpm/action-setup@v4
+        uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0

      - name: Configure npm registry
-        uses: actions/setup-node@v6
+        uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
        with:
          node-version: 24
          registry-url: https://registry.npmjs.org
@@ -0,0 +1,241 @@
+name: Release
+
+on:
+  workflow_dispatch:
+
+permissions:
+  contents: read
+
+concurrency:
+  group: release-main
+  cancel-in-progress: false
+
+jobs:
+  preflight:
+    name: Preflight
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+    outputs:
+      should_release: ${{ steps.probe.outputs.should_release }}
+      version: ${{ steps.probe.outputs.version }}
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+        with:
+          fetch-depth: 0
+
+      - name: Install pnpm
+        uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
+
+      - name: Setup Node.js
+        uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
+        with:
+          node-version: 24
+          cache: 'pnpm'
+
+      - name: Install dependencies
+        run: pnpm install --frozen-lockfile
+
+      - name: Probe semantic-release
+        id: probe
+        shell: bash
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: |
+          set -euo pipefail
+
+          npx semantic-release@25 --dry-run --no-ci 2>&1 | tee semantic-release.log
+
+          if grep -qi "the next release version is" semantic-release.log; then
+            echo "should_release=true" >> "$GITHUB_OUTPUT"
+            VERSION=$(grep -oiE "the next release version is [0-9]+\.[0-9]+\.[0-9]+" semantic-release.log | grep -oE "[0-9]+\.[0-9]+\.[0-9]+")
+            echo "version=$VERSION" >> "$GITHUB_OUTPUT"
+          else
+            echo "should_release=false" >> "$GITHUB_OUTPUT"
+          fi
+
+  build-docker:
+    name: Build Docker (${{ matrix.platform }})
+    needs: preflight
+    if: needs.preflight.outputs.should_release == 'true'
+    permissions:
+      contents: read
+    strategy:
+      fail-fast: true
+      matrix:
+        include:
+          - platform: linux/amd64
+            runner: ubuntu-latest
+          - platform: linux/arm64
+            runner: ubuntu-24.04-arm
+    runs-on: ${{ matrix.runner }}
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
+
+      - name: Log in to Docker Hub
+        uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # v4.0.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_TOKEN }}
+
+      - name: Build and push by digest
+        id: build
+        uses: docker/build-push-action@d08e5c354a6adb9ed34480a06d141179aa583294 # v7.0.0
+        with:
+          context: .
+          platforms: ${{ matrix.platform }}
+          provenance: mode=max
+          sbom: true
+          outputs: type=image,name=keygraph/shannon,push-by-digest=true,name-canonical=true,push=true
+
+      - name: Export digest
+        run: |
+          mkdir -p /tmp/digests
+          digest="${{ steps.build.outputs.digest }}"
+          touch "/tmp/digests/${digest#sha256:}"
+
+      - name: Upload digest
+        uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
+        with:
+          name: digests-${{ matrix.platform == 'linux/amd64' && 'amd64' || 'arm64' }}
+          path: /tmp/digests/*
+          if-no-files-found: error
+          retention-days: 1
+
+  merge-docker:
+    name: Push Docker manifests
+    needs: [preflight, build-docker]
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      id-token: write
+    outputs:
+      digest: ${{ steps.inspect.outputs.digest }}
+
+    steps:
+      - name: Download digests
+        uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
+        with:
+          path: /tmp/digests
+          pattern: digests-*
+          merge-multiple: true
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
+
+      - name: Log in to Docker Hub
+        uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # v4.0.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_TOKEN }}
+
+      - name: Create manifest list and push
+        working-directory: /tmp/digests
+        run: |
+          docker buildx imagetools create \
+            --tag "keygraph/shannon:${{ needs.preflight.outputs.version }}" \
+            --tag "keygraph/shannon:latest" \
+            $(printf 'keygraph/shannon@sha256:%s ' *)
+
+      - name: Inspect image
+        id: inspect
+        run: |
+          docker buildx imagetools inspect "keygraph/shannon:${{ needs.preflight.outputs.version }}"
+          DIGEST="sha256:$(docker buildx imagetools inspect --raw "keygraph/shannon:${{ needs.preflight.outputs.version }}" | sha256sum | cut -d' ' -f1)"
+          echo "digest=$DIGEST" >> "$GITHUB_OUTPUT"
+
+      - name: Install cosign
+        uses: sigstore/cosign-installer@ba7bc0a3fef59531c69a25acd34668d6d3fe6f22 # v4.1.0
+
+      - name: Sign Docker image
+        run: cosign sign --yes "keygraph/shannon@${{ steps.inspect.outputs.digest }}"
+
+      - name: Verify Docker image signature
+        run: |
+          sleep 10
+          cosign verify \
+            --certificate-oidc-issuer https://token.actions.githubusercontent.com \
+            --certificate-identity https://github.com/${{ github.repository }}/.github/workflows/release.yml@${{ github.ref }} \
+            "keygraph/shannon@${{ steps.inspect.outputs.digest }}"
+
+  publish-npm:
+    name: Publish npm
+    needs: [preflight, merge-docker]
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+      id-token: write
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+
+      - name: Install pnpm
+        uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
+
+      - name: Configure npm registry
+        uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
+        with:
+          node-version: 24
+          registry-url: https://registry.npmjs.org
+          cache: 'pnpm'
+
+      - name: Install dependencies
+        run: pnpm install --frozen-lockfile
+
+      - name: Set CLI package version
+        run: cd apps/cli && npm version "${{ needs.preflight.outputs.version }}" --no-git-tag-version --allow-same-version
+
+      - name: Sync lockfile with bumped version
+        run: pnpm install --lockfile-only
+
+      - name: Build CLI
+        run: pnpm --filter @keygraph/shannon run build
+
+      - name: Publish npm package
+        working-directory: apps/cli
+        env:
+          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
+        run: |
+          if npm view "@keygraph/shannon@${{ needs.preflight.outputs.version }}" version 2>/dev/null; then
+            echo "Version already published, skipping"
+          else
+            pnpm publish --access public --no-git-checks
+          fi
+
+  release:
+    name: Create GitHub release
+    needs: [preflight, publish-npm]
+    runs-on: ubuntu-latest
+    permissions:
+      contents: write
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+        with:
+          fetch-depth: 0
+
+      - name: Install pnpm
+        uses: pnpm/action-setup@fc06bc1257f339d1d5d8b3a19a8cae5388b55320 # v4.4.0
+
+      - name: Setup Node.js
+        uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
+        with:
+          node-version: 24
+          cache: 'pnpm'
+
+      - name: Install dependencies
+        run: pnpm install --frozen-lockfile
+
+      - name: Create GitHub release
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: npx semantic-release@25
@@ -38,7 +38,7 @@ jobs:
          echo "version=$VERSION" >> "$GITHUB_OUTPUT"

      - name: Setup Node.js
-        uses: actions/setup-node@v6
+        uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
        with:
          node-version: 24
          registry-url: https://registry.npmjs.org
@@ -0,0 +1,129 @@
+name: Rollback
+
+on:
+  workflow_dispatch:
+    inputs:
+      version:
+        description: "Version to move npm latest and Docker latest to (example: 1.4.2)"
+        required: true
+        type: string
+
+permissions:
+  contents: write
+
+concurrency:
+  group: rollback-latest-${{ github.event.inputs.version }}
+  cancel-in-progress: false
+
+jobs:
+  rollback:
+    name: Roll back npm, Docker, and GitHub release latest
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout tags
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+        with:
+          fetch-depth: 0
+
+      - name: Fetch all tags
+        run: git fetch --force --tags
+
+      - name: Validate target version
+        id: target
+        shell: bash
+        env:
+          RAW_VERSION: ${{ inputs.version }}
+        run: |
+          set -euo pipefail
+
+          VERSION="${RAW_VERSION#v}"
+
+          case "$VERSION" in
+            ''|*[!0-9.]*)
+              echo "Invalid version: $VERSION"
+              exit 1
+              ;;
+          esac
+
+          if ! [[ "$VERSION" =~ ^[0-9]+\.[0-9]+\.[0-9]+$ ]]; then
+            echo "Version must be in semver format X.Y.Z"
+            exit 1
+          fi
+
+          if ! git rev-parse "refs/tags/v$VERSION" >/dev/null 2>&1; then
+            echo "Git tag v$VERSION does not exist"
+            exit 1
+          fi
+
+          echo "version=$VERSION" >> "$GITHUB_OUTPUT"
+
+      - name: Setup Node.js
+        uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0
+        with:
+          node-version: 24
+          registry-url: https://registry.npmjs.org
+
+      - name: Verify npm package version exists
+        run: npm view "@keygraph/shannon@${{ steps.target.outputs.version }}" version
+
+      - name: Show current npm dist-tags
+        env:
+          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
+        run: npm dist-tag ls @keygraph/shannon
+
+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@4d04d5d9486b7bd6fa91e7baf45bbb4f8b9deedd # v4.0.0
+
+      - name: Log in to Docker Hub
+        uses: docker/login-action@b45d80f862d83dbcd57f89517bcf500b2ab88fb2 # v4.0.0
+        with:
+          username: ${{ secrets.DOCKERHUB_USERNAME }}
+          password: ${{ secrets.DOCKERHUB_TOKEN }}
+
+      - name: Verify Docker image tag exists
+        run: docker buildx imagetools inspect "keygraph/shannon:${{ steps.target.outputs.version }}"
+
+      - name: Install cosign
+        uses: sigstore/cosign-installer@ba7bc0a3fef59531c69a25acd34668d6d3fe6f22 # v4.1.0
+
+      - name: Verify Docker image signature before rollback
+        run: |
+          cosign verify \
+            --certificate-oidc-issuer https://token.actions.githubusercontent.com \
+            --certificate-identity "https://github.com/${{ github.repository }}/.github/workflows/release.yml@refs/heads/main" \
+            "keygraph/shannon:${{ steps.target.outputs.version }}"
+
+      - name: Move Docker latest
+        run: |
+          docker buildx imagetools create \
+            --tag "keygraph/shannon:latest" \
+            "keygraph/shannon:${{ steps.target.outputs.version }}"
+
+      - name: Move npm latest
+        env:
+          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
+        run: npm dist-tag add "@keygraph/shannon@${{ steps.target.outputs.version }}" latest
+
+      - name: Mark GitHub release as latest
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        run: gh release edit "v${{ steps.target.outputs.version }}" --latest
+
+      - name: Show final npm dist-tags
+        env:
+          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
+        run: npm dist-tag ls @keygraph/shannon
+
+      - name: Verify Docker latest now points to target
+        run: docker buildx imagetools inspect "keygraph/shannon:latest"
+
+      - name: Write summary
+        run: |
+          {
+            echo "## Rollback latest"
+            echo ""
+            echo "- Target version: \`${{ steps.target.outputs.version }}\`"
+            echo "- npm package: \`@keygraph/shannon\`"
+            echo "- Docker image: \`keygraph/shannon\`"
+            echo "- GitHub release: \`v${{ steps.target.outputs.version }}\` marked as latest"
+          } >> "$GITHUB_STEP_SUMMARY"
@@ -1,6 +1,7 @@
 node_modules/
 .env
-audit-logs/
+workspaces/
 credentials/
 dist/
 repos/
+.turbo/
@@ -0,0 +1,2 @@
+auto-install-peers=true
+strict-peer-dependencies=false
@@ -0,0 +1,21 @@
+{
+  "branches": ["main"],
+  "plugins": [
+    "@semantic-release/commit-analyzer",
+    "@semantic-release/release-notes-generator",
+    [
+      "@semantic-release/npm",
+      {
+        "npmPublish": false
+      }
+    ],
+    [
+      "@semantic-release/github",
+      {
+        "successCommentCondition": false,
+        "failCommentCondition": false,
+        "releasedLabels": false
+      }
+    ]
+  ]
+}
@@ -4,60 +4,137 @@ AI-powered penetration testing agent for defensive security analysis. Automates

 ## Commands

-**Prerequisites:** Docker, Anthropic API key in `.env`
+**Prerequisites:** Docker, AI provider credentials (`.env` for local, `shn setup` or env vars for npx)
+
+### Dual CLI
+
+Shannon supports two CLI modes, auto-detected based on the current working directory:
+
+| | **npx** (`npx @keygraph/shannon`) | **Local** (`./shannon`) |
+|---|---|---|
+| **Install** | Zero-install via npm | Clone the repo |
+| **Image** | Pulled from Docker Hub (`keygraph/shannon:latest`) | Built locally (`shannon-worker`) |
+| **State** | `~/.shannon/` | Project directory |
+| **Credentials** | `~/.shannon/config.toml` (via `shn setup`) or env vars | `./.env` |
+| **Config** | `~/.shannon/config.toml` (via `shn setup`) | N/A |
+| **Prompts** | Bundled in Docker image | Mounted from `./apps/worker/prompts/` (live-editable) |
+
+Mode auto-detection: local mode activates when env var `SHANNON_LOCAL=1` is set by the `./shannon` entry point (`apps/cli/src/mode.ts`). Otherwise npx mode.
+
+### npx Quick Start
+
+```bash
+# Configure credentials (interactive wizard)
+npx @keygraph/shannon setup
+
+# Or export env vars directly (non-interactive / CI)
+export ANTHROPIC_API_KEY=your-key
+
+# Run
+npx @keygraph/shannon start -u <url> -r /path/to/repo
+```
+
+### Local (Development) Quick Start

 ```bash
 # Setup
-cp .env.example .env && edit .env  # Set ANTHROPIC_API_KEY
+echo "ANTHROPIC_API_KEY=your-key" > .env

-# Prepare repo (REPO is a folder name inside ./repos/, not an absolute path)
-git clone https://github.com/org/repo.git ./repos/my-repo
-# or symlink: ln -s /path/to/existing/repo ./repos/my-repo
+# Build (auto-runs if image missing)
+./shannon build

 # Run
-./shannon start URL=<url> REPO=my-repo
-./shannon start URL=<url> REPO=my-repo CONFIG=./configs/my-config.yaml
+./shannon start -u <url> -r my-repo
+./shannon start -u <url> -r my-repo -c ./apps/worker/configs/my-config.yaml
+./shannon start -u <url> -r /any/path/to/repo
+```
+
+### Common Commands
+
+```bash
+# Setup (npx mode only — one-time credential configuration)
+npx @keygraph/shannon setup

 # Workspaces & Resume
-./shannon start URL=<url> REPO=my-repo WORKSPACE=my-audit    # New named workspace
-./shannon start URL=<url> REPO=my-repo WORKSPACE=my-audit    # Resume (same command)
-./shannon start URL=<url> REPO=my-repo WORKSPACE=<auto-name> # Resume auto-named run
-./shannon workspaces                                          # List all workspaces
+./shannon start -u <url> -r my-repo -w my-audit    # New named workspace
+./shannon start -u <url> -r my-repo -w my-audit    # Resume (same command)
+./shannon workspaces                                 # List all workspaces

 # Monitor
-./shannon logs                      # Real-time worker logs
+./shannon logs <workspace>            # Tail workflow log
+./shannon status                      # Show running workers
 # Temporal Web UI: http://localhost:8233

 # Stop
-./shannon stop                      # Preserves workflow data
-./shannon stop CLEAN=true           # Full cleanup including volumes
+./shannon stop                        # Preserves workflow data
+./shannon stop --clean                # Full cleanup including volumes (confirms first)

-# Build
-npm run build
+# Image management
+./shannon build [--no-cache]          # Local mode: build worker image
+npx @keygraph/shannon uninstall             # npx mode: remove ~/.shannon/ (confirms first)
+
+# Build TypeScript (development)
+pnpm run build                       # Build all packages via Turborepo
+pnpm run check                       # Type-check all packages
+pnpm biome                           # Biome lint + format + import sorting check
+pnpm biome:fix                       # Auto-fix lint, format, and import sorting
 ```

-**Options:** `CONFIG=<file>` (YAML config), `OUTPUT=<path>` (default: `./audit-logs/`), `WORKSPACE=<name>` (named workspace; auto-resumes if exists), `PIPELINE_TESTING=true` (minimal prompts, 10s retries), `REBUILD=true` (force Docker rebuild), `ROUTER=true` (multi-model routing via [claude-code-router](https://github.com/musistudio/claude-code-router))
+**Monorepo tooling:** pnpm workspaces, Turborepo for task orchestration, Biome for linting/formatting. TypeScript compiler options shared via `tsconfig.base.json` at the root. All packages extend it, overriding only `rootDir` and `outDir`. Shared devDependencies (`typescript`, `@types/node`, `turbo`, `@biomejs/biome`) are hoisted to the root workspace.
+
+**Options:** `-c <file>` (YAML config), `-o <path>` (output directory), `-w <name>` (named workspace; auto-resumes if exists), `--pipeline-testing` (minimal prompts, 10s retries), `--router` (multi-model routing via [claude-code-router](https://github.com/musistudio/claude-code-router))

 ## Architecture

-### Core Modules
- `src/session-manager.ts` — Agent definitions (`AGENTS` record). Agent types in `src/types/agents.ts`
- `src/config-parser.ts` — YAML config parsing with JSON Schema validation
- `src/ai/claude-executor.ts` — Claude Agent SDK integration with retry logic
- `src/services/` — Business logic layer (Temporal-agnostic). Activities delegate here. Key: `agent-execution.ts`, `error-handling.ts`, `container.ts`
- `src/types/` — Consolidated types: `Result<T,E>`, `ErrorCode`, `AgentName`, `ActivityLogger`, etc.
- `src/utils/` — Shared utilities (file I/O, formatting, concurrency)
+### Monorepo Layout
+
+```
+apps/cli/        — @keygraph/shannon (published to npm, bundled with tsdown)
+apps/worker/     — @shannon/worker (private, Temporal worker + pipeline logic)
+```
+
+### CLI Package (`apps/cli/`)
+Published as `@keygraph/shannon` on npm. Contains only Docker orchestration logic — no Temporal SDK, business logic, or prompts. Bundled with tsdown for single-file ESM output.
+
+- `apps/cli/src/index.ts` — CLI dispatcher (`setup`, `start`, `stop`, `logs`, `workspaces`, `status`, `build`, `uninstall`, `info`)
+- `apps/cli/src/mode.ts` — Auto-detection: local mode if `SHANNON_LOCAL=1` env var is set
+- `apps/cli/src/docker.ts` — Compose lifecycle, image pull/build, ephemeral `docker run` worker spawning
+- `apps/cli/src/home.ts` — State directory management (`~/.shannon/` for npx, `./` for local)
+- `apps/cli/src/env.ts` — `.env` loading, TOML fallback (npx only) via `apps/cli/src/config/resolver.ts`, credential validation, env flag building
+- `apps/cli/src/config/resolver.ts` — Cascading config (npx only): env vars → `~/.shannon/config.toml` (parsed with `smol-toml`)
+- `apps/cli/src/config/writer.ts` — TOML serialization and secure file persistence (0o600)
+- `apps/cli/src/commands/setup.ts` — Interactive TUI wizard (`@clack/prompts`) for provider credential setup (npx only)
+- `apps/cli/src/paths.ts` — Repo/config path resolution (bare name → `./repos/<name>`, or any absolute/relative path)
+- `apps/cli/src/commands/` — Command handlers
+- `apps/cli/infra/compose.yml` — Bundled Temporal + router compose file for npx mode
+- `apps/cli/tsdown.config.ts` — tsdown bundler config
+- `shannon` — Node.js entry point (`#!/usr/bin/env node`) that delegates to `apps/cli/dist/index.mjs`
+
+### Docker Architecture
+Infra (Temporal + router) runs via `docker-compose.yml`. Workers are ephemeral `docker run --rm` containers, one per scan, each with a unique task queue and isolated volume mounts.
+
+- `docker-compose.yml` — Infra only: `shannon-temporal` (port 7233/8233) and `shannon-router` (port 3456, optional via profile). Network: `shannon-net`
+- `Dockerfile` — 2-stage build (builder + Chainguard Wolfi runtime). Uses pnpm. Entrypoint: `CMD ["node", "apps/worker/dist/temporal/worker.js"]`
+- No `docker-compose.docker.yml` — host gateway handled via `--add-host` flag in CLI
+
+### Worker Package (`apps/worker/`)
+- `apps/worker/src/paths.ts` — Centralized path constants (`PROMPTS_DIR`, `CONFIGS_DIR`, `WORKSPACES_DIR`)
+- `apps/worker/src/session-manager.ts` — Agent definitions (`AGENTS` record). Agent types in `apps/worker/src/types/agents.ts`
+- `apps/worker/src/config-parser.ts` — YAML config parsing with JSON Schema validation
+- `apps/worker/src/ai/claude-executor.ts` — Claude Agent SDK integration with retry logic
+- `apps/worker/src/services/` — Business logic layer (Temporal-agnostic). Activities delegate here. Key: `agent-execution.ts`, `error-handling.ts`, `container.ts`
+- `apps/worker/src/types/` — Consolidated types: `Result<T,E>`, `ErrorCode`, `AgentName`, `ActivityLogger`, etc.
+- `apps/worker/src/utils/` — Shared utilities (file I/O, formatting, concurrency)

 ### Temporal Orchestration
 Durable workflow orchestration with crash recovery, queryable progress, intelligent retry, and parallel execution (5 concurrent agents in vuln/exploit phases).

- `src/temporal/workflows.ts` — Main workflow (`pentestPipelineWorkflow`)
- `src/temporal/activities.ts` — Thin wrappers — heartbeat loop, error classification, container lifecycle. Business logic delegated to `src/services/`
- `src/temporal/activity-logger.ts` — `TemporalActivityLogger` implementation of `ActivityLogger` interface
- `src/temporal/summary-mapper.ts` — Maps `PipelineSummary` to `WorkflowSummary`
- `src/temporal/worker.ts` — Worker entry point
- `src/temporal/client.ts` — CLI client for starting workflows
- `src/temporal/shared.ts` — Types, interfaces, query definitions
+- `apps/worker/src/temporal/workflows.ts` — Main workflow (`pentestPipelineWorkflow`)
+- `apps/worker/src/temporal/activities.ts` — Thin wrappers — heartbeat loop, error classification, container lifecycle. Business logic delegated to `apps/worker/src/services/`
+- `apps/worker/src/temporal/activity-logger.ts` — `TemporalActivityLogger` implementation of `ActivityLogger` interface
+- `apps/worker/src/temporal/summary-mapper.ts` — Maps `PipelineSummary` to `WorkflowSummary`
+- `apps/worker/src/temporal/worker.ts` — Combined worker + client entry point (per-invocation task queue, submits workflow, waits for result)
+- `apps/worker/src/temporal/shared.ts` — Types, interfaces, query definitions
 ### Five-Phase Pipeline

 1. **Pre-Recon** (`pre-recon`) — External scans (nmap, subfinder, whatweb) + source code analysis
@@ -67,39 +144,43 @@ Durable workflow orchestration with crash recovery, queryable progress, intellig
 5. **Reporting** (`report`) — Executive-level security report

 ### Supporting Systems
- **Configuration** — YAML configs in `configs/` with JSON Schema validation (`config-schema.json`). Supports auth settings, MFA/TOTP, and per-app testing parameters
- **Prompts** — Per-phase templates in `prompts/` with variable substitution (`{{TARGET_URL}}`, `{{CONFIG_CONTEXT}}`). Shared partials in `prompts/shared/` via `src/services/prompt-manager.ts`
- **SDK Integration** — Uses `@anthropic-ai/claude-agent-sdk` with `maxTurns: 10_000` and `bypassPermissions` mode. Playwright MCP for browser automation, TOTP generation via MCP tool. Login flow template at `prompts/shared/login-instructions.txt` supports form, SSO, API, and basic auth
- **Audit System** — Crash-safe append-only logging in `audit-logs/{hostname}_{sessionId}/`. Tracks session metrics, per-agent logs, prompts, and deliverables. WorkflowLogger (`audit/workflow-logger.ts`) provides unified human-readable per-workflow logs, backed by LogStream (`audit/log-stream.ts`) shared stream primitive
- **Deliverables** — Saved to `deliverables/` in the target repo via the `save_deliverable` MCP tool
- **Workspaces & Resume** — Named workspaces via `WORKSPACE=<name>` or auto-named from URL+timestamp. Resume passes `--workspace` to the Temporal client (`src/temporal/client.ts`), which loads `session.json` to detect completed agents. `loadResumeState()` in `src/temporal/activities.ts` validates deliverable existence, restores git checkpoints, and cleans up incomplete deliverables. Workspace listing via `src/temporal/workspaces.ts`
+- **Configuration** — YAML configs in `apps/worker/configs/` with JSON Schema validation (`config-schema.json`). Supports auth settings, MFA/TOTP, and per-app testing parameters. Credential resolution — local mode: env vars → `./.env`; npx mode: env vars → `~/.shannon/config.toml` (via `shn setup`)
+- **Prompts** — Per-phase templates in `apps/worker/prompts/` with variable substitution (`{{TARGET_URL}}`, `{{CONFIG_CONTEXT}}`). Shared partials in `apps/worker/prompts/shared/` via `apps/worker/src/services/prompt-manager.ts`
+- **SDK Integration** — Uses `@anthropic-ai/claude-agent-sdk` with `maxTurns: 10_000` and `bypassPermissions` mode. Browser automation via `playwright-cli` with session isolation (`-s=<session>`). TOTP generation via `generate-totp` CLI tool. Login flow template at `apps/worker/prompts/shared/login-instructions.txt` supports form, SSO, API, and basic auth
+- **Audit System** — Crash-safe append-only logging in `workspaces/{hostname}_{sessionId}/`. Tracks session metrics, per-agent logs, prompts, and deliverables. WorkflowLogger (`apps/worker/src/audit/workflow-logger.ts`) provides unified human-readable per-workflow logs, backed by LogStream (`apps/worker/src/audit/log-stream.ts`) shared stream primitive
+- **Deliverables** — Saved to `deliverables/` in the target repo via the `save-deliverable` CLI script (`apps/worker/src/scripts/save-deliverable.ts`)
+- **Workspaces & Resume** — Named workspaces via `-w <name>` or auto-named from URL+timestamp. Resume detects completed agents via `session.json`. `loadResumeState()` in `apps/worker/src/temporal/activities.ts` validates deliverable existence, restores git checkpoints, and cleans up incomplete deliverables. Workspace listing via `apps/worker/src/temporal/workspaces.ts`

 ## Development Notes

 ### Adding a New Agent
-1. Define agent in `src/session-manager.ts` (add to `AGENTS` record). `ALL_AGENTS`/`AgentName` types live in `src/types/agents.ts`
-2. Create prompt template in `prompts/` (e.g., `vuln-newtype.txt`)
-3. Two-layer pattern: add a thin activity wrapper in `src/temporal/activities.ts` (heartbeat + error classification). `AgentExecutionService` in `src/services/agent-execution.ts` handles the agent lifecycle automatically via the `AGENTS` registry
-4. Register activity in `src/temporal/workflows.ts` within the appropriate phase
+1. Define agent in `apps/worker/src/session-manager.ts` (add to `AGENTS` record). `ALL_AGENTS`/`AgentName` types live in `apps/worker/src/types/agents.ts`
+2. Create prompt template in `apps/worker/prompts/` (e.g., `vuln-newtype.txt`)
+3. Two-layer pattern: add a thin activity wrapper in `apps/worker/src/temporal/activities.ts` (heartbeat + error classification). `AgentExecutionService` in `apps/worker/src/services/agent-execution.ts` handles the agent lifecycle automatically via the `AGENTS` registry
+4. Register activity in `apps/worker/src/temporal/workflows.ts` within the appropriate phase

 ### Modifying Prompts
 - Variable substitution: `{{TARGET_URL}}`, `{{CONFIG_CONTEXT}}`, `{{LOGIN_INSTRUCTIONS}}`
- Shared partials in `prompts/shared/` included via `src/services/prompt-manager.ts`
- Test with `PIPELINE_TESTING=true` for fast iteration
+- Shared partials in `apps/worker/prompts/shared/` included via `apps/worker/src/services/prompt-manager.ts`
+- Test with `--pipeline-testing` for fast iteration

 ### Key Design Patterns
 - **Configuration-Driven** — YAML configs with JSON Schema validation
 - **Progressive Analysis** — Each phase builds on previous results
 - **SDK-First** — Claude Agent SDK handles autonomous analysis
 - **Modular Error Handling** — `ErrorCode` enum, `Result<T,E>` for explicit error propagation, automatic retry (3 attempts per agent)
- **Services Boundary** — Activities are thin Temporal wrappers; `src/services/` owns business logic, accepts `ActivityLogger`, returns `Result<T,E>`. No Temporal imports in services
- **DI Container** — Per-workflow in `src/services/container.ts`. `AuditSession` excluded (parallel safety)
+- **Services Boundary** — Activities are thin Temporal wrappers; `apps/worker/src/services/` owns business logic, accepts `ActivityLogger`, returns `Result<T,E>`. No Temporal imports in services
+- **DI Container** — Per-workflow in `apps/worker/src/services/container.ts`. `AuditSession` excluded (parallel safety)
+- **Ephemeral Workers** — Each scan runs in its own `docker run --rm` container with a per-invocation task queue. Temporal routes activities by queue name, so per-scan queues ensure activities never land on a worker with the wrong repo mounted

 ### Security
 Defensive security tool only. Use only on systems you own or have explicit permission to test.

 ## Code Style Guidelines

+### Formatting
+Biome handles formatting and linting. Run `pnpm biome:fix` to auto-fix. Config in `biome.json`: single quotes, semicolons, trailing commas, 2-space indent, 120 char line width.
+
 ### Clarity Over Brevity
 - Optimize for readability, not line count — three clear lines beat one dense expression
 - Use descriptive names that convey intent
@@ -142,18 +223,22 @@ Comments must be **timeless** — no references to this conversation, refactorin

 ## Key Files

-**Entry Points:** `src/temporal/workflows.ts`, `src/temporal/activities.ts`, `src/temporal/worker.ts`, `src/temporal/client.ts`
+**CLI:** `shannon` (entry point), `apps/cli/src/index.ts` (dispatcher), `apps/cli/src/docker.ts` (orchestration), `apps/cli/src/mode.ts` (auto-detection)

-**Core Logic:** `src/session-manager.ts`, `src/ai/claude-executor.ts`, `src/config-parser.ts`, `src/services/`, `src/audit/`
+**Entry Points:** `apps/worker/src/temporal/workflows.ts`, `apps/worker/src/temporal/activities.ts`, `apps/worker/src/temporal/worker.ts`

-**Config:** `shannon` (CLI), `docker-compose.yml`, `configs/`, `prompts/`
+**Core Logic:** `apps/worker/src/session-manager.ts`, `apps/worker/src/ai/claude-executor.ts`, `apps/worker/src/config-parser.ts`, `apps/worker/src/services/`, `apps/worker/src/audit/`
+
+**Config:** `docker-compose.yml`, `apps/cli/infra/compose.yml`, `apps/worker/configs/`, `apps/worker/prompts/`, `tsconfig.base.json` (shared compiler options), `turbo.json`, `biome.json`
+
+**CI/CD:** `.github/workflows/release.yml` (Docker Hub push + npm publish + GitHub release, manual dispatch)

 ## Troubleshooting

- **"Repository not found"** — `REPO` must be a folder name inside `./repos/`, not an absolute path. Clone or symlink your repo there first: `ln -s /path/to/repo ./repos/my-repo`
+- **"Repository not found"** — Pass a bare name (`-r my-repo`) for `./repos/my-repo`, or a path (`-r /path/to/repo`) for any directory
 - **"Temporal not ready"** — Wait for health check or `docker compose logs temporal`
- **Worker not processing** — Check `docker compose ps`
- **Reset state** — `./shannon stop CLEAN=true`
+- **Worker not processing** — Check `docker ps --filter "name=shannon-worker-"`
+- **Reset state** — `./shannon stop --clean`
 - **Local apps unreachable** — Use `host.docker.internal` instead of `localhost`
- **Missing tools** — Use `PIPELINE_TESTING=true` to skip nmap/subfinder/whatweb (graceful degradation)
+- **Missing tools** — Use `--pipeline-testing` to skip nmap/subfinder/whatweb (graceful degradation)
 - **Container permissions** — On Linux, may need `sudo` for docker commands
@@ -38,17 +38,38 @@ ENV CGO_ENABLED=1
 RUN mkdir -p $GOPATH/bin

 # Install Go-based security tools
-RUN go install -v github.com/projectdiscovery/subfinder/v2/cmd/subfinder@latest
-# Install WhatWeb from GitHub (Ruby-based tool)
-RUN git clone --depth 1 https://github.com/urbanadventurer/WhatWeb.git /opt/whatweb && \
+RUN go install -v github.com/projectdiscovery/subfinder/v2/cmd/subfinder@v2.13.0
+# Install WhatWeb from release tarball (Ruby-based tool)
+RUN curl -sL https://github.com/urbanadventurer/WhatWeb/archive/refs/tags/v0.6.3.tar.gz | tar xz -C /opt && \
+    mv /opt/WhatWeb-0.6.3 /opt/whatweb && \
    chmod +x /opt/whatweb/whatweb && \
-    gem install addressable && \
+    gem install addressable -v 2.8.9 && \
    echo '#!/bin/bash' > /usr/local/bin/whatweb && \
    echo 'cd /opt/whatweb && exec ./whatweb "$@"' >> /usr/local/bin/whatweb && \
    chmod +x /usr/local/bin/whatweb

 # Install Python-based tools
-RUN pip3 install --no-cache-dir schemathesis
+RUN pip3 install --no-cache-dir schemathesis==4.13.0
+
+# Install pnpm
+RUN npm install -g pnpm@10.12.1
+
+# Build Node.js application in builder to avoid QEMU emulation failures in CI
+WORKDIR /app
+
+# Copy workspace manifests for install layer caching
+COPY package.json pnpm-workspace.yaml pnpm-lock.yaml .npmrc ./
+COPY apps/worker/package.json ./apps/worker/
+COPY apps/cli/package.json ./apps/cli/
+
+RUN pnpm install --frozen-lockfile
+
+COPY . .
+
+# Build worker. CLI not needed in Docker
+RUN pnpm --filter @shannon/worker run build
+
+RUN pnpm prune --prod

 # Runtime stage - Minimal production image
 FROM cgr.dev/chainguard/wolfi-base:latest AS runtime
@@ -95,67 +116,64 @@ COPY --from=builder /opt/whatweb /opt/whatweb
 COPY --from=builder /usr/local/bin/whatweb /usr/local/bin/whatweb

 # Install WhatWeb Ruby dependencies in runtime stage
-RUN gem install addressable
+RUN gem install addressable -v 2.8.9

 # Copy Python packages from builder
 COPY --from=builder /usr/lib/python3.*/site-packages /usr/lib/python3.12/site-packages
 COPY --from=builder /usr/bin/schemathesis /usr/bin/

-# Create non-root user for security
+# Create non-root user
 RUN addgroup -g 1001 pentest && \
    adduser -u 1001 -G pentest -s /bin/bash -D pentest

+# System-level git config (survives UID remapping in entrypoint)
+RUN git config --system user.email "agent@localhost" && \
+    git config --system user.name "Pentest Agent" && \
+    git config --system --add safe.directory '*'
+
 # Set working directory
 WORKDIR /app

-# Copy package files first for better caching
-COPY package*.json ./
-COPY mcp-server/package*.json ./mcp-server/
+# Copy only what the worker needs (skip CLI source, infra, tsdown artifacts)
+COPY --from=builder /app/package.json /app/pnpm-workspace.yaml /app/pnpm-lock.yaml /app/.npmrc /app/
+COPY --from=builder /app/node_modules /app/node_modules
+COPY --from=builder /app/apps/worker /app/apps/worker
+COPY --from=builder /app/apps/cli/package.json /app/apps/cli/package.json

-# Install Node.js dependencies (including devDependencies for TypeScript build)
-RUN npm ci && \
-    cd mcp-server && npm ci && cd .. && \
-    npm cache clean --force
+RUN npm install -g @anthropic-ai/claude-code@2.1.84 @playwright/cli@0.1.1
+RUN mkdir -p /tmp/.claude/skills && \
+    playwright-cli install --skills && \
+    cp -r .claude/skills/playwright-cli /tmp/.claude/skills/ && \
+    rm -rf .claude

-# Copy application source code
-COPY . .
-
-# Build TypeScript (mcp-server first, then main project)
-RUN cd mcp-server && npm run build && cd .. && npm run build
-
-# Remove devDependencies after build to reduce image size
-RUN npm prune --production && \
-    cd mcp-server && npm prune --production
-
-RUN npm install -g @anthropic-ai/claude-code
+# Symlink CLI tools onto PATH
+RUN ln -s /app/apps/worker/dist/scripts/save-deliverable.js /usr/local/bin/save-deliverable && \
+    chmod +x /app/apps/worker/dist/scripts/save-deliverable.js && \
+    ln -s /app/apps/worker/dist/scripts/generate-totp.js /usr/local/bin/generate-totp && \
+    chmod +x /app/apps/worker/dist/scripts/generate-totp.js

 # Create directories for session data and ensure proper permissions
-RUN mkdir -p /app/sessions /app/deliverables /app/repos /app/configs && \
+RUN mkdir -p /app/sessions /app/deliverables /app/repos /app/workspaces && \
    mkdir -p /tmp/.cache /tmp/.config /tmp/.npm && \
    chmod 777 /app && \
    chmod 777 /tmp/.cache && \
    chmod 777 /tmp/.config && \
    chmod 777 /tmp/.npm && \
-    chown -R pentest:pentest /app
+    chown -R pentest:pentest /app /tmp/.claude

-# Switch to non-root user
-USER pentest
+COPY entrypoint.sh /app/entrypoint.sh
+RUN chmod +x /app/entrypoint.sh

 # Set environment variables
 ENV NODE_ENV=production
 ENV PATH="/usr/local/bin:$PATH"
 ENV SHANNON_DOCKER=true
 ENV PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1
-ENV PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH=/usr/bin/chromium-browser
+ENV PLAYWRIGHT_MCP_EXECUTABLE_PATH=/usr/bin/chromium-browser
 ENV npm_config_cache=/tmp/.npm
 ENV HOME=/tmp
 ENV XDG_CACHE_HOME=/tmp/.cache
 ENV XDG_CONFIG_HOME=/tmp/.config

-# Configure Git identity and trust all directories
-RUN git config --global user.email "agent@localhost" && \
-    git config --global user.name "Pentest Agent" && \
-    git config --global --add safe.directory '*'
-
-# Set entrypoint
-ENTRYPOINT ["node", "dist/shannon.js"]
+ENTRYPOINT ["/app/entrypoint.sh"]
+CMD ["node", "apps/worker/dist/temporal/worker.js"]
@@ -0,0 +1,3 @@
+src/
+tsconfig.json
+node_modules/
@@ -0,0 +1,22 @@
+<div align="center">
+
+<img src="https://raw.githubusercontent.com/KeygraphHQ/shannon/main/assets/github-banner.png" alt="Shannon — AI Pentester for Web Applications and APIs" width="100%">
+
+# Shannon — AI Pentester by Keygraph
+
+Shannon is an autonomous, white-box AI pentester for web applications and APIs. <br />
+It analyzes your source code, identifies attack vectors, and executes real exploits to prove vulnerabilities before they reach production.
+
+---
+
+<a href="https://github.com/KeygraphHQ/shannon/discussions/categories/announcements"><img src="https://raw.githubusercontent.com/KeygraphHQ/shannon/main/assets/announcements.png" height="40" alt="Announcements"></a>
+<a href="https://discord.gg/9ZqQPuhJB7"><img src="https://raw.githubusercontent.com/KeygraphHQ/shannon/main/assets/discord.png" height="40" alt="Join Discord"></a>
+<a href="https://keygraph.io/"><img src="https://raw.githubusercontent.com/KeygraphHQ/shannon/main/assets/Keygraph_Button.png" height="40" alt="Visit Keygraph.io"></a>
+<a href="https://www.linkedin.com/company/keygraph/"><img src="https://raw.githubusercontent.com/KeygraphHQ/shannon/main/assets/linkedin.png" height="40" alt="Follow Us on Linkedin"></a>
+
+---
+
+**Full README and usage guide**  
+[https://github.com/KeygraphHQ/shannon#readme](https://github.com/KeygraphHQ/shannon#readme)
+
+</div>
@@ -0,0 +1,50 @@
+networks:
+  default:
+    name: shannon-net
+
+services:
+  temporal:
+    image: temporalio/temporal:latest
+    container_name: shannon-temporal
+    command: ["server", "start-dev", "--db-filename", "/home/temporal/temporal.db", "--ip", "0.0.0.0"]
+    ports:
+      - "127.0.0.1:7233:7233"
+      - "127.0.0.1:8233:8233"
+    volumes:
+      - temporal-data:/home/temporal
+    healthcheck:
+      test: ["CMD", "temporal", "operator", "cluster", "health", "--address", "localhost:7233"]
+      interval: 10s
+      timeout: 5s
+      retries: 10
+      start_period: 30s
+
+  router:
+    image: node:20-slim
+    container_name: shannon-router
+    profiles: ["router"]
+    command: >
+      sh -c "apt-get update && apt-get install -y gettext-base &&
+             npm install -g @musistudio/claude-code-router &&
+             mkdir -p /root/.claude-code-router &&
+             envsubst < /config/router-config.json > /root/.claude-code-router/config.json &&
+             ccr start"
+    ports:
+      - "127.0.0.1:3456:3456"
+    volumes:
+      - ./router-config.json:/config/router-config.json:ro
+    environment:
+      - HOST=0.0.0.0
+      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:-}
+      - OPENAI_API_KEY=${OPENAI_API_KEY:-}
+      - OPENROUTER_API_KEY=${OPENROUTER_API_KEY:-}
+      - ROUTER_DEFAULT=${ROUTER_DEFAULT:-openai,gpt-4o}
+    healthcheck:
+      test: ["CMD", "node", "-e", "require('http').get('http://localhost:3456/health', r => process.exit(r.statusCode === 200 ? 0 : 1)).on('error', () => process.exit(1))"]
+      interval: 10s
+      timeout: 5s
+      retries: 5
+      start_period: 30s
+
+volumes:
+  temporal-data:
@@ -19,9 +19,7 @@
      "name": "openrouter",
      "api_base_url": "https://openrouter.ai/api/v1/chat/completions",
      "api_key": "$OPENROUTER_API_KEY",
-      "models": [
-        "google/gemini-3-flash-preview"
-      ],
+      "models": ["google/gemini-3-flash-preview"],
      "transformer": {
        "use": ["openrouter"]
      }
@@ -0,0 +1,50 @@
+{
+  "name": "@keygraph/shannon",
+  "version": "0.0.0",
+  "description": "Shannon - Autonomous white-box AI pentester for web applications and APIs by Keygraph",
+  "type": "module",
+  "main": "dist/index.mjs",
+  "bin": {
+    "shannon": "dist/index.mjs"
+  },
+  "files": [
+    "dist",
+    "infra"
+  ],
+  "scripts": {
+    "build": "tsdown",
+    "check": "tsc --noEmit",
+    "clean": "rm -rf dist"
+  },
+  "dependencies": {
+    "@clack/prompts": "^1.1.0",
+    "chokidar": "^5.0.0",
+    "dotenv": "^17.3.1",
+    "smol-toml": "^1.6.1"
+  },
+  "keywords": [
+    "security",
+    "pentest",
+    "penetration-testing",
+    "vulnerability-assessment",
+    "ai",
+    "white-box",
+    "owasp",
+    "exploitation",
+    "appsec",
+    "keygraph"
+  ],
+  "author": "",
+  "license": "AGPL-3.0-only",
+  "repository": {
+    "type": "git",
+    "url": "git+https://github.com/KeygraphHQ/shannon.git",
+    "directory": "apps/cli"
+  },
+  "engines": {
+    "node": ">=18"
+  },
+  "devDependencies": {
+    "tsdown": "^0.21.5"
+  }
+}
@@ -0,0 +1,19 @@
+/**
+ * `shannon build` command — build the worker Docker image locally.
+ * Only available in local mode (running from cloned repository).
+ */
+
+import { buildImage } from '../docker.js';
+import { isLocal } from '../mode.js';
+
+export function build(noCache: boolean): void {
+  if (!isLocal()) {
+    console.error('ERROR: Build is only available when running from the Shannon repository');
+    console.error('  (Dockerfile not found in current directory)');
+    console.error('');
+    console.error('For npx usage, run: shannon update');
+    process.exit(1);
+  }
+
+  buildImage(noCache);
+}
@@ -0,0 +1,106 @@
+/**
+ * `shannon logs` command — tail a workspace's workflow log.
+ *
+ * Uses chokidar for reliable cross-platform file watching and
+ * bounded synchronous reads to prevent duplicate output.
+ */
+
+import fs from 'node:fs';
+import path from 'node:path';
+import { watch } from 'chokidar';
+import { getWorkspacesDir } from '../home.js';
+
+// Match the exact line the worker writes — anchored to prevent false positives from agent output
+const COMPLETION_PATTERN = /^Workflow (COMPLETED|FAILED)$/m;
+
+/** Read a byte range from a file and return it as a UTF-8 string. */
+function readRange(filePath: string, start: number, end: number): string {
+  const length = end - start;
+  const buffer = Buffer.alloc(length);
+  const fd = fs.openSync(filePath, 'r');
+  try {
+    fs.readSync(fd, buffer, 0, length, start);
+  } finally {
+    fs.closeSync(fd);
+  }
+  return buffer.toString('utf-8');
+}
+
+/** Resolve a workspace ID to its workflow.log path, or exit with an error. */
+function resolveLogFile(workspaceId: string): string {
+  const workspacesDir = getWorkspacesDir();
+
+  // 1. Direct match
+  const directPath = path.join(workspacesDir, workspaceId, 'workflow.log');
+  if (fs.existsSync(directPath)) return directPath;
+
+  // 2. Resume workflow ID (e.g. workspace_resume_123)
+  const resumeBase = workspaceId.replace(/_resume_\d+$/, '');
+  if (resumeBase !== workspaceId) {
+    const resumePath = path.join(workspacesDir, resumeBase, 'workflow.log');
+    if (fs.existsSync(resumePath)) return resumePath;
+  }
+
+  // 3. Named workspace ID (e.g. workspace_shannon-123)
+  const namedBase = workspaceId.replace(/_shannon-\d+$/, '');
+  if (namedBase !== workspaceId) {
+    const namedPath = path.join(workspacesDir, namedBase, 'workflow.log');
+    if (fs.existsSync(namedPath)) return namedPath;
+  }
+
+  console.error(`ERROR: Workflow log not found for: ${workspaceId}`);
+  console.error('');
+  console.error('Possible causes:');
+  console.error("  - Workflow hasn't started yet");
+  console.error('  - Workspace ID is incorrect');
+  console.error('');
+  console.error('Check the Temporal Web UI at http://localhost:8233 for workflow details');
+  process.exit(1);
+}
+
+export function logs(workspaceId: string): void {
+  const logFile = resolveLogFile(workspaceId);
+  let position = 0;
+
+  /**
+   * Output any new content appended since the last read.
+   * Returns true when the workflow completion marker is detected.
+   */
+  function flush(): boolean {
+    try {
+      const { size } = fs.statSync(logFile);
+      if (size <= position) return false;
+
+      const data = readRange(logFile, position, size);
+      process.stdout.write(data);
+      position = size;
+
+      return COMPLETION_PATTERN.test(data);
+    } catch {
+      // File deleted or unreadable — treat as done
+      return true;
+    }
+  }
+
+  console.log(`Tailing workflow log: ${logFile}`);
+
+  // 1. Output existing content
+  if (flush()) {
+    process.exit(0);
+  }
+
+  // 2. Watch for appended content via chokidar
+  const watcher = watch(logFile, { persistent: true });
+
+  const shutdown = (): void => {
+    watcher.close().finally(() => process.exit(0));
+    // Safety net — force exit if watcher.close() stalls
+    setTimeout(() => process.exit(0), 1000).unref();
+  };
+
+  watcher.on('change', () => {
+    if (flush()) shutdown();
+  });
+
+  process.on('SIGINT', shutdown);
+}
@@ -0,0 +1,350 @@
+/**
+ * `shn setup` — interactive TUI wizard for one-time credential configuration.
+ *
+ * Walks the user through selecting a provider and entering credentials,
+ * then persists everything to ~/.shannon/config.toml with 0o600 permissions.
+ */
+
+import fs from 'node:fs';
+import os from 'node:os';
+import path from 'node:path';
+import * as p from '@clack/prompts';
+import { type ShannonConfig, saveConfig } from '../config/writer.js';
+
+const SHANNON_HOME = path.join(os.homedir(), '.shannon');
+
+type Provider = 'anthropic' | 'custom_base_url' | 'bedrock' | 'vertex' | 'router';
+
+export async function setup(): Promise<void> {
+  p.intro('Shannon Setup');
+
+  // 1. Select provider
+  const provider = await p.select({
+    message: 'Select your AI provider',
+    options: [
+      { value: 'anthropic' as const, label: 'Claude Direct', hint: 'recommended' },
+      { value: 'custom_base_url' as const, label: 'Custom Base URL', hint: 'proxies, gateways' },
+      { value: 'bedrock' as const, label: 'Claude via AWS Bedrock' },
+      { value: 'vertex' as const, label: 'Claude via Google Vertex AI' },
+      { value: 'router' as const, label: 'Router', hint: 'experimental' },
+    ],
+  });
+  if (p.isCancel(provider)) return cancelAndExit();
+
+  const config = await setupProvider(provider as Provider);
+
+  // 2. Save config
+  saveConfig(config);
+
+  const configPath = path.join(SHANNON_HOME, 'config.toml');
+  p.log.success(`Configuration saved to ${configPath}`);
+  p.outro('Run `npx @keygraph/shannon start` to begin a scan.');
+}
+
+async function setupProvider(provider: Provider): Promise<ShannonConfig> {
+  switch (provider) {
+    case 'anthropic':
+      return setupAnthropic();
+    case 'custom_base_url':
+      return setupCustomBaseUrl();
+    case 'bedrock':
+      return setupBedrock();
+    case 'vertex':
+      return setupVertex();
+    case 'router':
+      return setupRouter();
+  }
+}
+
+// === Provider Setup Flows ===
+
+async function setupAnthropic(): Promise<ShannonConfig> {
+  const authMethod = await p.select({
+    message: 'Authentication method',
+    options: [
+      { value: 'api_key' as const, label: 'API Key' },
+      { value: 'oauth' as const, label: 'OAuth Token' },
+    ],
+  });
+  if (p.isCancel(authMethod)) return cancelAndExit();
+
+  const config: ShannonConfig = {};
+
+  if (authMethod === 'oauth') {
+    const token = await promptSecret('Enter your OAuth token');
+    config.anthropic = { oauth_token: token };
+  } else {
+    const apiKey = await promptSecret('Enter your Anthropic API key');
+    config.anthropic = { api_key: apiKey };
+  }
+
+  const customizeModels = await p.confirm({
+    message:
+      'Do you want to change the default models?\n' +
+      '    Small  - claude-haiku-4-5-20251001\n' +
+      '    Medium - claude-sonnet-4-6\n' +
+      '    Large  - claude-opus-4-6',
+    initialValue: false,
+  });
+  if (p.isCancel(customizeModels)) return cancelAndExit();
+
+  if (customizeModels) {
+    const small = await p.text({
+      message: 'Small model ID',
+      initialValue: 'claude-haiku-4-5-20251001',
+      validate: required('Small model ID is required'),
+    });
+    if (p.isCancel(small)) return cancelAndExit();
+
+    const medium = await p.text({
+      message: 'Medium model ID',
+      initialValue: 'claude-sonnet-4-6',
+      validate: required('Medium model ID is required'),
+    });
+    if (p.isCancel(medium)) return cancelAndExit();
+
+    const large = await p.text({
+      message: 'Large model ID',
+      initialValue: 'claude-opus-4-6',
+      validate: required('Large model ID is required'),
+    });
+    if (p.isCancel(large)) return cancelAndExit();
+
+    config.models = { small, medium, large };
+  }
+
+  return config;
+}
+
+async function setupCustomBaseUrl(): Promise<ShannonConfig> {
+  const baseUrl = await p.text({
+    message: 'Endpoint URL',
+    placeholder: 'https://your-proxy.example.com',
+    validate: (value) => {
+      if (!value) return 'Endpoint URL is required';
+      try {
+        new URL(value);
+      } catch {
+        return 'Must be a valid URL';
+      }
+      return undefined;
+    },
+  });
+  if (p.isCancel(baseUrl)) return cancelAndExit();
+
+  const authToken = await promptSecret('Enter the auth token for the custom endpoint');
+
+  const config: ShannonConfig = {
+    custom_base_url: { base_url: baseUrl, auth_token: authToken },
+  };
+
+  const customizeModels = await p.confirm({
+    message:
+      'Do you want to change the default models?\n' +
+      '    Small  - claude-haiku-4-5-20251001\n' +
+      '    Medium - claude-sonnet-4-6\n' +
+      '    Large  - claude-opus-4-6',
+    initialValue: false,
+  });
+  if (p.isCancel(customizeModels)) return cancelAndExit();
+
+  if (customizeModels) {
+    const small = await p.text({
+      message: 'Small model ID',
+      initialValue: 'claude-haiku-4-5-20251001',
+      validate: required('Small model ID is required'),
+    });
+    if (p.isCancel(small)) return cancelAndExit();
+
+    const medium = await p.text({
+      message: 'Medium model ID',
+      initialValue: 'claude-sonnet-4-6',
+      validate: required('Medium model ID is required'),
+    });
+    if (p.isCancel(medium)) return cancelAndExit();
+
+    const large = await p.text({
+      message: 'Large model ID',
+      initialValue: 'claude-opus-4-6',
+      validate: required('Large model ID is required'),
+    });
+    if (p.isCancel(large)) return cancelAndExit();
+
+    config.models = { small, medium, large };
+  }
+
+  return config;
+}
+
+async function setupBedrock(): Promise<ShannonConfig> {
+  const region = await p.text({
+    message: 'AWS Region',
+    placeholder: 'us-east-1',
+    validate: required('AWS Region is required'),
+  });
+  if (p.isCancel(region)) return cancelAndExit();
+
+  const token = await promptSecret('Enter your AWS Bearer Token');
+
+  const small = await p.text({
+    message: 'Small model ID',
+    placeholder: 'us.anthropic.claude-haiku-4-5-20251001-v1:0',
+    validate: required('Small model ID is required'),
+  });
+  if (p.isCancel(small)) return cancelAndExit();
+
+  const medium = await p.text({
+    message: 'Medium model ID',
+    placeholder: 'us.anthropic.claude-sonnet-4-6',
+    validate: required('Medium model ID is required'),
+  });
+  if (p.isCancel(medium)) return cancelAndExit();
+
+  const large = await p.text({
+    message: 'Large model ID',
+    placeholder: 'us.anthropic.claude-opus-4-6',
+    validate: required('Large model ID is required'),
+  });
+  if (p.isCancel(large)) return cancelAndExit();
+
+  return {
+    bedrock: { use: true, region, token },
+    models: { small, medium, large },
+  };
+}
+
+async function setupVertex(): Promise<ShannonConfig> {
+  // 1. Collect region and project ID
+  const region = await p.text({
+    message: 'Google Cloud region',
+    placeholder: 'us-east5',
+    validate: required('Region is required'),
+  });
+  if (p.isCancel(region)) return cancelAndExit();
+
+  const projectId = await p.text({
+    message: 'GCP Project ID',
+    validate: required('Project ID is required'),
+  });
+  if (p.isCancel(projectId)) return cancelAndExit();
+
+  // 2. File picker for service account key
+  p.log.info('Select the path to your GCP Service Account JSON key file.');
+  const keySourcePath = await p.path({
+    message: 'Service Account JSON key file',
+    validate: (value) => {
+      if (!value) return 'Path is required';
+      if (!fs.existsSync(value)) return 'File not found';
+      if (!value.endsWith('.json')) return 'Must be a .json file';
+      return undefined;
+    },
+  });
+  if (p.isCancel(keySourcePath)) return cancelAndExit();
+
+  // 3. Copy key to ~/.shannon/ and lock permissions
+  const destPath = path.join(SHANNON_HOME, 'google-sa-key.json');
+  fs.mkdirSync(SHANNON_HOME, { recursive: true });
+  fs.copyFileSync(keySourcePath, destPath);
+  fs.chmodSync(destPath, 0o600);
+  p.log.success(`Key copied to ${destPath} (permissions: 0600)`);
+
+  // 4. Model tiers
+  const models = await p.group({
+    small: () =>
+      p.text({
+        message: 'Small model ID',
+        placeholder: 'claude-haiku-4-5@20251001',
+        validate: required('Small model ID is required'),
+      }),
+    medium: () =>
+      p.text({
+        message: 'Medium model ID',
+        placeholder: 'claude-sonnet-4-6',
+        validate: required('Medium model ID is required'),
+      }),
+    large: () =>
+      p.text({
+        message: 'Large model ID',
+        placeholder: 'claude-opus-4-6',
+        validate: required('Large model ID is required'),
+      }),
+  });
+  if (p.isCancel(models)) return cancelAndExit();
+
+  return {
+    vertex: {
+      use: true,
+      region,
+      project_id: projectId,
+      key_path: destPath,
+    },
+    models: { small: models.small, medium: models.medium, large: models.large },
+  };
+}
+
+async function setupRouter(): Promise<ShannonConfig> {
+  const routerProvider = await p.select({
+    message: 'Router provider',
+    options: [
+      { value: 'openai' as const, label: 'OpenAI' },
+      { value: 'openrouter' as const, label: 'OpenRouter' },
+    ],
+  });
+  if (p.isCancel(routerProvider)) return cancelAndExit();
+
+  const apiKey = await promptSecret(
+    routerProvider === 'openai' ? 'Enter your OpenAI API key' : 'Enter your OpenRouter API key',
+  );
+
+  let defaultModel: string;
+  if (routerProvider === 'openai') {
+    const model = await p.select({
+      message: 'Default model',
+      options: [
+        { value: 'gpt-5.2' as const, label: 'GPT-5.2' },
+        { value: 'gpt-5-mini' as const, label: 'GPT-5 Mini' },
+      ],
+    });
+    if (p.isCancel(model)) return cancelAndExit();
+    defaultModel = `openai,${model}`;
+  } else {
+    const model = await p.select({
+      message: 'Default model',
+      options: [{ value: 'google/gemini-3-flash-preview' as const, label: 'Google Gemini 3 Flash Preview' }],
+    });
+    if (p.isCancel(model)) return cancelAndExit();
+    defaultModel = `openrouter,${model}`;
+  }
+
+  const router: ShannonConfig['router'] = { default: defaultModel };
+  if (routerProvider === 'openai') {
+    router.openai_key = apiKey;
+  } else {
+    router.openrouter_key = apiKey;
+  }
+
+  return { router };
+}
+
+// === Helpers ===
+
+async function promptSecret(message: string): Promise<string> {
+  const value = await p.password({
+    message,
+    validate: required(`${message.replace(/^Enter /, '')} is required`),
+  });
+  if (p.isCancel(value)) return cancelAndExit();
+  return value;
+}
+
+function required(errorMessage: string): (value: string | undefined) => string | undefined {
+  return (value) => {
+    if (!value) return errorMessage;
+    return undefined;
+  };
+}
+
+function cancelAndExit(): never {
+  p.cancel('Setup cancelled.');
+  process.exit(0);
+}
@@ -0,0 +1,226 @@
+/**
+ * `shannon start` command — launch a pentest scan.
+ *
+ * Handles both local mode (local build, ./workspaces/, mounted prompts)
+ * and npx mode (Docker Hub pull, ~/.shannon/).
+ */
+
+import { execFileSync } from 'node:child_process';
+import fs from 'node:fs';
+import path from 'node:path';
+import { ensureImage, ensureInfra, randomSuffix, spawnWorker } from '../docker.js';
+import { buildEnvFlags, isRouterConfigured, loadEnv, validateCredentials } from '../env.js';
+import { getCredentialsPath, getWorkspacesDir, initHome } from '../home.js';
+import { isLocal } from '../mode.js';
+import { ensureDeliverables, resolveConfig, resolveRepo } from '../paths.js';
+import { displaySplash } from '../splash.js';
+
+export interface StartArgs {
+  url: string;
+  repo: string;
+  config?: string;
+  workspace?: string;
+  output?: string;
+  pipelineTesting: boolean;
+  router: boolean;
+  version: string;
+}
+
+export async function start(args: StartArgs): Promise<void> {
+  // 1. Initialize state directories and load env
+  initHome();
+  loadEnv();
+
+  // 2. Validate credentials and auto-detect router mode
+  const creds = validateCredentials();
+  if (!creds.valid) {
+    console.error(`ERROR: ${creds.error}`);
+    process.exit(1);
+  }
+  const useRouter = args.router || isRouterConfigured();
+
+  // 3. Resolve paths
+  const repo = resolveRepo(args.repo);
+  const config = args.config ? resolveConfig(args.config) : undefined;
+  ensureDeliverables(repo.hostPath);
+
+  // 4. Ensure workspaces dir is writable by container user (UID 1001)
+  const workspacesDir = getWorkspacesDir();
+  fs.mkdirSync(workspacesDir, { recursive: true });
+  fs.chmodSync(workspacesDir, 0o777);
+
+  // 5. Handle router env
+  if (useRouter) {
+    process.env.ANTHROPIC_BASE_URL = 'http://shannon-router:3456';
+    process.env.ANTHROPIC_AUTH_TOKEN = 'shannon-router-key';
+  }
+
+  // 6. Ensure image (auto-build in dev, pull in npx) and start infra
+  ensureImage(args.version);
+  await ensureInfra(useRouter);
+
+  // 7. Generate unique task queue and container name
+  const suffix = randomSuffix();
+  const taskQueue = `shannon-${suffix}`;
+  const containerName = `shannon-worker-${suffix}`;
+
+  // 8. Generate workspace name if not provided
+  const workspace =
+    args.workspace ?? `${new URL(args.url).hostname.replace(/[^a-zA-Z0-9-]/g, '-')}_shannon-${Date.now()}`;
+
+  // 9. Resolve credentials — mount single file to fixed container path
+  const credentialsPath = getCredentialsPath();
+  const hasCredentials = fs.existsSync(credentialsPath);
+
+  if (hasCredentials) {
+    process.env.GOOGLE_APPLICATION_CREDENTIALS = '/app/credentials/google-sa-key.json';
+  }
+
+  // 10. Resolve output directory
+  const outputDir = args.output ? path.resolve(args.output) : undefined;
+  if (outputDir) {
+    fs.mkdirSync(outputDir, { recursive: true });
+  }
+
+  // 11. Resolve prompts directory (local mode only)
+  const promptsDir = isLocal() ? path.resolve('apps/worker/prompts') : undefined;
+
+  // 12. Display splash screen
+  displaySplash(isLocal() ? undefined : args.version);
+
+  // 13. Spawn worker container
+  const proc = spawnWorker({
+    version: args.version,
+    url: args.url,
+    repo,
+    workspacesDir,
+    taskQueue,
+    containerName,
+    envFlags: buildEnvFlags(),
+    ...(config && { config }),
+    ...(hasCredentials && { credentials: credentialsPath }),
+    ...(promptsDir && { promptsDir }),
+    ...(outputDir && { outputDir }),
+    ...(workspace && { workspace }),
+    ...(args.pipelineTesting && { pipelineTesting: true }),
+  });
+
+  // 14. Wait for workflow to register, then display info
+  proc.on('error', (err) => {
+    console.error(`Failed to start worker: ${err.message}`);
+    process.exit(1);
+  });
+
+  // Detect whether this is a fresh workspace or a resume by checking session.json existence
+  const sessionJson = path.join(workspacesDir, workspace, 'session.json');
+  const isResume = fs.existsSync(sessionJson);
+  let initialResumeCount = 0;
+  if (isResume) {
+    try {
+      const session = JSON.parse(fs.readFileSync(sessionJson, 'utf-8'));
+      initialResumeCount = session.session?.resumeAttempts?.length ?? 0;
+    } catch {
+      // Corrupted file — worker will handle validation
+    }
+  }
+
+  // Poll for workflow to register in session.json
+  process.stdout.write('Waiting for workflow to start...');
+  let workflowId = '';
+  let started = false;
+  let attempts = 0;
+  const pollInterval = setInterval(() => {
+    attempts++;
+    if (attempts > 60) {
+      clearInterval(pollInterval);
+      process.stdout.write('\n');
+      console.error('Timeout waiting for workflow to start');
+      process.exit(1);
+    }
+
+    try {
+      const session = JSON.parse(fs.readFileSync(sessionJson, 'utf-8'));
+      const resumeAttempts: { workflowId: string }[] = session.session?.resumeAttempts ?? [];
+
+      // Fresh: session.json appears with originalWorkflowId. Resume: new resumeAttempts entry.
+      const ready = isResume ? resumeAttempts.length > initialResumeCount : !!session.session?.originalWorkflowId;
+
+      if (ready) {
+        clearInterval(pollInterval);
+        started = true;
+
+        // Latest workflow ID: last resume attempt, or originalWorkflowId for fresh scans
+        workflowId = resumeAttempts.at(-1)?.workflowId ?? session.session?.originalWorkflowId ?? '';
+
+        // Clear waiting line and show info
+        process.stdout.write('\r\x1b[K');
+        printInfo(args, useRouter, workspace, workflowId, repo.hostPath, workspacesDir);
+        return;
+      }
+    } catch {
+      // File doesn't exist yet
+    }
+    process.stdout.write('.');
+  }, 2000);
+
+  // Stop the worker container only if it hasn't started yet
+  let cleaned = false;
+  const cleanup = (): void => {
+    if (cleaned || started) return;
+    cleaned = true;
+    clearInterval(pollInterval);
+    console.log(`\nStopping worker ${containerName}...`);
+    try {
+      execFileSync('docker', ['stop', containerName], { stdio: 'pipe' });
+    } catch {
+      // Container may have already exited
+    }
+  };
+
+  process.on('SIGINT', () => {
+    cleanup();
+    process.exit(0);
+  });
+  process.on('SIGTERM', () => {
+    cleanup();
+    process.exit(0);
+  });
+  process.on('exit', cleanup);
+}
+
+function printInfo(
+  args: StartArgs,
+  routerActive: boolean,
+  workspace: string,
+  workflowId: string,
+  repoPath: string,
+  workspacesDir: string,
+): void {
+  const logsCmd = isLocal() ? `./shannon logs ${workspace}` : `npx @keygraph/shannon logs ${workspace}`;
+  const reportsPath = path.join(workspacesDir, workspace);
+
+  console.log(`  Target:     ${args.url}`);
+  console.log(`  Repository: ${repoPath}`);
+  console.log(`  Workspace:  ${workspace}`);
+  if (args.config) {
+    console.log(`  Config:     ${path.resolve(args.config)}`);
+  }
+  if (args.pipelineTesting) {
+    console.log('  Mode:       Pipeline Testing');
+  }
+  if (routerActive) {
+    console.log('  Router:     Enabled');
+  }
+  console.log('');
+  console.log('  Monitor:');
+  if (workflowId) {
+    console.log(`    Web UI:  http://localhost:8233/namespaces/default/workflows/${workflowId}`);
+  } else {
+    console.log('    Web UI:  http://localhost:8233');
+  }
+  console.log(`    Logs:    ${logsCmd}`);
+  console.log('');
+  console.log('  Output:');
+  console.log(`    Reports: ${reportsPath}/`);
+  console.log('');
+}
@@ -0,0 +1,24 @@
+/**
+ * `shannon status` command — show running workers and Temporal health.
+ */
+
+import { isTemporalReady, listRunningWorkers } from '../docker.js';
+
+export function status(): void {
+  // 1. Temporal health
+  const temporalUp = isTemporalReady();
+  console.log(`Temporal: ${temporalUp ? 'running' : 'not running'}`);
+  if (temporalUp) {
+    console.log('  Web UI: http://localhost:8233');
+  }
+  console.log('');
+
+  // 2. Running workers
+  const workers = listRunningWorkers();
+  if (workers) {
+    console.log('Workers:');
+    console.log(workers);
+  } else {
+    console.log('Workers: none running');
+  }
+}
@@ -0,0 +1,21 @@
+/**
+ * `shannon stop` command — stop workers and infrastructure.
+ */
+
+import * as p from '@clack/prompts';
+import { stopInfra, stopWorkers } from '../docker.js';
+
+export async function stop(clean: boolean): Promise<void> {
+  if (clean) {
+    const confirmed = await p.confirm({
+      message: 'This will stop all running scans and remove the Temporal data. Continue?',
+    });
+    if (p.isCancel(confirmed) || !confirmed) {
+      p.cancel('Aborted.');
+      process.exit(0);
+    }
+  }
+
+  stopWorkers();
+  stopInfra(clean);
+}
@@ -0,0 +1,37 @@
+/**
+ * `shn uninstall` command — remove ~/.shannon/ after confirmation (npx only).
+ */
+
+import fs from 'node:fs';
+import os from 'node:os';
+import path from 'node:path';
+import * as p from '@clack/prompts';
+import { stopInfra, stopWorkers } from '../docker.js';
+
+const SHANNON_HOME = path.join(os.homedir(), '.shannon');
+
+export async function uninstall(): Promise<void> {
+  p.intro('Shannon Uninstall');
+
+  if (!fs.existsSync(SHANNON_HOME)) {
+    p.log.info('Nothing to remove. Shannon is not configured on this machine.');
+    p.outro('Done.');
+    return;
+  }
+
+  const confirmed = await p.confirm({
+    message: 'This will permanently remove all past scan data, saved configurations, and API keys. Continue?',
+  });
+  if (p.isCancel(confirmed) || !confirmed) {
+    p.cancel('Aborted.');
+    process.exit(0);
+  }
+
+  // Stop any running containers first
+  stopWorkers();
+  stopInfra(false);
+
+  fs.rmSync(SHANNON_HOME, { recursive: true, force: true });
+  p.log.success('All Shannon data has been removed.');
+  p.outro('Shannon has been uninstalled. Run `npx @keygraph/shannon setup` to start fresh.');
+}
@@ -0,0 +1,35 @@
+/**
+ * `shannon workspaces` command — list all workspaces.
+ */
+
+import { execFileSync } from 'node:child_process';
+import os from 'node:os';
+import { getWorkerImage } from '../docker.js';
+import { getWorkspacesDir } from '../home.js';
+
+export function workspaces(version: string): void {
+  const workspacesDir = getWorkspacesDir();
+  const image = getWorkerImage(version);
+
+  try {
+    execFileSync(
+      'docker',
+      [
+        'run',
+        '--rm',
+        '-v',
+        `${workspacesDir}:/app/workspaces`,
+        '-e',
+        'WORKSPACES_DIR=/app/workspaces',
+        image,
+        'node',
+        'apps/worker/dist/temporal/workspaces.js',
+      ],
+      { stdio: 'inherit', ...(os.platform() === 'win32' && { env: { ...process.env, MSYS_NO_PATHCONV: '1' } }) },
+    );
+  } catch {
+    console.error('ERROR: Failed to list workspaces. Is the Docker image available?');
+    console.error(`  Run: docker pull ${image}`);
+    process.exit(1);
+  }
+}
@@ -0,0 +1,300 @@
+/**
+ * Configuration resolver with environment-first, TOML-fallback precedence.
+ *
+ * Priority: process.env > ~/.shannon/config.toml
+ * Env var names match .env.example exactly; TOML uses nested sections.
+ */
+
+import fs from 'node:fs';
+import { parse as parseTOML } from 'smol-toml';
+import { getConfigFile } from '../home.js';
+import { getMode } from '../mode.js';
+
+// === TOML ↔ Env Mapping ===
+
+type TOMLType = 'string' | 'number' | 'boolean';
+
+interface ConfigMapping {
+  readonly env: string;
+  readonly toml: string;
+  readonly type: TOMLType;
+}
+
+/** Maps every supported env var to its TOML path (section.key) and expected type. */
+const CONFIG_MAP: readonly ConfigMapping[] = [
+  // Core
+  { env: 'CLAUDE_CODE_MAX_OUTPUT_TOKENS', toml: 'core.max_tokens', type: 'number' },
+
+  // Anthropic
+  { env: 'ANTHROPIC_API_KEY', toml: 'anthropic.api_key', type: 'string' },
+  { env: 'CLAUDE_CODE_OAUTH_TOKEN', toml: 'anthropic.oauth_token', type: 'string' },
+
+  // Bedrock
+  { env: 'CLAUDE_CODE_USE_BEDROCK', toml: 'bedrock.use', type: 'boolean' },
+  { env: 'AWS_REGION', toml: 'bedrock.region', type: 'string' },
+  { env: 'AWS_BEARER_TOKEN_BEDROCK', toml: 'bedrock.token', type: 'string' },
+
+  // Vertex
+  { env: 'CLAUDE_CODE_USE_VERTEX', toml: 'vertex.use', type: 'boolean' },
+  { env: 'CLOUD_ML_REGION', toml: 'vertex.region', type: 'string' },
+  { env: 'ANTHROPIC_VERTEX_PROJECT_ID', toml: 'vertex.project_id', type: 'string' },
+  { env: 'GOOGLE_APPLICATION_CREDENTIALS', toml: 'vertex.key_path', type: 'string' },
+
+  // Custom Base URL
+  { env: 'ANTHROPIC_BASE_URL', toml: 'custom_base_url.base_url', type: 'string' },
+  { env: 'ANTHROPIC_AUTH_TOKEN', toml: 'custom_base_url.auth_token', type: 'string' },
+
+  // Router
+  { env: 'ROUTER_DEFAULT', toml: 'router.default', type: 'string' },
+  { env: 'OPENAI_API_KEY', toml: 'router.openai_key', type: 'string' },
+  { env: 'OPENROUTER_API_KEY', toml: 'router.openrouter_key', type: 'string' },
+
+  // Model tiers
+  { env: 'ANTHROPIC_SMALL_MODEL', toml: 'models.small', type: 'string' },
+  { env: 'ANTHROPIC_MEDIUM_MODEL', toml: 'models.medium', type: 'string' },
+  { env: 'ANTHROPIC_LARGE_MODEL', toml: 'models.large', type: 'string' },
+] as const;
+
+// === TOML Parsing ===
+
+type TOMLValue = string | number | boolean;
+type TOMLSection = Record<string, TOMLValue>;
+type TOMLConfig = Record<string, TOMLSection>;
+
+/** Read a nested TOML value by dotted path (e.g. "anthropic.api_key"). */
+function getTomlValue(config: TOMLConfig, path: string): string | undefined {
+  const [section, key] = path.split('.');
+  if (!section || !key) return undefined;
+
+  const sectionObj = config[section];
+  if (!sectionObj || typeof sectionObj !== 'object') return undefined;
+
+  const value = sectionObj[key];
+  if (value === undefined || value === null) return undefined;
+
+  // NOTE: env.ts checks bedrock/vertex via `=== '1'`, so booleans must map to "1"/"0"
+  if (typeof value === 'boolean') return value ? '1' : '0';
+
+  return String(value);
+}
+
+/** Parse the global TOML config file, returning null if it doesn't exist. */
+function loadTOML(): TOMLConfig | null {
+  const configPath = getConfigFile();
+  if (!fs.existsSync(configPath)) return null;
+
+  // Config contains secrets — refuse to read if group or others have any access.
+  // Skip on Windows where POSIX permissions are not supported.
+  if (process.platform !== 'win32') {
+    const mode = fs.statSync(configPath).mode;
+    if (mode & 0o077) {
+      const actual = (mode & 0o777).toString(8).padStart(3, '0');
+      console.error(`\nInsecure permissions (${actual}) on ${configPath}. Run: chmod 600 ${configPath}\n`);
+      process.exit(1);
+    }
+  }
+
+  try {
+    const content = fs.readFileSync(configPath, 'utf-8');
+    return parseTOML(content) as TOMLConfig;
+  } catch (err) {
+    const message = err instanceof Error ? err.message : String(err);
+    console.error(`\nFailed to parse ${configPath}: ${message}`);
+    console.error(`\nRun 'npx @keygraph/shannon setup' to reconfigure.\n`);
+    process.exit(1);
+  }
+}
+
+// === Validation ===
+
+/** Build a lookup of allowed keys per section from CONFIG_MAP. */
+function buildSchema(): Map<string, Map<string, TOMLType>> {
+  const schema = new Map<string, Map<string, TOMLType>>();
+  for (const mapping of CONFIG_MAP) {
+    const [section, key] = mapping.toml.split('.');
+    if (!section || !key) continue;
+
+    let keys = schema.get(section);
+    if (!keys) {
+      keys = new Map();
+      schema.set(section, keys);
+    }
+    keys.set(key, mapping.type);
+  }
+  return schema;
+}
+
+/** Check that a provider section has all required fields and dependencies. */
+function validateProviderFields(config: TOMLConfig, provider: string, errors: string[]): void {
+  const section = config[provider] as Record<string, unknown> | undefined;
+  if (!section) return;
+  const keys = Object.keys(section);
+
+  switch (provider) {
+    case 'anthropic':
+      if (!keys.includes('api_key') && !keys.includes('oauth_token')) {
+        errors.push('[anthropic] requires either api_key or oauth_token');
+      }
+      break;
+
+    case 'custom_base_url': {
+      const required = ['base_url', 'auth_token'];
+      const missing = required.filter((k) => !keys.includes(k));
+      if (missing.length > 0) {
+        errors.push(`[custom_base_url] missing required keys: ${missing.join(', ')}`);
+      }
+      break;
+    }
+
+    case 'bedrock': {
+      const required = ['use', 'region', 'token'];
+      const missing = required.filter((k) => !keys.includes(k));
+      if (missing.length > 0) {
+        errors.push(`[bedrock] missing required keys: ${missing.join(', ')}`);
+      }
+      validateModelTiers(config, 'bedrock', errors);
+      break;
+    }
+
+    case 'vertex': {
+      const required = ['use', 'region', 'project_id', 'key_path'];
+      const missing = required.filter((k) => !keys.includes(k));
+      if (missing.length > 0) {
+        errors.push(`[vertex] missing required keys: ${missing.join(', ')}`);
+      }
+      validateModelTiers(config, 'vertex', errors);
+      break;
+    }
+
+    case 'router': {
+      if (!keys.includes('default')) {
+        errors.push('[router] missing required key: default');
+      }
+      if (!keys.includes('openai_key') && !keys.includes('openrouter_key')) {
+        errors.push('[router] requires either openai_key or openrouter_key');
+      }
+      const models = config.models as Record<string, unknown> | undefined;
+      if (models && typeof models === 'object' && Object.keys(models).length > 0) {
+        errors.push('[models] is not supported with [router]');
+      }
+      break;
+    }
+  }
+}
+
+/** Bedrock and Vertex require a [models] section with all three tiers. */
+function validateModelTiers(config: TOMLConfig, provider: string, errors: string[]): void {
+  const models = config.models as Record<string, unknown> | undefined;
+  if (!models || typeof models !== 'object') {
+    errors.push(`[${provider}] requires a [models] section with small, medium, and large`);
+    return;
+  }
+
+  const required = ['small', 'medium', 'large'];
+  const missing = required.filter((k) => !Object.keys(models).includes(k));
+  if (missing.length > 0) {
+    errors.push(`[models] missing required keys for ${provider}: ${missing.join(', ')}`);
+  }
+}
+
+/**
+ * Validate a parsed TOML config against the known schema.
+ * Returns an array of human-readable error messages (empty = valid).
+ */
+function validateConfig(config: TOMLConfig): string[] {
+  const schema = buildSchema();
+  const errors: string[] = [];
+
+  for (const [section, sectionObj] of Object.entries(config)) {
+    // 1. Reject unknown sections
+    const allowedKeys = schema.get(section);
+    if (!allowedKeys) {
+      const known = [...schema.keys()].join(', ');
+      errors.push(`Unknown section [${section}]. Valid sections: ${known}`);
+      continue;
+    }
+
+    // 2. Section value must be a table
+    if (!sectionObj || typeof sectionObj !== 'object') {
+      errors.push(`[${section}] must be a table, got ${typeof sectionObj}`);
+      continue;
+    }
+
+    // 3. Validate each key in the section
+    for (const [key, value] of Object.entries(sectionObj as Record<string, unknown>)) {
+      const expectedType = allowedKeys.get(key);
+      if (!expectedType) {
+        const known = [...allowedKeys.keys()].join(', ');
+        errors.push(`Unknown key "${key}" in [${section}]. Valid keys: ${known}`);
+        continue;
+      }
+
+      if (typeof value !== expectedType) {
+        errors.push(`[${section}].${key} must be ${expectedType}, got ${typeof value}`);
+        continue;
+      }
+
+      // Reject empty strings — they pass type checks but are never useful
+      if (typeof value === 'string' && value.trim() === '') {
+        errors.push(`[${section}].${key} must not be empty`);
+      }
+    }
+  }
+
+  // 4. Only one provider section allowed (ignore empty sections)
+  const PROVIDER_SECTIONS = ['anthropic', 'custom_base_url', 'bedrock', 'vertex', 'router'] as const;
+  const present = PROVIDER_SECTIONS.filter((s) => {
+    const section = config[s];
+    return section && typeof section === 'object' && Object.keys(section).length > 0;
+  });
+  if (present.length > 1) {
+    errors.push(
+      `Multiple providers configured: [${present.join('], [')}]. Only one provider section is allowed at a time`,
+    );
+  }
+
+  // 5. Required fields per provider
+  const singleProvider = present.length === 1 ? present[0] : undefined;
+  if (singleProvider) {
+    validateProviderFields(config, singleProvider, errors);
+  }
+
+  return errors;
+}
+
+// === Public API ===
+
+/**
+ * Resolve all config values into process.env (npx mode only).
+ *
+ * For each mapped variable: if not already set in the environment,
+ * look it up in ~/.shannon/config.toml and inject it into process.env.
+ * Local mode uses .env exclusively — TOML is skipped.
+ * Exits with an error if the TOML contains unknown or invalid keys.
+ */
+export function resolveConfig(): void {
+  if (getMode() === 'local') return;
+
+  const toml = loadTOML();
+  if (!toml) return;
+
+  // Validate before injecting
+  const errors = validateConfig(toml);
+  if (errors.length > 0) {
+    console.error('\nInvalid configuration:');
+    for (const err of errors) {
+      console.error(`  - ${err}`);
+    }
+    console.error(`\nRun 'shn setup' to reconfigure.\n`);
+    process.exit(1);
+  }
+
+  for (const mapping of CONFIG_MAP) {
+    if (process.env[mapping.env]) continue;
+
+    const value = getTomlValue(toml, mapping.toml);
+    if (value) {
+      process.env[mapping.env] = value;
+    }
+  }
+}
@@ -0,0 +1,30 @@
+/** TOML config writer for ~/.shannon/config.toml. */
+
+import fs from 'node:fs';
+import path from 'node:path';
+import { stringify } from 'smol-toml';
+import { getConfigFile } from '../home.js';
+
+// === Types ===
+
+export interface ShannonConfig {
+  core?: { max_tokens?: number };
+  anthropic?: { api_key?: string; oauth_token?: string };
+  custom_base_url?: { base_url?: string; auth_token?: string };
+  bedrock?: { use?: boolean; region?: string; token?: string };
+  vertex?: { use?: boolean; region?: string; project_id?: string; key_path?: string };
+  router?: { default?: string; openai_key?: string; openrouter_key?: string };
+  models?: { small?: string; medium?: string; large?: string };
+}
+
+// === File Operations ===
+
+/** Write the config to ~/.shannon/config.toml with 0o600 permissions. */
+export function saveConfig(config: ShannonConfig): void {
+  const configPath = getConfigFile();
+  const dir = path.dirname(configPath);
+  fs.mkdirSync(dir, { recursive: true });
+
+  const content = stringify(config);
+  fs.writeFileSync(configPath, content, { mode: 0o600 });
+}
@@ -0,0 +1,317 @@
+/**
+ * Docker orchestration — compose lifecycle, network, image pull/build, worker spawning.
+ *
+ * Local mode: builds locally, uses docker-compose.yml from repo root, mounts prompts.
+ * NPX mode: pulls from Docker Hub, uses bundled compose.yml.
+ */
+
+import { type ChildProcess, execFileSync, spawn } from 'node:child_process';
+import crypto from 'node:crypto';
+import os from 'node:os';
+import path from 'node:path';
+import { setTimeout as sleep } from 'node:timers/promises';
+import { fileURLToPath } from 'node:url';
+import { getMode } from './mode.js';
+
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+
+const NPX_IMAGE_REPO = 'keygraph/shannon';
+const DEV_IMAGE = 'shannon-worker';
+
+export function getWorkerImage(version: string): string {
+  return getMode() === 'local' ? DEV_IMAGE : `${NPX_IMAGE_REPO}:${version}`;
+}
+
+function getComposeFile(): string {
+  return getMode() === 'local'
+    ? path.resolve('docker-compose.yml')
+    : path.resolve(__dirname, '..', 'infra', 'compose.yml');
+}
+
+/** Generate an 8-char random hex suffix for container/queue names. */
+export function randomSuffix(): string {
+  return crypto.randomBytes(4).toString('hex');
+}
+
+/** Run a command silently, return true if it succeeds. */
+function runQuiet(cmd: string, args: string[]): boolean {
+  try {
+    execFileSync(cmd, args, { stdio: 'pipe' });
+    return true;
+  } catch {
+    return false;
+  }
+}
+
+/** Run a command and return stdout, or empty string on failure. */
+function runOutput(cmd: string, args: string[]): string {
+  try {
+    return execFileSync(cmd, args, { stdio: 'pipe', encoding: 'utf-8' }).trim();
+  } catch {
+    return '';
+  }
+}
+
+/**
+ * Check if Temporal is running and healthy.
+ */
+export function isTemporalReady(): boolean {
+  const output = runOutput('docker', [
+    'exec',
+    'shannon-temporal',
+    'temporal',
+    'operator',
+    'cluster',
+    'health',
+    '--address',
+    'localhost:7233',
+  ]);
+  return output.includes('SERVING');
+}
+
+/** Check if the router container is running and healthy. */
+function isRouterReady(): boolean {
+  const status = runOutput('docker', ['inspect', '--format', '{{.State.Health.Status}}', 'shannon-router']);
+  return status === 'healthy';
+}
+
+/**
+ * Ensure Temporal (and optionally router) are running via compose.
+ * If Temporal is already up but router is needed and missing, starts router only.
+ */
+export async function ensureInfra(useRouter: boolean): Promise<void> {
+  const temporalReady = isTemporalReady();
+  const routerNeeded = useRouter && !isRouterReady();
+
+  if (temporalReady && !routerNeeded) {
+    return;
+  }
+
+  const composeFile = getComposeFile();
+  const composeArgs = ['compose', '-f', composeFile];
+  if (useRouter) composeArgs.push('--profile', 'router');
+  composeArgs.push('up', '-d');
+
+  if (temporalReady && routerNeeded) {
+    console.log('Starting router...');
+  } else {
+    console.log('Starting Shannon infrastructure...');
+  }
+  execFileSync('docker', composeArgs, { stdio: 'inherit' });
+
+  // Wait for Temporal if it wasn't already running
+  if (!temporalReady) {
+    console.log('Waiting for Temporal to be ready...');
+    for (let i = 0; i < 30; i++) {
+      if (isTemporalReady()) {
+        console.log('Temporal is ready!');
+        break;
+      }
+      if (i === 29) {
+        console.error('Timeout waiting for Temporal');
+        process.exit(1);
+      }
+      await sleep(2000);
+    }
+  }
+
+  // Wait for router if needed
+  if (routerNeeded) {
+    console.log('Waiting for router to be ready...');
+    for (let i = 0; i < 15; i++) {
+      if (isRouterReady()) {
+        console.log('Router is ready!');
+        return;
+      }
+      await sleep(2000);
+    }
+    console.error('Timeout waiting for router');
+    process.exit(1);
+  }
+}
+
+/**
+ * Build the worker image locally (local mode only).
+ */
+export function buildImage(noCache: boolean): void {
+  console.log(`Building ${DEV_IMAGE}...`);
+  const args = ['build'];
+  if (noCache) args.push('--no-cache');
+  args.push('-t', DEV_IMAGE, '.');
+  execFileSync('docker', args, { stdio: 'inherit' });
+  console.log(`Build complete: ${DEV_IMAGE}`);
+}
+
+/**
+ * Ensure the worker image is available.
+ * Local mode: auto-builds if missing. NPX mode: pulls from Docker Hub.
+ */
+export function ensureImage(version: string): void {
+  const image = getWorkerImage(version);
+  const exists = runQuiet('docker', ['image', 'inspect', image]);
+  if (exists) return;
+
+  if (getMode() === 'local') {
+    console.log('Worker image not found, building...');
+    buildImage(false);
+  } else {
+    console.log(`Pulling ${image}...`);
+    try {
+      execFileSync('docker', ['pull', image], { stdio: 'inherit' });
+    } catch {
+      console.error(`\nERROR: Failed to pull ${image}`);
+      console.error('The image may not be available for your platform yet.');
+      console.error('Check https://hub.docker.com/r/keygraph/shannon for available tags.');
+      process.exit(1);
+    }
+    pruneOldImages(version);
+  }
+}
+
+/**
+ * Detect if --add-host is needed (Linux without Podman).
+ * macOS has host.docker.internal built in.
+ */
+function addHostFlag(): string[] {
+  if (os.platform() === 'linux') {
+    const hasPodman = runQuiet('which', ['podman']);
+    if (!hasPodman) {
+      return ['--add-host', 'host.docker.internal:host-gateway'];
+    }
+  }
+  return [];
+}
+
+export interface WorkerOptions {
+  version: string;
+  url: string;
+  repo: { hostPath: string; containerPath: string };
+  workspacesDir: string;
+  taskQueue: string;
+  containerName: string;
+  envFlags: string[];
+  config?: { hostPath: string; containerPath: string };
+  credentials?: string;
+  promptsDir?: string;
+  outputDir?: string;
+  workspace?: string;
+  pipelineTesting?: boolean;
+}
+
+/**
+ * Spawn the worker container in detached mode and return the process.
+ */
+export function spawnWorker(opts: WorkerOptions): ChildProcess {
+  const args = ['run', '-d', '--rm', '--name', opts.containerName, '--network', 'shannon-net'];
+
+  // Add host flag for Linux
+  args.push(...addHostFlag());
+
+  // UID remapping for Linux bind mounts
+  if (os.platform() === 'linux' && process.getuid && process.getgid) {
+    args.push('-e', `SHANNON_HOST_UID=${process.getuid()}`, '-e', `SHANNON_HOST_GID=${process.getgid()}`);
+  }
+
+  // Volume mounts
+  args.push('-v', `${opts.workspacesDir}:/app/workspaces`);
+  args.push('-v', `${opts.repo.hostPath}:${opts.repo.containerPath}`);
+
+  // Local mode: mount prompts for live editing
+  if (opts.promptsDir) {
+    args.push('-v', `${opts.promptsDir}:/app/apps/worker/prompts:ro`);
+  }
+
+  if (opts.config) {
+    args.push('-v', `${opts.config.hostPath}:${opts.config.containerPath}:ro`);
+  }
+
+  // Output directory for deliverables copy
+  if (opts.outputDir) {
+    args.push('-v', `${opts.outputDir}:/app/output`);
+  }
+
+  // Mount credentials file to fixed container path
+  if (opts.credentials) {
+    args.push('-v', `${opts.credentials}:/app/credentials/google-sa-key.json:ro`);
+  }
+
+  // Environment
+  args.push(...opts.envFlags);
+
+  // Container settings
+  args.push('--shm-size', '2gb', '--security-opt', 'seccomp=unconfined');
+
+  // Image
+  args.push(getWorkerImage(opts.version));
+
+  // Worker command
+  args.push('node', 'apps/worker/dist/temporal/worker.js', opts.url, opts.repo.containerPath);
+  args.push('--task-queue', opts.taskQueue);
+  if (opts.config) {
+    args.push('--config', opts.config.containerPath);
+  }
+  if (opts.outputDir) {
+    args.push('--output', '/app/output');
+  }
+  if (opts.workspace) {
+    args.push('--workspace', opts.workspace);
+  }
+  if (opts.pipelineTesting) {
+    args.push('--pipeline-testing');
+  }
+
+  // Prevent MSYS/Git Bash from converting Unix paths (e.g. /repos/my-repo) to Windows paths
+  return spawn('docker', args, {
+    stdio: 'pipe',
+    ...(os.platform() === 'win32' && { env: { ...process.env, MSYS_NO_PATHCONV: '1' } }),
+  });
+}
+
+/**
+ * Stop all running shannon-worker-* containers.
+ */
+export function stopWorkers(): void {
+  const workers = runOutput('docker', ['ps', '-q', '--filter', 'name=shannon-worker-']);
+  if (!workers) return;
+
+  const ids = workers.split('\n').filter(Boolean);
+  console.log('Stopping worker containers...');
+  execFileSync('docker', ['stop', ...ids], { stdio: 'inherit' });
+}
+
+/**
+ * Tear down the compose stack.
+ */
+export function stopInfra(clean: boolean): void {
+  const composeFile = getComposeFile();
+  const args = ['compose', '-f', composeFile, '--profile', 'router', 'down'];
+  if (clean) args.push('-v');
+  execFileSync('docker', args, { stdio: 'inherit' });
+}
+
+/**
+ * Remove old keygraph/shannon images that don't match the current version.
+ */
+function pruneOldImages(currentVersion: string): void {
+  const output = runOutput('docker', ['images', NPX_IMAGE_REPO, '--format', '{{.Tag}}']);
+  if (!output) return;
+
+  const currentTag = currentVersion;
+  const stale = output.split('\n').filter((tag) => tag && tag !== currentTag);
+  for (const tag of stale) {
+    runQuiet('docker', ['rmi', `${NPX_IMAGE_REPO}:${tag}`]);
+  }
+}
+
+/**
+ * List running worker containers.
+ */
+export function listRunningWorkers(): string {
+  return runOutput('docker', [
+    'ps',
+    '--filter',
+    'name=shannon-worker-',
+    '--format',
+    'table {{.Names}}\t{{.Status}}\t{{.RunningFor}}',
+  ]);
+}
@@ -0,0 +1,171 @@
+/**
+ * Environment variable loading and credential validation.
+ *
+ * Local mode: loads ./.env via dotenv.
+ * NPX mode: fills gaps from ~/.shannon/config.toml (no .env).
+ */
+
+import dotenv from 'dotenv';
+import { resolveConfig } from './config/resolver.js';
+import { getMode } from './mode.js';
+
+/** Environment variables forwarded to worker containers. */
+const FORWARD_VARS = [
+  'ANTHROPIC_API_KEY',
+  'ANTHROPIC_BASE_URL',
+  'ANTHROPIC_AUTH_TOKEN',
+  'ROUTER_DEFAULT',
+  'CLAUDE_CODE_OAUTH_TOKEN',
+  'CLAUDE_CODE_USE_BEDROCK',
+  'AWS_REGION',
+  'AWS_BEARER_TOKEN_BEDROCK',
+  'CLAUDE_CODE_USE_VERTEX',
+  'CLOUD_ML_REGION',
+  'ANTHROPIC_VERTEX_PROJECT_ID',
+  'GOOGLE_APPLICATION_CREDENTIALS',
+  'ANTHROPIC_SMALL_MODEL',
+  'ANTHROPIC_MEDIUM_MODEL',
+  'ANTHROPIC_LARGE_MODEL',
+  'CLAUDE_CODE_MAX_OUTPUT_TOKENS',
+  'OPENAI_API_KEY',
+  'OPENROUTER_API_KEY',
+] as const;
+
+/**
+ * Load credentials into process.env.
+ * Local mode: loads ./.env via dotenv.
+ * NPX mode: fills gaps from ~/.shannon/config.toml.
+ * Exported env vars always take precedence in both modes.
+ */
+export function loadEnv(): void {
+  if (getMode() === 'local') {
+    dotenv.config({ path: '.env', quiet: true });
+  } else {
+    resolveConfig();
+  }
+}
+
+/**
+ * Build `-e KEY=VALUE` flags for docker run, only for set variables.
+ */
+export function buildEnvFlags(): string[] {
+  const flags: string[] = ['-e', 'TEMPORAL_ADDRESS=shannon-temporal:7233'];
+
+  for (const key of FORWARD_VARS) {
+    const value = process.env[key];
+    if (value) {
+      flags.push('-e', `${key}=${value}`);
+    }
+  }
+
+  return flags;
+}
+
+interface CredentialValidation {
+  valid: boolean;
+  error?: string;
+  mode: 'api-key' | 'oauth' | 'custom-base-url' | 'bedrock' | 'vertex' | 'router';
+}
+
+/** Check if router credentials are present in the environment. */
+export function isRouterConfigured(): boolean {
+  return !!(process.env.ROUTER_DEFAULT && (process.env.OPENAI_API_KEY || process.env.OPENROUTER_API_KEY));
+}
+
+/** Check if a custom Anthropic-compatible base URL is configured. */
+function isCustomBaseUrlConfigured(): boolean {
+  return !!(process.env.ANTHROPIC_BASE_URL && process.env.ANTHROPIC_AUTH_TOKEN);
+}
+
+/** Detect which providers are configured via environment variables. */
+function detectProviders(): string[] {
+  const providers: string[] = [];
+  if (process.env.ANTHROPIC_API_KEY) providers.push('Anthropic API key');
+  if (process.env.CLAUDE_CODE_OAUTH_TOKEN) providers.push('Anthropic OAuth');
+  if (isCustomBaseUrlConfigured()) providers.push('Custom Base URL');
+  if (process.env.CLAUDE_CODE_USE_BEDROCK === '1') providers.push('AWS Bedrock');
+  if (process.env.CLAUDE_CODE_USE_VERTEX === '1') providers.push('Google Vertex');
+  if (isRouterConfigured()) providers.push('Router');
+  return providers;
+}
+
+/**
+ * Validate that exactly one authentication method is configured.
+ */
+export function validateCredentials(): CredentialValidation {
+  // Reject multiple providers
+  const providers = detectProviders();
+  if (providers.length > 1) {
+    return {
+      valid: false,
+      mode: 'api-key',
+      error: `Multiple providers detected: ${providers.join(', ')}. Only one provider can be active at a time.`,
+    };
+  }
+
+  if (process.env.ANTHROPIC_API_KEY) {
+    return { valid: true, mode: 'api-key' };
+  }
+  if (process.env.CLAUDE_CODE_OAUTH_TOKEN) {
+    return { valid: true, mode: 'oauth' };
+  }
+  if (isCustomBaseUrlConfigured()) {
+    // Set auth token as API key so the SDK can initialize
+    process.env.ANTHROPIC_API_KEY = process.env.ANTHROPIC_AUTH_TOKEN;
+    return { valid: true, mode: 'custom-base-url' };
+  }
+  if (process.env.CLAUDE_CODE_USE_BEDROCK === '1') {
+    const missing: string[] = [];
+    if (!process.env.AWS_REGION) missing.push('AWS_REGION');
+    if (!process.env.AWS_BEARER_TOKEN_BEDROCK) missing.push('AWS_BEARER_TOKEN_BEDROCK');
+    if (!process.env.ANTHROPIC_SMALL_MODEL) missing.push('ANTHROPIC_SMALL_MODEL');
+    if (!process.env.ANTHROPIC_MEDIUM_MODEL) missing.push('ANTHROPIC_MEDIUM_MODEL');
+    if (!process.env.ANTHROPIC_LARGE_MODEL) missing.push('ANTHROPIC_LARGE_MODEL');
+    if (missing.length > 0) {
+      return {
+        valid: false,
+        mode: 'bedrock',
+        error: `Bedrock mode requires: ${missing.join(', ')}`,
+      };
+    }
+    return { valid: true, mode: 'bedrock' };
+  }
+  if (process.env.CLAUDE_CODE_USE_VERTEX === '1') {
+    const missing: string[] = [];
+    if (!process.env.CLOUD_ML_REGION) missing.push('CLOUD_ML_REGION');
+    if (!process.env.ANTHROPIC_VERTEX_PROJECT_ID) missing.push('ANTHROPIC_VERTEX_PROJECT_ID');
+    if (!process.env.ANTHROPIC_SMALL_MODEL) missing.push('ANTHROPIC_SMALL_MODEL');
+    if (!process.env.ANTHROPIC_MEDIUM_MODEL) missing.push('ANTHROPIC_MEDIUM_MODEL');
+    if (!process.env.ANTHROPIC_LARGE_MODEL) missing.push('ANTHROPIC_LARGE_MODEL');
+    if (missing.length > 0) {
+      return {
+        valid: false,
+        mode: 'vertex',
+        error: `Vertex AI mode requires: ${missing.join(', ')}`,
+      };
+    }
+    if (!process.env.GOOGLE_APPLICATION_CREDENTIALS) {
+      return {
+        valid: false,
+        mode: 'vertex',
+        error: 'Vertex AI mode requires GOOGLE_APPLICATION_CREDENTIALS',
+      };
+    }
+    return { valid: true, mode: 'vertex' };
+  }
+  if (isRouterConfigured()) {
+    // Set a placeholder so the worker doesn't reject the missing key
+    process.env.ANTHROPIC_API_KEY = 'router-mode';
+    return { valid: true, mode: 'router' };
+  }
+
+  const hint =
+    getMode() === 'local'
+      ? `No credentials found. Set ANTHROPIC_API_KEY in .env or export it.`
+      : `Authentication not configured. Export variables or run 'npx @keygraph/shannon setup'.`;
+  return {
+    valid: false,
+    mode: 'api-key',
+    error: hint,
+  };
+}
@@ -0,0 +1,52 @@
+/**
+ * Shannon state directory management.
+ *
+ * Local mode (cloned repo): uses ./workspaces/, ./credentials/
+ * NPX mode: uses ~/.shannon/workspaces/, ~/.shannon/
+ */
+
+import fs from 'node:fs';
+import os from 'node:os';
+import path from 'node:path';
+import { getMode } from './mode.js';
+
+const SHANNON_HOME = path.join(os.homedir(), '.shannon');
+
+export function getConfigFile(): string {
+  return path.join(SHANNON_HOME, 'config.toml');
+}
+
+export function getWorkspacesDir(): string {
+  return getMode() === 'local' ? path.resolve('workspaces') : path.join(SHANNON_HOME, 'workspaces');
+}
+
+/**
+ * Resolve the Vertex credentials file path.
+ *
+ * Checks GOOGLE_APPLICATION_CREDENTIALS env var first (may be set by TOML resolver),
+ * then falls back to mode-appropriate default location.
+ */
+export function getCredentialsPath(): string {
+  const envPath = process.env.GOOGLE_APPLICATION_CREDENTIALS;
+  if (envPath && fs.existsSync(envPath)) return path.resolve(envPath);
+
+  if (getMode() === 'local') {
+    return path.resolve('credentials', 'google-sa-key.json');
+  }
+
+  return path.join(SHANNON_HOME, 'google-sa-key.json');
+}
+
+/**
+ * Initialize state directories.
+ * Local mode: creates ./workspaces/ and ./credentials/
+ * NPX mode: creates ~/.shannon/workspaces/
+ */
+export function initHome(): void {
+  if (getMode() === 'local') {
+    fs.mkdirSync(path.resolve('workspaces'), { recursive: true });
+    fs.mkdirSync(path.resolve('credentials'), { recursive: true });
+  } else {
+    fs.mkdirSync(path.join(SHANNON_HOME, 'workspaces'), { recursive: true });
+  }
+}
@@ -0,0 +1,239 @@
+/**
+ * Shannon CLI — AI Penetration Testing Framework
+ *
+ * Unified CLI supporting two modes:
+ *   Local mode: Run from cloned repo — builds locally, mounts prompts, uses ./workspaces/
+ *   NPX mode:   Run via npx — pulls from Docker Hub, uses ~/.shannon/
+ *
+ * Mode is auto-detected based on presence of Dockerfile + docker-compose.yml + prompts/
+ * in the current working directory.
+ */
+
+import fs from 'node:fs';
+import path from 'node:path';
+import { fileURLToPath } from 'node:url';
+import { build } from './commands/build.js';
+import { logs } from './commands/logs.js';
+import { setup } from './commands/setup.js';
+import { start } from './commands/start.js';
+import { status } from './commands/status.js';
+import { stop } from './commands/stop.js';
+import { uninstall } from './commands/uninstall.js';
+import { workspaces } from './commands/workspaces.js';
+import { getMode } from './mode.js';
+import { displaySplash } from './splash.js';
+
+const __dirname = path.dirname(fileURLToPath(import.meta.url));
+
+function getVersion(): string {
+  try {
+    const pkgPath = path.join(__dirname, '..', 'package.json');
+    const pkg = JSON.parse(fs.readFileSync(pkgPath, 'utf-8')) as { version?: string };
+    return pkg.version || '1.0.0';
+  } catch {
+    return '1.0.0';
+  }
+}
+
+function showHelp(): void {
+  const mode = getMode();
+  const prefix = mode === 'local' ? './shannon' : 'npx @keygraph/shannon';
+
+  console.log(`
+Shannon - AI Penetration Testing Framework
+
+Usage:${
+    mode === 'local'
+      ? ''
+      : `
+  ${prefix} setup                                       Configure credentials`
+  }
+  ${prefix} start --url <url> --repo <path> [options]   Start a pentest scan
+  ${prefix} stop [--clean]                               Stop all containers
+  ${prefix} workspaces                                   List all workspaces
+  ${prefix} logs <workspace>                             Tail workflow log
+  ${prefix} status                                       Show running workers${
+    mode === 'local'
+      ? `
+  ${prefix} build [--no-cache]                           Build worker image`
+      : `
+  ${prefix} uninstall                                    Remove ~/.shannon/ and all data`
+  }
+  ${prefix} info                                         Show splash screen
+  ${prefix} help                                         Show this help
+
+Options for 'start':
+  -u, --url <url>           Target URL (required)
+  -r, --repo <path>         Repository path${mode === 'local' ? ' or bare name' : ''} (required)
+  -c, --config <path>       Configuration file (YAML)
+  -o, --output <path>       Copy deliverables to this directory after run
+  -w, --workspace <name>    Named workspace (auto-resumes if exists)
+      --pipeline-testing    Use minimal prompts for fast testing
+      --router              Route requests through claude-code-router
+
+Examples:
+  ${prefix} start -u https://example.com -r ${mode === 'local' ? 'my-repo' : './my-repo'}
+  ${prefix} start -u https://example.com -r /path/to/repo -c config.yaml -w q1-audit
+  ${prefix} logs q1-audit
+  ${prefix} stop --clean
+${
+  mode === 'local'
+    ? `
+State directory: ./workspaces/`
+    : `
+State directory: ~/.shannon/`
+}
+Monitor workflows at http://localhost:8233
+`);
+}
+
+interface ParsedStartArgs {
+  url: string;
+  repo: string;
+  config?: string;
+  workspace?: string;
+  output?: string;
+  pipelineTesting: boolean;
+  router: boolean;
+}
+
+function parseStartArgs(argv: string[]): ParsedStartArgs {
+  let url = '';
+  let repo = '';
+  let config: string | undefined;
+  let workspace: string | undefined;
+  let output: string | undefined;
+  let pipelineTesting = false;
+  let router = false;
+
+  for (let i = 0; i < argv.length; i++) {
+    const arg = argv[i];
+    const next = argv[i + 1];
+
+    switch (arg) {
+      case '-u':
+      case '--url':
+        if (next && !next.startsWith('-')) {
+          url = next;
+          i++;
+        }
+        break;
+      case '-r':
+      case '--repo':
+        if (next && !next.startsWith('-')) {
+          repo = next;
+          i++;
+        }
+        break;
+      case '-c':
+      case '--config':
+        if (next && !next.startsWith('-')) {
+          config = next;
+          i++;
+        }
+        break;
+      case '-w':
+      case '--workspace':
+        if (next && !next.startsWith('-')) {
+          workspace = next;
+          i++;
+        }
+        break;
+      case '-o':
+      case '--output':
+        if (next && !next.startsWith('-')) {
+          output = next;
+          i++;
+        }
+        break;
+      case '--pipeline-testing':
+        pipelineTesting = true;
+        break;
+      case '--router':
+        router = true;
+        break;
+      default:
+        console.error(`Unknown option: ${arg}`);
+        console.error(`Run "${getMode() === 'local' ? './shannon' : 'npx @keygraph/shannon'} help" for usage`);
+        process.exit(1);
+    }
+  }
+
+  if (!url || !repo) {
+    console.error('ERROR: --url and --repo are required');
+    console.error(`Usage: ${getMode() === 'local' ? './shannon' : 'npx @keygraph/shannon'} start -u <url> -r <path>`);
+    process.exit(1);
+  }
+
+  return {
+    url,
+    repo,
+    pipelineTesting,
+    router,
+    ...(config && { config }),
+    ...(workspace && { workspace }),
+    ...(output && { output }),
+  };
+}
+
+// === Main Dispatch ===
+
+const args = process.argv.slice(2);
+const command = args[0];
+
+switch (command) {
+  case 'start': {
+    const parsed = parseStartArgs(args.slice(1));
+    await start({ ...parsed, version: getVersion() });
+    break;
+  }
+  case 'stop':
+    stop(args.includes('--clean'));
+    break;
+  case 'logs': {
+    const workspaceId = args[1];
+    if (!workspaceId) {
+      console.error('ERROR: Workspace ID is required');
+      console.error(`Usage: ${getMode() === 'local' ? './shannon' : 'npx @keygraph/shannon'} logs <workspace>`);
+      process.exit(1);
+    }
+    logs(workspaceId);
+    break;
+  }
+  case 'workspaces':
+    workspaces(getVersion());
+    break;
+  case 'status':
+    status();
+    break;
+  case 'setup':
+    if (getMode() === 'local') {
+      console.error('ERROR: setup is only available in npx mode. In local mode, use .env');
+      process.exit(1);
+    }
+    setup();
+    break;
+  case 'build':
+    build(args.includes('--no-cache'));
+    break;
+  case 'uninstall':
+    if (getMode() === 'local') {
+      console.error('ERROR: uninstall is only available in npx mode.');
+      process.exit(1);
+    }
+    uninstall();
+    break;
+  case 'info':
+    displaySplash(getMode() === 'local' ? undefined : getVersion());
+    break;
+  case 'help':
+  case '--help':
+  case '-h':
+  case undefined:
+    showHelp();
+    break;
+  default:
+    console.error(`Unknown command: ${command}`);
+    showHelp();
+    process.exit(1);
+}
@@ -0,0 +1,25 @@
+/**
+ * Runtime mode detection — local (build from source) vs npx (Docker Hub).
+ *
+ * The root `./shannon` entry point sets SHANNON_LOCAL=1 before importing.
+ * When run via npx, `cli/dist/index.js` is executed directly without it.
+ */
+
+export type Mode = 'local' | 'npx';
+
+let cachedMode: Mode | undefined;
+
+export function getMode(): Mode {
+  if (cachedMode !== undefined) return cachedMode;
+
+  cachedMode = process.env.SHANNON_LOCAL === '1' ? 'local' : 'npx';
+  return cachedMode;
+}
+
+export function setMode(mode: Mode): void {
+  cachedMode = mode;
+}
+
+export function isLocal(): boolean {
+  return getMode() === 'local';
+}
@@ -0,0 +1,87 @@
+/**
+ * Path resolution for --repo and --config arguments.
+ *
+ * Local mode supports bare repo names (e.g. "my-repo" → ./repos/my-repo).
+ * Both modes resolve relative paths against CWD.
+ */
+
+import fs from 'node:fs';
+import path from 'node:path';
+import { isLocal } from './mode.js';
+
+export interface MountPair {
+  hostPath: string;
+  containerPath: string;
+}
+
+/**
+ * Resolve --repo to absolute path and container mount.
+ * Dev mode: bare names (no / or . prefix) check ./repos/<name> first.
+ */
+export function resolveRepo(repoArg: string): MountPair {
+  let hostPath: string;
+
+  if (isLocal() && !repoArg.startsWith('/') && !repoArg.startsWith('.')) {
+    // Bare name — check ./repos/<name> for backward compatibility
+    const barePath = path.resolve('repos', repoArg);
+    if (fs.existsSync(barePath)) {
+      hostPath = barePath;
+    } else {
+      console.error(`ERROR: Repository not found at ./repos/${repoArg}`);
+      console.error('');
+      console.error('Place your target repository under the ./repos/ directory,');
+      console.error('or pass an absolute/relative path: -r /path/to/repo');
+      process.exit(1);
+    }
+  } else {
+    hostPath = path.resolve(repoArg);
+  }
+
+  if (!fs.existsSync(hostPath)) {
+    console.error(`ERROR: Repository not found: ${hostPath}`);
+    process.exit(1);
+  }
+
+  if (!fs.statSync(hostPath).isDirectory()) {
+    console.error(`ERROR: Not a directory: ${hostPath}`);
+    process.exit(1);
+  }
+
+  const basename = path.basename(hostPath);
+  return {
+    hostPath,
+    containerPath: `/repos/${basename}`,
+  };
+}
+
+/**
+ * Resolve --config to absolute path and container mount.
+ */
+export function resolveConfig(configArg: string): MountPair {
+  const hostPath = path.resolve(configArg);
+
+  if (!fs.existsSync(hostPath)) {
+    console.error(`ERROR: Config file not found: ${hostPath}`);
+    process.exit(1);
+  }
+
+  if (!fs.statSync(hostPath).isFile()) {
+    console.error(`ERROR: Not a file: ${hostPath}`);
+    process.exit(1);
+  }
+
+  const basename = path.basename(hostPath);
+  return {
+    hostPath,
+    containerPath: `/app/configs/${basename}`,
+  };
+}
+
+/**
+ * Ensure the deliverables directory exists and is writable by the container user.
+ */
+export function ensureDeliverables(repoHostPath: string): void {
+  const deliverables = path.join(repoHostPath, 'deliverables');
+  fs.mkdirSync(deliverables, { recursive: true });
+  fs.chmodSync(deliverables, 0o777);
+}
@@ -0,0 +1,50 @@
+/**
+ * Splash screen display — pure terminal output, no npm dependencies.
+ */
+
+export function displaySplash(version?: string): void {
+  const GOLD = '\x1b[38;2;244;197;66m';
+  const CYAN = '\x1b[36;1m';
+  const WHITE = '\x1b[1;37m';
+  const GRAY = '\x1b[0;37m';
+  const YELLOW = '\x1b[1;33m';
+  const RESET = '\x1b[0m';
+
+  const B = `${CYAN}\u2551${RESET}`;
+  const S67 = ' '.repeat(67);
+  const HR = '\u2550'.repeat(67);
+
+  const lines = [
+    '',
+    `  ${CYAN}\u2554${HR}\u2557${RESET}`,
+    `  ${B}${S67}${B}`,
+    `  ${B}  ${GOLD}\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2557\u2588\u2588\u2557  \u2588\u2588\u2557 \u2588\u2588\u2588\u2588\u2588\u2557 \u2588\u2588\u2588\u2557   \u2588\u2588\u2557\u2588\u2588\u2588\u2557   \u2588\u2588\u2557 \u2588\u2588\u2588\u2588\u2588\u2588\u2557 \u2588\u2588\u2588\u2557   \u2588\u2588\u2557${RESET}  ${B}`,
+    `  ${B}  ${GOLD}\u2588\u2588\u2554\u2550\u2550\u2550\u2550\u255D\u2588\u2588\u2551  \u2588\u2588\u2551\u2588\u2588\u2554\u2550\u2550\u2588\u2588\u2557\u2588\u2588\u2588\u2588\u2557  \u2588\u2588\u2551\u2588\u2588\u2588\u2588\u2557  \u2588\u2588\u2551\u2588\u2588\u2554\u2550\u2550\u2550\u2588\u2588\u2557\u2588\u2588\u2588\u2588\u2557  \u2588\u2588\u2551${RESET}  ${B}`,
+    `  ${B}  ${GOLD}\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2557\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2551\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2551\u2588\u2588\u2554\u2588\u2588\u2557 \u2588\u2588\u2551\u2588\u2588\u2554\u2588\u2588\u2557 \u2588\u2588\u2551\u2588\u2588\u2551   \u2588\u2588\u2551\u2588\u2588\u2554\u2588\u2588\u2557 \u2588\u2588\u2551${RESET}  ${B}`,
+    `  ${B}  ${GOLD}\u255A\u2550\u2550\u2550\u2550\u2588\u2588\u2551\u2588\u2588\u2554\u2550\u2550\u2588\u2588\u2551\u2588\u2588\u2554\u2550\u2550\u2588\u2588\u2551\u2588\u2588\u2551\u255A\u2588\u2588\u2557\u2588\u2588\u2551\u2588\u2588\u2551\u255A\u2588\u2588\u2557\u2588\u2588\u2551\u2588\u2588\u2551   \u2588\u2588\u2551\u2588\u2588\u2551\u255A\u2588\u2588\u2557\u2588\u2588\u2551${RESET}  ${B}`,
+    `  ${B}  ${GOLD}\u2588\u2588\u2588\u2588\u2588\u2588\u2588\u2551\u2588\u2588\u2551  \u2588\u2588\u2551\u2588\u2588\u2551  \u2588\u2588\u2551\u2588\u2588\u2551 \u255A\u2588\u2588\u2588\u2588\u2551\u2588\u2588\u2551 \u255A\u2588\u2588\u2588\u2588\u2551\u255A\u2588\u2588\u2588\u2588\u2588\u2588\u2554\u255D\u2588\u2588\u2551 \u255A\u2588\u2588\u2588\u2588\u2551${RESET}  ${B}`,
+    `  ${B}  ${GOLD}\u255A\u2550\u2550\u2550\u2550\u2550\u2550\u255D\u255A\u2550\u255D  \u255A\u2550\u255D\u255A\u2550\u255D  \u255A\u2550\u255D\u255A\u2550\u255D  \u255A\u2550\u2550\u2550\u255D\u255A\u2550\u255D  \u255A\u2550\u2550\u2550\u255D \u255A\u2550\u2550\u2550\u2550\u2550\u255D \u255A\u2550\u255D  \u255A\u2550\u2550\u2550\u255D${RESET}  ${B}`,
+    `  ${B}${S67}${B}`,
+    `  ${B}              ${CYAN}\u2554\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2557${RESET}               ${B}`,
+    `  ${B}              ${CYAN}\u2551${RESET}  ${WHITE}AI Penetration Testing Framework${RESET}  ${CYAN}\u2551${RESET}               ${B}`,
+    `  ${B}              ${CYAN}\u255A\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u2550\u255D${RESET}               ${B}`,
+    `  ${B}${S67}${B}`,
+  ];
+
+  if (version) {
+    const verStr = `v${version}`;
+    const verPadLeft = Math.floor((67 - verStr.length) / 2);
+    const verPadRight = 67 - verStr.length - verPadLeft;
+    lines.push(`  ${B}${' '.repeat(verPadLeft)}${GRAY}${verStr}${RESET}${' '.repeat(verPadRight)}${B}`);
+  }
+
+  lines.push(
+    `  ${B}${S67}${B}`,
+    `  ${B}                    ${YELLOW}\uD83D\uDD10 DEFENSIVE SECURITY ONLY \uD83D\uDD10${RESET}                  ${B}`,
+    `  ${B}${S67}${B}`,
+    `  ${CYAN}\u255A${HR}\u255D${RESET}`,
+    '',
+  );
+
+  console.log(lines.join('\n'));
+}
@@ -0,0 +1,9 @@
+{
+  "extends": "../../tsconfig.base.json",
+  "compilerOptions": {
+    "rootDir": "./src",
+    "outDir": "./dist"
+  },
+  "include": ["src/**/*"],
+  "exclude": ["node_modules", "dist"]
+}
@@ -0,0 +1,11 @@
+import { defineConfig } from 'tsdown';
+
+export default defineConfig({
+  entry: ['src/index.ts'],
+  format: 'esm',
+  target: 'node18',
+  outDir: 'dist',
+  clean: true,
+  deps: { neverBundle: ['@clack/prompts', 'dotenv', 'smol-toml'] },
+  banner: { js: '#!/usr/bin/env node' },
+});
@@ -122,12 +122,20 @@
      "type": "object",
      "description": "Deprecated: Use 'authentication' section instead",
      "deprecated": true
+    },
+    "description": {
+      "type": "string",
+      "description": "Description of the target environment, its deployment context, and any information that helps guide the security assessment",
+      "minLength": 1,
+      "maxLength": 500,
+      "pattern": "\\S"
    }
  },
  "anyOf": [
-    {"required": ["authentication"]},
-    {"required": ["rules"]},
-    {"required": ["authentication", "rules"]}
+    { "required": ["authentication"] },
+    { "required": ["rules"] },
+    { "required": ["authentication", "rules"] },
+    { "required": ["description"] }
  ],
  "additionalProperties": false,
  "$defs": {
@@ -157,4 +165,4 @@
      "additionalProperties": false
    }
  }
-}
+}
@@ -1,6 +1,9 @@
 # Example configuration file for pentest-agent
 # Copy this file and modify it for your specific testing needs

+# Description of the target environment (optional, max 500 chars)
+description: "Next.js e-commerce app on PostgreSQL. Local dev environment — .env files contain local-only credentials, not deployed to production."
+
 authentication:
  login_type: form  # Options: 'form' or 'sso'
  login_url: "https://example.com/login"
@@ -0,0 +1,26 @@
+{
+  "name": "@shannon/worker",
+  "version": "0.0.0",
+  "private": true,
+  "type": "module",
+  "scripts": {
+    "build": "tsc",
+    "check": "tsc --noEmit",
+    "clean": "rm -rf dist"
+  },
+  "dependencies": {
+    "@anthropic-ai/claude-agent-sdk": "catalog:",
+    "@temporalio/activity": "^1.11.0",
+    "@temporalio/client": "^1.11.0",
+    "@temporalio/worker": "^1.11.0",
+    "@temporalio/workflow": "^1.11.0",
+    "ajv": "^8.12.0",
+    "ajv-formats": "^2.1.1",
+    "dotenv": "^16.4.5",
+    "js-yaml": "^4.1.0",
+    "zx": "^8.0.0"
+  },
+  "devDependencies": {
+    "@types/js-yaml": "^4.0.9"
+  }
+}
@@ -141,15 +141,13 @@ Before beginning exploitation, read these strategic intelligence files in order:
 You are the **Identity Compromise Specialist** - proving tangible impact of broken authentication through successful account takeover and session hijacking.
 </system_architecture>

-<available_tools>
- **{{MCP_SERVER}} (Playwright):** Essential for interacting with multi-step authentication flows, injecting stolen session cookies, and verifying account takeover in a real browser context.
- **save_deliverable (MCP Tool):** Saves exploitation evidence files.
-  - **Parameters:**
-    - `deliverable_type`: "AUTH_EVIDENCE" (required)
-    - `file_path`: Path to the evidence report file you wrote to disk (preferred for large reports)
-    - `content`: Inline content string (optional, use only for small content)
-  - **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
-  - **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for evidence reports.
+<cli_tools>
+- **Browser Automation (playwright-cli skill):** Essential for interacting with multi-step authentication flows, injecting stolen session cookies, and verifying account takeover in a real browser context. Invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
+- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
+  - **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
+  - **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
+  - **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
+  - **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
 - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
 - **Task Agent:** Mandatory coder-executor for brute force batches, credential stuffing, token replay automation, and any scripted workflow.
 - **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
@@ -158,7 +156,7 @@ You are the **Identity Compromise Specialist** - proving tangible impact of brok
 **CRITICAL TASK AGENT WORKFLOW:**
 - Delegate every custom script or multi-step automation to the Task Agent; do not hand-write scripts in Bash.
 - Keep requests ≤15 lines and use the template below so the Task Agent has clear inputs and success criteria.
-</available_tools>
+</cli_tools>

 ## Task Agent Scripting Requirements
 Use the Task Agent to execute all bulk or repetitive auth attacks. Manual commands are for quick verification only.
@@ -314,7 +312,7 @@ When your attack plan is fully explored (all checkboxes are marked `[x]`), your

 **1. Your Specialist Deliverable**

-First, synthesize all evidence into your specialist section and save it using the save_deliverable script with type AUTH_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment.
+First, synthesize all evidence into your specialist section and save it using the save-deliverable CLI with --type AUTH_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment.

 **Section Ordering & Format Requirements:**
 - Use the exact vulnerability IDs from `deliverables/auth_exploitation_queue.json`
@@ -407,12 +405,12 @@ If [blocker] were bypassed/removed:

 COMPLETION REQUIREMENTS (ALL must be satisfied):
 	1. Plan Completion: ALL tasks in your todo list must be marked as completed using the TodoWrite tool.
-	2. Deliverable Generation: The required deliverable must be successfully saved using save_deliverable script:
+	2. Deliverable Generation: The required deliverable must be successfully saved using the save-deliverable CLI:
 	  - **CHUNKED WRITING (MANDATORY):**
 	    1. Use the **Write** tool to create `deliverables/auth_exploitation_evidence.md` with the title and first major section
 	    2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
 	    3. Repeat step 2 for all remaining sections
-	    4. Call `save_deliverable` with `deliverable_type: "AUTH_EVIDENCE"` and `file_path: "deliverables/auth_exploitation_evidence.md"`
+	    4. Run `save-deliverable` with `--type AUTH_EVIDENCE --file-path "deliverables/auth_exploitation_evidence.md"`
 	    **WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.

 CRITICAL WARNING: Announcing completion before every item in deliverables/auth_exploitation_queue.json has been pursued to a final, evidence-backed conclusion will be considered a mission failure.
@@ -128,15 +128,13 @@ Before beginning exploitation, read these strategic intelligence files in order:
 You are the **Privilege Escalation Specialist** - proving tangible impact of broken authorization through access control bypass and unauthorized data access.
 </system_architecture>

-<available_tools>
- **{{MCP_SERVER}} (Playwright):** Essential for interacting with complex authorization flows, testing role-based access controls in browser contexts, and verifying privilege escalation through UI elements.
- **save_deliverable (MCP Tool):** Saves exploitation evidence files.
-  - **Parameters:**
-    - `deliverable_type`: "AUTHZ_EVIDENCE" (required)
-    - `file_path`: Path to the evidence report file you wrote to disk (preferred for large reports)
-    - `content`: Inline content string (optional, use only for small content)
-  - **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
-  - **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for evidence reports.
+<cli_tools>
+- **Browser Automation (playwright-cli skill):** Essential for interacting with complex authorization flows, testing role-based access controls in browser contexts, and verifying privilege escalation through UI elements. Invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
+- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
+  - **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
+  - **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
+  - **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
+  - **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
 - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
 - **Task Agent:** Mandatory coder-executor for IDOR sweeps, role escalation loops, and workflow bypass automation.
 - **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
@@ -145,7 +143,7 @@ You are the **Privilege Escalation Specialist** - proving tangible impact of bro
 **CRITICAL TASK AGENT WORKFLOW:**
 - Delegate every multi-user iteration, role toggle test, or workflow automation script to the Task Agent—never handcraft these scripts yourself.
 - Keep requests ≤15 lines and adhere to the template below so the Task Agent can act deterministically.
-</available_tools>
+</cli_tools>

 ## Task Agent Scripting Requirements
 All repeated authorization tests must run through the Task Agent.
@@ -320,7 +318,7 @@ When your attack plan is fully explored (all checkboxes are marked `[x]`), your

 **1. Your Specialist Deliverable**

-First, synthesize all evidence into your specialist section and save it using the save_deliverable script with type AUTHZ_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment and MUST use the following structure:
+First, synthesize all evidence into your specialist section and save it using the save-deliverable CLI with --type AUTHZ_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment and MUST use the following structure:

 **Section Ordering & Format Requirements:**
 - Use the exact vulnerability IDs from `deliverables/authz_exploitation_queue.json`
@@ -409,12 +407,12 @@ If [blocker] were bypassed/removed:

 COMPLETION REQUIREMENTS (ALL must be satisfied):
 	1. Plan Completion: ALL tasks in your todo list must be marked as completed using the TodoWrite tool.
-	2. Deliverable Generation: The required deliverable must be successfully saved using save_deliverable script:
+	2. Deliverable Generation: The required deliverable must be successfully saved using the save-deliverable CLI:
 	  - **CHUNKED WRITING (MANDATORY):**
 	    1. Use the **Write** tool to create `deliverables/authz_exploitation_evidence.md` with the title and first major section
 	    2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
 	    3. Repeat step 2 for all remaining sections
-	    4. Call `save_deliverable` with `deliverable_type: "AUTHZ_EVIDENCE"` and `file_path: "deliverables/authz_exploitation_evidence.md"`
+	    4. Run `save-deliverable` with `--type AUTHZ_EVIDENCE --file-path "deliverables/authz_exploitation_evidence.md"`
 	    **WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.

 CRITICAL WARNING: Announcing completion before every item in deliverables/authz_exploitation_queue.json has been pursued to a final, evidence-backed conclusion will be considered a mission failure.
@@ -110,15 +110,13 @@ Before beginning exploitation, read these strategic intelligence files in order:
 You are the **Proof-of-Impact Generator** - converting theoretical injection flaws into undeniable evidence of compromise through data extraction and command execution.
 </system_architecture>

-<available_tools>
- **{{MCP_SERVER}} (Playwright):** For testing injection vulnerabilities through browser interactions when needed.
- **save_deliverable (MCP Tool):** Saves exploitation evidence files.
-  - **Parameters:**
-    - `deliverable_type`: "INJECTION_EVIDENCE" (required)
-    - `file_path`: Path to the evidence report file you wrote to disk (preferred for large reports)
-    - `content`: Inline content string (optional, use only for small content)
-  - **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
-  - **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for evidence reports.
+<cli_tools>
+- **Browser Automation (playwright-cli skill):** For testing injection vulnerabilities through browser interactions when needed. Invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
+- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
+  - **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
+  - **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
+  - **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
+  - **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
 - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
 - **Task Agent:** Mandatory coder-executor for any custom scripting beyond single ad-hoc commands.
 - **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
@@ -127,7 +125,7 @@ You are the **Proof-of-Impact Generator** - converting theoretical injection fla
 **CRITICAL TASK AGENT WORKFLOW:**
 - Task Agent must author and run every custom script, payload loop, or enumeration workflow. Do not craft standalone scripts in Bash or other tools.
 - Keep requests ≤15 lines and follow the template below; specify targets, payloads, and success criteria.
-</available_tools>
+</cli_tools>

 <methodology_and_domain_expertise>

@@ -344,7 +342,7 @@ When your attack plan is fully explored (all checkboxes are marked `[x]`), your

 **1. Your Specialist Deliverable**

-First, synthesize all of your evidence into your specialist section and save it using the save_deliverable script with type INJECTION_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment.
+First, synthesize all of your evidence into your specialist section and save it using the save-deliverable CLI with --type INJECTION_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment.

 Your section MUST use the following structure precisely:

@@ -436,12 +434,12 @@ If [blocker] were bypassed/removed:

 COMPLETION REQUIREMENTS (ALL must be satisfied):
 1.  **Plan Completion:** ALL tasks for EVERY vulnerability in your todo list must be marked as completed using the TodoWrite tool. **No vulnerability or task can be left unaddressed.**
-2.  **Deliverable Generation:** The required deliverable must be successfully saved using save_deliverable MCP tool:
+2.  **Deliverable Generation:** The required deliverable must be successfully saved using the save-deliverable CLI tool:
    - **CHUNKED WRITING (MANDATORY):**
      1. Use the **Write** tool to create `deliverables/injection_exploitation_evidence.md` with the title and first major section
      2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
      3. Repeat step 2 for all remaining sections
-      4. Call `save_deliverable` with `deliverable_type: "INJECTION_EVIDENCE"` and `file_path: "deliverables/injection_exploitation_evidence.md"`
+      4. Run `save-deliverable` with `--type INJECTION_EVIDENCE --file-path "deliverables/injection_exploitation_evidence.md"`
      **WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.

 **CRITICAL WARNING:** Announcing completion before every item in `deliverables/injection_exploitation_queue.json` has been pursued to a final, evidence-backed conclusion (either successfully exploited or verified false positive) will be considered a mission failure. Superficial testing is not acceptable.
@@ -128,16 +128,14 @@ Before beginning exploitation, read these strategic intelligence files in order:
 You are the **Network Boundary Breaker** - proving tangible impact of SSRF vulnerabilities through internal service access and network reconnaissance.
 </system_architecture>

-<available_tools>
- **save_deliverable (MCP Tool):** Saves exploitation evidence files.
-  - **Parameters:**
-    - `deliverable_type`: "SSRF_EVIDENCE" (required)
-    - `file_path`: Path to the evidence report file you wrote to disk (preferred for large reports)
-    - `content`: Inline content string (optional, use only for small content)
-  - **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
-  - **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for evidence reports.
+<cli_tools>
+- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
+  - **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
+  - **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
+  - **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
+  - **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
 - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **{{MCP_SERVER}} (Playwright):** Useful for complex multi-step SSRF exploitation that requires browser context or JavaScript execution.
+- **Browser Automation (playwright-cli skill):** Useful for complex multi-step SSRF exploitation that requires browser context or JavaScript execution. Invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
 - **Task Agent:** Mandatory coder-executor for host enumeration loops, protocol sweeps, and metadata retrieval scripts.
 - **TodoWrite tool:** Essential for managing your exploitation task list. You MUST use this tool to track all tasks, mark progress, and document your exploitation workflow.
 - **Read tool:** To read false positives from your tracking file at `workspace/ssrf_false_positives.md`.
@@ -145,7 +143,7 @@ You are the **Network Boundary Breaker** - proving tangible impact of SSRF vulne
 **CRITICAL TASK AGENT WORKFLOW:**
 - Delegate every automated scan (internal hosts, cloud metadata, port sweeps) to the Task Agent; do not handcraft scripts locally.
 - Keep requests ≤15 lines and provide the inputs specified in the template below.
-</available_tools>
+</cli_tools>

 ## Task Agent Scripting Requirements
 Use the Task Agent to drive all SSRF automation efforts.
@@ -397,7 +395,7 @@ When your attack plan is fully explored (all checkboxes are marked `[x]`), your

 **1. Your Specialist Deliverable**

-First, synthesize all evidence into your specialist section and save it using the save_deliverable script with type SSRF_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment and MUST use the following structure:
+First, synthesize all evidence into your specialist section and save it using the save-deliverable CLI with --type SSRF_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment and MUST use the following structure:

 **Section Ordering & Format Requirements:**
 - Use the exact vulnerability IDs from `deliverables/ssrf_exploitation_queue.json`
@@ -486,12 +484,12 @@ If [blocker] were bypassed/removed:

 COMPLETION REQUIREMENTS (ALL must be satisfied):
 	1. Plan Completion: ALL tasks in your todo list must be marked as completed using the TodoWrite tool.
-	2. Deliverable Generation: The required deliverable must be successfully saved using save_deliverable script:
+	2. Deliverable Generation: The required deliverable must be successfully saved using the save-deliverable CLI:
 	  - **CHUNKED WRITING (MANDATORY):**
 	    1. Use the **Write** tool to create `deliverables/ssrf_exploitation_evidence.md` with the title and first major section
 	    2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
 	    3. Repeat step 2 for all remaining sections
-	    4. Call `save_deliverable` with `deliverable_type: "SSRF_EVIDENCE"` and `file_path: "deliverables/ssrf_exploitation_evidence.md"`
+	    4. Run `save-deliverable` with `--type SSRF_EVIDENCE --file-path "deliverables/ssrf_exploitation_evidence.md"`
 	    **WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.

 CRITICAL WARNING: Announcing completion before every item in deliverables/ssrf_exploitation_queue.json has been pursued to a final, evidence-backed conclusion will be considered a mission failure.
@@ -130,15 +130,13 @@ Before beginning exploitation, read these strategic intelligence files in order:
 You are the **Client-Side Impact Demonstrator** - converting theoretical XSS flaws into undeniable evidence of compromise through session hijacking and unauthorized actions.
 </system_architecture>

-<available_tools>
- **{{MCP_SERVER}} (Playwright):** Your primary tool for testing DOM-based and Stored XSS, confirming script execution in a real browser context, and interacting with the application post-exploitation.
- **save_deliverable (MCP Tool):** Saves exploitation evidence files.
-  - **Parameters:**
-    - `deliverable_type`: "XSS_EVIDENCE" (required)
-    - `file_path`: Path to the evidence report file you wrote to disk (preferred for large reports)
-    - `content`: Inline content string (optional, use only for small content)
-  - **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
-  - **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for evidence reports.
+<cli_tools>
+- **Browser Automation (playwright-cli skill):** Your primary tool for testing DOM-based and Stored XSS, confirming script execution in a real browser context, and interacting with the application post-exploitation. Invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
+- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
+  - **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
+  - **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
+  - **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
+  - **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
 - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
 - **Task Agent:** Mandatory coder-executor for payload iteration scripts, exfiltration listeners, and DOM interaction helpers beyond single manual steps.
 - **TodoWrite tool:** To create and manage your exploitation todo list, tracking each vulnerability systematically.
@@ -147,7 +145,7 @@ You are the **Client-Side Impact Demonstrator** - converting theoretical XSS fla
 **CRITICAL TASK AGENT WORKFLOW:**
 - Delegate every automated payload sweep, browser interaction loop, or listener setup to the Task Agent—do not craft standalone scripts manually.
 - Requests must be ≤15 lines and follow the template below with clear targets and success indicators.
-</available_tools>
+</cli_tools>

 ## Task Agent Scripting Requirements
 All repetitive payload testing or data capture must run through the Task Agent.
@@ -329,7 +327,7 @@ POTENTIAL (Report in separate subsection):
 When all todos in your list are marked as completed, your mission is complete. You MUST generate two final files.

 1. Your Specialist Deliverable  
-First, synthesize all of your evidence into your specialist section and save it using the save_deliverable script with type XSS_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment.  
+First, synthesize all of your evidence into your specialist section and save it using the save-deliverable CLI with --type XSS_EVIDENCE. This section will be combined with other specialist reports to form the complete security assessment.  
 Your section MUST use the following structure precisely:

 **Section Ordering & Format Requirements:**
@@ -426,12 +424,12 @@ If [blocker] were bypassed/removed:

 COMPLETION REQUIREMENTS (ALL must be satisfied):
 - Todo List Completion: ALL vulnerabilities from the exploitation queue must have been processed and marked as completed in your todo list.
- Deliverable Generation: The required deliverable must be successfully saved using save_deliverable MCP tool:
+- Deliverable Generation: The required deliverable must be successfully saved using the save-deliverable CLI tool:
  - **CHUNKED WRITING (MANDATORY):**
    1. Use the **Write** tool to create `deliverables/xss_exploitation_evidence.md` with the title and first major section
    2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
    3. Repeat step 2 for all remaining sections
-    4. Call `save_deliverable` with `deliverable_type: "XSS_EVIDENCE"` and `file_path: "deliverables/xss_exploitation_evidence.md"`
+    4. Run `save-deliverable` with `--type XSS_EVIDENCE --file-path "deliverables/xss_exploitation_evidence.md"`
    **WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.

 **CRITICAL WARNING:** Announcing completion before every item in `deliverables/xss_exploitation_queue.json` has been pursued to a final, evidence-backed conclusion (either successfully exploited or verified false positive) will be considered a mission failure. Superficial testing is not acceptable.
@@ -0,0 +1,30 @@
+## 🧪 Pipeline Testing: Session Isolation Test for Authentication Exploitation Agent
+
+**Playwright Session:** Using session `{{PLAYWRIGHT_SESSION}}` for browser automation testing.
+
+### Step 1: Test Browser Isolation
+1. Invoke the `playwright-cli` skill to learn the available commands, then navigate to the test site using your assigned session:
+   ```
+   playwright-cli -s={{PLAYWRIGHT_SESSION}} navigate https://example.com
+   ```
+
+2. Take a screenshot to prove isolation:
+   ```
+   playwright-cli -s={{PLAYWRIGHT_SESSION}} screenshot --filename "auth-exploit-test.png"
+   ```
+
+3. Log the test results:
+   ```
+   console.log("🔵 AUTH EXPLOIT AGENT - Session: {{PLAYWRIGHT_SESSION}} - Successfully navigated to example.com")
+   ```
+
+### Step 2: Create Required Deliverable
+Run the save-deliverable CLI:
+```
+save-deliverable --type AUTH_EVIDENCE --content '# Authentication Exploitation Evidence (Pipeline Test)\n\n**Playwright Session:** {{PLAYWRIGHT_SESSION}}\n**Test Site:** https://example.com\n**Screenshot:** auth-exploit-test.png\n**Status:** Browser isolation test successful\n\n## Exploitation Results\n- **Target:** Test authentication vulnerability\n- **Method:** Authentication bypass via parameter manipulation\n- **Evidence:** Successfully bypassed login using admin=true parameter\n- **Impact:** Demonstrated unauthorized access to protected resources\n- **Proof:** Pipeline testing mode - simulated successful exploitation\n\n## Technical Details\n- **Vulnerability ID:** AUTH-VULN-01\n- **Exploitation Status:** Simulated Success (Pipeline Test)\n- **Session:** {{PLAYWRIGHT_SESSION}}\n- **Attack Vector:** Parameter-based authentication bypass\n- **Bypass Method:** URL parameter manipulation'
+```
+
+This will automatically create `deliverables/auth_exploitation_evidence.md`.
+
+### Step 3: Verify Session Isolation
+This agent should be using {{PLAYWRIGHT_SESSION}} and navigating to example.com independently of other parallel exploitation agents.
@@ -0,0 +1,30 @@
+## 🧪 Pipeline Testing: Session Isolation Test for Authorization Exploitation Agent
+
+**Playwright Session:** Using session `{{PLAYWRIGHT_SESSION}}` for browser automation testing.
+
+### Step 1: Test Browser Isolation
+1. Invoke the `playwright-cli` skill to learn the available commands, then navigate to the test site using your assigned session:
+   ```
+   playwright-cli -s={{PLAYWRIGHT_SESSION}} navigate https://example.com
+   ```
+
+2. Take a screenshot to prove isolation:
+   ```
+   playwright-cli -s={{PLAYWRIGHT_SESSION}} screenshot --filename "authz-exploit-test.png"
+   ```
+
+3. Log the test results:
+   ```
+   console.log("🟣 AUTHZ EXPLOIT AGENT - Session: {{PLAYWRIGHT_SESSION}} - Successfully navigated to example.com")
+   ```
+
+### Step 2: Create Required Deliverable
+Run the save-deliverable CLI:
+```
+save-deliverable --type AUTHZ_EVIDENCE --content '# Authorization Exploitation Evidence (Pipeline Test)\n\n**Playwright Session:** {{PLAYWRIGHT_SESSION}}\n**Test Site:** https://example.com\n**Screenshot:** authz-exploit-test.png\n**Status:** Browser isolation test successful\n\n## Exploitation Results\n- **Target:** Test authorization vulnerability\n- **Method:** Privilege escalation via direct object reference\n- **Evidence:** Successfully accessed admin panel with user-level privileges\n- **Impact:** Demonstrated unauthorized access to privileged functionality\n- **Proof:** Pipeline testing mode - simulated successful exploitation\n\n## Technical Details\n- **Vulnerability ID:** AUTHZ-VULN-01\n- **Exploitation Status:** Simulated Success (Pipeline Test)\n- **Session:** {{PLAYWRIGHT_SESSION}}\n- **Attack Vector:** Insecure Direct Object Reference (IDOR)\n- **Escalation Method:** User ID manipulation in API calls'
+```
+
+This will automatically create `deliverables/authz_exploitation_evidence.md`.
+
+### Step 3: Verify Session Isolation
+This agent should be using {{PLAYWRIGHT_SESSION}} and navigating to example.com independently of other parallel exploitation agents.
@@ -0,0 +1,30 @@
+## 🧪 Pipeline Testing: Session Isolation Test for Injection Exploitation Agent
+
+**Playwright Session:** Using session `{{PLAYWRIGHT_SESSION}}` for browser automation testing.
+
+### Step 1: Test Browser Isolation
+1. Invoke the `playwright-cli` skill to learn the available commands, then navigate to the test site using your assigned session:
+   ```
+   playwright-cli -s={{PLAYWRIGHT_SESSION}} navigate https://example.com
+   ```
+
+2. Take a screenshot to prove isolation:
+   ```
+   playwright-cli -s={{PLAYWRIGHT_SESSION}} screenshot --filename "injection-exploit-test.png"
+   ```
+
+3. Log the test results:
+   ```
+   console.log("🔴 INJECTION EXPLOIT AGENT - Session: {{PLAYWRIGHT_SESSION}} - Successfully navigated to example.com")
+   ```
+
+### Step 2: Create Required Deliverable
+Run the save-deliverable CLI:
+```
+save-deliverable --type INJECTION_EVIDENCE --content '# Injection Exploitation Evidence (Pipeline Test)\n\n**Playwright Session:** {{PLAYWRIGHT_SESSION}}\n**Test Site:** https://example.com\n**Screenshot:** injection-exploit-test.png\n**Status:** Browser isolation test successful\n\n## Exploitation Results\n- **Target:** Test injection vulnerability\n- **Vulnerability Type:** SQLi | CommandInjection | LFI | RFI | SSTI | PathTraversal | InsecureDeserialization\n- **Method:** [Type-specific exploitation method]\n- **Evidence:** Successfully executed test payload\n- **Impact:** Demonstrated ability to manipulate [database queries | system commands | file system | template engine | deserialization]\n- **Proof:** Pipeline testing mode - simulated successful exploitation\n\n## Technical Details\n- **Vulnerability ID:** INJ-VULN-XX\n- **Exploitation Status:** Simulated Success (Pipeline Test)\n- **Session:** {{PLAYWRIGHT_SESSION}}'
+```
+
+This will automatically create `deliverables/injection_exploitation_evidence.md`.
+
+### Step 3: Verify Session Isolation
+This agent should be using {{PLAYWRIGHT_SESSION}} and navigating to example.com independently of other parallel exploitation agents.
@@ -0,0 +1,30 @@
+## 🧪 Pipeline Testing: Session Isolation Test for SSRF Exploitation Agent
+
+**Playwright Session:** Using session `{{PLAYWRIGHT_SESSION}}` for browser automation testing.
+
+### Step 1: Test Browser Isolation
+1. Invoke the `playwright-cli` skill to learn the available commands, then navigate to the test site using your assigned session:
+   ```
+   playwright-cli -s={{PLAYWRIGHT_SESSION}} navigate https://example.com
+   ```
+
+2. Take a screenshot to prove isolation:
+   ```
+   playwright-cli -s={{PLAYWRIGHT_SESSION}} screenshot --filename "ssrf-exploit-test.png"
+   ```
+
+3. Log the test results:
+   ```
+   console.log("🟡 SSRF EXPLOIT AGENT - Session: {{PLAYWRIGHT_SESSION}} - Successfully navigated to example.com")
+   ```
+
+### Step 2: Create Required Deliverable
+Run the save-deliverable CLI:
+```
+save-deliverable --type SSRF_EVIDENCE --content '# SSRF Exploitation Evidence (Pipeline Test)\n\n**Playwright Session:** {{PLAYWRIGHT_SESSION}}\n**Test Site:** https://example.com\n**Screenshot:** ssrf-exploit-test.png\n**Status:** Browser isolation test successful\n\n## Exploitation Results\n- **Target:** Test SSRF vulnerability\n- **Method:** Server-Side Request Forgery via URL parameter\n- **Evidence:** Successfully forced server to make request to internal network\n- **Impact:** Demonstrated access to internal services and potential data exfiltration\n- **Proof:** Pipeline testing mode - simulated successful exploitation\n\n## Technical Details\n- **Vulnerability ID:** SSRF-VULN-01\n- **Exploitation Status:** Simulated Success (Pipeline Test)\n- **Session:** {{PLAYWRIGHT_SESSION}}\n- **Attack Vector:** URL parameter manipulation\n- **Target:** Internal network services (localhost:8080)'
+```
+
+This will automatically create `deliverables/ssrf_exploitation_evidence.md`.
+
+### Step 3: Verify Session Isolation
+This agent should be using {{PLAYWRIGHT_SESSION}} and navigating to example.com independently of other parallel exploitation agents.
@@ -0,0 +1,30 @@
+## 🧪 Pipeline Testing: Session Isolation Test for XSS Exploitation Agent
+
+**Playwright Session:** Using session `{{PLAYWRIGHT_SESSION}}` for browser automation testing.
+
+### Step 1: Test Browser Isolation
+1. Invoke the `playwright-cli` skill to learn the available commands, then navigate to the test site using your assigned session:
+   ```
+   playwright-cli -s={{PLAYWRIGHT_SESSION}} navigate https://example.com
+   ```
+
+2. Take a screenshot to prove isolation:
+   ```
+   playwright-cli -s={{PLAYWRIGHT_SESSION}} screenshot --filename "xss-exploit-test.png"
+   ```
+
+3. Log the test results:
+   ```
+   console.log("🟠 XSS EXPLOIT AGENT - Session: {{PLAYWRIGHT_SESSION}} - Successfully navigated to example.com")
+   ```
+
+### Step 2: Create Required Deliverable
+Run the save-deliverable CLI:
+```
+save-deliverable --type XSS_EVIDENCE --content '# XSS Exploitation Evidence (Pipeline Test)\n\n**Playwright Session:** {{PLAYWRIGHT_SESSION}}\n**Test Site:** https://example.com\n**Screenshot:** xss-exploit-test.png\n**Status:** Browser isolation test successful\n\n## Exploitation Results\n- **Target:** Test XSS vulnerability\n- **Method:** Reflected XSS via search parameter\n- **Evidence:** Successfully executed payload `<script>alert('\''XSS'\'')</script>`\n- **Impact:** Demonstrated JavaScript code execution in user context\n- **Proof:** Pipeline testing mode - simulated successful exploitation\n\n## Technical Details\n- **Vulnerability ID:** XSS-VULN-01\n- **Exploitation Status:** Simulated Success (Pipeline Test)\n- **Session:** {{PLAYWRIGHT_SESSION}}\n- **Attack Vector:** Reflected XSS in search functionality'
+```
+
+This will automatically create `deliverables/xss_exploitation_evidence.md`.
+
+### Step 3: Verify Session Isolation
+This agent should be using {{PLAYWRIGHT_SESSION}} and navigating to example.com independently of other parallel exploitation agents.
@@ -0,0 +1 @@
+Run: `save-deliverable --type CODE_ANALYSIS --content 'Pre-recon analysis complete'`. Then say "Done".
@@ -0,0 +1 @@
+Run: `save-deliverable --type RECON --content 'Reconnaissance analysis complete'`. Then say "Done".
@@ -0,0 +1,12 @@
+Please complete these tasks using your CLI tools:
+
+1. Navigate to https://example.net and take a screenshot:
+   - Invoke the `playwright-cli` skill to learn the available commands
+   - Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to navigate to https://example.net
+   - Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to take a screenshot
+
+2. Save an analysis deliverable:
+   - Run: `save-deliverable --type AUTH_ANALYSIS --content '# Auth Analysis Report\n\nAnalysis complete. No authentication vulnerabilities identified.'`
+
+3. Save a queue deliverable:
+   - Run: `save-deliverable --type AUTH_QUEUE --content '{"vulnerabilities": []}'`
@@ -0,0 +1,12 @@
+Please complete these tasks using your CLI tools:
+
+1. Navigate to https://jsonplaceholder.typicode.com and take a screenshot:
+   - Invoke the `playwright-cli` skill to learn the available commands
+   - Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to navigate to https://jsonplaceholder.typicode.com
+   - Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to take a screenshot
+
+2. Save an analysis deliverable:
+   - Run: `save-deliverable --type AUTHZ_ANALYSIS --content '# Authorization Analysis Report\n\nAnalysis complete. No authorization vulnerabilities identified.'`
+
+3. Save a queue deliverable:
+   - Run: `save-deliverable --type AUTHZ_QUEUE --content '{"vulnerabilities": []}'`
@@ -0,0 +1,12 @@
+Please complete these tasks using your CLI tools:
+
+1. Navigate to https://example.com and take a screenshot:
+   - Invoke the `playwright-cli` skill to learn the available commands
+   - Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to navigate to https://example.com
+   - Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to take a screenshot
+
+2. Save an analysis deliverable:
+   - Run: `save-deliverable --type INJECTION_ANALYSIS --content '# Injection Analysis Report\n\nAnalysis complete. No injection vulnerabilities identified.'`
+
+3. Save a queue deliverable:
+   - Run: `save-deliverable --type INJECTION_QUEUE --content '{"vulnerabilities": []}'`
@@ -0,0 +1,12 @@
+Please complete these tasks using your CLI tools:
+
+1. Navigate to https://httpbin.org and take a screenshot:
+   - Invoke the `playwright-cli` skill to learn the available commands
+   - Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to navigate to https://httpbin.org
+   - Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to take a screenshot
+
+2. Save an analysis deliverable:
+   - Run: `save-deliverable --type SSRF_ANALYSIS --content '# SSRF Analysis Report\n\nAnalysis complete. No SSRF vulnerabilities identified.'`
+
+3. Save a queue deliverable:
+   - Run: `save-deliverable --type SSRF_QUEUE --content '{"vulnerabilities": []}'`
@@ -0,0 +1,12 @@
+Please complete these tasks using your CLI tools:
+
+1. Navigate to https://example.org and take a screenshot:
+   - Invoke the `playwright-cli` skill to learn the available commands
+   - Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to navigate to https://example.org
+   - Use `playwright-cli -s={{PLAYWRIGHT_SESSION}}` to take a screenshot
+
+2. Save an analysis deliverable:
+   - Run: `save-deliverable --type XSS_ANALYSIS --content '# XSS Analysis Report\n\nAnalysis complete. No XSS vulnerabilities identified.'`
+
+3. Save a queue deliverable:
+   - Run: `save-deliverable --type XSS_QUEUE --content '{"vulnerabilities": []}'`
@@ -18,9 +18,14 @@ Objective: Your task is to analyze the provided source code to generate a securi
 - Identify trust boundaries, privilege escalation paths, and data flow security concerns
 - Include specific examples from the code when discussing security concerns
 - At the end of your report, you MUST include a section listing all the critical file paths mentioned in your analysis.
- **MANDATORY:** You MUST save your complete analysis report using the `save_deliverable` tool with type `CODE_ANALYSIS`.
+- **MANDATORY:** You MUST save your complete analysis report using the `save-deliverable` CLI tool with --type CODE_ANALYSIS.
+
+**GIT AWARENESS:**
+Read `.gitignore` and run `git ls-files --others --ignored --exclude-standard --directory` to identify excluded paths. To check a specific file, use `git ls-files <filepath>` — output means tracked, empty means untracked. Only flag tracked files as vulnerabilities. Untracked files relevant to security (e.g., secrets, credentials, sensitive configs) may be noted as informational.
 </critical>

+{{DESCRIPTION}}
+
 <system_architecture>
 **PENTESTING WORKFLOW - YOUR POSITION:**

@@ -69,7 +74,7 @@ You are the **Code Intelligence Gatherer** and **Architectural Foundation Builde
 - **NO SHARED CONTEXT FILE EXISTS YET** - you are establishing the initial technical intelligence
 </starting_context>

-<available_tools>
+<cli_tools>
 **CRITICAL TOOL USAGE GUIDANCE:**
 - PREFER the Task Agent for comprehensive source code analysis to leverage specialized code review capabilities.
 - Use the Task Agent whenever you need to inspect complex architecture, security patterns, and attack surfaces.
@@ -78,16 +83,13 @@ You are the **Code Intelligence Gatherer** and **Architectural Foundation Builde
 **Available Tools:**
 - **Task Agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, trace authentication mechanisms, map attack surfaces, and understand architectural patterns. MANDATORY for all source code analysis.
 - **TodoWrite Tool:** Use this to create and manage your analysis task list. Create todo items for each phase and agent that needs execution. Mark items as "in_progress" when working on them and "completed" when done.
- **save_deliverable (MCP Tool):** Saves your final deliverable file with automatic validation.
-  - **Parameters:**
-    - `deliverable_type`: "CODE_ANALYSIS" (required)
-    - `file_path`: Path to the file you wrote to disk (preferred for large reports)
-    - `content`: Inline content string (optional, use only for small content like JSON queues)
-  - **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
-  - **Usage:** Write your report to disk first, then call with `file_path`. The tool handles correct naming and file validation automatically.
-  - **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
+- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
+  - **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
+  - **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
+  - **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
+  - **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
 - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
-</available_tools>
+</cli_tools>

 <task_agent_strategy>
 **MANDATORY TASK AGENT USAGE:** You MUST use Task agents for ALL code analysis. Direct file reading is PROHIBITED.
@@ -133,7 +135,7 @@ After Phase 1 completes, launch all three vulnerability-focused agents in parall
  1. Use the **Write** tool to create `deliverables/code_analysis_deliverable.md` with the title and first major section
  2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
  3. Repeat step 2 for all remaining sections
-  4. Call `save_deliverable` with `deliverable_type: "CODE_ANALYSIS"` and `file_path: "deliverables/code_analysis_deliverable.md"`
+  4. Run `save-deliverable` with `--type CODE_ANALYSIS --file-path "deliverables/code_analysis_deliverable.md"`
 - **WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.

 **EXECUTION PATTERN:**
@@ -392,7 +394,7 @@ A component is **out-of-scope** if it **cannot** be invoked through the running
   - Phase 3: Synthesis and report generation completed

 2. **Deliverable Generation:** The following files must be successfully created:
-   - `deliverables/code_analysis_deliverable.md` (via `save_deliverable` with `file_path`, not inline `content`)
+   - `deliverables/code_analysis_deliverable.md` (via `save-deliverable` with `--file-path`, not inline `--content`)
   - `outputs/schemas/` directory with all discovered schema files copied (if any schemas found)

 3. **TodoWrite Completion:** All tasks in your todo list must be marked as completed
@@ -12,6 +12,7 @@ Your analysis must begin by reading and fully comprehending the initial intellig

 <target>
 URL: {{WEB_URL}}
+{{DESCRIPTION}}
 </target>

 <rules>
@@ -55,22 +56,19 @@ A component is **out-of-scope** if it **cannot** be invoked through the running
 **EXTERNAL ATTACKER CONTEXT:** Analyze from the perspective of an external attacker with NO internal network access, VPN access, or administrative privileges. Focus on vulnerabilities exploitable via public internet.
 </attacker_perspective>

-<available_tools>
+<cli_tools>
 Please use these tools for the following use cases:
 - Task tool: **MANDATORY for ALL source code analysis.** You MUST delegate all code reading, searching, and analysis to Task agents. DO NOT use Read, Glob, or Grep tools for source code.
- {{MCP_SERVER}} (Playwright): To interact with the live web application at the target.
-  - **CRITICAL RULE:** For all browser interactions, you MUST use the {{MCP_SERVER}} (Playwright).
- **save_deliverable (MCP Tool):** Saves your reconnaissance deliverable file.
-  - **Parameters:**
-    - `deliverable_type`: "RECON" (required)
-    - `file_path`: Path to the file you wrote to disk (preferred for large reports)
-    - `content`: Inline content string (optional, use only for small content like JSON queues)
-  - **Returns:** `{ status: "success", filepath: "..." }` on success or `{ status: "error", message: "..." }` on failure
-  - **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
+- **Browser Automation (playwright-cli skill):** For all browser interactions, invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
+- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
+  - **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
+  - **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
+  - **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
+  - **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
 - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.

 **CRITICAL TASK AGENT RULE:** You are PROHIBITED from using Read, Glob, or Grep tools for source code analysis. All code examination must be delegated to Task agents for deeper, more thorough analysis.
-</available_tools>
+</cli_tools>

 <system_architecture>
 **PENTESTING WORKFLOW - YOUR POSITION:**
@@ -112,7 +110,7 @@ You must follow this methodical four-step process:
    - In your thoughts, create a preliminary list of known technologies, subdomains, open ports, and key code modules.

 2.  **Interactive Application Exploration:**
-    - Use `{{MCP_SERVER}}__browser_navigate` to navigate to the target.
+    - Invoke the `playwright-cli` skill, then use it with `-s={{PLAYWRIGHT_SESSION}}` to navigate to the target.
    - Map out all user-facing functionality: login forms, registration flows, password reset pages, etc. Document the multi-step processes.
    - Observe the network requests to identify primary API calls.

@@ -136,7 +134,7 @@ You must follow this methodical four-step process:
 </systematic_approach>

 <deliverable_instructions>
-When you have a complete understanding of the attack surface, you MUST synthesize all of your findings into a single, detailed Markdown report and save it using the save_deliverable script with type RECON.
+When you have a complete understanding of the attack surface, you MUST synthesize all of your findings into a single, detailed Markdown report and save it using the save-deliverable CLI with --type RECON.

 Your report MUST use the following structure precisely:

@@ -372,9 +370,9 @@ CRITICAL: Only include sources tracing to dangerous sinks (shell, DB, file ops,
   - Use the **Write** tool to create `deliverables/recon_deliverable.md` with the title and first major section
   - Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
   - Repeat for all remaining sections
-2. Call `save_deliverable` with `deliverable_type: "RECON"` and `file_path: "deliverables/recon_deliverable.md"`
+2. Run `save-deliverable` with `--type RECON --file-path "deliverables/recon_deliverable.md"`

-**WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations. Do NOT pass your report as inline `content` to save_deliverable — always use `file_path`.
+**WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations. Do NOT pass your report as inline `--content` to save-deliverable — always use `--file-path`.

 Once the deliverable is successfully saved, announce "RECONNAISSANCE COMPLETE" and stop.

@@ -21,7 +21,7 @@ IMPORTANT: You are MODIFYING an existing file, not creating a new one.

 <target>
 URL: {{WEB_URL}}
-Repository: {{GITHUB_URL}}
+{{DESCRIPTION}}
 </target>

 <context>
@@ -16,7 +16,7 @@ Execute the login flow based on the login_type specified in the configuration:
 2. Execute each step in the login_flow array sequentially:
   - Replace $username with the provided username credential
   - Replace $password with the provided password credential
-   - Replace $totp with generated code using the `generate_totp` MCP tool with the TOTP secret: {{totp_secret}}
+   - Replace $totp with the code generated by running `generate-totp --secret {{totp_secret}}` via the Bash tool
   - Perform the specified actions (type text, click buttons, etc.)
 3. Wait for page navigation/loading to complete after each critical step
 4. Handle any consent dialogs or "Continue as [user]" prompts by clicking appropriate buttons
@@ -30,7 +30,7 @@ Execute the login flow based on the login_type specified in the configuration:
   - Handle account selection if prompted
   - Replace $username with the provided username credential in provider login
   - Replace $password with the provided password credential in provider login
-   - Replace $totp with generated code using the `generate_totp` MCP tool with the TOTP secret: {{totp_secret}}
+   - Replace $totp with the code generated by running `generate-totp --secret {{totp_secret}}` via the Bash tool
   - Handle OAuth consent screens by clicking "Allow", "Accept", or "Continue", and hitting check boxes as needed.
   - Handle "Continue as [username]" dialogs by clicking "Continue"
 3. Wait for OAuth callback and final redirect to complete
@@ -68,7 +68,7 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a
 </exploitable_vulnerability_definition>
 </definitions>

-<available_tools>
+<cli_tools>

 **CRITICAL TOOL USAGE RESTRICTIONS:**
 - NEVER use the Read tool for application source code analysis—delegate every code review to the Task Agent.
@@ -77,23 +77,20 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a

 **Available Tools:**
 - **Task Agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, trace authentication logic paths, and understand session/credential handling. MANDATORY for all source code analysis.
- **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation.
-  - **Parameters:**
-    - `deliverable_type`: "AUTH_ANALYSIS" or "AUTH_QUEUE" (required)
-    - `file_path`: Path to the file you wrote to disk (preferred for large reports)
-    - `content`: Inline content string (use only for small content like JSON queues)
-  - **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
-  - **Usage:** For analysis reports, write to disk first then call with `file_path`. For JSON queues, you may pass inline `content`. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically.
-  - **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
+- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
+  - **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
+  - **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
+  - **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
+  - **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
 - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **{{MCP_SERVER}} (Playwright):** To interact with the live web application to understand multi-step flows like password reset or registration.
+- **Browser Automation (playwright-cli skill):** For browser interactions with the live web application (multi-step flows like password reset or registration), invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
 - **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each endpoint/flow that needs analysis. Mark items as "in_progress" when working on them and "completed" when done.
-</available_tools>
+</cli_tools>

 <data_format_specifications>

 <exploitation_queue_format>
-**Purpose:** Defines the structure for a "exploitation queue" saved via the save_deliverable script with type AUTH_QUEUE.
+**Purpose:** Defines the structure for a "exploitation queue" saved via the save-deliverable CLI with --type AUTH_QUEUE.
 **Structure:** The `vulnerability` JSON object MUST follow this exact format:
 	{
 		"ID": "unique ID for each vulnerability json (AUTH-VULN-XX)",
@@ -184,7 +181,7 @@ Rule: when uncertain, round down (favor Medium/Low) to minimize false positives.
 # Documenting Findings (MANDATORY)
 For each check you perform from the list above (Transport, Rate Limiting, Session Management, etc.), you must make a final **verdict**.

- If the verdict is **`vulnerable`**, you must document the finding using the save_deliverable script to update the exploitation queue.
+- If the verdict is **`vulnerable`**, you must document the finding using the save-deliverable CLI to update the exploitation queue.
 - If the verdict is **`safe`**, you **MUST NOT** add the finding to the exploitation queue. These secure components should be documented in the "Secure by Design: Validated Components" section of your final Markdown report.
 </methodology>

@@ -202,7 +199,7 @@ For each check you perform from the list above (Transport, Rate Limiting, Sessio
 When you have systematically analyzed all relevant endpoints and logic paths, you MUST generate two final files. Follow these instructions precisely.

 **1. Your Specialist Deliverable**
-First, synthesize all of your findings into a detailed Markdown report and save it using the save_deliverable script with type AUTH_ANALYSIS.
+First, synthesize all of your findings into a detailed Markdown report and save it using the save-deliverable CLI with --type AUTH_ANALYSIS.
 Your report MUST use the following structure precisely:

 ---
@@ -242,9 +239,9 @@ These components were analyzed and found to have robust defenses. They are low-p

 <exploitation_queue_requirements>
 **Exploitation Queue (MANDATORY)**
-Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save_deliverable MCP tool:
- **If vulnerabilities found:** Use `save_deliverable` MCP tool with `deliverable_type: "AUTH_QUEUE"` and `content: {"vulnerabilities": [...]}` with each confirmed hypothesis following the `exploitation_queue_format`. Set `externally_exploitable` to `true` ONLY if exploitable via public internet without internal access.
- **If no vulnerabilities found:** Use `save_deliverable` MCP tool with `deliverable_type: "AUTH_QUEUE"` and `content: {"vulnerabilities": []}`.
+Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save-deliverable CLI tool:
+- **If vulnerabilities found:** Run save-deliverable CLI with `--type AUTH_QUEUE --content '{"vulnerabilities": [...]}'` with each confirmed hypothesis following the `exploitation_queue_format`. Set `externally_exploitable` to `true` ONLY if exploitable via public internet without internal access.
+- **If no vulnerabilities found:** Run save-deliverable CLI with `--type AUTH_QUEUE --content '{"vulnerabilities": []}'`.
 - **QUEUE INCLUSION CRITERIA:** ONLY include vulnerabilities where `externally_exploitable = true`. Exclude any vulnerability requiring internal network access, VPN, or direct server access.
 This file serves as the handoff mechanism and must always be created to signal completion.
 </exploitation_queue_requirements>
@@ -253,14 +250,14 @@ This file serves as the handoff mechanism and must always be created to signal c
 **COMPLETION REQUIREMENTS (ALL must be satisfied):**

 1.  **Systematic Analysis:** ALL relevant API endpoints and user-facing features identified in the reconnaissance deliverable must be analyzed for AuthN/AuthZ flaws.
-2.  **Deliverable Generation:** Both required deliverables must be successfully saved using save_deliverable MCP tool:
+2.  **Deliverable Generation:** Both required deliverables must be successfully saved using the save-deliverable CLI tool:
    - **CHUNKED WRITING (MANDATORY):**
      1. Use the **Write** tool to create `deliverables/auth_analysis_deliverable.md` with the title and first major section
      2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
      3. Repeat step 2 for all remaining sections
-      4. Call `save_deliverable` with `deliverable_type: "AUTH_ANALYSIS"` and `file_path: "deliverables/auth_analysis_deliverable.md"`
+      4. Run `save-deliverable` with `--type AUTH_ANALYSIS --file-path "deliverables/auth_analysis_deliverable.md"`
      **WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.
-    - Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "AUTH_QUEUE"` and `content: {"vulnerabilities": [...]}`
+    - Exploitation queue: Run save-deliverable CLI with `--type AUTH_QUEUE --content '{"vulnerabilities": [...]}'`

 **ONLY AFTER** both systematic analysis AND successful deliverable generation, announce "**AUTH ANALYSIS COMPLETE**" and stop.

@@ -71,7 +71,7 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a
 </exploitable_vulnerability_definition>
 </definitions>

-<available_tools>
+<cli_tools>

 **CRITICAL TOOL USAGE RESTRICTIONS:**
 - NEVER use the Read tool for application source code analysis—delegate every code review to the Task Agent.
@@ -80,23 +80,20 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a

 **Available Tools:**
 - **Task Agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, trace authorization logic paths, and understand permission models. MANDATORY for all source code analysis.
- **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation.
-  - **Parameters:**
-    - `deliverable_type`: "AUTHZ_ANALYSIS" or "AUTHZ_QUEUE" (required)
-    - `file_path`: Path to the file you wrote to disk (preferred for large reports)
-    - `content`: Inline content string (use only for small content like JSON queues)
-  - **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
-  - **Usage:** For analysis reports, write to disk first then call with `file_path`. For JSON queues, you may pass inline `content`. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically.
-  - **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
+- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
+  - **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
+  - **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
+  - **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
+  - **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
 - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **{{MCP_SERVER}} (Playwright):** To interact with the live web application to understand multi-step flows and role-based access controls.
+- **Browser Automation (playwright-cli skill):** For browser interactions with the live web application (multi-step flows and role-based access controls), invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
 - **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each endpoint that needs authorization analysis. Mark items as "in_progress" when working on them and "completed" when done.
-</available_tools>
+</cli_tools>

 <data_format_specifications>

 <exploitation_queue_format>
-**Purpose:** Defines the structure for a "exploitation queue" saved via the save_deliverable script with type AUTHZ_QUEUE.
+**Purpose:** Defines the structure for a "exploitation queue" saved via the save-deliverable CLI with --type AUTHZ_QUEUE.
 **Structure:** The `vulnerability` JSON object MUST follow this exact format:
 	{
 		"ID": "unique ID for each vulnerability (e.g., AUTHZ-VULN-01)",
@@ -241,7 +238,7 @@ An **exploitable vulnerability** is a logical flaw in the code that represents a

 For each analysis you perform from the lists above, you must make a final **verdict**:

- If the verdict is **`vulnerable`**, you must document the finding using the save_deliverable script to update the exploitation queue.
+- If the verdict is **`vulnerable`**, you must document the finding using the save-deliverable CLI to update the exploitation queue.
 - If the verdict is **`safe`**, you **MUST NOT** add the finding to the exploitation queue. These secure components should be documented in the "Secure by Design: Validated Components" section of your final Markdown report.
 </methodology>

@@ -279,7 +276,7 @@ When you have systematically analyzed all relevant endpoints and logic paths, yo

 **1. Your Specialist Deliverable**

-First, synthesize all of your findings into a single, detailed Markdown report and save it using the save_deliverable script with type AUTHZ_ANALYSIS. This report is the official record of your work.
+First, synthesize all of your findings into a single, detailed Markdown report and save it using the save-deliverable CLI with --type AUTHZ_ANALYSIS. This report is the official record of your work.

 Your report MUST use the following structure precisely:

@@ -345,9 +342,9 @@ examples:

 <exploitation_queue_requirements>
 **Exploitation Queue (MANDATORY)**
-Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save_deliverable MCP tool:
- **If vulnerabilities found:** Use `save_deliverable` MCP tool with `deliverable_type: "AUTHZ_QUEUE"` and `content: {"vulnerabilities": [...]}` with each confirmed hypothesis following the `exploitation_queue_format`. Set `externally_exploitable` to `true` ONLY if exploitable via public internet without internal access.
- **If no vulnerabilities found:** Use `save_deliverable` MCP tool with `deliverable_type: "AUTHZ_QUEUE"` and `content: {"vulnerabilities": []}`.
+Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save-deliverable CLI tool:
+- **If vulnerabilities found:** Run save-deliverable CLI with `--type AUTHZ_QUEUE --content '{"vulnerabilities": [...]}'` with each confirmed hypothesis following the `exploitation_queue_format`. Set `externally_exploitable` to `true` ONLY if exploitable via public internet without internal access.
+- **If no vulnerabilities found:** Run save-deliverable CLI with `--type AUTHZ_QUEUE --content '{"vulnerabilities": []}'`.
 - **QUEUE INCLUSION CRITERIA:** ONLY include vulnerabilities where `externally_exploitable = true`. Exclude any vulnerability requiring internal network access, VPN, or direct server access.
 This file serves as the handoff mechanism and must always be created to signal completion.
 </exploitation_queue_requirements>
@@ -356,14 +353,14 @@ This file serves as the handoff mechanism and must always be created to signal c
 **COMPLETION REQUIREMENTS (ALL must be satisfied):**

 1. **Todo Completion:** ALL tasks in your TodoWrite list must be marked as "completed"
-2. **Deliverable Generation:** Both required deliverables must be successfully saved using save_deliverable MCP tool:
+2. **Deliverable Generation:** Both required deliverables must be successfully saved using the save-deliverable CLI tool:
   - **CHUNKED WRITING (MANDATORY):**
     1. Use the **Write** tool to create `deliverables/authz_analysis_deliverable.md` with the title and first major section
     2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
     3. Repeat step 2 for all remaining sections
-     4. Call `save_deliverable` with `deliverable_type: "AUTHZ_ANALYSIS"` and `file_path: "deliverables/authz_analysis_deliverable.md"`
+     4. Run `save-deliverable` with `--type AUTHZ_ANALYSIS --file-path "deliverables/authz_analysis_deliverable.md"`
     **WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.
-   - Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "AUTHZ_QUEUE"` and `content: {"vulnerabilities": [...]}`
+   - Exploitation queue: Run save-deliverable CLI with `--type AUTHZ_QUEUE --content '{"vulnerabilities": [...]}'`

 **ONLY AFTER** both todo completion AND successful deliverable generation, announce "**AUTHORIZATION ANALYSIS COMPLETE**" and stop.

@@ -71,7 +71,7 @@ An **exploitable vulnerability** is a confirmed source-to-sink path where the en
 </exploitable_vulnerability_definition>
 </definitions>

-<available_tools>
+<cli_tools>

 **CRITICAL TOOL USAGE RESTRICTIONS:**
 - NEVER use the Read tool for application source code analysis—delegate every code review to the Task Agent.
@@ -80,23 +80,20 @@ An **exploitable vulnerability** is a confirmed source-to-sink path where the en

 **Available Tools:**
 - **Task Agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, map query/command construction paths, and verify sanitization coverage. MANDATORY for all source code analysis.
- **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation.
-  - **Parameters:**
-    - `deliverable_type`: "INJECTION_ANALYSIS" or "INJECTION_QUEUE" (required)
-    - `file_path`: Path to the file you wrote to disk (preferred for large reports)
-    - `content`: Inline content string (use only for small content like JSON queues)
-  - **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
-  - **Usage:** For analysis reports, write to disk first then call with `file_path`. For JSON queues, you may pass inline `content`. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically.
-  - **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
+- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
+  - **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
+  - **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
+  - **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
+  - **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
 - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **{{MCP_SERVER}} (Playwright):** To interact with the live web application to understand multi-step flows like password reset or registration.
+- **Browser Automation (playwright-cli skill):** For browser interactions with the live web application (multi-step flows like password reset or registration), invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
 - **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each injection source that needs analysis. Mark items as "in_progress" when working on them and "completed" when done.
-</available_tools>
+</cli_tools>

 <data_format_specifications>

  <exploitation_queue_format>
-  **Purpose:** Defines the structure for a "exploitation queue" saved via the save_deliverable script with type INJECTION_QUEUE.
+  **Purpose:** Defines the structure for a "exploitation queue" saved via the save-deliverable CLI with --type INJECTION_QUEUE.

  **Structure:** The `vulnerability` JSON object MUST follow this exact format:
 		{
@@ -351,10 +348,10 @@ These input vectors were traced and confirmed to have robust, context-appropriat
 <exploitation_queue_requirements>
 **Exploitation Queue (MANDATORY)**

-Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save_deliverable MCP tool:
+Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save-deliverable CLI tool:

- **If vulnerabilities found:** Use `save_deliverable` MCP tool with `deliverable_type: "INJECTION_QUEUE"` and `content: {"vulnerabilities": [...]}` with each exploitable injection vulnerability (verdict: "vulnerable") following the exploitation_queue_format
- **If no vulnerabilities found:** Use `save_deliverable` MCP tool with `deliverable_type: "INJECTION_QUEUE"` and `content: {"vulnerabilities": []}`
+- **If vulnerabilities found:** Run save-deliverable CLI with `--type INJECTION_QUEUE --content '{"vulnerabilities": [...]}'` with each exploitable injection vulnerability (verdict: "vulnerable") following the exploitation_queue_format
+- **If no vulnerabilities found:** Run save-deliverable CLI with `--type INJECTION_QUEUE --content '{"vulnerabilities": []}'`

 This file serves as the handoff mechanism to the Exploitation phase and must always be created to signal completion of your analysis.
 </exploitation_queue_requirements>
@@ -363,14 +360,14 @@ This file serves as the handoff mechanism to the Exploitation phase and must alw
 **COMPLETION REQUIREMENTS (ALL must be satisfied):**

 1. **Todo Completion:** ALL tasks in your TodoWrite list must be marked as "completed"
-2. **Deliverable Generation:** Both required deliverables must be successfully saved using save_deliverable MCP tool:
+2. **Deliverable Generation:** Both required deliverables must be successfully saved using the save-deliverable CLI tool:
   - **CHUNKED WRITING (MANDATORY):**
     1. Use the **Write** tool to create `deliverables/injection_analysis_deliverable.md` with the title and first major section
     2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
     3. Repeat step 2 for all remaining sections
-     4. Call `save_deliverable` with `deliverable_type: "INJECTION_ANALYSIS"` and `file_path: "deliverables/injection_analysis_deliverable.md"`
+     4. Run `save-deliverable` with `--type INJECTION_ANALYSIS --file-path "deliverables/injection_analysis_deliverable.md"`
     **WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.
-   - Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "INJECTION_QUEUE"` and `content: {"vulnerabilities": [...]}`
+   - Exploitation queue: Run save-deliverable CLI with `--type INJECTION_QUEUE --content '{"vulnerabilities": [...]}'`

 **ONLY AFTER** both todo completion AND successful deliverable generation, announce "**INJECTION ANALYSIS COMPLETE**" and stop.

@@ -67,7 +67,7 @@ An **exploitable vulnerability** is a data flow where user-controlled input infl
 </exploitable_vulnerability_definition>
 </definitions>

-<available_tools>
+<cli_tools>

 **CRITICAL TOOL USAGE RESTRICTIONS:**
 - NEVER use the Read tool for application source code analysis—delegate every code review to the Task Agent.
@@ -76,23 +76,20 @@ An **exploitable vulnerability** is a data flow where user-controlled input infl

 **Available Tools:**
 - **Task Agent (Code Analysis):** Your primary tool. Use it to ask targeted questions about the source code, trace data flows, and understand HTTP client usage. MANDATORY for all source code analysis.
- **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation.
-  - **Parameters:**
-    - `deliverable_type`: "SSRF_ANALYSIS" or "SSRF_QUEUE" (required)
-    - `file_path`: Path to the file you wrote to disk (preferred for large reports)
-    - `content`: Inline content string (use only for small content like JSON queues)
-  - **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
-  - **Usage:** For analysis reports, write to disk first then call with `file_path`. For JSON queues, you may pass inline `content`. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically.
-  - **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
+- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
+  - **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
+  - **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
+  - **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
+  - **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
 - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
- **{{MCP_SERVER}} (Playwright):** To interact with the live web application to understand multi-step flows that might involve URL redirection or proxy functionality.
+- **Browser Automation (playwright-cli skill):** For browser interactions with the live web application (multi-step flows involving URL redirection or proxy functionality), invoke the `playwright-cli` skill to learn available commands. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
 - **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each SSRF sink that needs analysis. Mark items as "in_progress" when working on them and "completed" when done.
-</available_tools>
+</cli_tools>

 <data_format_specifications>

 <exploitation_queue_format>
-**Purpose:** Defines the structure for a "exploitation queue" saved via the save_deliverable script with type SSRF_QUEUE.
+**Purpose:** Defines the structure for a "exploitation queue" saved via the save-deliverable CLI with --type SSRF_QUEUE.
 **Structure:** The `vulnerability` JSON object MUST follow this exact format:
 	{
 		"ID": "unique ID for each vulnerability json (SSRF-VULN-XX)",
@@ -231,7 +228,7 @@ Rule: when uncertain, round down (favor Medium/Low) to minimize false positives.
 # Documenting Findings (MANDATORY)
 For each check you perform from the list above, you must make a final **verdict**.

- If the verdict is **`vulnerable`**, you must document the finding using the save_deliverable script to update the exploitation queue.
+- If the verdict is **`vulnerable`**, you must document the finding using the save-deliverable CLI to update the exploitation queue.
 - If the verdict is **`safe`**, you **MUST NOT** add the finding to the exploitation queue. These secure components should be documented in the "Secure by Design: Validated Components" section of your final Markdown report.
 </methodology>

@@ -249,7 +246,7 @@ For each check you perform from the list above, you must make a final **verdict*
 When you have systematically analyzed all relevant endpoints and request-making functions, you MUST generate two final files. Follow these instructions precisely.

 **1. Your Specialist Deliverable**
-First, synthesize all of your findings into a detailed Markdown report and save it using the save_deliverable script with type SSRF_ANALYSIS.
+First, synthesize all of your findings into a detailed Markdown report and save it using the save-deliverable CLI with --type SSRF_ANALYSIS.
 Your report MUST use the following structure precisely:

 ---
@@ -289,9 +286,9 @@ These components were analyzed and found to have robust defenses. They are low-p

 <exploitation_queue_requirements>
 **Exploitation Queue (MANDATORY)**
-Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save_deliverable MCP tool:
- **If vulnerabilities found:** Use `save_deliverable` MCP tool with `deliverable_type: "SSRF_QUEUE"` and `content: {"vulnerabilities": [...]}` with each confirmed hypothesis following the `exploitation_queue_format`. Set `externally_exploitable` to `true` ONLY if exploitable via public internet without internal access.
- **If no vulnerabilities found:** Use `save_deliverable` MCP tool with `deliverable_type: "SSRF_QUEUE"` and `content: {"vulnerabilities": []}`.
+Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save-deliverable CLI tool:
+- **If vulnerabilities found:** Run save-deliverable CLI with `--type SSRF_QUEUE --content '{"vulnerabilities": [...]}'` with each confirmed hypothesis following the `exploitation_queue_format`. Set `externally_exploitable` to `true` ONLY if exploitable via public internet without internal access.
+- **If no vulnerabilities found:** Run save-deliverable CLI with `--type SSRF_QUEUE --content '{"vulnerabilities": []}'`.
 - **QUEUE INCLUSION CRITERIA:** ONLY include vulnerabilities where `externally_exploitable = true`. Exclude any vulnerability requiring internal network access, VPN, or direct server access.
 This file serves as the handoff mechanism and must always be created to signal completion.
 </exploitation_queue_requirements>
@@ -300,14 +297,14 @@ This file serves as the handoff mechanism and must always be created to signal c
 **COMPLETION REQUIREMENTS (ALL must be satisfied):**

 1.  **Systematic Analysis:** ALL relevant API endpoints and request-making features identified in the reconnaissance deliverable must be analyzed for SSRF vulnerabilities.
-2.  **Deliverable Generation:** Both required deliverables must be successfully saved using save_deliverable MCP tool:
+2.  **Deliverable Generation:** Both required deliverables must be successfully saved using the save-deliverable CLI tool:
    - **CHUNKED WRITING (MANDATORY):**
      1. Use the **Write** tool to create `deliverables/ssrf_analysis_deliverable.md` with the title and first major section
      2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
      3. Repeat step 2 for all remaining sections
-      4. Call `save_deliverable` with `deliverable_type: "SSRF_ANALYSIS"` and `file_path: "deliverables/ssrf_analysis_deliverable.md"`
+      4. Run `save-deliverable` with `--type SSRF_ANALYSIS --file-path "deliverables/ssrf_analysis_deliverable.md"`
      **WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.
-    - Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "SSRF_QUEUE"` and `content: {"vulnerabilities": [...]}`
+    - Exploitation queue: Run save-deliverable CLI with `--type SSRF_QUEUE --content '{"vulnerabilities": [...]}'`

 **ONLY AFTER** both systematic analysis AND successful deliverable generation, announce "**SSRF ANALYSIS COMPLETE**" and stop.

@@ -68,7 +68,7 @@ An **exploitable vulnerability** is a confirmed source-to-sink path where the en
 </exploitable_vulnerability_definition>
 </definitions>

-<available_tools>
+<cli_tools>

 **CRITICAL TOOL USAGE RESTRICTIONS:**
 - NEVER use the Read tool for application source code analysis - ALWAYS delegate to Task agents for examining .js, .ts, .py, .php files and application logic. You MAY use Read
@@ -79,23 +79,20 @@ An **exploitable vulnerability** is a confirmed source-to-sink path where the en
 **Available Tools:**
 - **Task Agent (Code Analysis):** MANDATORY for all source code analysis and data flow tracing. Use this instead of Read tool for examining application code, models, controllers, and templates.
 - **Terminal (curl):** MANDATORY for testing HTTP-based XSS vectors and observing raw HTML responses. Use for reflected XSS testing and JSONP injection testing.
- **{{MCP_SERVER}} (Playwright):** MANDATORY for testing DOM-based XSS and form submission vectors. Use for stored XSS testing and client-side payload execution verification.
+- **Browser Automation (playwright-cli skill):** MANDATORY for testing DOM-based XSS and form submission vectors. Invoke the `playwright-cli` skill to learn available commands. Use for stored XSS testing and client-side payload execution verification. Always pass `-s={{PLAYWRIGHT_SESSION}}` to every command for session isolation.
 - **TodoWrite Tool:** Use this to create and manage your analysis task list. Create a todo item for each sink you need to analyze.
- **save_deliverable (MCP Tool):** Saves deliverable files with automatic validation.
-  - **Parameters:**
-    - `deliverable_type`: "XSS_ANALYSIS" or "XSS_QUEUE" (required)
-    - `file_path`: Path to the file you wrote to disk (preferred for large reports)
-    - `content`: Inline content string (use only for small content like JSON queues)
-  - **Returns:** `{ status: "success", filepath: "...", validated: true/false }` on success or `{ status: "error", message: "...", errorType: "...", retryable: true/false }` on failure
-  - **Usage:** For analysis reports, write to disk first then call with `file_path`. For JSON queues, you may pass inline `content`. Queue files must have `{"vulnerabilities": [...]}` structure and will be validated automatically.
-  - **WARNING:** Do NOT pass large reports as inline `content` — this will exceed output token limits and cause agent failure. Always use `file_path` for analysis reports.
+- **save-deliverable (CLI Tool):** Saves your deliverable files with automatic validation.
+  - **Usage:** `save-deliverable --type <TYPE> --file-path <path>` or `--content '<json>'`
+  - **Returns:** JSON to stdout: `{"status":"success","filepath":"...","validated":true}` or `{"status":"error","message":"...","retryable":true}`
+  - **For large reports:** Write to disk first, then use `--file-path`. Do NOT pass large reports via `--content`.
+  - **For JSON queues:** You may use `--content '{"vulnerabilities": [...]}'`. Queue files are validated automatically.
 - **Bash tool:** Use for creating directories, copying files, and other shell commands as needed.
-</available_tools>
+</cli_tools>

 <data_format_specifications>

 <exploitation_queue_format>
-Purpose: Defines the structure for a "exploitation queue" saved via the save_deliverable script with type XSS_QUEUE.
+Purpose: Defines the structure for a "exploitation queue" saved via the save-deliverable CLI with --type XSS_QUEUE.
 Structure: The vulnerability JSON object MUST follow this exact format:
 	{
 		"ID": "unique ID for each vulnerability json (XSS-VULN-XX)",
@@ -180,7 +177,7 @@ This rulebook is used for the **Early Termination** check in Step 2.
 - Include both safe and vulnerable paths to demonstrate **full coverage**.  
 - Craft a minimal `witness_payload` that proves control over the render context.  
 - For every path analyzed, you must document the outcome. The location of the documentation depends on the verdict:
-		- If the verdict is 'vulnerable', you MUST use the save_deliverable script to save the finding to the exploitation queue, including complete source-to-sink information.
+		- If the verdict is 'vulnerable', you MUST use the save-deliverable CLI to save the finding to the exploitation queue, including complete source-to-sink information.
 		- If the verdict is 'safe', you MUST NOT add it to the exploitation queue. Instead, you will document these secure paths in the "Vectors Analyzed and Confirmed Secure" table of your final analysis report.
 - For vulnerable findings, craft a minimal witness_payload that proves control over the render context.

@@ -272,13 +269,13 @@ These input vectors were traced and confirmed to have robust, context-appropriat

 ## Exploitation Queue (MANDATORY)

-Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save_deliverable MCP tool.
+Regardless of whether vulnerabilities are found, you MUST create the exploitation queue using the save-deliverable CLI tool.

 - **If exploitable vulnerabilities found:**
-  Use `save_deliverable` MCP tool with `deliverable_type: "XSS_QUEUE"` and `content: {"vulnerabilities": [...]}` with each exploitable XSS vulnerability (verdict: "vulnerable") following the `exploitation_queue_format`. Set `externally_exploitable` to `true` ONLY if exploitable via public internet without internal access.
+  Run save-deliverable CLI with `--type XSS_QUEUE --content '{"vulnerabilities": [...]}'` with each exploitable XSS vulnerability (verdict: "vulnerable") following the `exploitation_queue_format`. Set `externally_exploitable` to `true` ONLY if exploitable via public internet without internal access.

 - **If no exploitable vulnerabilities found:**
-  Use `save_deliverable` MCP tool with `deliverable_type: "XSS_QUEUE"` and `content: {"vulnerabilities": []}`
+  Run save-deliverable CLI with `--type XSS_QUEUE --content '{"vulnerabilities": []}'`

 - **QUEUE INCLUSION CRITERIA:** ONLY include vulnerabilities where `externally_exploitable = true`. Exclude any vulnerability requiring internal network access, VPN, or direct server access.

@@ -289,14 +286,14 @@ This file is the mandatory handoff to the Exploitation phase.
 COMPLETION REQUIREMENTS (ALL must be satisfied):

 1. Systematic Analysis: ALL input vectors identified from the reconnaissance deliverable must be analyzed.
-2. Deliverable Generation: Both required deliverables must be successfully saved using save_deliverable MCP tool:
+2. Deliverable Generation: Both required deliverables must be successfully saved using the save-deliverable CLI tool:
   - **CHUNKED WRITING (MANDATORY):**
     1. Use the **Write** tool to create `deliverables/xss_analysis_deliverable.md` with the title and first major section
     2. Use the **Edit** tool to append each remaining section — match the last few lines of the file, then replace with those lines plus the new section content
     3. Repeat step 2 for all remaining sections
-     4. Call `save_deliverable` with `deliverable_type: "XSS_ANALYSIS"` and `file_path: "deliverables/xss_analysis_deliverable.md"`
+     4. Run `save-deliverable` with `--type XSS_ANALYSIS --file-path "deliverables/xss_analysis_deliverable.md"`
     **WARNING:** Do NOT write the entire report in a single tool call — exceeds 32K output token limit. Split into multiple Write/Edit operations.
-   - Exploitation queue: Use `save_deliverable` MCP tool with `deliverable_type: "XSS_QUEUE"` and `content: {"vulnerabilities": [...]}`
+   - Exploitation queue: Run save-deliverable CLI with `--type XSS_QUEUE --content '{"vulnerabilities": [...]}'`

 ONLY AFTER both systematic analysis AND successful deliverable generation, announce "XSS ANALYSIS COMPLETE" and stop.

@@ -6,26 +6,21 @@

 // Production Claude agent execution with retry, git checkpoints, and audit logging

-import { fs, path } from 'zx';
 import { query } from '@anthropic-ai/claude-agent-sdk';
-
+import { fs, path } from 'zx';
+import type { AuditSession } from '../audit/index.js';
 import { isRetryableError, PentestError } from '../services/error-handling.js';
-import { isSpendingCapBehavior } from '../utils/billing-detection.js';
-import { Timer } from '../utils/metrics.js';
-import { formatTimestamp } from '../utils/formatting.js';
-import { AGENT_VALIDATORS, MCP_AGENT_MAPPING } from '../session-manager.js';
-import { AuditSession } from '../audit/index.js';
-import { createShannonHelperServer } from '../../mcp-server/dist/index.js';
-import { AGENTS } from '../session-manager.js';
-import type { AgentName } from '../types/index.js';
-
-import { dispatchMessage } from './message-handlers.js';
-import { detectExecutionContext, formatErrorOutput, formatCompletionMessage } from './output-formatters.js';
-import { createProgressManager } from './progress-manager.js';
-import { createAuditLogger } from './audit-logger.js';
-import { getActualModelName } from './router-utils.js';
-import { resolveModel, type ModelTier } from './models.js';
+import { AGENT_VALIDATORS } from '../session-manager.js';
 import type { ActivityLogger } from '../types/activity-logger.js';
+import { isSpendingCapBehavior } from '../utils/billing-detection.js';
+import { formatTimestamp } from '../utils/formatting.js';
+import { Timer } from '../utils/metrics.js';
+import { createAuditLogger } from './audit-logger.js';
+import { dispatchMessage } from './message-handlers.js';
+import { type ModelTier, resolveModel } from './models.js';
+import { detectExecutionContext, formatCompletionMessage, formatErrorOutput } from './output-formatters.js';
+import { createProgressManager } from './progress-manager.js';
+import { getActualModelName } from './router-utils.js';

 declare global {
  var SHANNON_DISABLE_LOADER: boolean | undefined;
@@ -46,89 +41,6 @@ export interface ClaudePromptResult {
  retryable?: boolean | undefined;
 }

-interface StdioMcpServer {
-  type: 'stdio';
-  command: string;
-  args: string[];
-  env: Record<string, string>;
-}
-
-type McpServer = ReturnType<typeof createShannonHelperServer> | StdioMcpServer;
-
-// Configures MCP servers for agent execution, with Docker-specific Chromium handling
-function buildMcpServers(
-  sourceDir: string,
-  agentName: string | null,
-  logger: ActivityLogger
-): Record<string, McpServer> {
-  // 1. Create the shannon-helper server (always present)
-  const shannonHelperServer = createShannonHelperServer(sourceDir);
-
-  const mcpServers: Record<string, McpServer> = {
-    'shannon-helper': shannonHelperServer,
-  };
-
-  // 2. Look up the agent's Playwright MCP mapping
-  if (agentName) {
-    const promptTemplate = AGENTS[agentName as AgentName].promptTemplate;
-    const playwrightMcpName = MCP_AGENT_MAPPING[promptTemplate as keyof typeof MCP_AGENT_MAPPING] || null;
-
-    if (playwrightMcpName) {
-      logger.info(`Assigned ${agentName} -> ${playwrightMcpName}`);
-
-      const userDataDir = `/tmp/${playwrightMcpName}`;
-
-      // 3. Configure Playwright MCP args with Docker/local browser handling
-      const isDocker = process.env.SHANNON_DOCKER === 'true';
-
-      const mcpArgs: string[] = [
-        '@playwright/mcp@0.0.68',
-        '--isolated',
-        '--user-data-dir', userDataDir,
-      ];
-
-      if (isDocker) {
-        mcpArgs.push('--executable-path', '/usr/bin/chromium-browser');
-        mcpArgs.push('--browser', 'chromium');
-      }
-
-      // NOTE: Explicit allowlist — the Playwright MCP subprocess must not inherit
-      // secrets (API keys, AWS tokens) from the parent process.
-      const MCP_ENV_ALLOWLIST = [
-        'PATH', 'HOME', 'NODE_PATH', 'DISPLAY',
-        'PLAYWRIGHT_CHROMIUM_EXECUTABLE_PATH',
-      ] as const;
-
-      const envVars: Record<string, string> = {
-        PLAYWRIGHT_HEADLESS: 'true',
-        ...(isDocker && { PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: '1' }),
-      };
-
-      for (const key of MCP_ENV_ALLOWLIST) {
-        if (process.env[key]) {
-          envVars[key] = process.env[key]!;
-        }
-      }
-
-      for (const [key, value] of Object.entries(process.env)) {
-        if (key.startsWith('XDG_') && value !== undefined) {
-          envVars[key] = value;
-        }
-      }
-
-      mcpServers[playwrightMcpName] = {
-        type: 'stdio' as const,
-        command: 'npx',
-        args: mcpArgs,
-        env: envVars,
-      };
-    }
-  }
-
-  // 4. Return configured servers
-  return mcpServers;
-}
-
 function outputLines(lines: string[]): void {
  for (const line of lines) {
    console.log(line);
@@ -139,7 +51,7 @@ async function writeErrorLog(
  err: Error & { code?: string; status?: number },
  sourceDir: string,
  fullPrompt: string,
-  duration: number
+  duration: number,
 ): Promise<void> {
  try {
    const errorLog = {
@@ -150,17 +62,17 @@ async function writeErrorLog(
        message: err.message,
        code: err.code,
        status: err.status,
-        stack: err.stack
+        stack: err.stack,
      },
      context: {
        sourceDir,
-        prompt: fullPrompt.slice(0, 200) + '...',
-        retryable: isRetryableError(err)
+        prompt: `${fullPrompt.slice(0, 200)}...`,
+        retryable: isRetryableError(err),
      },
-      duration
+      duration,
    };
    const logPath = path.join(sourceDir, 'error.log');
-    await fs.appendFile(logPath, JSON.stringify(errorLog) + '\n');
+    await fs.appendFile(logPath, `${JSON.stringify(errorLog)}\n`);
  } catch {
    // Best-effort error log writing - don't propagate failures
  }
@@ -170,7 +82,7 @@ export async function validateAgentOutput(
  result: ClaudePromptResult,
  agentName: string | null,
  sourceDir: string,
-  logger: ActivityLogger
+  logger: ActivityLogger,
 ): Promise<boolean> {
  logger.info(`Validating ${agentName} agent output`);

@@ -202,7 +114,6 @@ export async function validateAgentOutput(
    }

    return validationResult;
-
  } catch (error) {
    const errMsg = error instanceof Error ? error.message : String(error);
    logger.error(`Validation failed with error: ${errMsg}`);
@@ -217,10 +128,10 @@ export async function runClaudePrompt(
  sourceDir: string,
  context: string = '',
  description: string = 'Claude analysis',
-  agentName: string | null = null,
+  _agentName: string | null = null,
  auditSession: AuditSession | null = null,
  logger: ActivityLogger,
-  modelTier: ModelTier = 'medium'
+  modelTier: ModelTier = 'medium',
 ): Promise<ClaudePromptResult> {
  // 1. Initialize timing and prompt
  const timer = new Timer(`agent-${description.toLowerCase().replace(/\s+/g, '-')}`);
@@ -230,16 +141,13 @@ export async function runClaudePrompt(
  const execContext = detectExecutionContext(description);
  const progress = createProgressManager(
    { description, useCleanOutput: execContext.useCleanOutput },
-    global.SHANNON_DISABLE_LOADER ?? false
+    global.SHANNON_DISABLE_LOADER ?? false,
  );
  const auditLogger = createAuditLogger(auditSession);

  logger.info(`Running Claude Code: ${description}...`);

-  // 3. Configure MCP servers
-  const mcpServers = buildMcpServers(sourceDir, agentName, logger);
-
-  // 4. Build env vars to pass to SDK subprocesses
+  // 3. Build env vars to pass to SDK subprocesses
  const sdkEnv: Record<string, string> = {
    CLAUDE_CODE_MAX_OUTPUT_TOKENS: process.env.CLAUDE_CODE_MAX_OUTPUT_TOKENS || '64000',
  };
@@ -258,21 +166,25 @@ export async function runClaudePrompt(
    'ANTHROPIC_SMALL_MODEL',
    'ANTHROPIC_MEDIUM_MODEL',
    'ANTHROPIC_LARGE_MODEL',
+    'HOME',
+    'PATH',
+    'PLAYWRIGHT_MCP_EXECUTABLE_PATH',
  ];
  for (const name of passthroughVars) {
-    if (process.env[name]) {
-      sdkEnv[name] = process.env[name]!;
+    const val = process.env[name];
+    if (val) {
+      sdkEnv[name] = val;
    }
  }

-  // 5. Configure SDK options
+  // 4. Configure SDK options
  const options = {
    model: resolveModel(modelTier),
    maxTurns: 10_000,
    cwd: sourceDir,
    permissionMode: 'bypassPermissions' as const,
    allowDangerouslySkipPermissions: true,
-    mcpServers,
+    settingSources: ['user'] as ('user' | 'project' | 'local')[],
    env: sdkEnv,
  };

@@ -293,7 +205,7 @@ export async function runClaudePrompt(
      fullPrompt,
      options,
      { execContext, description, progress, auditLogger, logger },
-      timer
+      timer,
    );

    turnCount = messageLoopResult.turnCount;
@@ -309,7 +221,7 @@ export async function runClaudePrompt(
      throw new PentestError(
        `Spending cap likely reached (turns=${turnCount}, cost=$0): ${result?.slice(0, 100)}`,
        'billing',
-        true // Retryable - Temporal will use 5-30 min backoff
+        true, // Retryable - Temporal will use 5-30 min backoff
      );
    }

@@ -330,9 +242,8 @@ export async function runClaudePrompt(
      cost: totalCost,
      model,
      partialCost: totalCost,
-      apiErrorDetected
+      apiErrorDetected,
    };
-
  } catch (error) {
    // 9. Handle errors — log, write error file, return failure
    const duration = timer.stop();
@@ -347,16 +258,15 @@ export async function runClaudePrompt(
    return {
      error: err.message,
      errorType: err.constructor.name,
-      prompt: fullPrompt.slice(0, 100) + '...',
+      prompt: `${fullPrompt.slice(0, 100)}...`,
      success: false,
      duration,
      cost: totalCost,
-      retryable: isRetryableError(err)
+      retryable: isRetryableError(err),
    };
  }
 }

-
 interface MessageLoopResult {
  turnCount: number;
  result: string | null;
@@ -377,7 +287,7 @@ async function processMessageStream(
  fullPrompt: string,
  options: NonNullable<Parameters<typeof query>[0]['options']>,
  deps: MessageLoopDeps,
-  timer: Timer
+  timer: Timer,
 ): Promise<MessageLoopResult> {
  const { execContext, description, progress, auditLogger, logger } = deps;
  const HEARTBEAT_INTERVAL = 30000;
@@ -402,11 +312,13 @@ async function processMessageStream(
      turnCount++;
    }

-    const dispatchResult = await dispatchMessage(
-      message as { type: string; subtype?: string },
-      turnCount,
-      { execContext, description, progress, auditLogger, logger }
-    );
+    const dispatchResult = await dispatchMessage(message as { type: string; subtype?: string }, turnCount, {
+      execContext,
+      description,
+      progress,
+      auditLogger,
+      logger,
+    });

    if (dispatchResult.type === 'throw') {
      throw dispatchResult.error;
@@ -4,35 +4,35 @@
 // it under the terms of the GNU Affero General Public License version 3
 // as published by the Free Software Foundation.

+import type { SDKAssistantMessageError } from '@anthropic-ai/claude-agent-sdk';
 import { PentestError } from '../services/error-handling.js';
+import type { ActivityLogger } from '../types/activity-logger.js';
 import { ErrorCode } from '../types/errors.js';
 import { matchesBillingTextPattern } from '../utils/billing-detection.js';
-import { filterJsonToolCalls } from './output-formatters.js';
 import { formatTimestamp } from '../utils/formatting.js';
-import { getActualModelName } from './router-utils.js';
-import type { ActivityLogger } from '../types/activity-logger.js';
+import type { AuditLogger } from './audit-logger.js';
 import {
+  filterJsonToolCalls,
  formatAssistantOutput,
  formatResultOutput,
-  formatToolUseOutput,
  formatToolResultOutput,
+  formatToolUseOutput,
 } from './output-formatters.js';
-import type { AuditLogger } from './audit-logger.js';
 import type { ProgressManager } from './progress-manager.js';
-import type { SDKAssistantMessageError } from '@anthropic-ai/claude-agent-sdk';
+import { getActualModelName } from './router-utils.js';
 import type {
-  AssistantMessage,
-  ResultMessage,
-  ToolUseMessage,
-  ToolResultMessage,
-  AssistantResult,
-  ResultData,
-  ToolUseData,
-  ToolResultData,
  ApiErrorDetection,
+  AssistantMessage,
+  AssistantResult,
  ContentBlock,
-  SystemInitMessage,
  ExecutionContext,
+  ResultData,
+  ResultMessage,
+  SystemInitMessage,
+  ToolResultData,
+  ToolResultMessage,
+  ToolUseData,
+  ToolUseMessage,
 } from './types.js';

 // Handles both array and string content formats from SDK
@@ -40,9 +40,7 @@ function extractMessageContent(message: AssistantMessage): string {
  const messageContent = message.message;

  if (Array.isArray(messageContent.content)) {
-    return messageContent.content
-      .map((c: ContentBlock) => c.text || JSON.stringify(c))
-      .join('\n');
+    return messageContent.content.map((c: ContentBlock) => c.text || JSON.stringify(c)).join('\n');
  }

  return String(messageContent.content);
@@ -81,7 +79,7 @@ function detectApiError(content: string): ApiErrorDetection {
        'billing',
        true, // RETRYABLE - Temporal will use 5-30 min backoff
        {},
-        ErrorCode.SPENDING_CAP_REACHED
+        ErrorCode.SPENDING_CAP_REACHED,
      ),
    };
  }
@@ -104,10 +102,7 @@ function detectApiError(content: string): ApiErrorDetection {
 }

 // Maps SDK structured error types to our error handling.
-function handleStructuredError(
-  errorType: SDKAssistantMessageError,
-  content: string
-): ApiErrorDetection {
+function handleStructuredError(errorType: SDKAssistantMessageError, content: string): ApiErrorDetection {
  switch (errorType) {
    case 'billing_error':
      return {
@@ -117,7 +112,7 @@ function handleStructuredError(
          'billing',
          true, // Retryable with backoff
          {},
-          ErrorCode.INSUFFICIENT_CREDITS
+          ErrorCode.INSUFFICIENT_CREDITS,
        ),
      };
    case 'rate_limit':
@@ -128,7 +123,7 @@ function handleStructuredError(
          'network',
          true, // Retryable with backoff
          {},
-          ErrorCode.API_RATE_LIMITED
+          ErrorCode.API_RATE_LIMITED,
        ),
      };
    case 'authentication_failed':
@@ -137,7 +132,7 @@ function handleStructuredError(
        shouldThrow: new PentestError(
          `Authentication failed: ${content.slice(0, 100)}`,
          'config',
-          false // Not retryable - needs API key fix
+          false, // Not retryable - needs API key fix
        ),
      };
    case 'server_error':
@@ -146,7 +141,7 @@ function handleStructuredError(
        shouldThrow: new PentestError(
          `Server error (structured): ${content.slice(0, 100)}`,
          'network',
-          true // Retryable
+          true, // Retryable
        ),
      };
    case 'invalid_request':
@@ -155,7 +150,7 @@ function handleStructuredError(
        shouldThrow: new PentestError(
          `Invalid request: ${content.slice(0, 100)}`,
          'config',
-          false // Not retryable - needs code fix
+          false, // Not retryable - needs code fix
        ),
      };
    case 'max_output_tokens':
@@ -164,19 +159,15 @@ function handleStructuredError(
        shouldThrow: new PentestError(
          `Max output tokens reached: ${content.slice(0, 100)}`,
          'billing',
-          true // Retryable - may succeed with different content
+          true, // Retryable - may succeed with different content
        ),
      };
-    case 'unknown':
    default:
      return { detected: true };
  }
 }

-function handleAssistantMessage(
-  message: AssistantMessage,
-  turnCount: number
-): AssistantResult {
+function handleAssistantMessage(message: AssistantMessage, turnCount: number): AssistantResult {
  const content = extractMessageContent(message);
  const cleanedContent = filterJsonToolCalls(content);

@@ -246,8 +237,7 @@ function handleToolUseMessage(message: ToolUseMessage): ToolUseData {
 // Truncates long results for display (500 char limit), preserves full content for logging
 function handleToolResultMessage(message: ToolResultMessage): ToolResultData {
  const content = message.content;
-  const contentStr =
-    typeof content === 'string' ? content : JSON.stringify(content, null, 2);
+  const contentStr = typeof content === 'string' ? content : JSON.stringify(content, null, 2);

  const displayContent =
    contentStr.length > 500
@@ -284,7 +274,7 @@ export interface MessageDispatchDeps {
 export async function dispatchMessage(
  message: { type: string; subtype?: string },
  turnCount: number,
-  deps: MessageDispatchDeps
+  deps: MessageDispatchDeps,
 ): Promise<MessageDispatchAction> {
  const { execContext, description, progress, auditLogger, logger } = deps;

@@ -298,12 +288,7 @@ export async function dispatchMessage(

      if (assistantResult.cleanedContent.trim()) {
        progress.stop();
-        outputLines(formatAssistantOutput(
-          assistantResult.cleanedContent,
-          execContext,
-          turnCount,
-          description
-        ));
+        outputLines(formatAssistantOutput(assistantResult.cleanedContent, execContext, turnCount, description));
        progress.start();
      }

@@ -323,10 +308,6 @@ export async function dispatchMessage(
        const actualModel = getActualModelName(initMsg.model);
        if (!execContext.useCleanOutput) {
          logger.info(`Model: ${actualModel}, Permission: ${initMsg.permissionMode}`);
-          if (initMsg.mcp_servers && initMsg.mcp_servers.length > 0) {
-            const mcpStatus = initMsg.mcp_servers.map(s => `${s.name}(${s.status})`).join(', ');
-            logger.info(`MCP: ${mcpStatus}`);
-          }
        }
        // Return actual model for tracking in audit logs
        return { type: 'continue', model: actualModel };
@@ -4,8 +4,8 @@
 // it under the terms of the GNU Affero General Public License version 3
 // as published by the Free Software Foundation.

-import { extractAgentType, formatDuration } from '../utils/formatting.js';
 import { AGENTS } from '../session-manager.js';
+import { extractAgentType, formatDuration } from '../utils/formatting.js';
 import type { ExecutionContext, ResultData } from './types.js';

 interface ToolCallInput {
@@ -16,6 +16,7 @@ interface ToolCallInput {
  text?: string;
  action?: string;
  description?: string;
+  command?: string;
  todos?: Array<{
    status: string;
    content: string;
@@ -76,6 +77,80 @@ function extractDomain(url: string): string {
  }
 }

+/**
+ * Format playwright-cli commands into clean progress indicators
+ */
+function formatBrowserAction(command: string): string | null {
+  // Extract subcommand after optional session flag (e.g., "playwright-cli -s=session1 navigate https://example.com")
+  const match = command.match(/playwright-cli\s+(?:-s=\S+\s+)?(\S+)(?:\s+(.*))?/);
+  if (!match) return null;
+
+  const subcommand = match[1];
+  const args = match[2] || '';
+
+  switch (subcommand) {
+    case 'open':
+    case 'goto': {
+      const domain = args.trim() ? extractDomain(args.trim()) : '';
+      return domain ? `🌐 Navigating to ${domain}` : '🌐 Opening browser';
+    }
+    case 'go-back':
+      return '⬅️ Going back';
+    case 'go-forward':
+      return '➡️ Going forward';
+    case 'reload':
+      return '🔄 Reloading page';
+    case 'click':
+    case 'dblclick':
+      return `🖱️ Clicking ${(args || 'element').slice(0, 25)}`;
+    case 'hover':
+      return `👆 Hovering over ${(args || 'element').slice(0, 20)}`;
+    case 'type':
+      return `⌨️ Typing ${(args || 'text').slice(0, 20)}`;
+    case 'press':
+    case 'keydown':
+    case 'keyup':
+      return `⌨️ Pressing ${args || 'key'}`;
+    case 'fill':
+      return `📝 Filling ${(args || 'field').slice(0, 25)}`;
+    case 'select':
+      return '📋 Selecting dropdown option';
+    case 'check':
+    case 'uncheck':
+      return `☑️ ${subcommand === 'check' ? 'Checking' : 'Unchecking'} ${(args || 'element').slice(0, 20)}`;
+    case 'upload':
+      return '📁 Uploading file';
+    case 'drag':
+      return '🖱️ Dragging element';
+    case 'snapshot':
+      return '📸 Taking page snapshot';
+    case 'screenshot':
+      return '📸 Taking screenshot';
+    case 'eval':
+    case 'run-code':
+      return '🔍 Running JavaScript analysis';
+    case 'console':
+      return '📜 Checking console logs';
+    case 'network':
+      return '🌐 Analyzing network traffic';
+    case 'tab-list':
+    case 'tab-new':
+    case 'tab-close':
+    case 'tab-select':
+      return `🗂️ ${subcommand.replace('tab-', '')} browser tab`;
+    case 'dialog-accept':
+      return '💬 Accepting dialog';
+    case 'dialog-dismiss':
+      return '💬 Dismissing dialog';
+    case 'pdf':
+      return '📄 Saving page as PDF';
+    case 'resize':
+      return `🖥️ Resizing browser ${args || ''}`.trim();
+    default:
+      return `🌐 Browser: ${subcommand}`;
+  }
+}
+
 /**
 * Summarize TodoWrite updates into clean progress indicators
 */
@@ -89,118 +164,20 @@ function summarizeTodoUpdate(input: ToolCallInput | undefined): string | null {
  const inProgress = todos.filter((t) => t.status === 'in_progress');

  // Show recently completed tasks
-  if (completed.length > 0) {
-    const recent = completed[completed.length - 1]!;
+  const recent = completed.at(-1);
+  if (recent) {
    return `✅ ${recent.content}`;
  }

  // Show current in-progress task
-  if (inProgress.length > 0) {
-    const current = inProgress[0]!;
+  const current = inProgress.at(0);
+  if (current) {
    return `🔄 ${current.content}`;
  }

  return null;
 }

-/**
- * Format browser tool calls into clean progress indicators
- */
-function formatBrowserAction(toolCall: ToolCall): string {
-  const toolName = toolCall.name;
-  const input = toolCall.input || {};
-
-  // Core Browser Operations
-  if (toolName === 'mcp__playwright__browser_navigate') {
-    const url = input.url || '';
-    const domain = extractDomain(url);
-    return `🌐 Navigating to ${domain}`;
-  }
-
-  if (toolName === 'mcp__playwright__browser_navigate_back') {
-    return `⬅️ Going back`;
-  }
-
-  // Page Interaction
-  if (toolName === 'mcp__playwright__browser_click') {
-    const element = input.element || 'element';
-    return `🖱️ Clicking ${element.slice(0, 25)}`;
-  }
-
-  if (toolName === 'mcp__playwright__browser_hover') {
-    const element = input.element || 'element';
-    return `👆 Hovering over ${element.slice(0, 20)}`;
-  }
-
-  if (toolName === 'mcp__playwright__browser_type') {
-    const element = input.element || 'field';
-    return `⌨️ Typing in ${element.slice(0, 20)}`;
-  }
-
-  if (toolName === 'mcp__playwright__browser_press_key') {
-    const key = input.key || 'key';
-    return `⌨️ Pressing ${key}`;
-  }
-
-  // Form Handling
-  if (toolName === 'mcp__playwright__browser_fill_form') {
-    const fieldCount = input.fields?.length || 0;
-    return `📝 Filling ${fieldCount} form fields`;
-  }
-
-  if (toolName === 'mcp__playwright__browser_select_option') {
-    return `📋 Selecting dropdown option`;
-  }
-
-  if (toolName === 'mcp__playwright__browser_file_upload') {
-    return `📁 Uploading file`;
-  }
-
-  // Page Analysis
-  if (toolName === 'mcp__playwright__browser_snapshot') {
-    return `📸 Taking page snapshot`;
-  }
-
-  if (toolName === 'mcp__playwright__browser_take_screenshot') {
-    return `📸 Taking screenshot`;
-  }
-
-  if (toolName === 'mcp__playwright__browser_evaluate') {
-    return `🔍 Running JavaScript analysis`;
-  }
-
-  // Waiting & Monitoring
-  if (toolName === 'mcp__playwright__browser_wait_for') {
-    if (input.text) {
-      return `⏳ Waiting for "${input.text.slice(0, 20)}"`;
-    }
-    return `⏳ Waiting for page response`;
-  }
-
-  if (toolName === 'mcp__playwright__browser_console_messages') {
-    return `📜 Checking console logs`;
-  }
-
-  if (toolName === 'mcp__playwright__browser_network_requests') {
-    return `🌐 Analyzing network traffic`;
-  }
-
-  // Tab Management
-  if (toolName === 'mcp__playwright__browser_tabs') {
-    const action = input.action || 'managing';
-    return `🗂️ ${action} browser tab`;
-  }
-
-  // Dialog Handling
-  if (toolName === 'mcp__playwright__browser_handle_dialog') {
-    return `💬 Handling browser dialog`;
-  }
-
-  // Fallback for any missed tools
-  const actionType = toolName.split('_').pop();
-  return `🌐 Browser: ${actionType}`;
-}
-
 /**
 * Filter out JSON tool calls from content, with special handling for Task calls
 */
@@ -241,17 +218,16 @@ export function filterJsonToolCalls(content: string | null | undefined): string
          continue;
        }

-        // Special handling for browser tool calls
-        if (toolCall.name.startsWith('mcp__playwright__browser_')) {
-          const browserAction = formatBrowserAction(toolCall);
-          if (browserAction) {
-            processedLines.push(browserAction);
+        // Special handling for browser tool calls (playwright-cli via Bash)
+        if (toolCall.name === 'Bash') {
+          const command = toolCall.input?.command || '';
+          if (command.includes('playwright-cli')) {
+            const browserAction = formatBrowserAction(command);
+            if (browserAction) {
+              processedLines.push(browserAction);
+            }
          }
-          continue;
        }
-
-        // Hide all other tool calls (Read, Write, Grep, etc.)
-        continue;
      } catch {
        // If JSON parsing fails, treat as regular text
        processedLines.push(line);
@@ -266,8 +242,7 @@ export function filterJsonToolCalls(content: string | null | undefined): string
 }

 export function detectExecutionContext(description: string): ExecutionContext {
-  const isParallelExecution =
-    description.includes('vuln agent') || description.includes('exploit agent');
+  const isParallelExecution = description.includes('vuln agent') || description.includes('exploit agent');

  const useCleanOutput =
    description.includes('Pre-recon agent') ||
@@ -287,7 +262,7 @@ export function formatAssistantOutput(
  cleanedContent: string,
  context: ExecutionContext,
  turnCount: number,
-  description: string
+  description: string,
 ): string[] {
  if (!cleanedContent.trim()) {
    return [];
@@ -341,7 +316,7 @@ export function formatErrorOutput(
  description: string,
  duration: number,
  sourceDir: string,
-  isRetryable: boolean
+  isRetryable: boolean,
 ): string[] {
  const lines: string[] = [];

@@ -374,7 +349,7 @@ export function formatCompletionMessage(
  context: ExecutionContext,
  description: string,
  turnCount: number,
-  duration: number
+  duration: number,
 ): string {
  if (context.isParallelExecution) {
    const prefix = getAgentPrefix(description);
@@ -388,10 +363,7 @@ export function formatCompletionMessage(
  return `  Claude Code completed: ${description} (${turnCount} turns) in ${formatDuration(duration)}`;
 }

-export function formatToolUseOutput(
-  toolName: string,
-  input: Record<string, unknown> | undefined
-): string[] {
+export function formatToolUseOutput(toolName: string, input: Record<string, unknown> | undefined): string[] {
  const lines: string[] = [];

  lines.push(`\n    Using Tool: ${toolName}`);
@@ -63,10 +63,7 @@ class NullProgressManager implements ProgressManager {
 }

 // Returns no-op when disabled
-export function createProgressManager(
-  context: ProgressContext,
-  disableLoader: boolean
-): ProgressManager {
+export function createProgressManager(context: ProgressContext, disableLoader: boolean): ProgressManager {
  if (!context.useCleanOutput || disableLoader) {
    return new NullProgressManager();
  }
@@ -25,4 +25,3 @@ export function getActualModelName(sdkReportedModel?: string): string | undefine
  // Fall back to SDK-reported model
  return sdkReportedModel;
 }
-
@@ -53,7 +53,6 @@ export interface ContentBlock {
  text?: string;
 }

-
 export interface AssistantMessage {
  type: 'assistant';
  error?: SDKAssistantMessageError;
@@ -93,10 +92,8 @@ export interface SystemInitMessage {
  subtype: 'init';
  model?: string;
  permissionMode?: string;
-  mcp_servers?: Array<{ name: string; status: string }>;
 }

 export interface UserMessage {
  type: 'user';
 }
-
@@ -11,15 +11,15 @@
 * crash-safe audit logging.
 */

-import { AgentLogger } from './logger.js';
-import { WorkflowLogger, type AgentLogDetails, type WorkflowSummary } from './workflow-logger.js';
-import { MetricsTracker } from './metrics-tracker.js';
-import { initializeAuditStructure, type SessionMetadata } from './utils.js';
-import { formatTimestamp } from '../utils/formatting.js';
-import { SessionMutex } from '../utils/concurrency.js';
-import type { AgentEndResult } from '../types/index.js';
 import { PentestError } from '../services/error-handling.js';
 import { ErrorCode } from '../types/errors.js';
+import type { AgentEndResult } from '../types/index.js';
+import { SessionMutex } from '../utils/concurrency.js';
+import { formatTimestamp } from '../utils/formatting.js';
+import { AgentLogger } from './logger.js';
+import { MetricsTracker } from './metrics-tracker.js';
+import { initializeAuditStructure, type SessionMetadata } from './utils.js';
+import { type AgentLogDetails, WorkflowLogger, type WorkflowSummary } from './workflow-logger.js';

 // Global mutex instance
 const sessionMutex = new SessionMutex();
@@ -47,7 +47,7 @@ export class AuditSession {
        'config',
        false,
        { field: 'sessionMetadata.id' },
-        ErrorCode.CONFIG_VALIDATION_FAILED
+        ErrorCode.CONFIG_VALIDATION_FAILED,
      );
    }
    if (!this.sessionMetadata.webUrl) {
@@ -56,7 +56,7 @@ export class AuditSession {
        'config',
        false,
        { field: 'sessionMetadata.webUrl' },
-        ErrorCode.CONFIG_VALIDATION_FAILED
+        ErrorCode.CONFIG_VALIDATION_FAILED,
      );
    }

@@ -82,8 +82,8 @@ export class AuditSession {
    // Initialize metrics tracker (loads or creates session.json)
    await this.metricsTracker.initialize(workflowId);

-    // Initialize workflow logger
-    await this.workflowLogger.initialize();
+    // Initialize workflow logger with actual Temporal workflow ID
+    await this.workflowLogger.initialize(workflowId);

    this.initialized = true;
  }
@@ -100,11 +100,7 @@ export class AuditSession {
  /**
   * Start agent execution
   */
-  async startAgent(
-    agentName: string,
-    promptContent: string,
-    attemptNumber: number = 1
-  ): Promise<void> {
+  async startAgent(agentName: string, promptContent: string, attemptNumber: number = 1): Promise<void> {
    await this.ensureInitialized();

    // 1. Save prompt snapshot (only on first attempt)
@@ -140,7 +136,7 @@ export class AuditSession {
        'validation',
        false,
        {},
-        ErrorCode.AGENT_EXECUTION_FAILED
+        ErrorCode.AGENT_EXECUTION_FAILED,
      );
    }

@@ -152,18 +148,10 @@ export class AuditSession {
    const agentName = this.currentAgentName || 'unknown';
    switch (eventType) {
      case 'tool_start':
-        await this.workflowLogger.logToolStart(
-          agentName,
-          String(data.toolName || ''),
-          data.parameters
-        );
+        await this.workflowLogger.logToolStart(agentName, String(data.toolName || ''), data.parameters);
        break;
      case 'llm_response':
-        await this.workflowLogger.logLlmResponse(
-          agentName,
-          Number(data.turn || 0),
-          String(data.content || '')
-        );
+        await this.workflowLogger.logLlmResponse(agentName, Number(data.turn || 0), String(data.content || ''));
        break;
      // tool_end and error events are intentionally not logged to workflow log
      // to reduce noise - the agent completion message captures the outcome
@@ -266,11 +254,7 @@ export class AuditSession {
   * @param terminatedWorkflows - IDs of workflows that were terminated
   * @param checkpointHash - Git checkpoint hash that was restored
   */
-  async addResumeAttempt(
-    workflowId: string,
-    terminatedWorkflows: string[],
-    checkpointHash?: string
-  ): Promise<void> {
+  async addResumeAttempt(workflowId: string, terminatedWorkflows: string[], checkpointHash?: string): Promise<void> {
    await this.ensureInitialized();

    const unlock = await sessionMutex.lock(this.sessionId);
@@ -12,8 +12,8 @@
 * and proper cleanup.
 */

-import fs from 'fs';
-import path from 'path';
+import fs from 'node:fs';
+import path from 'node:path';
 import { ensureDirectory } from '../utils/file-io.js';

 /**
@@ -103,7 +103,7 @@ export class LogStream {
    }

    return new Promise((resolve) => {
-      this.stream!.end(() => {
+      this.stream?.end(() => {
        this._isOpen = false;
        this.stream = null;
        resolve();
@@ -11,14 +11,10 @@
 * Uses LogStream for stream management with backpressure handling.
 */

-import {
-  generateLogPath,
-  generatePromptPath,
-  type SessionMetadata,
-} from './utils.js';
 import { atomicWrite } from '../utils/file-io.js';
 import { formatTimestamp } from '../utils/formatting.js';
 import { LogStream } from './log-stream.js';
+import { generateLogPath, generatePromptPath, type SessionMetadata } from './utils.js';

 interface LogEvent {
  type: string;
@@ -103,11 +99,7 @@ export class AgentLogger {
   * Save prompt snapshot to prompts directory
   * Static method - doesn't require logger instance
   */
-  static async savePrompt(
-    sessionMetadata: SessionMetadata,
-    agentName: string,
-    promptContent: string
-  ): Promise<void> {
+  static async savePrompt(sessionMetadata: SessionMetadata, agentName: string, promptContent: string): Promise<void> {
    const promptPath = generatePromptPath(sessionMetadata, agentName);

    // Create header with metadata
@@ -11,16 +11,13 @@
 * Tracks attempt-level data for complete forensic trail.
 */

-import {
-  generateSessionJsonPath,
-  type SessionMetadata,
-} from './utils.js';
-import { atomicWrite, readJson, fileExists } from '../utils/file-io.js';
-import { formatTimestamp, calculatePercentage } from '../utils/formatting.js';
-import { AGENT_PHASE_MAP, type PhaseName } from '../session-manager.js';
 import { PentestError } from '../services/error-handling.js';
+import { AGENT_PHASE_MAP, type PhaseName } from '../session-manager.js';
 import { ErrorCode } from '../types/errors.js';
-import type { AgentName, AgentEndResult } from '../types/index.js';
+import type { AgentEndResult, AgentName } from '../types/index.js';
+import { atomicWrite, fileExists, readJson } from '../utils/file-io.js';
+import { calculatePercentage, formatTimestamp } from '../utils/formatting.js';
+import { generateSessionJsonPath, type SessionMetadata } from './utils.js';

 interface AttemptData {
  attempt_number: number;
@@ -166,7 +163,7 @@ export class MetricsTracker {
        'validation',
        false,
        {},
-        ErrorCode.AGENT_EXECUTION_FAILED
+        ErrorCode.AGENT_EXECUTION_FAILED,
      );
    }

@@ -254,18 +251,14 @@ export class MetricsTracker {
   * @param terminatedWorkflows - IDs of workflows that were terminated
   * @param checkpointHash - Git checkpoint hash that was restored
   */
-  async addResumeAttempt(
-    workflowId: string,
-    terminatedWorkflows: string[],
-    checkpointHash?: string
-  ): Promise<void> {
+  async addResumeAttempt(workflowId: string, terminatedWorkflows: string[], checkpointHash?: string): Promise<void> {
    if (!this.data) {
      throw new PentestError(
        'MetricsTracker not initialized',
        'validation',
        false,
        {},
-        ErrorCode.AGENT_EXECUTION_FAILED
+        ErrorCode.AGENT_EXECUTION_FAILED,
      );
    }

@@ -307,15 +300,10 @@ export class MetricsTracker {
    const agents = this.data.metrics.agents;

    // Only count successful agents
-    const successfulAgents = Object.entries(agents).filter(
-      ([, data]) => data.status === 'success'
-    );
+    const successfulAgents = Object.entries(agents).filter(([, data]) => data.status === 'success');

    // Calculate total duration and cost
-    const totalDuration = successfulAgents.reduce(
-      (sum, [, data]) => sum + data.final_duration_ms,
-      0
-    );
+    const totalDuration = successfulAgents.reduce((sum, [, data]) => sum + data.final_duration_ms, 0);

    const totalCost = successfulAgents.reduce((sum, [, data]) => sum + data.total_cost_usd, 0);

@@ -329,15 +317,13 @@ export class MetricsTracker {
  /**
   * Calculate phase-level metrics
   */
-  private calculatePhaseMetrics(
-    successfulAgents: Array<[string, AgentAuditMetrics]>
-  ): Record<string, PhaseMetrics> {
+  private calculatePhaseMetrics(successfulAgents: Array<[string, AgentAuditMetrics]>): Record<string, PhaseMetrics> {
    const phases: Record<PhaseName, AgentAuditMetrics[]> = {
      'pre-recon': [],
-      'recon': [],
+      recon: [],
      'vulnerability-analysis': [],
-      'exploitation': [],
-      'reporting': [],
+      exploitation: [],
+      reporting: [],
    };

    // Group agents by phase using imported AGENT_PHASE_MAP
@@ -350,6 +336,7 @@ export class MetricsTracker {

    // Calculate metrics per phase
    const phaseMetrics: Record<string, PhaseMetrics> = {};
+    // biome-ignore lint/style/noNonNullAssertion: called from recalculateAggregations which guards this.data
    const totalDuration = this.data!.metrics.total_duration_ms;

    for (const [phaseName, agentList] of Object.entries(phases)) {
@@ -11,22 +11,15 @@
 * All functions are pure and crash-safe.
 */

-import fs from 'fs/promises';
-import path from 'path';
-import { fileURLToPath } from 'url';
-
+import fs from 'node:fs/promises';
+import path from 'node:path';
+import { WORKSPACES_DIR } from '../paths.js';
 import { ensureDirectory } from '../utils/file-io.js';

 export type { SessionMetadata } from '../types/audit.js';
+
 import type { SessionMetadata } from '../types/audit.js';

-const __filename = fileURLToPath(import.meta.url);
-const __dirname = path.dirname(__filename);
-
-// Get Shannon repository root
-const SHANNON_ROOT = path.resolve(__dirname, '..', '..');
-const AUDIT_LOGS_DIR = path.join(SHANNON_ROOT, 'audit-logs');
-
 /**
 * Extract and sanitize hostname from URL for use in identifiers
 */
@@ -44,11 +37,11 @@ export function generateSessionIdentifier(sessionMetadata: SessionMetadata): str

 /**
 * Generate path to audit log directory for a session
- * Uses custom outputPath if provided, otherwise defaults to AUDIT_LOGS_DIR
+ * Uses custom outputPath if provided, otherwise defaults to WORKSPACES_DIR
 */
 export function generateAuditPath(sessionMetadata: SessionMetadata): string {
  const sessionIdentifier = generateSessionIdentifier(sessionMetadata);
-  const baseDir = sessionMetadata.outputPath || AUDIT_LOGS_DIR;
+  const baseDir = sessionMetadata.outputPath || WORKSPACES_DIR;
  return path.join(baseDir, sessionIdentifier);
 }

@@ -59,7 +52,7 @@ export function generateLogPath(
  sessionMetadata: SessionMetadata,
  agentName: string,
  timestamp: number,
-  attemptNumber: number
+  attemptNumber: number,
 ): string {
  const auditPath = generateAuditPath(sessionMetadata);
  const filename = `${timestamp}_${agentName}_attempt-${attemptNumber}.log`;
@@ -92,7 +85,7 @@ export function generateWorkflowLogPath(sessionMetadata: SessionMetadata): strin

 /**
 * Initialize audit directory structure for a session
- * Creates: audit-logs/{sessionId}/, agents/, prompts/, deliverables/
+ * Creates: workspaces/{sessionId}/, agents/, prompts/, deliverables/
 */
 export async function initializeAuditStructure(sessionMetadata: SessionMetadata): Promise<void> {
  const auditPath = generateAuditPath(sessionMetadata);
@@ -107,13 +100,10 @@ export async function initializeAuditStructure(sessionMetadata: SessionMetadata)
 }

 /**
- * Copy deliverable files from repo to audit-logs for self-contained audit trail.
+ * Copy deliverable files from repo to workspaces for self-contained audit trail.
 * No-ops if source directory doesn't exist. Idempotent and parallel-safe.
 */
-export async function copyDeliverablesToAudit(
-  sessionMetadata: SessionMetadata,
-  repoPath: string
-): Promise<void> {
+export async function copyDeliverablesToAudit(sessionMetadata: SessionMetadata, repoPath: string): Promise<void> {
  const sourceDir = path.join(repoPath, 'deliverables');
  const destDir = path.join(generateAuditPath(sessionMetadata), 'deliverables');

@@ -11,10 +11,10 @@
 * Optimized for `tail -f` viewing during concurrent workflow execution.
 */

-import fs from 'fs/promises';
-import { generateWorkflowLogPath, type SessionMetadata } from './utils.js';
+import fs from 'node:fs/promises';
 import { formatDuration, formatTimestamp } from '../utils/formatting.js';
 import { LogStream } from './log-stream.js';
+import { generateWorkflowLogPath, type SessionMetadata } from './utils.js';

 export interface AgentLogDetails {
  attemptNumber?: number;
@@ -44,6 +44,7 @@ export interface WorkflowSummary {
 export class WorkflowLogger {
  private readonly sessionMetadata: SessionMetadata;
  private readonly logStream: LogStream;
+  private workflowId: string | undefined;

  constructor(sessionMetadata: SessionMetadata) {
    this.sessionMetadata = sessionMetadata;
@@ -54,7 +55,11 @@ export class WorkflowLogger {
  /**
   * Initialize the log stream (creates file and writes header)
   */
-  async initialize(): Promise<void> {
+  async initialize(workflowId?: string): Promise<void> {
+    if (workflowId) {
+      this.workflowId = workflowId;
+    }
+
    if (this.logStream.isOpen) {
      return;
    }
@@ -76,7 +81,7 @@ export class WorkflowLogger {
      `================================================================================`,
      `Shannon Pentest - Workflow Log`,
      `================================================================================`,
-      `Workflow ID: ${this.sessionMetadata.id}`,
+      `Workflow ID: ${this.workflowId ?? this.sessionMetadata.id}`,
      `Target URL:  ${this.sessionMetadata.webUrl}`,
      `Started:     ${formatTimestamp()}`,
      `================================================================================`,
@@ -142,11 +147,7 @@ export class WorkflowLogger {
  /**
   * Log an agent event
   */
-  async logAgent(
-    agentName: string,
-    event: 'start' | 'end',
-    details?: AgentLogDetails
-  ): Promise<void> {
+  async logAgent(agentName: string, event: 'start' | 'end', details?: AgentLogDetails): Promise<void> {
    await this.ensureInitialized();

    let message: string;
@@ -155,7 +156,7 @@ export class WorkflowLogger {
      const attempt = details?.attemptNumber ?? 1;
      message = `${agentName}: Starting (attempt ${attempt})`;
    } else {
-      const parts: string[] = [agentName + ':'];
+      const parts: string[] = [`${agentName}:`];

      if (details?.success === false) {
        parts.push('Failed');
@@ -208,7 +209,7 @@ export class WorkflowLogger {
   */
  private truncate(str: string, maxLen: number): string {
    if (str.length <= maxLen) return str;
-    return str.slice(0, maxLen - 3) + '...';
+    return `${str.slice(0, maxLen - 3)}...`;
  }

  /**
@@ -259,22 +260,6 @@ export class WorkflowLogger {
          return String(p.url);
        }
        break;
-      case 'mcp__playwright__browser_navigate':
-        if (p.url) {
-          return String(p.url);
-        }
-        break;
-      case 'mcp__playwright__browser_click':
-        if (p.selector) {
-          return this.truncate(String(p.selector), 60);
-        }
-        break;
-      case 'mcp__playwright__browser_type':
-        if (p.selector) {
-          const text = p.text ? `: "${this.truncate(String(p.text), 30)}"` : '';
-          return `${this.truncate(String(p.selector), 40)}${text}`;
-        }
-        break;
    }

    // Default: show first string-valued param truncated
@@ -322,11 +307,9 @@ export class WorkflowLogger {
    const label = 'Error:       ';
    const indent = ' '.repeat(label.length);

-    const lines = segments.map((segment, i) =>
-      i === 0 ? `${label}${segment.trim()}` : `${indent}${segment.trim()}`
-    );
+    const lines = segments.map((segment, i) => (i === 0 ? `${label}${segment.trim()}` : `${indent}${segment.trim()}`));

-    return lines.join('\n') + '\n';
+    return `${lines.join('\n')}\n`;
  }

  /**
@@ -337,35 +320,40 @@ export class WorkflowLogger {

    const status = summary.status === 'completed' ? 'COMPLETED' : 'FAILED';

-    await this.logStream.write('\n');
-    await this.logStream.write(`================================================================================\n`);
-    await this.logStream.write(`Workflow ${status}\n`);
-    await this.logStream.write(`────────────────────────────────────────\n`);
-    await this.logStream.write(`Workflow ID: ${this.sessionMetadata.id}\n`);
-    await this.logStream.write(`Status:      ${summary.status}\n`);
-    await this.logStream.write(`Duration:    ${formatDuration(summary.totalDurationMs)}\n`);
-    await this.logStream.write(`Total Cost:  $${summary.totalCostUsd.toFixed(4)}\n`);
-    await this.logStream.write(`Agents:      ${summary.completedAgents.length} completed\n`);
+    const lines: string[] = [
+      '',
+      '================================================================================',
+      `Workflow ${status}`,
+      '────────────────────────────────────────',
+      `Workflow ID: ${this.workflowId ?? this.sessionMetadata.id}`,
+      `Status:      ${summary.status}`,
+      `Duration:    ${formatDuration(summary.totalDurationMs)}`,
+      `Total Cost:  $${summary.totalCostUsd.toFixed(4)}`,
+      `Agents:      ${summary.completedAgents.length} completed`,
+    ];

    if (summary.error) {
-      await this.logStream.write(this.formatErrorBlock(summary.error));
+      lines.push(this.formatErrorBlock(summary.error).trimEnd());
    }

-    await this.logStream.write(`\n`);
-    await this.logStream.write(`Agent Breakdown:\n`);
+    lines.push('');
+    lines.push('Agent Breakdown:');

    for (const agentName of summary.completedAgents) {
      const metrics = summary.agentMetrics[agentName];
      if (metrics) {
        const duration = formatDuration(metrics.durationMs);
        const cost = metrics.costUsd !== null ? `$${metrics.costUsd.toFixed(4)}` : 'N/A';
-        await this.logStream.write(`  - ${agentName} (${duration}, ${cost})\n`);
+        lines.push(`  - ${agentName} (${duration}, ${cost})`);
      } else {
-        await this.logStream.write(`  - ${agentName}\n`);
+        lines.push(`  - ${agentName}`);
      }
    }

-    await this.logStream.write(`================================================================================\n`);
+    lines.push('================================================================================');
+
+    // Single atomic write to prevent interleaved/duplicate output in log tailers
+    await this.logStream.write(`${lines.join('\n')}\n`);
  }

  /**
@@ -4,19 +4,14 @@
 // it under the terms of the GNU Affero General Public License version 3
 // as published by the Free Software Foundation.

-import { createRequire } from 'module';
-import { fs } from 'zx';
-import yaml from 'js-yaml';
-import { Ajv, type ValidateFunction, type ErrorObject } from 'ajv';
+import { createRequire } from 'node:module';
+import { Ajv, type ErrorObject, type ValidateFunction } from 'ajv';
 import type { FormatsPlugin } from 'ajv-formats';
+import yaml from 'js-yaml';
+import { fs } from 'zx';
 import { PentestError } from './services/error-handling.js';
+import type { Authentication, Config, DistributedConfig, Rule } from './types/config.js';
 import { ErrorCode } from './types/errors.js';
-import type {
-  Config,
-  Rule,
-  Authentication,
-  DistributedConfig,
-} from './types/config.js';

 // Handle ESM/CJS interop for ajv-formats using require
 const require = createRequire(import.meta.url);
@@ -35,12 +30,10 @@ try {
  validateSchema = ajv.compile(configSchema);
 } catch (error) {
  const errMsg = error instanceof Error ? error.message : String(error);
-  throw new PentestError(
-    `Failed to load configuration schema: ${errMsg}`,
-    'config',
-    false,
-    { schemaPath: '../configs/config-schema.json', originalError: errMsg }
-  );
+  throw new PentestError(`Failed to load configuration schema: ${errMsg}`, 'config', false, {
+    schemaPath: '../configs/config-schema.json',
+    originalError: errMsg,
+  });
 }

 const DANGEROUS_PATTERNS: RegExp[] = [
@@ -185,7 +178,7 @@ export const parseConfig = async (configPath: string): Promise<Config> => {
        'config',
        false,
        { configPath },
-        ErrorCode.CONFIG_NOT_FOUND
+        ErrorCode.CONFIG_NOT_FOUND,
      );
    }

@@ -198,7 +191,7 @@ export const parseConfig = async (configPath: string): Promise<Config> => {
        'config',
        false,
        { configPath, fileSize: stats.size, maxFileSize },
-        ErrorCode.CONFIG_VALIDATION_FAILED
+        ErrorCode.CONFIG_VALIDATION_FAILED,
      );
    }

@@ -211,7 +204,7 @@ export const parseConfig = async (configPath: string): Promise<Config> => {
        'config',
        false,
        { configPath },
-        ErrorCode.CONFIG_VALIDATION_FAILED
+        ErrorCode.CONFIG_VALIDATION_FAILED,
      );
    }

@@ -230,7 +223,7 @@ export const parseConfig = async (configPath: string): Promise<Config> => {
        'config',
        false,
        { configPath, originalError: errMsg },
-        ErrorCode.CONFIG_PARSE_ERROR
+        ErrorCode.CONFIG_PARSE_ERROR,
      );
    }

@@ -241,7 +234,7 @@ export const parseConfig = async (configPath: string): Promise<Config> => {
        'config',
        false,
        { configPath },
-        ErrorCode.CONFIG_PARSE_ERROR
+        ErrorCode.CONFIG_PARSE_ERROR,
      );
    }

@@ -260,7 +253,7 @@ export const parseConfig = async (configPath: string): Promise<Config> => {
      'config',
      false,
      { configPath, originalError: errMsg },
-      ErrorCode.CONFIG_PARSE_ERROR
+      ErrorCode.CONFIG_PARSE_ERROR,
    );
  }
 };
@@ -272,7 +265,7 @@ const validateConfig = (config: Config): void => {
      'config',
      false,
      {},
-      ErrorCode.CONFIG_VALIDATION_FAILED
+      ErrorCode.CONFIG_VALIDATION_FAILED,
    );
  }

@@ -282,7 +275,7 @@ const validateConfig = (config: Config): void => {
      'config',
      false,
      {},
-      ErrorCode.CONFIG_VALIDATION_FAILED
+      ErrorCode.CONFIG_VALIDATION_FAILED,
    );
  }

@@ -295,20 +288,18 @@ const validateConfig = (config: Config): void => {
      'config',
      false,
      { validationErrors: errorMessages },
-      ErrorCode.CONFIG_VALIDATION_FAILED
+      ErrorCode.CONFIG_VALIDATION_FAILED,
    );
  }

  performSecurityValidation(config);

-  if (!config.rules && !config.authentication) {
+  if (!config.rules && !config.authentication && !config.description) {
    console.warn(
-      '⚠️  Configuration file contains no rules or authentication. The pentest will run without any scoping restrictions or login capabilities.'
+      '⚠️  Configuration file contains no rules, authentication, or description. The pentest will run without any scoping restrictions or login capabilities.',
    );
  } else if (config.rules && !config.rules.avoid && !config.rules.focus) {
-    console.warn(
-      '⚠️  Configuration file contains no rules. The pentest will run without any scoping restrictions.'
-    );
+    console.warn('⚠️  Configuration file contains no rules. The pentest will run without any scoping restrictions.');
  }
 };

@@ -325,7 +316,7 @@ const performSecurityValidation = (config: Config): void => {
            'config',
            false,
            { field: 'login_url', pattern: pattern.source },
-            ErrorCode.CONFIG_VALIDATION_FAILED
+            ErrorCode.CONFIG_VALIDATION_FAILED,
          );
        }
      }
@@ -339,7 +330,7 @@ const performSecurityValidation = (config: Config): void => {
            'config',
            false,
            { field: 'credentials.username', pattern: pattern.source },
-            ErrorCode.CONFIG_VALIDATION_FAILED
+            ErrorCode.CONFIG_VALIDATION_FAILED,
          );
        }
        if (pattern.test(auth.credentials.password)) {
@@ -348,7 +339,7 @@ const performSecurityValidation = (config: Config): void => {
            'config',
            false,
            { field: 'credentials.password', pattern: pattern.source },
-            ErrorCode.CONFIG_VALIDATION_FAILED
+            ErrorCode.CONFIG_VALIDATION_FAILED,
          );
        }
      }
@@ -363,7 +354,7 @@ const performSecurityValidation = (config: Config): void => {
              'config',
              false,
              { field: `login_flow[${index}]`, pattern: pattern.source },
-              ErrorCode.CONFIG_VALIDATION_FAILED
+              ErrorCode.CONFIG_VALIDATION_FAILED,
            );
          }
        }
@@ -379,6 +370,20 @@ const performSecurityValidation = (config: Config): void => {
    checkForDuplicates(config.rules.focus || [], 'focus');
    checkForConflicts(config.rules.avoid, config.rules.focus);
  }
+
+  if (config.description) {
+    for (const pattern of DANGEROUS_PATTERNS) {
+      if (pattern.test(config.description)) {
+        throw new PentestError(
+          `description contains potentially dangerous pattern: ${pattern.source}`,
+          'config',
+          false,
+          { field: 'description', pattern: pattern.source },
+          ErrorCode.CONFIG_VALIDATION_FAILED,
+        );
+      }
+    }
+  }
 };

 const validateRulesSecurity = (rules: Rule[] | undefined, ruleType: string): void => {
@@ -392,7 +397,7 @@ const validateRulesSecurity = (rules: Rule[] | undefined, ruleType: string): voi
          'config',
          false,
          { field: `rules.${ruleType}[${index}].url_path`, pattern: pattern.source },
-          ErrorCode.CONFIG_VALIDATION_FAILED
+          ErrorCode.CONFIG_VALIDATION_FAILED,
        );
      }
      if (pattern.test(rule.description)) {
@@ -401,7 +406,7 @@ const validateRulesSecurity = (rules: Rule[] | undefined, ruleType: string): voi
          'config',
          false,
          { field: `rules.${ruleType}[${index}].description`, pattern: pattern.source },
-          ErrorCode.CONFIG_VALIDATION_FAILED
+          ErrorCode.CONFIG_VALIDATION_FAILED,
        );
      }
    }
@@ -421,7 +426,7 @@ const validateRuleTypeSpecific = (rule: Rule, ruleType: string, index: number):
          'config',
          false,
          { field, ruleType: rule.type },
-          ErrorCode.CONFIG_VALIDATION_FAILED
+          ErrorCode.CONFIG_VALIDATION_FAILED,
        );
      }
      break;
@@ -435,7 +440,7 @@ const validateRuleTypeSpecific = (rule: Rule, ruleType: string, index: number):
          'config',
          false,
          { field, ruleType: rule.type },
-          ErrorCode.CONFIG_VALIDATION_FAILED
+          ErrorCode.CONFIG_VALIDATION_FAILED,
        );
      }
      // Must contain at least one dot for domains
@@ -445,7 +450,7 @@ const validateRuleTypeSpecific = (rule: Rule, ruleType: string, index: number):
          'config',
          false,
          { field, ruleType: rule.type },
-          ErrorCode.CONFIG_VALIDATION_FAILED
+          ErrorCode.CONFIG_VALIDATION_FAILED,
        );
      }
      break;
@@ -458,7 +463,7 @@ const validateRuleTypeSpecific = (rule: Rule, ruleType: string, index: number):
          'config',
          false,
          { field, ruleType: rule.type, allowedMethods },
-          ErrorCode.CONFIG_VALIDATION_FAILED
+          ErrorCode.CONFIG_VALIDATION_FAILED,
        );
      }
      break;
@@ -471,7 +476,7 @@ const validateRuleTypeSpecific = (rule: Rule, ruleType: string, index: number):
          'config',
          false,
          { field, ruleType: rule.type },
-          ErrorCode.CONFIG_VALIDATION_FAILED
+          ErrorCode.CONFIG_VALIDATION_FAILED,
        );
      }
      break;
@@ -483,7 +488,7 @@ const validateRuleTypeSpecific = (rule: Rule, ruleType: string, index: number):
          'config',
          false,
          { field, ruleType: rule.type },
-          ErrorCode.CONFIG_VALIDATION_FAILED
+          ErrorCode.CONFIG_VALIDATION_FAILED,
        );
      }
      break;
@@ -500,7 +505,7 @@ const checkForDuplicates = (rules: Rule[], ruleType: string): void => {
        'config',
        false,
        { field: `rules.${ruleType}[${index}]`, ruleType: rule.type, urlPath: rule.url_path },
-        ErrorCode.CONFIG_VALIDATION_FAILED
+        ErrorCode.CONFIG_VALIDATION_FAILED,
      );
    }
    seen.add(key);
@@ -518,7 +523,7 @@ const checkForConflicts = (avoidRules: Rule[] = [], focusRules: Rule[] = []): vo
        'config',
        false,
        { field: `rules.focus[${index}]`, urlPath: rule.url_path },
-        ErrorCode.CONFIG_VALIDATION_FAILED
+        ErrorCode.CONFIG_VALIDATION_FAILED,
      );
    }
  });
@@ -536,11 +541,13 @@ export const distributeConfig = (config: Config | null): DistributedConfig => {
  const avoid = config?.rules?.avoid || [];
  const focus = config?.rules?.focus || [];
  const authentication = config?.authentication || null;
+  const description = config?.description?.trim() || '';

  return {
    avoid: avoid.map(sanitizeRule),
    focus: focus.map(sanitizeRule),
    authentication: authentication ? sanitizeAuthentication(authentication) : null,
+    description,
  };
 };

@@ -0,0 +1,30 @@
+/** Centralized path constants for the worker package */
+
+import fs from 'node:fs';
+import path from 'node:path';
+
+/** Worker package root (apps/worker/) resolved from compiled dist/ files */
+const WORKER_ROOT = path.resolve(import.meta.dirname, '..');
+
+export const PROMPTS_DIR = path.join(WORKER_ROOT, 'prompts');
+export const CONFIGS_DIR = path.join(WORKER_ROOT, 'configs');
+
+/**
+ * Repository root — walk up from WORKER_ROOT looking for pnpm-workspace.yaml.
+ * Falls back to two levels up (apps/worker/ → repo root) if not found.
+ */
+function findRepoRoot(): string {
+  let dir = WORKER_ROOT;
+  for (let i = 0; i < 5; i++) {
+    if (fs.existsSync(path.join(dir, 'pnpm-workspace.yaml'))) {
+      return dir;
+    }
+    const parent = path.dirname(dir);
+    if (parent === dir) break;
+    dir = parent;
+  }
+  return path.resolve(WORKER_ROOT, '..', '..');
+}
+
+const REPO_ROOT = findRepoRoot();
+export const WORKSPACES_DIR = path.join(REPO_ROOT, 'workspaces');
@@ -37,7 +37,7 @@ export class ProgressIndicator {
    }

    // Clear the spinner line
-    process.stdout.write('\r' + ' '.repeat(this.message.length + 5) + '\r');
+    process.stdout.write(`\r${' '.repeat(this.message.length + 5)}\r`);
    this.isRunning = false;
  }

@@ -0,0 +1,137 @@
+#!/usr/bin/env node
+
+// Copyright (C) 2025 Keygraph, Inc.
+//
+// This program is free software: you can redistribute it and/or modify
+// it under the terms of the GNU Affero General Public License version 3
+// as published by the Free Software Foundation.
+
+/**
+ * generate-totp CLI
+ *
+ * Generates 6-digit TOTP codes for authentication.
+ * Replaces the MCP generate_totp tool.
+ * Based on RFC 6238 (TOTP) and RFC 4226 (HOTP).
+ *
+ * Usage:
+ *   generate-totp --secret JBSWY3DPEHPK3PXP
+ */
+
+import { createHmac } from 'node:crypto';
+
+// === Base32 Decoding ===
+
+function base32Decode(encoded: string): Buffer {
+  const alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567';
+  const cleanInput = encoded.toUpperCase().replace(/[^A-Z2-7]/g, '');
+
+  if (cleanInput.length === 0) {
+    throw new Error('TOTP secret is empty after cleaning');
+  }
+
+  const output: number[] = [];
+  let bits = 0;
+  let value = 0;
+
+  for (const char of cleanInput) {
+    const index = alphabet.indexOf(char);
+    if (index === -1) {
+      throw new Error(`Invalid base32 character: ${char}`);
+    }
+
+    value = (value << 5) | index;
+    bits += 5;
+
+    if (bits >= 8) {
+      output.push((value >>> (bits - 8)) & 255);
+      bits -= 8;
+    }
+  }
+
+  return Buffer.from(output);
+}
+
+// === TOTP Generation (RFC 6238) ===
+
+function generateHOTP(secret: string, counter: number, digits: number = 6): string {
+  const key = base32Decode(secret);
+
+  // Convert counter to 8-byte buffer (big-endian)
+  const counterBuffer = Buffer.alloc(8);
+  counterBuffer.writeBigUInt64BE(BigInt(counter));
+
+  // Generate HMAC-SHA1
+  const hmac = createHmac('sha1', key);
+  hmac.update(counterBuffer);
+  const hash = hmac.digest();
+
+  // Dynamic truncation (SHA-1 always produces 20 bytes)
+  const lastByte = hash[hash.length - 1] ?? 0;
+  const offset = lastByte & 0x0f;
+  const code =
+    (((hash[offset] ?? 0) & 0x7f) << 24) |
+    (((hash[offset + 1] ?? 0) & 0xff) << 16) |
+    (((hash[offset + 2] ?? 0) & 0xff) << 8) |
+    ((hash[offset + 3] ?? 0) & 0xff);
+
+  return (code % 10 ** digits).toString().padStart(digits, '0');
+}
+
+function generateTOTP(secret: string, timeStep: number = 30, digits: number = 6): string {
+  const counter = Math.floor(Date.now() / 1000 / timeStep);
+  return generateHOTP(secret, counter, digits);
+}
+
+// === Argument Parsing ===
+
+function parseSecret(argv: string[]): string {
+  for (let i = 2; i < argv.length; i++) {
+    const next = argv[i + 1];
+    if (argv[i] === '--secret' && next) {
+      return next;
+    }
+  }
+  return '';
+}
+
+// === Main ===
+
+function main(): void {
+  const secret = parseSecret(process.argv);
+
+  if (!secret) {
+    console.log(JSON.stringify({ status: 'error', message: 'Missing required --secret argument', retryable: false }));
+    process.exit(1);
+  }
+
+  const base32Regex = /^[A-Z2-7]+$/i;
+  if (!base32Regex.test(secret)) {
+    console.log(
+      JSON.stringify({
+        status: 'error',
+        message: 'Secret must be base32-encoded (characters A-Z and 2-7)',
+        retryable: false,
+      }),
+    );
+    process.exit(1);
+  }
+
+  try {
+    const totpCode = generateTOTP(secret);
+    const expiresIn = 30 - (Math.floor(Date.now() / 1000) % 30);
+
+    console.log(
+      JSON.stringify({
+        status: 'success',
+        totpCode,
+        expiresIn,
+      }),
+    );
+  } catch (error) {
+    const msg = error instanceof Error ? error.message : String(error);
+    console.log(JSON.stringify({ status: 'error', message: `TOTP generation failed: ${msg}`, retryable: false }));
+    process.exit(1);
+  }
+}
+
+main();
@@ -0,0 +1,191 @@
+#!/usr/bin/env node
+
+// Copyright (C) 2025 Keygraph, Inc.
+//
+// This program is free software: you can redistribute it and/or modify
+// it under the terms of the GNU Affero General Public License version 3
+// as published by the Free Software Foundation.
+
+/**
+ * save-deliverable CLI
+ *
+ * Standalone script to save deliverable files with validation.
+ * Replaces the MCP save_deliverable tool.
+ *
+ * Usage:
+ *   node save-deliverable.js --type INJECTION_QUEUE --content '{"vulnerabilities": [...]}'
+ *   node save-deliverable.js --type INJECTION_ANALYSIS --file-path deliverables/injection_analysis_deliverable.md
+ */
+
+import { mkdirSync, readFileSync, writeFileSync } from 'node:fs';
+import { join, resolve } from 'node:path';
+import { DELIVERABLE_FILENAMES, type DeliverableType, isQueueType } from '../types/deliverables.js';
+
+// === Argument Parsing ===
+
+interface ParsedArgs {
+  type: string;
+  content?: string;
+  filePath?: string;
+}
+
+function parseArgs(argv: string[]): ParsedArgs {
+  const args: ParsedArgs = { type: '' };
+
+  for (let i = 2; i < argv.length; i++) {
+    const arg = argv[i];
+    const next = argv[i + 1];
+
+    if (arg === '--type' && next) {
+      args.type = next;
+      i++;
+    } else if (arg === '--content' && next) {
+      args.content = next;
+      i++;
+    } else if (arg === '--file-path' && next) {
+      args.filePath = next;
+      i++;
+    }
+  }
+
+  return args;
+}
+
+// === Queue Validation ===
+
+interface ValidationResult {
+  valid: boolean;
+  message?: string;
+}
+
+function validateQueueJson(content: string): ValidationResult {
+  try {
+    const parsed = JSON.parse(content) as unknown;
+
+    if (typeof parsed !== 'object' || parsed === null) {
+      return {
+        valid: false,
+        message: `Invalid queue structure: Expected an object. Got: ${typeof parsed}`,
+      };
+    }
+
+    const obj = parsed as Record<string, unknown>;
+
+    if (!('vulnerabilities' in obj)) {
+      return {
+        valid: false,
+        message: `Invalid queue structure: Missing 'vulnerabilities' property. Expected: {"vulnerabilities": [...]}`,
+      };
+    }
+
+    if (!Array.isArray(obj.vulnerabilities)) {
+      return {
+        valid: false,
+        message: `Invalid queue structure: 'vulnerabilities' must be an array. Expected: {"vulnerabilities": [...]}`,
+      };
+    }
+
+    return { valid: true };
+  } catch (error) {
+    return {
+      valid: false,
+      message: `Invalid JSON: ${error instanceof Error ? error.message : String(error)}`,
+    };
+  }
+}
+
+// === File Operations ===
+
+function saveDeliverableFile(targetDir: string, filename: string, content: string): string {
+  const deliverablesDir = join(targetDir, 'deliverables');
+  const filepath = join(deliverablesDir, filename);
+
+  try {
+    mkdirSync(deliverablesDir, { recursive: true });
+  } catch {
+    throw new Error(`Cannot create deliverables directory at ${deliverablesDir}`);
+  }
+
+  writeFileSync(filepath, content, 'utf8');
+  return filepath;
+}
+
+// === Main ===
+
+function main(): void {
+  const args = parseArgs(process.argv);
+
+  // 1. Validate --type
+  if (!args.type) {
+    console.log(JSON.stringify({ status: 'error', message: 'Missing required --type argument', retryable: false }));
+    process.exit(1);
+  }
+
+  const deliverableType = args.type as DeliverableType;
+  const filename = DELIVERABLE_FILENAMES[deliverableType];
+
+  if (!filename) {
+    console.log(
+      JSON.stringify({ status: 'error', message: `Unknown deliverable type: ${args.type}`, retryable: false }),
+    );
+    process.exit(1);
+  }
+
+  // 2. Resolve content from --content or --file-path
+  let content: string;
+
+  if (args.content) {
+    content = args.content;
+  } else if (args.filePath) {
+    // Path traversal protection: must resolve inside cwd
+    const cwd = process.cwd();
+    const resolved = resolve(cwd, args.filePath);
+    if (!resolved.startsWith(`${cwd}/`) && resolved !== cwd) {
+      console.log(
+        JSON.stringify({ status: 'error', message: `Path traversal detected: ${args.filePath}`, retryable: false }),
+      );
+      process.exit(1);
+    }
+
+    try {
+      content = readFileSync(resolved, 'utf8');
+    } catch (error) {
+      const msg = error instanceof Error ? error.message : String(error);
+      console.log(JSON.stringify({ status: 'error', message: `Failed to read file: ${msg}`, retryable: true }));
+      process.exit(1);
+    }
+  } else {
+    console.log(
+      JSON.stringify({
+        status: 'error',
+        message: 'Either --content or --file-path is required',
+        retryable: false,
+      }),
+    );
+    process.exit(1);
+  }
+
+  // 3. Validate queue types
+  let validated = false;
+  if (isQueueType(args.type)) {
+    const validation = validateQueueJson(content);
+    if (!validation.valid) {
+      console.log(JSON.stringify({ status: 'error', message: validation.message, retryable: true }));
+      process.exit(1);
+    }
+    validated = true;
+  }
+
+  // 4. Save the file
+  try {
+    const targetDir = process.cwd();
+    const filepath = saveDeliverableFile(targetDir, filename, content);
+    console.log(JSON.stringify({ status: 'success', filepath, validated }));
+  } catch (error) {
+    const msg = error instanceof Error ? error.message : String(error);
+    console.log(JSON.stringify({ status: 'error', message: `Failed to save: ${msg}`, retryable: true }));
+    process.exit(1);
+  }
+}
+
+main();
@@ -21,29 +21,20 @@
 * No Temporal dependencies - pure domain logic.
 */

-import type { ActivityLogger } from '../types/activity-logger.js';
-import { Result, ok, err, isErr } from '../types/result.js';
-import { ErrorCode, type PentestErrorType } from '../types/errors.js';
-import { PentestError } from './error-handling.js';
-import { isSpendingCapBehavior } from '../utils/billing-detection.js';
+import { type ClaudePromptResult, runClaudePrompt, validateAgentOutput } from '../ai/claude-executor.js';
+import type { AuditSession } from '../audit/index.js';
 import { AGENTS } from '../session-manager.js';
-import { loadPrompt } from './prompt-manager.js';
-import {
-  runClaudePrompt,
-  validateAgentOutput,
-  type ClaudePromptResult,
-} from '../ai/claude-executor.js';
-import {
-  createGitCheckpoint,
-  commitGitSuccess,
-  rollbackGitWorkspace,
-  getGitCommitHash,
-} from './git-manager.js';
-import { AuditSession } from '../audit/index.js';
-import type { AgentEndResult } from '../types/audit.js';
+import type { ActivityLogger } from '../types/activity-logger.js';
 import type { AgentName } from '../types/agents.js';
-import type { ConfigLoaderService } from './config-loader.js';
+import type { AgentEndResult } from '../types/audit.js';
+import { ErrorCode, type PentestErrorType } from '../types/errors.js';
 import type { AgentMetrics } from '../types/metrics.js';
+import { err, isErr, ok, type Result } from '../types/result.js';
+import { isSpendingCapBehavior } from '../utils/billing-detection.js';
+import type { ConfigLoaderService } from './config-loader.js';
+import { PentestError } from './error-handling.js';
+import { commitGitSuccess, createGitCheckpoint, getGitCommitHash, rollbackGitWorkspace } from './git-manager.js';
+import { loadPrompt } from './prompt-manager.js';

 /**
 * Input for agent execution.
@@ -94,7 +85,7 @@ export class AgentExecutionService {
    agentName: AgentName,
    input: AgentExecutionInput,
    auditSession: AuditSession,
-    logger: ActivityLogger
+    logger: ActivityLogger,
  ): Promise<Result<AgentEndResult, PentestError>> {
    const { webUrl, repoPath, configPath, pipelineTestingMode = false, attemptNumber } = input;

@@ -109,13 +100,7 @@ export class AgentExecutionService {
    const promptTemplate = AGENTS[agentName].promptTemplate;
    let prompt: string;
    try {
-      prompt = await loadPrompt(
-        promptTemplate,
-        { webUrl, repoPath },
-        distributedConfig,
-        pipelineTestingMode,
-        logger
-      );
+      prompt = await loadPrompt(promptTemplate, { webUrl, repoPath }, distributedConfig, pipelineTestingMode, logger);
    } catch (error) {
      const errorMessage = error instanceof Error ? error.message : String(error);
      return err(
@@ -124,8 +109,8 @@ export class AgentExecutionService {
          'prompt',
          false,
          { agentName, promptTemplate, originalError: errorMessage },
-          ErrorCode.PROMPT_LOAD_FAILED
-        )
+          ErrorCode.PROMPT_LOAD_FAILED,
+        ),
      );
    }

@@ -140,8 +125,8 @@ export class AgentExecutionService {
          'filesystem',
          false,
          { agentName, repoPath, originalError: errorMessage },
-          ErrorCode.GIT_CHECKPOINT_FAILED
-        )
+          ErrorCode.GIT_CHECKPOINT_FAILED,
+        ),
      );
    }

@@ -157,7 +142,7 @@ export class AgentExecutionService {
      agentName,
      auditSession,
      logger,
-      AGENTS[agentName].modelTier
+      AGENTS[agentName].modelTier,
    );

    // 6. Spending cap check - defense-in-depth
@@ -165,7 +150,8 @@ export class AgentExecutionService {
      const resultText = result.result || '';
      if (isSpendingCapBehavior(result.turns ?? 0, result.cost || 0, resultText)) {
        return this.failAgent(agentName, repoPath, auditSession, logger, {
-          attemptNumber, result,
+          attemptNumber,
+          result,
          rollbackReason: 'spending cap detected',
          errorMessage: `Spending cap likely reached: ${resultText.slice(0, 100)}`,
          errorCode: ErrorCode.SPENDING_CAP_REACHED,
@@ -179,7 +165,8 @@ export class AgentExecutionService {
    // 7. Handle execution failure
    if (!result.success) {
      return this.failAgent(agentName, repoPath, auditSession, logger, {
-        attemptNumber, result,
+        attemptNumber,
+        result,
        rollbackReason: 'execution failure',
        errorMessage: result.error || 'Agent execution failed',
        errorCode: ErrorCode.AGENT_EXECUTION_FAILED,
@@ -193,7 +180,8 @@ export class AgentExecutionService {
    const validationPassed = await validateAgentOutput(result, agentName, repoPath, logger);
    if (!validationPassed) {
      return this.failAgent(agentName, repoPath, auditSession, logger, {
-        attemptNumber, result,
+        attemptNumber,
+        result,
        rollbackReason: 'validation failure',
        errorMessage: `Agent ${agentName} failed output validation`,
        errorCode: ErrorCode.OUTPUT_VALIDATION_FAILED,
@@ -225,7 +213,7 @@ export class AgentExecutionService {
    repoPath: string,
    auditSession: AuditSession,
    logger: ActivityLogger,
-    opts: FailAgentOpts
+    opts: FailAgentOpts,
  ): Promise<Result<AgentEndResult, PentestError>> {
    await rollbackGitWorkspace(repoPath, opts.rollbackReason, logger);

@@ -239,15 +227,7 @@ export class AgentExecutionService {
    };
    await auditSession.endAgent(agentName, endResult);

-    return err(
-      new PentestError(
-        opts.errorMessage,
-        opts.category,
-        opts.retryable,
-        opts.context,
-        opts.errorCode
-      )
-    );
+    return err(new PentestError(opts.errorMessage, opts.category, opts.retryable, opts.context, opts.errorCode));
  }

  /**
@@ -267,7 +247,7 @@ export class AgentExecutionService {
    agentName: AgentName,
    input: AgentExecutionInput,
    auditSession: AuditSession,
-    logger: ActivityLogger
+    logger: ActivityLogger,
  ): Promise<AgentEndResult> {
    const result = await this.execute(agentName, input, auditSession, logger);
    if (isErr(result)) {
@@ -11,11 +11,11 @@
 * Pure service with no Temporal dependencies.
 */

-import { parseConfig, distributeConfig } from '../config-parser.js';
-import { PentestError } from './error-handling.js';
-import { Result, ok, err } from '../types/result.js';
-import { ErrorCode } from '../types/errors.js';
+import { distributeConfig, parseConfig } from '../config-parser.js';
 import type { DistributedConfig } from '../types/config.js';
+import { ErrorCode } from '../types/errors.js';
+import { err, ok, type Result } from '../types/result.js';
+import { PentestError } from './error-handling.js';

 /**
 * Service for loading and distributing configuration files.
@@ -52,8 +52,8 @@ export class ConfigLoaderService {
          'config',
          false,
          { configPath, originalError: errorMessage },
-          errorCode
-        )
+          errorCode,
+        ),
      );
    }
  }
@@ -64,9 +64,7 @@ export class ConfigLoaderService {
   * @param configPath - Optional path to the YAML configuration file
   * @returns Result containing DistributedConfig (or null) on success, PentestError on failure
   */
-  async loadOptional(
-    configPath: string | undefined
-  ): Promise<Result<DistributedConfig | null, PentestError>> {
+  async loadOptional(configPath: string | undefined): Promise<Result<DistributedConfig | null, PentestError>> {
    if (!configPath) {
      return ok(null);
    }
@@ -75,10 +75,7 @@ const containers = new Map<string, Container>();
 * @param sessionMetadata - Session metadata for audit paths
 * @returns Container instance for the workflow
 */
-export function getOrCreateContainer(
-  workflowId: string,
-  sessionMetadata: SessionMetadata
-): Container {
+export function getOrCreateContainer(workflowId: string, sessionMetadata: SessionMetadata): Container {
  let container = containers.get(workflowId);

  if (!container) {
@@ -4,16 +4,8 @@
 // it under the terms of the GNU Affero General Public License version 3
 // as published by the Free Software Foundation.

-import {
-  ErrorCode,
-  type PentestErrorType,
-  type PentestErrorContext,
-  type PromptErrorResult,
-} from '../types/errors.js';
-import {
-  matchesBillingApiPattern,
-  matchesBillingTextPattern,
-} from '../utils/billing-detection.js';
+import { ErrorCode, type PentestErrorContext, type PentestErrorType, type PromptErrorResult } from '../types/errors.js';
+import { matchesBillingApiPattern, matchesBillingTextPattern } from '../utils/billing-detection.js';

 export class PentestError extends Error {
  override name = 'PentestError' as const;
@@ -29,7 +21,7 @@ export class PentestError extends Error {
    type: PentestErrorType,
    retryable: boolean = false,
    context: PentestErrorContext = {},
-    code?: ErrorCode
+    code?: ErrorCode,
  ) {
    super(message);
    this.type = type;
@@ -42,18 +34,13 @@ export class PentestError extends Error {
  }
 }

-export function handlePromptError(
-  promptName: string,
-  error: Error
-): PromptErrorResult {
+export function handlePromptError(promptName: string, error: Error): PromptErrorResult {
  return {
    success: false,
-    error: new PentestError(
-      `Failed to load prompt '${promptName}': ${error.message}`,
-      'prompt',
-      false,
-      { promptName, originalError: error.message }
-    ),
+    error: new PentestError(`Failed to load prompt '${promptName}': ${error.message}`, 'prompt', false, {
+      promptName,
+      originalError: error.message,
+    }),
  };
 }

@@ -76,7 +63,6 @@ const RETRYABLE_PATTERNS = [
  'service unavailable',
  'bad gateway',
  // Claude API errors
-  'mcp server',
  'model unavailable',
  'service temporarily unavailable',
  'api error',
@@ -111,10 +97,7 @@ export function isRetryableError(error: Error): boolean {
 * Classifies errors by ErrorCode for reliable, code-based classification.
 * Used when error is a PentestError with a specific ErrorCode.
 */
-function classifyByErrorCode(
-  code: ErrorCode,
-  retryableFromError: boolean
-): { type: string; retryable: boolean } {
+function classifyByErrorCode(code: ErrorCode, retryableFromError: boolean): { type: string; retryable: boolean } {
  switch (code) {
    // Billing errors - retryable (wait for cap reset or credits added)
    case ErrorCode.SPENDING_CAP_REACHED:
@@ -206,49 +189,30 @@ export function classifyErrorForTemporal(error: unknown): { type: string; retrya
  }

  // Permission (403) - access won't be granted
-  if (
-    message.includes('permission') ||
-    message.includes('forbidden') ||
-    message.includes('403')
-  ) {
+  if (message.includes('permission') || message.includes('forbidden') || message.includes('403')) {
    return { type: 'PermissionError', retryable: false };
  }

  // === OUTPUT VALIDATION ERRORS (Retryable) ===
  // Agent didn't produce expected deliverables - retry may succeed
  // IMPORTANT: Must come BEFORE generic 'validation' check below
-  if (
-    message.includes('failed output validation') ||
-    message.includes('output validation failed')
-  ) {
+  if (message.includes('failed output validation') || message.includes('output validation failed')) {
    return { type: 'OutputValidationError', retryable: true };
  }

  // Invalid Request (400) - malformed request is permanent
  // Note: Checked AFTER billing and AFTER output validation
-  if (
-    message.includes('invalid_request_error') ||
-    message.includes('malformed') ||
-    message.includes('validation')
-  ) {
+  if (message.includes('invalid_request_error') || message.includes('malformed') || message.includes('validation')) {
    return { type: 'InvalidRequestError', retryable: false };
  }

  // Request Too Large (413) - won't fit no matter how many retries
-  if (
-    message.includes('request_too_large') ||
-    message.includes('too large') ||
-    message.includes('413')
-  ) {
+  if (message.includes('request_too_large') || message.includes('too large') || message.includes('413')) {
    return { type: 'RequestTooLargeError', retryable: false };
  }

  // Configuration errors - missing files need manual fix
-  if (
-    message.includes('enoent') ||
-    message.includes('no such file') ||
-    message.includes('cli not installed')
-  ) {
+  if (message.includes('enoent') || message.includes('no such file') || message.includes('cli not installed')) {
    return { type: 'ConfigurationError', retryable: false };
  }

@@ -13,13 +13,9 @@
 * No Temporal dependencies - this is pure business logic.
 */

-import {
-  validateQueueSafe,
-  type VulnType,
-  type ExploitationDecision,
-} from './queue-validation.js';
-import { isOk } from '../types/result.js';
 import type { ActivityLogger } from '../types/activity-logger.js';
+import { isOk } from '../types/result.js';
+import { type ExploitationDecision, type VulnType, validateQueueSafe } from './queue-validation.js';

 /**
 * Service for checking exploitation queue decisions.
@@ -46,7 +42,7 @@ export class ExploitationCheckerService {
    if (isOk(result)) {
      const decision = result.value;
      logger.info(
-        `${vulnType}: ${decision.shouldExploit ? `${decision.vulnerabilityCount} vulnerabilities found` : 'no vulnerabilities, skipping exploitation'}`
+        `${vulnType}: ${decision.shouldExploit ? `${decision.vulnerabilityCount} vulnerabilities found` : 'no vulnerabilities, skipping exploitation'}`,
      );
      return decision;
    }
@@ -5,9 +5,9 @@
 // as published by the Free Software Foundation.

 import { $ } from 'zx';
-import { PentestError } from './error-handling.js';
-import { ErrorCode } from '../types/errors.js';
 import type { ActivityLogger } from '../types/activity-logger.js';
+import { ErrorCode } from '../types/errors.js';
+import { PentestError } from './error-handling.js';

 /**
 * Check if a directory is a git repository.
@@ -31,15 +31,8 @@ interface GitOperationResult {
 /**
 * Get list of changed files from git status --porcelain output
 */
-async function getChangedFiles(
-  sourceDir: string,
-  operationDescription: string
-): Promise<string[]> {
-  const status = await executeGitCommandWithRetry(
-    ['git', 'status', '--porcelain'],
-    sourceDir,
-    operationDescription
-  );
+async function getChangedFiles(sourceDir: string, operationDescription: string): Promise<string[]> {
+  const status = await executeGitCommandWithRetry(['git', 'status', '--porcelain'], sourceDir, operationDescription);
  return status.stdout
    .trim()
    .split('\n')
@@ -55,14 +48,15 @@ function logChangeSummary(
  messageWithoutChanges: string,
  logger: ActivityLogger,
  level: 'info' | 'warn' = 'info',
-  maxToShow: number = 5
+  maxToShow: number = 5,
 ): void {
  if (changes.length > 0) {
    const msg = messageWithChanges.replace('{count}', String(changes.length));
-    const fileList = changes.slice(0, maxToShow).map((c) => `  ${c}`).join(', ');
-    const suffix = changes.length > maxToShow
-      ? ` ... and ${changes.length - maxToShow} more files`
-      : '';
+    const fileList = changes
+      .slice(0, maxToShow)
+      .map((c) => `  ${c}`)
+      .join(', ');
+    const suffix = changes.length > maxToShow ? ` ... and ${changes.length - maxToShow} more files` : '';
    logger[level](`${msg} ${fileList}${suffix}`);
  } else {
    logger[level](messageWithoutChanges);
@@ -101,7 +95,7 @@ class GitSemaphore {
    if (!this.running && this.queue.length > 0) {
      this.running = true;
      const resolve = this.queue.shift();
-      resolve!();
+      resolve?.();
    }
  }
 }
@@ -125,7 +119,7 @@ export async function executeGitCommandWithRetry(
  commandArgs: string[],
  sourceDir: string,
  description: string,
-  maxRetries: number = 5
+  maxRetries: number = 5,
 ): Promise<{ stdout: string; stderr: string }> {
  await gitSemaphore.acquire();

@@ -139,11 +133,11 @@ export async function executeGitCommandWithRetry(
        const errMsg = error instanceof Error ? error.message : String(error);

        if (isGitLockError(errMsg) && attempt < maxRetries) {
-          const delay = Math.pow(2, attempt - 1) * 1000;
+          const delay = 2 ** (attempt - 1) * 1000;
          // executeGitCommandWithRetry is also called outside activity context
          // (e.g., from resume logic), so we use console.warn as a fallback here
          console.warn(
-            `Git lock conflict during ${description} (attempt ${attempt}/${maxRetries}). Retrying in ${delay}ms...`
+            `Git lock conflict during ${description} (attempt ${attempt}/${maxRetries}). Retrying in ${delay}ms...`,
          );
          await new Promise((resolve) => setTimeout(resolve, delay));
          continue;
@@ -157,7 +151,7 @@ export async function executeGitCommandWithRetry(
      'filesystem',
      true, // Retryable - transient git lock issues
      { maxRetries, description },
-      ErrorCode.GIT_CHECKPOINT_FAILED
+      ErrorCode.GIT_CHECKPOINT_FAILED,
    );
  } finally {
    gitSemaphore.release();
@@ -168,7 +162,7 @@ export async function executeGitCommandWithRetry(
 export async function rollbackGitWorkspace(
  sourceDir: string,
  reason: string = 'retry preparation',
-  logger: ActivityLogger
+  logger: ActivityLogger,
 ): Promise<GitOperationResult> {
  // Skip git operations if not a git repository
  if (!(await isGitRepository(sourceDir))) {
@@ -180,16 +174,8 @@ export async function rollbackGitWorkspace(
  try {
    const changes = await getChangedFiles(sourceDir, 'status check for rollback');

-    await executeGitCommandWithRetry(
-      ['git', 'reset', '--hard', 'HEAD'],
-      sourceDir,
-      'hard reset for rollback'
-    );
-    await executeGitCommandWithRetry(
-      ['git', 'clean', '-fd'],
-      sourceDir,
-      'cleaning untracked files for rollback'
-    );
+    await executeGitCommandWithRetry(['git', 'reset', '--hard', 'HEAD'], sourceDir, 'hard reset for rollback');
+    await executeGitCommandWithRetry(['git', 'clean', '-fd'], sourceDir, 'cleaning untracked files for rollback');

    logChangeSummary(
      changes,
@@ -197,7 +183,7 @@ export async function rollbackGitWorkspace(
      'Rollback completed - no changes to remove',
      logger,
      'info',
-      3
+      3,
    );
    return { success: true };
  } catch (error) {
@@ -210,7 +196,7 @@ export async function rollbackGitWorkspace(
        'filesystem',
        false, // Non-retryable - rollback is best-effort cleanup
        { sourceDir, reason },
-        ErrorCode.GIT_ROLLBACK_FAILED
+        ErrorCode.GIT_ROLLBACK_FAILED,
      ),
    };
  }
@@ -221,7 +207,7 @@ export async function createGitCheckpoint(
  sourceDir: string,
  description: string,
  attempt: number,
-  logger: ActivityLogger
+  logger: ActivityLogger,
 ): Promise<GitOperationResult> {
  // Skip git operations if not a git repository
  if (!(await isGitRepository(sourceDir))) {
@@ -248,7 +234,7 @@ export async function createGitCheckpoint(
    await executeGitCommandWithRetry(
      ['git', 'commit', '-m', `📍 Checkpoint: ${description} (attempt ${attempt})`, '--allow-empty'],
      sourceDir,
-      'creating commit'
+      'creating commit',
    );

    // 4. Log result
@@ -268,7 +254,7 @@ export async function createGitCheckpoint(
 export async function commitGitSuccess(
  sourceDir: string,
  description: string,
-  logger: ActivityLogger
+  logger: ActivityLogger,
 ): Promise<GitOperationResult> {
  // Skip git operations if not a git repository
  if (!(await isGitRepository(sourceDir))) {
@@ -280,22 +266,18 @@ export async function commitGitSuccess(
  try {
    const changes = await getChangedFiles(sourceDir, 'status check for success commit');

-    await executeGitCommandWithRetry(
-      ['git', 'add', '-A'],
-      sourceDir,
-      'staging changes for success commit'
-    );
+    await executeGitCommandWithRetry(['git', 'add', '-A'], sourceDir, 'staging changes for success commit');
    await executeGitCommandWithRetry(
      ['git', 'commit', '-m', `✅ ${description}: completed successfully`, '--allow-empty'],
      sourceDir,
-      'creating success commit'
+      'creating success commit',
    );

    logChangeSummary(
      changes,
      'Success commit created with {count} file changes:',
      'Empty success commit created (agent made no file changes)',
-      logger
+      logger,
    );
    return { success: true };
  } catch (error) {
@@ -11,13 +11,12 @@
 * Services are pure domain logic with no Temporal dependencies.
 */

-export { Container, getOrCreateContainer, removeContainer } from './container.js';
-export type { ContainerDependencies } from './container.js';
+export type { AgentExecutionInput } from './agent-execution.js';
+export { AgentExecutionService } from './agent-execution.js';

 export { ConfigLoaderService } from './config-loader.js';
+export type { ContainerDependencies } from './container.js';
+export { Container, getOrCreateContainer, removeContainer } from './container.js';
 export { ExploitationCheckerService } from './exploitation-checker.js';
-export { AgentExecutionService } from './agent-execution.js';
-export type { AgentExecutionInput } from './agent-execution.js';
-
-export { assembleFinalReport, injectModelIntoReport } from './reporting.js';
 export { loadPrompt } from './prompt-manager.js';
+export { assembleFinalReport, injectModelIntoReport } from './reporting.js';
@@ -15,24 +15,31 @@
 * 1. Repository path exists and contains .git
 * 2. Config file parses and validates (if provided)
 * 3. Credentials validate via Claude Agent SDK query (API key, OAuth, Bedrock, Vertex AI, or router mode)
+ * 4. Target URL is reachable from the container (DNS + HTTP)
 */

-import fs from 'fs/promises';
-import { query } from '@anthropic-ai/claude-agent-sdk';
+import { lookup } from 'node:dns/promises';
+import fs from 'node:fs/promises';
+import http from 'node:http';
+import https from 'node:https';
 import type { SDKAssistantMessageError } from '@anthropic-ai/claude-agent-sdk';
-import { PentestError, isRetryableError } from './error-handling.js';
-import { ErrorCode } from '../types/errors.js';
-import { type Result, ok, err } from '../types/result.js';
-import { parseConfig } from '../config-parser.js';
+import { query } from '@anthropic-ai/claude-agent-sdk';
 import { resolveModel } from '../ai/models.js';
+import { parseConfig } from '../config-parser.js';
 import type { ActivityLogger } from '../types/activity-logger.js';
+import { ErrorCode } from '../types/errors.js';
+import { err, ok, type Result } from '../types/result.js';
+import { isRetryableError, PentestError } from './error-handling.js';
+
+const TARGET_URL_TIMEOUT_MS = 10_000;
+
+function isLoopbackAddress(address: string): boolean {
+  return address === '127.0.0.1' || address === '::1' || address === '0.0.0.0';
+}

 // === Repository Validation ===

-async function validateRepo(
-  repoPath: string,
-  logger: ActivityLogger
-): Promise<Result<void, PentestError>> {
+async function validateRepo(repoPath: string, logger: ActivityLogger): Promise<Result<void, PentestError>> {
  logger.info('Checking repository path...', { repoPath });

  // 1. Check repo directory exists
@@ -45,8 +52,8 @@ async function validateRepo(
          'config',
          false,
          { repoPath },
-          ErrorCode.REPO_NOT_FOUND
-        )
+          ErrorCode.REPO_NOT_FOUND,
+        ),
      );
    }
  } catch {
@@ -56,8 +63,8 @@ async function validateRepo(
        'config',
        false,
        { repoPath },
-        ErrorCode.REPO_NOT_FOUND
-      )
+        ErrorCode.REPO_NOT_FOUND,
+      ),
    );
  }

@@ -71,8 +78,8 @@ async function validateRepo(
          'config',
          false,
          { repoPath },
-          ErrorCode.REPO_NOT_FOUND
-        )
+          ErrorCode.REPO_NOT_FOUND,
+        ),
      );
    }
  } catch {
@@ -82,8 +89,8 @@ async function validateRepo(
        'config',
        false,
        { repoPath },
-        ErrorCode.REPO_NOT_FOUND
-      )
+        ErrorCode.REPO_NOT_FOUND,
+      ),
    );
  }

@@ -93,10 +100,7 @@ async function validateRepo(

 // === Config Validation ===

-async function validateConfig(
-  configPath: string,
-  logger: ActivityLogger
-): Promise<Result<void, PentestError>> {
+async function validateConfig(configPath: string, logger: ActivityLogger): Promise<Result<void, PentestError>> {
  logger.info('Validating configuration file...', { configPath });

  try {
@@ -114,8 +118,8 @@ async function validateConfig(
        'config',
        false,
        { configPath },
-        ErrorCode.CONFIG_VALIDATION_FAILED
-      )
+        ErrorCode.CONFIG_VALIDATION_FAILED,
+      ),
    );
  }
 }
@@ -123,43 +127,60 @@ async function validateConfig(
 // === Credential Validation ===

 /** Map SDK error type to a human-readable preflight PentestError. */
-function classifySdkError(
-  sdkError: SDKAssistantMessageError,
-  authType: string
-): Result<void, PentestError> {
+function classifySdkError(sdkError: SDKAssistantMessageError, authType: string): Result<void, PentestError> {
  switch (sdkError) {
    case 'authentication_failed':
-      return err(new PentestError(
-        `Invalid ${authType}. Check your credentials in .env and try again.`,
-        'config', false, { authType, sdkError }, ErrorCode.AUTH_FAILED
-      ));
+      return err(
+        new PentestError(
+          `Invalid ${authType}. Check your credentials in .env and try again.`,
+          'config',
+          false,
+          { authType, sdkError },
+          ErrorCode.AUTH_FAILED,
+        ),
+      );
    case 'billing_error':
-      return err(new PentestError(
-        `Anthropic account has a billing issue. Add credits or check your billing dashboard.`,
-        'billing', true, { authType, sdkError }, ErrorCode.BILLING_ERROR
-      ));
+      return err(
+        new PentestError(
+          `Anthropic account has a billing issue. Add credits or check your billing dashboard.`,
+          'billing',
+          true,
+          { authType, sdkError },
+          ErrorCode.BILLING_ERROR,
+        ),
+      );
    case 'rate_limit':
-      return err(new PentestError(
-        `Anthropic rate limit or spending cap reached. Wait a few minutes and try again.`,
-        'billing', true, { authType, sdkError }, ErrorCode.BILLING_ERROR
-      ));
+      return err(
+        new PentestError(
+          `Anthropic rate limit or spending cap reached. Wait a few minutes and try again.`,
+          'billing',
+          true,
+          { authType, sdkError },
+          ErrorCode.BILLING_ERROR,
+        ),
+      );
    case 'server_error':
-      return err(new PentestError(
-        `Anthropic API is temporarily unavailable. Try again shortly.`,
-        'network', true, { authType, sdkError }
-      ));
+      return err(
+        new PentestError(`Anthropic API is temporarily unavailable. Try again shortly.`, 'network', true, {
+          authType,
+          sdkError,
+        }),
+      );
    default:
-      return err(new PentestError(
-        `${authType} validation failed unexpectedly. Check your credentials in .env.`,
-        'config', false, { authType, sdkError }, ErrorCode.AUTH_FAILED
-      ));
+      return err(
+        new PentestError(
+          `${authType} validation failed unexpectedly. Check your credentials in .env.`,
+          'config',
+          false,
+          { authType, sdkError },
+          ErrorCode.AUTH_FAILED,
+        ),
+      );
  }
 }

 /** Validate credentials via a minimal Claude Agent SDK query. */
-async function validateCredentials(
-  logger: ActivityLogger
-): Promise<Result<void, PentestError>> {
+async function validateCredentials(logger: ActivityLogger): Promise<Result<void, PentestError>> {
  // 1. Custom base URL — validate endpoint is reachable via SDK query
  if (process.env.ANTHROPIC_BASE_URL) {
    const baseUrl = process.env.ANTHROPIC_BASE_URL;
@@ -185,16 +206,22 @@ async function validateCredentials(
          'network',
          false,
          { baseUrl },
-          ErrorCode.AUTH_FAILED
-        )
+          ErrorCode.AUTH_FAILED,
+        ),
      );
    }
  }

  // 2. Bedrock mode — validate required AWS credentials are present
  if (process.env.CLAUDE_CODE_USE_BEDROCK === '1') {
-    const required = ['AWS_REGION', 'AWS_BEARER_TOKEN_BEDROCK', 'ANTHROPIC_SMALL_MODEL', 'ANTHROPIC_MEDIUM_MODEL', 'ANTHROPIC_LARGE_MODEL'];
-    const missing = required.filter(v => !process.env[v]);
+    const required = [
+      'AWS_REGION',
+      'AWS_BEARER_TOKEN_BEDROCK',
+      'ANTHROPIC_SMALL_MODEL',
+      'ANTHROPIC_MEDIUM_MODEL',
+      'ANTHROPIC_LARGE_MODEL',
+    ];
+    const missing = required.filter((v) => !process.env[v]);
    if (missing.length > 0) {
      return err(
        new PentestError(
@@ -202,8 +229,8 @@ async function validateCredentials(
          'config',
          false,
          { missing },
-          ErrorCode.AUTH_FAILED
-        )
+          ErrorCode.AUTH_FAILED,
+        ),
      );
    }
    logger.info('Bedrock credentials OK');
@@ -212,8 +239,14 @@ async function validateCredentials(

  // 3. Vertex AI mode — validate required GCP credentials are present
  if (process.env.CLAUDE_CODE_USE_VERTEX === '1') {
-    const required = ['CLOUD_ML_REGION', 'ANTHROPIC_VERTEX_PROJECT_ID', 'ANTHROPIC_SMALL_MODEL', 'ANTHROPIC_MEDIUM_MODEL', 'ANTHROPIC_LARGE_MODEL'];
-    const missing = required.filter(v => !process.env[v]);
+    const required = [
+      'CLOUD_ML_REGION',
+      'ANTHROPIC_VERTEX_PROJECT_ID',
+      'ANTHROPIC_SMALL_MODEL',
+      'ANTHROPIC_MEDIUM_MODEL',
+      'ANTHROPIC_LARGE_MODEL',
+    ];
+    const missing = required.filter((v) => !process.env[v]);
    if (missing.length > 0) {
      return err(
        new PentestError(
@@ -221,8 +254,8 @@ async function validateCredentials(
          'config',
          false,
          { missing },
-          ErrorCode.AUTH_FAILED
-        )
+          ErrorCode.AUTH_FAILED,
+        ),
      );
    }
    // Validate service account credentials file is accessible
@@ -234,8 +267,8 @@ async function validateCredentials(
          'config',
          false,
          {},
-          ErrorCode.AUTH_FAILED
-        )
+          ErrorCode.AUTH_FAILED,
+        ),
      );
    }
    try {
@@ -247,8 +280,8 @@ async function validateCredentials(
          'config',
          false,
          { credPath },
-          ErrorCode.AUTH_FAILED
-        )
+          ErrorCode.AUTH_FAILED,
+        ),
      );
    }
    logger.info('Vertex AI credentials OK');
@@ -263,8 +296,8 @@ async function validateCredentials(
        'config',
        false,
        {},
-        ErrorCode.AUTH_FAILED
-      )
+        ErrorCode.AUTH_FAILED,
+      ),
    );
  }

@@ -296,8 +329,113 @@ async function validateCredentials(
        retryable ? 'network' : 'config',
        retryable,
        { authType },
-        retryable ? undefined : ErrorCode.AUTH_FAILED
-      )
+        retryable ? undefined : ErrorCode.AUTH_FAILED,
+      ),
+    );
+  }
+}
+
+// === Target URL Validation ===
+
+/** HTTP HEAD with TLS verification disabled — we check reachability, not certificate validity. */
+function httpHead(url: string, timeoutMs: number): Promise<number> {
+  return new Promise((resolve, reject) => {
+    const parsed = new URL(url);
+    const isHttps = parsed.protocol === 'https:';
+    const transport = isHttps ? https : http;
+
+    const req = transport.request(
+      url,
+      {
+        method: 'HEAD',
+        timeout: timeoutMs,
+        ...(isHttps && { rejectUnauthorized: false }),
+      },
+      (res) => {
+        res.resume();
+        resolve(res.statusCode ?? 0);
+      },
+    );
+
+    req.on('timeout', () => {
+      req.destroy();
+      reject(new Error(`Connection timed out after ${timeoutMs}ms`));
+    });
+    req.on('error', reject);
+    req.end();
+  });
+}
+
+/** Check that the target URL is reachable from inside the container. */
+async function validateTargetUrl(targetUrl: string, logger: ActivityLogger): Promise<Result<void, PentestError>> {
+  logger.info('Checking target URL reachability...', { targetUrl });
+
+  // 1. Parse URL
+  let parsed: URL;
+  try {
+    parsed = new URL(targetUrl);
+  } catch {
+    return err(
+      new PentestError(
+        `Invalid target URL: ${targetUrl}`,
+        'config',
+        false,
+        { targetUrl },
+        ErrorCode.TARGET_UNREACHABLE,
+      ),
+    );
+  }
+
+  // 2. DNS lookup — detect loopback addresses early for a better hint
+  const hostname = parsed.hostname;
+  let resolvedAddress: string | undefined;
+  try {
+    const result = await lookup(hostname);
+    resolvedAddress = result.address;
+  } catch {
+    return err(
+      new PentestError(
+        `Target URL ${targetUrl} is not reachable. Verify the URL is correct and the site is up.`,
+        'network',
+        false,
+        { targetUrl, hostname },
+        ErrorCode.TARGET_UNREACHABLE,
+      ),
+    );
+  }
+
+  // 3. HTTP reachability check
+  try {
+    await httpHead(targetUrl, TARGET_URL_TIMEOUT_MS);
+
+    logger.info('Target URL OK');
+    return ok(undefined);
+  } catch (error) {
+    const isLoopback = isLoopbackAddress(resolvedAddress);
+    const detail = error instanceof Error ? error.message : String(error);
+
+    if (isLoopback) {
+      const suggestion = targetUrl.replace(hostname, 'host.docker.internal');
+      return err(
+        new PentestError(
+          `Target URL ${targetUrl} resolves to ${resolvedAddress} (loopback) and is not reachable. ` +
+            `For local services, use host.docker.internal instead of ${hostname} (e.g., ${suggestion})`,
+          'network',
+          false,
+          { targetUrl, resolvedAddress, hostname },
+          ErrorCode.TARGET_UNREACHABLE,
+        ),
+      );
+    }
+
+    return err(
+      new PentestError(
+        `Target URL ${targetUrl} is not reachable: ${detail}`,
+        'network',
+        false,
+        { targetUrl, resolvedAddress },
+        ErrorCode.TARGET_UNREACHABLE,
+      ),
    );
  }
 }
@@ -310,13 +448,15 @@ async function validateCredentials(
 * 1. Repository path exists and contains .git
 * 2. Config file parses and validates (if configPath provided)
 * 3. Credentials validate (API key, OAuth, or router mode)
+ * 4. Target URL is reachable from the container
 *
 * Returns on first failure.
 */
 export async function runPreflightChecks(
+  targetUrl: string,
  repoPath: string,
  configPath: string | undefined,
-  logger: ActivityLogger
+  logger: ActivityLogger,
 ): Promise<Result<void, PentestError>> {
  // 1. Repository check (free — filesystem only)
  const repoResult = await validateRepo(repoPath, logger);
@@ -338,6 +478,12 @@ export async function runPreflightChecks(
    return credResult;
  }

+  // 4. Target URL reachability check (cheap — 1 HTTP round-trip)
+  const urlResult = await validateTargetUrl(targetUrl, logger);
+  if (!urlResult.ok) {
+    return urlResult;
+  }
+
  logger.info('All preflight checks passed');
  return ok(undefined);
 }
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				Run: `save-deliverable --type CODE_ANALYSIS --content 'Pre-recon analysis complete'`. Then say "Done".
				`@@ -0,0 +1 @@`
				Run: `save-deliverable --type RECON --content 'Reconnaissance analysis complete'`. Then say "Done".