Add env-configurable timeout for proxy providers

feat: seed governance config and responses routing
2026-04-03 05:30:11 +02:00 · 2025-10-21 14:26:34 +02:00 · 2025-10-18 15:52:59 +02:00
428 changed files with 80702 additions and 12896 deletions
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,48 @@
+---
+name: 🐛 Bug Report
+about: Create a report to help us improve FuzzForge
+title: "[BUG] "
+labels: bug
+assignees: ''
+---
+
+## Description
+A clear and concise description of the bug you encountered.
+
+## Environment
+Please provide details about your environment:
+- **OS**: (e.g., macOS 14.0, Ubuntu 22.04, Windows 11)
+- **Python version**: (e.g., 3.9.7)
+- **Docker version**: (e.g., 24.0.6)
+- **FuzzForge version**: (e.g., 0.6.0)
+
+## Steps to Reproduce
+Clear steps to recreate the issue:
+
+1. Go to '...'
+2. Run command '...'
+3. Click on '...'
+4. See error
+
+## Expected Behavior
+A clear and concise description of what should happen.
+
+## Actual Behavior
+A clear and concise description of what actually happens.
+
+## Logs
+Please include relevant error messages and stack traces:
+
+```
+Paste logs here
+```
+
+## Screenshots
+If applicable, add screenshots to help explain your problem.
+
+## Additional Context
+Add any other context about the problem here (workflow used, specific target, configuration, etc.).
+
+---
+
+💬 **Need help?** Join our [Discord Community](https://discord.com/invite/acqv9FVG) for real-time support.
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,8 @@
+blank_issues_enabled: false
+contact_links:
+  - name: 💬 Community Discord
+    url: https://discord.com/invite/acqv9FVG
+    about: Join our Discord to discuss ideas, workflows, and security research with the community.
+  - name: 📖 Documentation
+    url: https://github.com/FuzzingLabs/fuzzforge_ai/tree/main/docs
+    about: Check our documentation for guides, tutorials, and API reference.
--- a/.github/ISSUE_TEMPLATE/feature_request.md
+++ b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,38 @@
+---
+name: ✨ Feature Request
+about: Suggest an idea for FuzzForge
+title: "[FEATURE] "
+labels: enhancement
+assignees: ''
+---
+
+## Use Case
+Why is this feature needed? Describe the problem you're trying to solve or the improvement you'd like to see.
+
+## Proposed Solution
+How should it work? Describe your ideal solution in detail.
+
+## Alternatives
+What other approaches have you considered? List any alternative solutions or features you've thought about.
+
+## Implementation
+**(Optional)** Do you have any technical considerations or implementation ideas?
+
+## Category
+What area of FuzzForge would this feature enhance?
+
+- [ ] 🤖 AI Agents for Security
+- [ ] 🛠 Workflow Automation
+- [ ] 📈 Vulnerability Research
+- [ ] 🔗 Fuzzer Integration
+- [ ] 🌐 Community Marketplace
+- [ ] 🔒 Enterprise Features
+- [ ] 📚 Documentation
+- [ ] 🎯 Other
+
+## Additional Context
+Add any other context, screenshots, references, or examples about the feature request here.
+
+---
+
+💬 **Want to discuss this idea?** Join our [Discord Community](https://discord.com/invite/acqv9FVG) to collaborate with other contributors!
--- a/.github/ISSUE_TEMPLATE/workflow_submission.md
+++ b/.github/ISSUE_TEMPLATE/workflow_submission.md
@@ -0,0 +1,67 @@
+---
+name: 🔄 Workflow Submission
+about: Contribute a security workflow or module to the FuzzForge community
+title: "[WORKFLOW] "
+labels: workflow, community
+assignees: ''
+---
+
+## Workflow Name
+Provide a short, descriptive name for your workflow.
+
+## Description
+Explain what this workflow does and what security problems it solves.
+
+## Category
+What type of security workflow is this?
+
+- [ ] 🛡️ **Security Assessment** - Static analysis, vulnerability scanning
+- [ ] 🔍 **Secret Detection** - Credential and secret scanning
+- [ ] 🎯 **Fuzzing** - Dynamic testing and fuzz testing
+- [ ] 🔄 **Reverse Engineering** - Binary analysis and decompilation
+- [ ] 🌐 **Infrastructure Security** - Container, cloud, network security
+- [ ] 🔒 **Penetration Testing** - Offensive security testing
+- [ ] 📋 **Other** - Please describe
+
+## Files
+Please attach or provide links to your workflow files:
+
+- [ ] `workflow.py` - Main Temporal flow implementation
+- [ ] `Dockerfile` - Container definition
+- [ ] `metadata.yaml` - Workflow metadata
+- [ ] Test files or examples
+- [ ] Documentation
+
+## Testing
+How did you test this workflow? Please describe:
+
+- **Test targets used**: (e.g., vulnerable_app, custom test cases)
+- **Expected outputs**: (e.g., SARIF format, specific vulnerabilities detected)
+- **Validation results**: (e.g., X vulnerabilities found, Y false positives)
+
+## SARIF Compliance
+- [ ] My workflow outputs results in SARIF format
+- [ ] Results include severity levels and descriptions
+- [ ] Code flow information is provided where applicable
+
+## Security Guidelines
+- [ ] This workflow focuses on **defensive security** purposes only
+- [ ] I have not included any malicious tools or capabilities
+- [ ] All secrets/credentials are parameterized (no hardcoded values)
+- [ ] I have followed responsible disclosure practices
+
+## Registry Integration
+Have you updated the workflow registry?
+
+- [ ] Added import statement to `backend/toolbox/workflows/registry.py`
+- [ ] Added registry entry with proper metadata
+- [ ] Tested workflow registration and deployment
+
+## Additional Notes
+Anything else the maintainers should know about this workflow?
+
+---
+
+🚀 **Thank you for contributing to FuzzForge!** Your workflow will help the security community automate and scale their testing efforts.
+
+💬 **Questions?** Join our [Discord Community](https://discord.com/invite/acqv9FVG) to discuss your contribution!
--- a/.github/workflows/benchmark.yml
+++ b/.github/workflows/benchmark.yml
@@ -0,0 +1,165 @@
+name: Benchmarks
+
+on:
+  # Disabled automatic runs - benchmarks not ready for CI/CD yet
+  # schedule:
+  #   - cron: '0 2 * * *'  # 2 AM UTC every day
+
+  # Allow manual trigger for testing
+  workflow_dispatch:
+    inputs:
+      compare_with:
+        description: 'Baseline commit to compare against (optional)'
+        required: false
+        default: ''
+
+  # pull_request:
+  #   paths:
+  #     - 'backend/benchmarks/**'
+  #     - 'backend/toolbox/modules/**'
+  #     - '.github/workflows/benchmark.yml'
+
+jobs:
+  benchmark:
+    name: Run Benchmarks
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0  # Fetch all history for comparison
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+
+      - name: Install system dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential
+
+      - name: Install Python dependencies
+        working-directory: ./backend
+        run: |
+          python -m pip install --upgrade pip
+          pip install -e ".[dev]"
+          pip install pytest pytest-asyncio pytest-benchmark pytest-benchmark[histogram]
+          pip install -e ../sdk  # Install SDK for benchmarks
+
+      - name: Run benchmarks
+        working-directory: ./backend
+        run: |
+          pytest benchmarks/ \
+            -v \
+            --benchmark-only \
+            --benchmark-json=benchmark-results.json \
+            --benchmark-histogram=benchmark-histogram
+
+      - name: Store benchmark results
+        uses: actions/upload-artifact@v4
+        with:
+          name: benchmark-results-${{ github.run_number }}
+          path: |
+            backend/benchmark-results.json
+            backend/benchmark-histogram.svg
+
+      - name: Download baseline benchmarks
+        if: github.event_name == 'pull_request'
+        uses: dawidd6/action-download-artifact@v3
+        continue-on-error: true
+        with:
+          workflow: benchmark.yml
+          branch: ${{ github.base_ref }}
+          name: benchmark-results-*
+          path: ./baseline
+          search_artifacts: true
+
+      - name: Compare with baseline
+        if: github.event_name == 'pull_request' && hashFiles('baseline/benchmark-results.json') != ''
+        run: |
+          python -c "
+          import json
+          import sys
+
+          with open('backend/benchmark-results.json') as f:
+              current = json.load(f)
+
+          with open('baseline/benchmark-results.json') as f:
+              baseline = json.load(f)
+
+          print('\\n## Benchmark Comparison\\n')
+          print('| Benchmark | Current | Baseline | Change |')
+          print('|-----------|---------|----------|--------|')
+
+          regressions = []
+
+          for bench in current['benchmarks']:
+              name = bench['name']
+              current_time = bench['stats']['mean']
+
+              # Find matching baseline
+              baseline_bench = next((b for b in baseline['benchmarks'] if b['name'] == name), None)
+              if baseline_bench:
+                  baseline_time = baseline_bench['stats']['mean']
+                  change = ((current_time - baseline_time) / baseline_time) * 100
+
+                  print(f'| {name} | {current_time:.4f}s | {baseline_time:.4f}s | {change:+.2f}% |')
+
+                  # Flag regressions > 10%
+                  if change > 10:
+                      regressions.append((name, change))
+              else:
+                  print(f'| {name} | {current_time:.4f}s | N/A | NEW |')
+
+          if regressions:
+              print('\\n⚠️  **Performance Regressions Detected:**')
+              for name, change in regressions:
+                  print(f'- {name}: +{change:.2f}%')
+              sys.exit(1)
+          else:
+              print('\\n✅ No significant performance regressions detected')
+          "
+
+      - name: Comment PR with results
+        if: github.event_name == 'pull_request'
+        uses: actions/github-script@v7
+        with:
+          script: |
+            const fs = require('fs');
+            const results = JSON.parse(fs.readFileSync('backend/benchmark-results.json', 'utf8'));
+
+            let body = '## Benchmark Results\\n\\n';
+            body += '| Category | Benchmark | Mean Time | Std Dev |\\n';
+            body += '|----------|-----------|-----------|---------|\\n';
+
+            for (const bench of results.benchmarks) {
+              const group = bench.group || 'ungrouped';
+              const name = bench.name.split('::').pop();
+              const mean = bench.stats.mean.toFixed(4);
+              const stddev = bench.stats.stddev.toFixed(4);
+              body += `| ${group} | ${name} | ${mean}s | ${stddev}s |\\n`;
+            }
+
+            body += '\\n📊 Full benchmark results available in artifacts.';
+
+            github.rest.issues.createComment({
+              issue_number: context.issue.number,
+              owner: context.repo.owner,
+              repo: context.repo.repo,
+              body: body
+            });
+
+  benchmark-summary:
+    name: Benchmark Summary
+    runs-on: ubuntu-latest
+    needs: benchmark
+    if: always()
+    steps:
+      - name: Check results
+        run: |
+          if [ "${{ needs.benchmark.result }}" != "success" ]; then
+            echo "Benchmarks failed or detected regressions"
+            exit 1
+          fi
+          echo "Benchmarks completed successfully!"
--- a/.github/workflows/ci-python.yml
+++ b/.github/workflows/ci-python.yml
@@ -0,0 +1,70 @@
+name: Python CI
+
+# This is a dumb Ci to ensure that the python client and backend builds correctly
+# It could be optimized to run faster, building, testing and linting only changed code
+# but for now it is good enough. It runs on every push and PR to any branch.
+# It also runs on demand.
+
+on:
+  workflow_dispatch:
+
+  push:
+    paths:
+      - "ai/**"
+      - "backend/**"
+      - "cli/**"
+      - "sdk/**"
+      - "src/**"
+  pull_request:
+    paths:
+      - "ai/**"
+      - "backend/**"
+      - "cli/**"
+      - "sdk/**"
+      - "src/**"
+
+jobs:
+  ci:
+    name: ci
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v5
+
+      - name: Setup uv
+        uses: astral-sh/setup-uv@v6
+        with:
+          enable-cache: true
+
+      - name: Set up Python
+        run: uv python install
+
+      # Validate no obvious issues
+      # Quick hack because CLI returns non-zero exit code when no args are provided
+      - name: Run base command
+        run: |
+          set +e
+          uv run ff
+          if [ $? -ne 2 ]; then
+            echo "Expected exit code 2 from 'uv run ff', got $?"
+            exit 1
+          fi
+
+      - name: Build fuzzforge_ai package
+        run: uv build
+
+      - name: Build ai package
+        working-directory: ai
+        run: uv build
+
+      - name: Build cli package
+        working-directory: cli
+        run: uv build
+
+      - name: Build sdk package
+        working-directory: sdk
+        run: uv build
+
+      - name: Build backend package
+        working-directory: backend
+        run: uv build
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -1,86 +0,0 @@
-name: CI
-
-on:
-  push:
-    branches: [main, dev, feature/*]
-  pull_request:
-    branches: [main, dev]
-  workflow_dispatch:
-
-jobs:
-  lint-and-typecheck:
-    name: Lint & Type Check
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v4
-
-      - name: Install uv
-        uses: astral-sh/setup-uv@v5
-        with:
-          version: "latest"
-
-      - name: Set up Python
-        run: uv python install 3.14
-
-      - name: Install dependencies
-        run: uv sync
-
-      - name: Ruff check (fuzzforge-cli)
-        run: |
-          cd fuzzforge-cli
-          uv run --extra lints ruff check src/
-
-      - name: Ruff check (fuzzforge-mcp)
-        run: |
-          cd fuzzforge-mcp
-          uv run --extra lints ruff check src/
-
-      - name: Ruff check (fuzzforge-common)
-        run: |
-          cd fuzzforge-common
-          uv run --extra lints ruff check src/
-
-      - name: Mypy type check (fuzzforge-cli)
-        run: |
-          cd fuzzforge-cli
-          uv run --extra lints mypy src/
-
-      - name: Mypy type check (fuzzforge-mcp)
-        run: |
-          cd fuzzforge-mcp
-          uv run --extra lints mypy src/
-
-      # NOTE: Mypy check for fuzzforge-common temporarily disabled
-      # due to 37 pre-existing type errors in legacy code.
-      # TODO: Fix type errors and re-enable strict checking
-      #- name: Mypy type check (fuzzforge-common)
-      #  run: |
-      #    cd fuzzforge-common
-      #    uv run --extra lints mypy src/
-
-  test:
-    name: Tests
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v4
-
-      - name: Install uv
-        uses: astral-sh/setup-uv@v5
-        with:
-          version: "latest"
-
-      - name: Set up Python
-        run: uv python install 3.14
-
-      - name: Install dependencies
-        run: uv sync --all-extras
-
-      - name: Run MCP tests
-        run: |
-          cd fuzzforge-mcp
-          uv run --extra tests pytest -v
-
-      - name: Run common tests
-        run: |
-          cd fuzzforge-common
-          uv run --extra tests pytest -v
--- a/.github/workflows/docs-deploy.yml
+++ b/.github/workflows/docs-deploy.yml
@@ -0,0 +1,57 @@
+name: Deploy Docusaurus to GitHub Pages
+
+on:
+  workflow_dispatch:
+
+  push:
+    branches:
+      - master
+    paths:
+      - "docs/**"
+
+jobs:
+  build:
+    name: Build Docusaurus
+    runs-on: ubuntu-latest
+    defaults:
+      run:
+        working-directory: ./docs
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - uses: actions/setup-node@v4
+        with:
+          node-version: 24
+          cache: npm
+          cache-dependency-path: "**/package-lock.json"
+
+      - name: Install dependencies
+        run: npm ci
+      - name: Build website
+        run: npm run build
+
+      - name: Upload Build Artifact
+        uses: actions/upload-pages-artifact@v3
+        with:
+          path: ./docs/build
+
+  deploy:
+    name: Deploy to GitHub Pages
+    needs: build
+
+    # Grant GITHUB_TOKEN the permissions required to make a Pages deployment
+    permissions:
+      pages: write # to deploy to Pages
+      id-token: write # to verify the deployment originates from an appropriate source
+
+    # Deploy to the github-pages environment
+    environment:
+      name: github-pages
+      url: ${{ steps.deployment.outputs.page_url }}
+
+    runs-on: ubuntu-latest
+    steps:
+      - name: Deploy to GitHub Pages
+        id: deployment
+        uses: actions/deploy-pages@v4
--- a/.github/workflows/docs-test-deploy.yml
+++ b/.github/workflows/docs-test-deploy.yml
@@ -0,0 +1,33 @@
+name: Docusaurus test deployment
+
+on:
+  workflow_dispatch:
+
+  push:
+    paths:
+      - "docs/**"
+  pull_request:
+    paths:
+      - "docs/**"
+
+jobs:
+  test-deploy:
+    name: Test deployment
+    runs-on: ubuntu-latest
+    defaults:
+      run:
+        working-directory: ./docs
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - uses: actions/setup-node@v4
+        with:
+          node-version: 24
+          cache: npm
+          cache-dependency-path: "**/package-lock.json"
+
+      - name: Install dependencies
+        run: npm ci
+      - name: Test build website
+        run: npm run build
--- a/.github/workflows/examples/security-scan.yml
+++ b/.github/workflows/examples/security-scan.yml
@@ -0,0 +1,152 @@
+# FuzzForge CI/CD Example - Security Scanning
+#
+# This workflow demonstrates how to integrate FuzzForge into your CI/CD pipeline
+# for automated security testing on pull requests and pushes.
+#
+# Features:
+#   - Runs entirely in GitHub Actions (no external infrastructure needed)
+#   - Auto-starts FuzzForge services on-demand
+#   - Fails builds on error-level SARIF findings
+#   - Uploads SARIF results to GitHub Security tab
+#   - Exports findings as artifacts
+#
+# Prerequisites:
+#   - Ubuntu runner with Docker support
+#   - At least 4GB RAM available
+#   - ~90 seconds startup time
+
+name: Security Scan Example
+
+on:
+  pull_request:
+    branches: [main, develop]
+  push:
+    branches: [main]
+
+jobs:
+  security-scan:
+    name: Security Assessment
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v4
+
+      - name: Start FuzzForge
+        run: |
+          bash scripts/ci-start.sh
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+
+      - name: Install FuzzForge CLI
+        run: |
+          pip install ./cli
+
+      - name: Initialize FuzzForge
+        run: |
+          ff init --api-url http://localhost:8000 --name "GitHub Actions Security Scan"
+
+      - name: Run Security Assessment
+        run: |
+          ff workflow run security_assessment . \
+            --wait \
+            --fail-on error \
+            --export-sarif results.sarif
+
+      - name: Upload SARIF to GitHub Security
+        if: always()
+        uses: github/codeql-action/upload-sarif@v3
+        with:
+          sarif_file: results.sarif
+
+      - name: Upload findings as artifact
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: security-findings
+          path: results.sarif
+          retention-days: 30
+
+      - name: Stop FuzzForge
+        if: always()
+        run: |
+          bash scripts/ci-stop.sh
+
+  secret-scan:
+    name: Secret Detection
+    runs-on: ubuntu-latest
+    timeout-minutes: 15
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Start FuzzForge
+        run: bash scripts/ci-start.sh
+
+      - name: Install CLI
+        run: |
+          pip install ./cli
+
+      - name: Initialize & Scan
+        run: |
+          ff init --api-url http://localhost:8000 --name "Secret Detection"
+          ff workflow run secret_detection . \
+            --wait \
+            --fail-on all \
+            --export-sarif secrets.sarif
+
+      - name: Upload results
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: secret-scan-results
+          path: secrets.sarif
+          retention-days: 30
+
+      - name: Cleanup
+        if: always()
+        run: bash scripts/ci-stop.sh
+
+  # Example: Nightly fuzzing campaign (long-running)
+  nightly-fuzzing:
+    name: Nightly Fuzzing
+    runs-on: ubuntu-latest
+    timeout-minutes: 120
+    # Only run on schedule
+    if: github.event_name == 'schedule'
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Start FuzzForge
+        run: bash scripts/ci-start.sh
+
+      - name: Install CLI
+        run: pip install ./cli
+
+      - name: Run Fuzzing Campaign
+        run: |
+          ff init --api-url http://localhost:8000
+          ff workflow run atheris_fuzzing . \
+            max_iterations=100000000 \
+            timeout_seconds=7200 \
+            --wait \
+            --export-sarif fuzzing-results.sarif
+        # Don't fail on fuzzing findings, just report
+        continue-on-error: true
+
+      - name: Upload fuzzing results
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: fuzzing-results
+          path: fuzzing-results.sarif
+          retention-days: 90
+
+      - name: Cleanup
+        if: always()
+        run: bash scripts/ci-stop.sh
--- a/.github/workflows/mcp-server.yml
+++ b/.github/workflows/mcp-server.yml
@@ -1,49 +0,0 @@
-name: MCP Server Smoke Test
-
-on:
-  push:
-    branches: [main, dev]
-  pull_request:
-    branches: [main, dev]
-  workflow_dispatch:
-
-jobs:
-  mcp-server:
-    name: MCP Server Test
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v4
-
-      - name: Install uv
-        uses: astral-sh/setup-uv@v5
-        with:
-          version: "latest"
-
-      - name: Set up Python
-        run: uv python install 3.14
-
-      - name: Install dependencies
-        run: uv sync --all-extras
-
-      - name: Start MCP server in background
-        run: |
-          cd fuzzforge-mcp
-          nohup uv run python -m fuzzforge_mcp.server > server.log 2>&1 &
-          echo $! > server.pid
-          sleep 3
-
-      - name: Run MCP tool tests
-        run: |
-          cd fuzzforge-mcp
-          uv run --extra tests pytest tests/test_resources.py -v
-
-      - name: Stop MCP server
-        if: always()
-        run: |
-          if [ -f fuzzforge-mcp/server.pid ]; then
-            kill $(cat fuzzforge-mcp/server.pid) || true
-          fi
-
-      - name: Show server logs
-        if: failure()
-        run: cat fuzzforge-mcp/server.log || true
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -0,0 +1,155 @@
+name: Tests
+
+on:
+  push:
+    branches: [ main, master, develop, feature/** ]
+  pull_request:
+    branches: [ main, master, develop ]
+
+jobs:
+  lint:
+    name: Lint
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install ruff mypy
+
+      - name: Run ruff
+        run: ruff check backend/src backend/toolbox backend/tests backend/benchmarks --output-format=github
+
+      - name: Run mypy (continue on error)
+        run: mypy backend/src backend/toolbox || true
+        continue-on-error: true
+
+  unit-tests:
+    name: Unit Tests
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ['3.11', '3.12']
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+
+      - name: Install system dependencies
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y build-essential
+
+      - name: Install Python dependencies
+        working-directory: ./backend
+        run: |
+          python -m pip install --upgrade pip
+          pip install -e ".[dev]"
+          pip install pytest pytest-asyncio pytest-cov pytest-xdist
+
+      - name: Run unit tests
+        working-directory: ./backend
+        run: |
+          pytest tests/unit/ \
+            -v \
+            --cov=toolbox/modules \
+            --cov=src \
+            --cov-report=xml \
+            --cov-report=term \
+            --cov-report=html \
+            -n auto
+
+      - name: Upload coverage to Codecov
+        if: matrix.python-version == '3.11'
+        uses: codecov/codecov-action@v4
+        with:
+          file: ./backend/coverage.xml
+          flags: unittests
+          name: codecov-backend
+
+      - name: Upload coverage HTML
+        if: matrix.python-version == '3.11'
+        uses: actions/upload-artifact@v4
+        with:
+          name: coverage-report
+          path: ./backend/htmlcov/
+
+  # integration-tests:
+  #   name: Integration Tests
+  #   runs-on: ubuntu-latest
+  #   needs: unit-tests
+  #
+  #   services:
+  #     postgres:
+  #       image: postgres:15
+  #       env:
+  #         POSTGRES_USER: postgres
+  #         POSTGRES_PASSWORD: postgres
+  #         POSTGRES_DB: fuzzforge_test
+  #       options: >-
+  #         --health-cmd pg_isready
+  #         --health-interval 10s
+  #         --health-timeout 5s
+  #         --health-retries 5
+  #       ports:
+  #         - 5432:5432
+  #
+  #   steps:
+  #     - uses: actions/checkout@v4
+  #
+  #     - name: Set up Python
+  #       uses: actions/setup-python@v5
+  #       with:
+  #         python-version: '3.11'
+  #
+  #     - name: Set up Docker Buildx
+  #       uses: docker/setup-buildx-action@v3
+  #
+  #     - name: Install Python dependencies
+  #       working-directory: ./backend
+  #       run: |
+  #         python -m pip install --upgrade pip
+  #         pip install -e ".[dev]"
+  #         pip install pytest pytest-asyncio
+  #
+  #     - name: Start services (Temporal, MinIO)
+  #       run: |
+  #         docker-compose -f docker-compose.yml up -d temporal minio
+  #         sleep 30
+  #
+  #     - name: Run integration tests
+  #       working-directory: ./backend
+  #       run: |
+  #         pytest tests/integration/ -v --tb=short
+  #       env:
+  #         DATABASE_URL: postgresql://postgres:postgres@localhost:5432/fuzzforge_test
+  #         TEMPORAL_ADDRESS: localhost:7233
+  #         MINIO_ENDPOINT: localhost:9000
+  #
+  #     - name: Shutdown services
+  #       if: always()
+  #       run: docker-compose down
+
+  test-summary:
+    name: Test Summary
+    runs-on: ubuntu-latest
+    needs: [lint, unit-tests]
+    if: always()
+    steps:
+      - name: Check test results
+        run: |
+          if [ "${{ needs.unit-tests.result }}" != "success" ]; then
+            echo "Unit tests failed"
+            exit 1
+          fi
+          echo "All tests passed!"
--- a/.gitignore
+++ b/.gitignore
@@ -1,15 +1,307 @@
-*.egg-info
-*.whl
+# ========================================
+# FuzzForge Platform .gitignore
+# ========================================
+
+# -------------------- Python --------------------
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Environments
 .env
-.mypy_cache
-.pytest_cache
-.ruff_cache
 .venv
-.vscode
-__pycache__
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+.python-version

-# Podman/Docker container storage artifacts
-~/.fuzzforge/
+# UV package manager
+uv.lock
+# But allow uv.lock in CLI and SDK for reproducible builds
+!cli/uv.lock
+!sdk/uv.lock
+!backend/uv.lock

-# User-specific hub config (generated at runtime)
-hub-config.json
+# MyPy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# -------------------- IDE / Editor --------------------
+# VSCode
+.vscode/
+*.code-workspace
+
+# PyCharm
+.idea/
+
+# Vim
+*.swp
+*.swo
+*~
+
+# Emacs
+*~
+\#*\#
+/.emacs.desktop
+/.emacs.desktop.lock
+*.elc
+auto-save-list
+tramp
+.\#*
+
+# Sublime Text
+*.sublime-project
+*.sublime-workspace
+
+# -------------------- Operating System --------------------
+# macOS
+.DS_Store
+.AppleDouble
+.LSOverride
+Icon
+._*
+.DocumentRevisions-V100
+.fseventsd
+.Spotlight-V100
+.TemporaryItems
+.Trashes
+.VolumeIcon.icns
+.com.apple.timemachine.donotpresent
+.AppleDB
+.AppleDesktop
+Network Trash Folder
+Temporary Items
+.apdisk
+
+# Windows
+Thumbs.db
+Thumbs.db:encryptable
+ehthumbs.db
+ehthumbs_vista.db
+*.stackdump
+[Dd]esktop.ini
+$RECYCLE.BIN/
+*.cab
+*.msi
+*.msix
+*.msm
+*.msp
+*.lnk
+
+# Linux
+*~
+.fuse_hidden*
+.directory
+.Trash-*
+.nfs*
+
+# -------------------- Docker --------------------
+# Docker volumes and data
+docker-volumes/
+.dockerignore.bak
+
+# Docker Compose override files
+docker-compose.override.yml
+docker-compose.override.yaml
+
+# -------------------- Database --------------------
+# SQLite
+*.sqlite
+*.sqlite3
+*.db
+*.db-journal
+*.db-shm
+*.db-wal
+
+# PostgreSQL
+*.sql.backup
+
+# -------------------- Logs --------------------
+# General logs
+*.log
+logs/
+*.log.*
+
+# -------------------- FuzzForge Specific --------------------
+# FuzzForge project directories (user projects should manage their own .gitignore)
+.fuzzforge/
+
+# Docker volume configs (keep .env.example but ignore actual .env)
+volumes/env/.env
+
+# Vendored proxy sources (kept locally for reference)
+ai/proxy/bifrost/
+ai/proxy/litellm/
+
+# Test project databases and configurations
+test_projects/*/.fuzzforge/
+test_projects/*/findings.db*
+test_projects/*/config.yaml
+test_projects/*/.gitignore
+
+# Local development configurations
+local_config.yaml
+dev_config.yaml
+.env.local
+.env.development
+
+# Generated reports and outputs
+reports/
+output/
+findings/
+*.sarif
+*.sarif.json
+*.html.report
+security_report.*
+
+# Temporary files
+tmp/
+temp/
+*.tmp
+*.temp
+
+# Backup files
+*.bak
+*.backup
+*~
+
+# -------------------- Node.js (for any JS tooling) --------------------
+node_modules/
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+.npm
+
+# -------------------- Security --------------------
+# Never commit these files
+*.pem
+*.key
+*.p12
+*.pfx
+
+# Exception: Secret detection benchmark test files (not real secrets)
+!test_projects/secret_detection_benchmark/
+!test_projects/secret_detection_benchmark/**
+!**/secret_detection_benchmark_GROUND_TRUTH.json
+!**/secret_detection/results/
+
+secret*
+secrets/
+credentials*
+api_keys*
+.env.production
+.env.staging
+
+# AWS credentials
+.aws/
+
+# -------------------- Build Artifacts --------------------
+# Python builds
+build/
+dist/
+*.wheel
+
+# Documentation builds
+docs/_build/
+site/
+
+# -------------------- Miscellaneous --------------------
+# Jupyter Notebook checkpoints
+.ipynb_checkpoints
+
+# IPython history
+.ipython/
+
+# Rope project settings
+.ropeproject
+
+# spyderproject
+.spyderproject
+.spyproject
+
+# mkdocs documentation
+/site
+
+# Local Netlify folder
+.netlify
+
+# -------------------- Project Specific Overrides --------------------
+# Allow specific test project files that should be tracked
+!test_projects/*/src/
+!test_projects/*/scripts/
+!test_projects/*/config/
+!test_projects/*/data/
+!test_projects/*/README.md
+!test_projects/*/*.py
+!test_projects/*/*.js
+!test_projects/*/*.php
+!test_projects/*/*.java
+
+# But exclude their sensitive content
+test_projects/*/.env
+test_projects/*/private_key.pem
+test_projects/*/wallet.json
+test_projects/*/.npmrc
+test_projects/*/.git-credentials
+test_projects/*/credentials.*
+test_projects/*/api_keys.*
+test_projects/*/ci-*.sh
--- a/.gitlab-ci.example.yml
+++ b/.gitlab-ci.example.yml
@@ -0,0 +1,121 @@
+# FuzzForge CI/CD Example - GitLab CI
+#
+# This file demonstrates how to integrate FuzzForge into your GitLab CI/CD pipeline.
+# Copy this to `.gitlab-ci.yml` in your project root to enable security scanning.
+#
+# Features:
+#   - Runs entirely in GitLab runners (no external infrastructure)
+#   - Auto-starts FuzzForge services on-demand
+#   - Fails pipelines on critical/high severity findings
+#   - Uploads SARIF reports to GitLab Security Dashboard
+#   - Exports findings as artifacts
+#
+# Prerequisites:
+#   - GitLab Runner with Docker support (docker:dind)
+#   - At least 4GB RAM available
+#   - ~90 seconds startup time
+
+stages:
+  - security
+
+variables:
+  FUZZFORGE_API_URL: "http://localhost:8000"
+  DOCKER_DRIVER: overlay2
+  DOCKER_TLS_CERTDIR: ""
+
+# Base template for all FuzzForge jobs
+.fuzzforge_template:
+  image: docker:24
+  services:
+    - docker:24-dind
+  before_script:
+    # Install dependencies
+    - apk add --no-cache bash curl python3 py3-pip git
+    # Start FuzzForge
+    - bash scripts/ci-start.sh
+    # Install CLI
+    - pip3 install ./cli --break-system-packages
+    # Initialize project
+    - ff init --api-url $FUZZFORGE_API_URL --name "GitLab CI Security Scan"
+  after_script:
+    # Cleanup
+    - bash scripts/ci-stop.sh || true
+
+# Security Assessment - Comprehensive code analysis
+security:scan:
+  extends: .fuzzforge_template
+  stage: security
+  timeout: 30 minutes
+  script:
+    - ff workflow run security_assessment . --wait --fail-on error --export-sarif results.sarif
+  artifacts:
+    when: always
+    reports:
+      sast: results.sarif
+    paths:
+      - results.sarif
+    expire_in: 30 days
+  only:
+    - merge_requests
+    - main
+    - develop
+
+# Secret Detection - Scan for exposed credentials
+security:secrets:
+  extends: .fuzzforge_template
+  stage: security
+  timeout: 15 minutes
+  script:
+    - ff workflow run secret_detection . --wait --fail-on all --export-sarif secrets.sarif
+  artifacts:
+    when: always
+    paths:
+      - secrets.sarif
+    expire_in: 30 days
+  only:
+    - merge_requests
+    - main
+
+# Nightly Fuzzing - Long-running fuzzing campaign (scheduled only)
+security:fuzzing:
+  extends: .fuzzforge_template
+  stage: security
+  timeout: 2 hours
+  script:
+    - |
+      ff workflow run atheris_fuzzing . \
+        max_iterations=100000000 \
+        timeout_seconds=7200 \
+        --wait \
+        --export-sarif fuzzing-results.sarif
+  artifacts:
+    when: always
+    paths:
+      - fuzzing-results.sarif
+    expire_in: 90 days
+  allow_failure: true  # Don't fail pipeline on fuzzing findings
+  only:
+    - schedules
+
+# OSS-Fuzz Campaign (for supported projects)
+security:ossfuzz:
+  extends: .fuzzforge_template
+  stage: security
+  timeout: 1 hour
+  script:
+    - |
+      ff workflow run ossfuzz_campaign . \
+        project_name=your-project-name \
+        campaign_duration_hours=0.5 \
+        --wait \
+        --export-sarif ossfuzz-results.sarif
+  artifacts:
+    when: always
+    paths:
+      - ossfuzz-results.sarif
+    expire_in: 90 days
+  allow_failure: true
+  only:
+    - schedules
+  # Uncomment and set your project name
+  # when: manual
--- a/.python-version
+++ b/.python-version
@@ -1 +0,0 @@
-3.14.2
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -0,0 +1,85 @@
+# Changelog
+
+All notable changes to FuzzForge will be documented in this file.
+
+The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
+and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
+
+## [0.7.0] - 2025-01-16
+
+### 🎯 Major Features
+
+#### Secret Detection Workflows
+- **Added three secret detection workflows**:
+  - `gitleaks_detection` - Pattern-based secret scanning
+  - `trufflehog_detection` - Entropy-based secret detection with verification
+  - `llm_secret_detection` - AI-powered semantic secret detection using LLMs
+- **Comprehensive benchmarking infrastructure**:
+  - 32-secret ground truth dataset for precision/recall testing
+  - Difficulty levels: 12 Easy, 10 Medium, 10 Hard secrets
+  - SARIF-formatted output for all workflows
+  - Achieved 100% recall with LLM-based detection on benchmark dataset
+
+#### AI Module & Agent Integration
+- Added A2A (Agent-to-Agent) wrapper for multi-agent orchestration
+- Task agent implementation with Google ADK
+- LLM analysis workflow for code security analysis
+- Reactivated AI agent command (`ff ai agent`)
+
+#### Temporal Migration Complete
+- Fully migrated from Prefect to Temporal for workflow orchestration
+- MinIO storage for unified file handling (replaces volume mounts)
+- Vertical workers with pre-built security toolchains
+- Improved worker lifecycle management
+
+#### CI/CD Integration
+- Ephemeral deployment model for testing
+- Automated workflow validation in CI pipeline
+
+### ✨ Enhancements
+
+#### Documentation
+- Updated README for Temporal + MinIO architecture
+- Removed obsolete `volume_mode` references across all documentation
+- Added `.env` configuration guide for AI agent API keys
+- Fixed worker startup instructions with correct service names
+- Updated docker compose commands to modern syntax
+
+#### Worker Management
+- Added `worker_service` field to API responses for correct service naming
+- Improved error messages with actionable manual start commands
+- Fixed default parameters for gitleaks (now uses `no_git=True` by default)
+
+### 🐛 Bug Fixes
+
+- Fixed gitleaks workflow failing on uploaded directories without Git history
+- Fixed worker startup command suggestions (now uses `docker compose up -d` with service names)
+- Fixed missing `cognify_text` method in CogneeProjectIntegration
+
+### 🔧 Technical Changes
+
+- Updated all package versions to 0.7.0
+- Improved SARIF output formatting for secret detection workflows
+- Enhanced benchmark validation with ground truth JSON
+- Better integration between CLI and backend for worker management
+
+### 📝 Test Projects
+
+- Added `secret_detection_benchmark` with 32 documented secrets
+- Ground truth JSON for automated precision/recall calculations
+- Updated `vulnerable_app` for comprehensive security testing
+
+---
+
+## [0.6.0] - 2024-12-XX
+
+### Features
+- Initial Temporal migration
+- Fuzzing workflows (Atheris, Cargo, OSS-Fuzz)
+- Security assessment workflow
+- Basic CLI commands
+
+---
+
+[0.7.0]: https://github.com/FuzzingLabs/fuzzforge_ai/compare/v0.6.0...v0.7.0
+[0.6.0]: https://github.com/FuzzingLabs/fuzzforge_ai/releases/tag/v0.6.0
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,21 +1,17 @@
-# Contributing to FuzzForge AI
+# Contributing to FuzzForge 🤝

-Thank you for your interest in contributing to FuzzForge AI! We welcome contributions from the community and are excited to collaborate with you.
+Thank you for your interest in contributing to FuzzForge! We welcome contributions from the community and are excited to collaborate with you.

-**Our Vision**: FuzzForge aims to be a **universal platform for security research** across all cybersecurity domains. Through our modular architecture, any security tool—from fuzzing engines to cloud scanners, from mobile app analyzers to IoT security tools—can be integrated as a containerized module and controlled via AI agents.
+## 🌟 Ways to Contribute

-## Ways to Contribute
+- 🐛 **Bug Reports** - Help us identify and fix issues
+- 💡 **Feature Requests** - Suggest new capabilities and improvements
+- 🔧 **Code Contributions** - Submit bug fixes, features, and enhancements
+- 📚 **Documentation** - Improve guides, tutorials, and API documentation
+- 🧪 **Testing** - Help test new features and report issues
+- 🛡️ **Security Workflows** - Contribute new security analysis workflows

- **Security Modules** - Create modules for any cybersecurity domain (AppSec, NetSec, Cloud, IoT, etc.)
- **Bug Reports** - Help us identify and fix issues
- **Feature Requests** - Suggest new capabilities and improvements
- **Core Features** - Contribute to the MCP server, runner, or CLI
- **Documentation** - Improve guides, tutorials, and module documentation
- **Testing** - Help test new features and report issues
- **AI Integration** - Improve MCP tools and AI agent interactions
- **Tool Integrations** - Wrap existing security tools as FuzzForge modules
-
-## Contribution Guidelines
+## 📋 Contribution Guidelines

 ### Code Style

@@ -48,10 +44,9 @@ We use conventional commits for clear history:

 **Examples:**
 ```
-feat(modules): add cloud security scanner module
-fix(mcp): resolve module listing timeout
-docs(sdk): update module development guide
-test(runner): add container execution tests
+feat(workflows): add new static analysis workflow for Go
+fix(api): resolve authentication timeout issue
+docs(readme): update installation instructions
 ```

 ### Pull Request Process
@@ -70,14 +65,9 @@ test(runner): add container execution tests

 3. **Test Your Changes**
   ```bash
-   # Test modules
-   FUZZFORGE_MODULES_PATH=./fuzzforge-modules uv run fuzzforge modules list
-   
-   # Run a module
-   uv run fuzzforge modules run your-module --assets ./test-assets
-   
-   # Test MCP integration (if applicable)
-   uv run fuzzforge mcp status
+   # Test workflows
+   cd test_projects/vulnerable_app/
+   ff workflow security_assessment .
   ```

 4. **Submit Pull Request**
@@ -86,353 +76,65 @@ test(runner): add container execution tests
   - Link related issues using `Fixes #123` or `Closes #123`
   - Ensure all CI checks pass

-## Module Development
+## 🛡️ Security Workflow Development

-FuzzForge uses a modular architecture where security tools run as isolated containers. The `fuzzforge-modules-sdk` provides everything you need to create new modules.
+### Creating New Workflows

-**Documentation:**
- [Module SDK Documentation](fuzzforge-modules/fuzzforge-modules-sdk/README.md) - Complete SDK reference
- [Module Template](fuzzforge-modules/fuzzforge-module-template/) - Starting point for new modules
- [USAGE Guide](USAGE.md) - Setup and installation instructions
-
-### Creating a New Module
-
-1. **Use the Module Template**
-   ```bash
-   # Generate a new module from template
-   cd fuzzforge-modules/
-   cp -r fuzzforge-module-template my-new-module
-   cd my-new-module
+1. **Workflow Structure**
+   ```
+   backend/toolbox/workflows/your_workflow/
+   ├── __init__.py
+   ├── workflow.py          # Main Temporal workflow
+   ├── activities.py        # Workflow activities (optional)
+   ├── metadata.yaml        # Workflow metadata (includes vertical field)
+   └── requirements.txt     # Additional dependencies (optional)
   ```

-2. **Module Structure**
-   ```
-   my-new-module/
-   ├── Dockerfile              # Container definition
-   ├── Makefile                # Build commands
-   ├── README.md               # Module documentation
-   ├── pyproject.toml          # Python dependencies
-   ├── mypy.ini                # Type checking config
-   ├── ruff.toml               # Linting config
-   └── src/
-       └── module/
-           ├── __init__.py
-           ├── __main__.py     # Entry point
-           ├── mod.py          # Main module logic
-           ├── models.py       # Pydantic models
-           └── settings.py     # Configuration
-   ```
-
-3. **Implement Your Module**
-   
-   Edit `src/module/mod.py`:
+2. **Register Your Workflow**
+   Add your workflow to `backend/toolbox/workflows/registry.py`:
   ```python
-   from fuzzforge_modules_sdk.api.modules import BaseModule
-   from fuzzforge_modules_sdk.api.models import ModuleResult
-   from .models import MyModuleConfig, MyModuleOutput
-   
-   class MyModule(BaseModule[MyModuleConfig, MyModuleOutput]):
-       """Your module description."""
-       
-       def execute(self) -> ModuleResult[MyModuleOutput]:
-           """Main execution logic."""
-           # Access input assets
-           assets = self.input_path
-           
-           # Your security tool logic here
-           results = self.run_analysis(assets)
-           
-           # Return structured results
-           return ModuleResult(
-               success=True,
-               output=MyModuleOutput(
-                   findings=results,
-                   summary="Analysis complete"
-               )
-           )
+   # Import your workflow
+   from .your_workflow.workflow import main_flow as your_workflow_flow
+
+   # Add to registry
+   WORKFLOW_REGISTRY["your_workflow"] = {
+       "flow": your_workflow_flow,
+       "module_path": "toolbox.workflows.your_workflow.workflow",
+       "function_name": "main_flow",
+       "description": "Description of your workflow",
+       "version": "1.0.0",
+       "author": "Your Name",
+       "tags": ["tag1", "tag2"]
+   }
   ```

-4. **Define Configuration Models**
-   
-   Edit `src/module/models.py`:
-   ```python
-   from pydantic import BaseModel, Field
-   from fuzzforge_modules_sdk.api.models import BaseModuleConfig, BaseModuleOutput
-   
-   class MyModuleConfig(BaseModuleConfig):
-       """Configuration for your module."""
-       timeout: int = Field(default=300, description="Timeout in seconds")
-       max_iterations: int = Field(default=1000, description="Max iterations")
-   
-   class MyModuleOutput(BaseModuleOutput):
-       """Output from your module."""
-       findings: list[dict] = Field(default_factory=list)
-       coverage: float = Field(default=0.0)
-   ```
-
-5. **Build Your Module**
-   ```bash
-   # Build the SDK first (if not already done)
-   cd ../fuzzforge-modules-sdk
-   uv build
-   mkdir -p .wheels
-   cp ../../dist/fuzzforge_modules_sdk-*.whl .wheels/
-   cd ../..
-   docker build -t localhost/fuzzforge-modules-sdk:0.1.0 fuzzforge-modules/fuzzforge-modules-sdk/
-   
-   # Build your module
-   cd fuzzforge-modules/my-new-module
-   docker build -t fuzzforge-my-new-module:0.1.0 .
-   ```
-
-6. **Test Your Module**
-   ```bash
-   # Run with test assets
-   uv run fuzzforge modules run my-new-module --assets ./test-assets
-   
-   # Check module info
-   uv run fuzzforge modules info my-new-module
-   ```
-
-### Module Development Guidelines
-
-**Important Conventions:**
- **Input/Output**: Use `/fuzzforge/input` for assets and `/fuzzforge/output` for results
- **Configuration**: Support JSON configuration via stdin or file
- **Logging**: Use structured logging (structlog is pre-configured)
- **Error Handling**: Return proper exit codes and error messages
- **Security**: Run as non-root user when possible
- **Documentation**: Include clear README with usage examples
- **Dependencies**: Minimize container size, use multi-stage builds
-
-**See also:**
- [Module SDK API Reference](fuzzforge-modules/fuzzforge-modules-sdk/src/fuzzforge_modules_sdk/api/)
- [Dockerfile Best Practices](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/)
-
-### Module Types
-
-FuzzForge is designed to support modules across **all cybersecurity domains**. The modular architecture allows any security tool to be containerized and integrated. Here are the main categories:
-
-**Application Security**
- Fuzzing engines (coverage-guided, grammar-based, mutation-based)
- Static analysis (SAST, code quality, dependency scanning)
- Dynamic analysis (DAST, runtime analysis, instrumentation)
- Test validation and coverage analysis
- Crash analysis and exploit detection
-
-**Network & Infrastructure Security**
- Network scanning and service enumeration
- Protocol analysis and fuzzing
- Firewall and configuration testing
- Cloud security (AWS/Azure/GCP misconfiguration detection, IAM analysis)
- Container security (image scanning, Kubernetes security)
-
-**Web & API Security**
- Web vulnerability scanners (XSS, SQL injection, CSRF)
- Authentication and session testing
- API security (REST/GraphQL/gRPC testing, fuzzing)
- SSL/TLS analysis
-
-**Binary & Reverse Engineering**
- Binary analysis and disassembly
- Malware sandboxing and behavior analysis
- Exploit development tools
- Firmware extraction and analysis
-
-**Mobile & IoT Security**
- Mobile app analysis (Android/iOS static/dynamic analysis)
- IoT device security and firmware analysis
- SCADA/ICS and industrial protocol testing
- Automotive security (CAN bus, ECU testing)
-
-**Data & Compliance**
- Database security testing
- Encryption and cryptography analysis
- Secrets and credential detection
- Privacy tools (PII detection, GDPR compliance)
- Compliance checkers (PCI-DSS, HIPAA, SOC2, ISO27001)
-
-**Threat Intelligence & Risk**
- OSINT and reconnaissance tools
- Threat hunting and IOC correlation
- Risk assessment and attack surface mapping
- Security audit and policy validation
-
-**Emerging Technologies**
- AI/ML security (model poisoning, adversarial testing)
- Blockchain and smart contract analysis
- Quantum-safe cryptography testing
-
-**Custom & Integration**
- Domain-specific security tools
- Bridges to existing security tools
- Multi-tool orchestration and result aggregation
-
-### Example: Simple Security Scanner Module
-
-```python
-# src/module/mod.py
-from pathlib import Path
-from fuzzforge_modules_sdk.api.modules import BaseModule
-from fuzzforge_modules_sdk.api.models import ModuleResult
-from .models import ScannerConfig, ScannerOutput
-
-class SecurityScanner(BaseModule[ScannerConfig, ScannerOutput]):
-    """Scans for common security issues in code."""
-    
-    def execute(self) -> ModuleResult[ScannerOutput]:
-        findings = []
-        
-        # Scan all source files
-        for file_path in self.input_path.rglob("*"):
-            if file_path.is_file():
-                findings.extend(self.scan_file(file_path))
-        
-        return ModuleResult(
-            success=True,
-            output=ScannerOutput(
-                findings=findings,
-                files_scanned=len(list(self.input_path.rglob("*")))
-            )
-        )
-    
-    def scan_file(self, path: Path) -> list[dict]:
-        """Scan a single file for security issues."""
-        # Your scanning logic here
-        return []
-```
-
-### Testing Modules
-
-Create tests in `tests/`:
-```python
-import pytest
-from module.mod import MyModule
-from module.models import MyModuleConfig
-
-def test_module_execution():
-    config = MyModuleConfig(timeout=60)
-    module = MyModule(config=config, input_path=Path("test_assets"))
-    result = module.execute()
-    
-    assert result.success
-    assert len(result.output.findings) >= 0
-```
-
-Run tests:
-```bash
-uv run pytest
-```
+3. **Testing Workflows**
+   - Create test cases in `test_projects/vulnerable_app/`
+   - Ensure SARIF output format compliance
+   - Test with various input scenarios

 ### Security Guidelines

-**Critical Requirements:**
- Never commit secrets, API keys, or credentials
- Focus on **defensive security** tools and analysis
- Do not create tools for malicious purposes
- Test modules thoroughly before submission
- Follow responsible disclosure for security issues
- Use minimal, secure base images for containers
- Avoid running containers as root when possible
+- 🔐 Never commit secrets, API keys, or credentials
+- 🛡️ Focus on **defensive security** tools and analysis
+- ⚠️ Do not create tools for malicious purposes
+- 🧪 Test workflows thoroughly before submission
+- 📋 Follow responsible disclosure for security issues

-**Security Resources:**
- [OWASP Container Security](https://cheatsheetseries.owasp.org/cheatsheets/Docker_Security_Cheat_Sheet.html)
- [CIS Docker Benchmarks](https://www.cisecurity.org/benchmark/docker)
-
-## Contributing to Core Features
-
-Beyond modules, you can contribute to FuzzForge's core components.
-
-**Useful Resources:**
- [Project Structure](README.md) - Overview of the codebase
- [USAGE Guide](USAGE.md) - Installation and setup
- Python best practices: [PEP 8](https://pep8.org/)
-
-### Core Components
-
- **fuzzforge-mcp** - MCP server for AI agent integration
- **fuzzforge-runner** - Module execution engine
- **fuzzforge-cli** - Command-line interface
- **fuzzforge-common** - Shared utilities and sandbox engines
- **fuzzforge-types** - Type definitions and schemas
-
-### Development Setup
-
-1. **Clone and Install**
-   ```bash
-   git clone https://github.com/FuzzingLabs/fuzzforge_ai.git
-   cd fuzzforge_ai
-   uv sync --all-extras
-   ```
-
-2. **Run Tests**
-   ```bash
-   # Run all tests
-   make test
-   
-   # Run specific package tests
-   cd fuzzforge-mcp
-   uv run pytest
-   ```
-
-3. **Type Checking**
-   ```bash
-   # Type check all packages
-   make typecheck
-   
-   # Type check specific package
-   cd fuzzforge-runner
-   uv run mypy .
-   ```
-
-4. **Linting and Formatting**
-   ```bash
-   # Format code
-   make format
-   
-   # Lint code
-   make lint
-   ```
-
-## Bug Reports
+## 🐛 Bug Reports

 When reporting bugs, please include:

- **Environment**: OS, Python version, Docker version, uv version
- **FuzzForge Version**: Output of `uv run fuzzforge --version`
- **Module**: Which module or component is affected
+- **Environment**: OS, Python version, Docker version
 - **Steps to Reproduce**: Clear steps to recreate the issue
 - **Expected Behavior**: What should happen
 - **Actual Behavior**: What actually happens
 - **Logs**: Relevant error messages and stack traces
- **Container Logs**: For module issues, include Docker/Podman logs
 - **Screenshots**: If applicable

-**Example:**
-```markdown
-**Environment:**
- OS: Ubuntu 22.04
- Python: 3.14.2
- Docker: 24.0.7
- uv: 0.5.13
+Use our [Bug Report Template](.github/ISSUE_TEMPLATE/bug_report.md).

-**Module:** my-custom-scanner
-
-**Steps to Reproduce:**
-1. Run `uv run fuzzforge modules run my-scanner --assets ./test-target`
-2. Module fails with timeout error
-
-**Expected:** Module completes analysis
-**Actual:** Times out after 30 seconds
-
-**Logs:**
-```
-ERROR: Module execution timeout
-...
-```
-```
-
-## Feature Requests
+## 💡 Feature Requests

 For new features, please provide:

@@ -440,124 +142,33 @@ For new features, please provide:
 - **Proposed Solution**: How should it work?
 - **Alternatives**: Other approaches considered
 - **Implementation**: Technical considerations (optional)
- **Module vs Core**: Should this be a module or core feature?

-**Example Feature Requests:**
- New module for cloud security posture management (CSPM)
- Module for analyzing smart contract vulnerabilities
- MCP tool for orchestrating multi-module workflows
- CLI command for batch module execution across multiple targets
- Support for distributed fuzzing campaigns
- Integration with CI/CD pipelines
- Module marketplace/registry features
+Use our [Feature Request Template](.github/ISSUE_TEMPLATE/feature_request.md).

-## Documentation
+## 📚 Documentation

 Help improve our documentation:

- **Module Documentation**: Document your modules in their README.md
 - **API Documentation**: Update docstrings and type hints
- **User Guides**: Improve USAGE.md and tutorial content
- **Module SDK Guides**: Help document the SDK for module developers
- **MCP Integration**: Document AI agent integration patterns
- **Examples**: Add practical usage examples and workflows
+- **User Guides**: Create tutorials and how-to guides
+- **Workflow Documentation**: Document new security workflows
+- **Examples**: Add practical usage examples

-### Documentation Standards
-
- Use clear, concise language
- Include code examples
- Add command-line examples with expected output
- Document all configuration options
- Explain error messages and troubleshooting
-
-### Module README Template
-
-```markdown
-# Module Name
-
-Brief description of what this module does.
-
-## Features
-
- Feature 1
- Feature 2
-
-## Configuration
-
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| timeout   | int  | 300     | Timeout in seconds |
-
-## Usage
-
-\`\`\`bash
-uv run fuzzforge modules run module-name --assets ./path/to/assets
-\`\`\`
-
-## Output
-
-Describes the output structure and format.
-
-## Examples
-
-Practical usage examples.
-```
-
-## Recognition
+## 🙏 Recognition

 Contributors will be:

 - Listed in our [Contributors](CONTRIBUTORS.md) file
 - Mentioned in release notes for significant contributions
- Credited in module documentation (for module authors)
- Invited to join our [Discord community](https://discord.gg/8XEX33UUwZ)
+- Invited to join our Discord community
+- Eligible for FuzzingLabs Academy courses and swag

-## Module Submission Checklist
+## 📜 License

-Before submitting a new module:
-
- [ ] Module follows SDK structure and conventions
- [ ] Dockerfile builds successfully
- [ ] Module executes without errors
- [ ] Configuration options are documented
- [ ] README.md is complete with examples
- [ ] Tests are included (pytest)
- [ ] Type hints are used throughout
- [ ] Linting passes (ruff)
- [ ] Security best practices followed
- [ ] No secrets or credentials in code
- [ ] License headers included
-
-## Review Process
-
-1. **Initial Review** - Maintainers review for completeness
-2. **Technical Review** - Code quality and security assessment
-3. **Testing** - Module tested in isolated environment
-4. **Documentation Review** - Ensure docs are clear and complete
-5. **Approval** - Module merged and included in next release
-
-## License
-
-By contributing to FuzzForge AI, you agree that your contributions will be licensed under the same license as the project (see [LICENSE](LICENSE)).
-
-For module contributions:
- Modules you create remain under the project license
- You retain credit as the module author
- Your module may be used by others under the project license terms
+By contributing to FuzzForge, you agree that your contributions will be licensed under the same [Business Source License 1.1](LICENSE) as the project.

 ---

-## Getting Help
+**Thank you for making FuzzForge better! 🚀**

-Need help contributing?
-
- Join our [Discord](https://discord.gg/8XEX33UUwZ)
- Read the [Module SDK Documentation](fuzzforge-modules/fuzzforge-modules-sdk/README.md)
- Check the module template for examples
- Contact: contact@fuzzinglabs.com
-
---
-
-**Thank you for making FuzzForge better!**
-
-Every contribution, no matter how small, helps build a stronger security research platform. Whether you're creating a module for web security, cloud scanning, mobile analysis, or any other cybersecurity domain, your work makes FuzzForge more powerful and versatile for the entire security community!
+Every contribution, no matter how small, helps build a stronger security community.
--- a/78
+++ b/78
@@ -1,78 +0,0 @@
-.PHONY: help install sync format lint typecheck test build-hub-images clean
-
-SHELL := /bin/bash
-
-# Default target
-help:
-	@echo "FuzzForge AI Development Commands"
-	@echo ""
-	@echo "  make install       - Install all dependencies"
-	@echo "  make sync          - Sync shared packages from upstream"
-	@echo "  make format        - Format code with ruff"
-	@echo "  make lint          - Lint code with ruff"
-	@echo "  make typecheck     - Type check with mypy"
-	@echo "  make test          - Run all tests"
-	@echo "  make build-hub-images  - Build all mcp-security-hub images"
-	@echo "  make clean             - Clean build artifacts"
-	@echo ""
-
-# Install all dependencies
-install:
-	uv sync
-
-# Sync shared packages from upstream fuzzforge-core
-sync:
-	@if [ -z "$(UPSTREAM)" ]; then \
-		echo "Usage: make sync UPSTREAM=/path/to/fuzzforge-core"; \
-		exit 1; \
-	fi
-	./scripts/sync-upstream.sh $(UPSTREAM)
-
-# Format all packages
-format:
-	@for pkg in packages/fuzzforge-*/; do \
-		if [ -f "$$pkg/pyproject.toml" ]; then \
-			echo "Formatting $$pkg..."; \
-			cd "$$pkg" && uv run ruff format . && cd -; \
-		fi \
-	done
-
-# Lint all packages
-lint:
-	@for pkg in packages/fuzzforge-*/; do \
-		if [ -f "$$pkg/pyproject.toml" ]; then \
-			echo "Linting $$pkg..."; \
-			cd "$$pkg" && uv run ruff check . && cd -; \
-		fi \
-	done
-
-# Type check all packages
-typecheck:
-	@for pkg in packages/fuzzforge-*/; do \
-		if [ -f "$$pkg/pyproject.toml" ] && [ -f "$$pkg/mypy.ini" ]; then \
-			echo "Type checking $$pkg..."; \
-			cd "$$pkg" && uv run mypy . && cd -; \
-		fi \
-	done
-
-# Run all tests
-test:
-	@for pkg in packages/fuzzforge-*/; do \
-		if [ -f "$$pkg/pytest.ini" ]; then \
-			echo "Testing $$pkg..."; \
-			cd "$$pkg" && uv run pytest && cd -; \
-		fi \
-	done
-
-# Build all mcp-security-hub images for the firmware analysis pipeline
-build-hub-images:
-	@bash scripts/build-hub-images.sh
-
-# Clean build artifacts
-clean:
-	find . -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
-	find . -type d -name ".pytest_cache" -exec rm -rf {} + 2>/dev/null || true
-	find . -type d -name ".mypy_cache" -exec rm -rf {} + 2>/dev/null || true
-	find . -type d -name ".ruff_cache" -exec rm -rf {} + 2>/dev/null || true
-	find . -type d -name "*.egg-info" -exec rm -rf {} + 2>/dev/null || true
-	find . -type f -name "*.pyc" -delete 2>/dev/null || true
--- a/QUICKSTART_TEMPORAL.md
+++ b/QUICKSTART_TEMPORAL.md
@@ -0,0 +1,421 @@
+# FuzzForge Temporal Architecture - Quick Start Guide
+
+This guide walks you through starting and testing the new Temporal-based architecture.
+
+## Prerequisites
+
+- Docker and Docker Compose installed
+- At least 2GB free RAM (core services only, workers start on-demand)
+- Ports available: 7233, 8233, 9000, 9001, 8000
+
+## Step 1: Start Core Services
+
+```bash
+# From project root
+cd /path/to/fuzzforge_ai
+
+# Start core services (Temporal, MinIO, Backend)
+docker-compose up -d
+
+# Workers are pre-built but don't auto-start (saves ~6-7GB RAM)
+# They'll start automatically when workflows need them
+
+# Check status
+docker-compose ps
+```
+
+**Expected output:**
+```
+NAME                          STATUS    PORTS
+fuzzforge-minio               healthy   0.0.0.0:9000-9001->9000-9001/tcp
+fuzzforge-temporal            healthy   0.0.0.0:7233->7233/tcp
+fuzzforge-temporal-postgresql healthy   5432/tcp
+fuzzforge-backend             healthy   0.0.0.0:8000->8000/tcp
+fuzzforge-minio-setup         exited (0)
+# Workers NOT running (will start on-demand)
+```
+
+**First startup takes ~30-60 seconds** for health checks to pass.
+
+## Step 2: Verify Worker Discovery
+
+Check worker logs to ensure workflows are discovered:
+
+```bash
+docker logs fuzzforge-worker-rust
+```
+
+**Expected output:**
+```
+============================================================
+FuzzForge Vertical Worker: rust
+============================================================
+Temporal Address: temporal:7233
+Task Queue: rust-queue
+Max Concurrent Activities: 5
+============================================================
+Discovering workflows for vertical: rust
+Importing workflow module: toolbox.workflows.rust_test.workflow
+✓ Discovered workflow: RustTestWorkflow from rust_test (vertical: rust)
+Discovered 1 workflows for vertical 'rust'
+Connecting to Temporal at temporal:7233...
+✓ Connected to Temporal successfully
+Creating worker on task queue: rust-queue
+✓ Worker created successfully
+============================================================
+🚀 Worker started for vertical 'rust'
+📦 Registered 1 workflows
+⚙️  Registered 3 activities
+📨 Listening on task queue: rust-queue
+============================================================
+Worker is ready to process tasks...
+```
+
+## Step 2.5: Worker Lifecycle Management (New in v0.7.0)
+
+Workers start on-demand when workflows need them:
+
+```bash
+# Check worker status (should show Exited or not running)
+docker ps -a --filter "name=fuzzforge-worker"
+
+# Run a workflow - worker starts automatically
+ff workflow run ossfuzz_campaign . project_name=zlib
+
+# Worker is now running
+docker ps --filter "name=fuzzforge-worker-ossfuzz"
+```
+
+**Configuration** (`.fuzzforge/config.yaml`):
+```yaml
+workers:
+  auto_start_workers: true    # Default: auto-start
+  auto_stop_workers: false    # Default: keep running
+  worker_startup_timeout: 60  # Startup timeout in seconds
+```
+
+**CLI Control**:
+```bash
+# Disable auto-start
+ff workflow run ossfuzz_campaign . --no-auto-start
+
+# Enable auto-stop after completion
+ff workflow run ossfuzz_campaign . --wait --auto-stop
+```
+
+## Step 3: Access Web UIs
+
+### Temporal Web UI
+- URL: http://localhost:8233
+- View workflows, executions, and task queues
+
+### MinIO Console
+- URL: http://localhost:9001
+- Login: `fuzzforge` / `fuzzforge123`
+- View uploaded targets and results
+
+## Step 4: Test Workflow Execution
+
+### Option A: Using Temporal CLI (tctl)
+
+```bash
+# Install tctl (if not already installed)
+brew install temporal  # macOS
+# or download from https://github.com/temporalio/tctl/releases
+
+# Execute test workflow
+tctl workflow run \
+  --address localhost:7233 \
+  --taskqueue rust-queue \
+  --workflow_type RustTestWorkflow \
+  --input '{"target_id": "test-123", "test_message": "Hello Temporal!"}'
+```
+
+### Option B: Using Python Client
+
+Create `test_workflow.py`:
+
+```python
+import asyncio
+from temporalio.client import Client
+
+async def main():
+    # Connect to Temporal
+    client = await Client.connect("localhost:7233")
+
+    # Start workflow
+    result = await client.execute_workflow(
+        "RustTestWorkflow",
+        {"target_id": "test-123", "test_message": "Hello Temporal!"},
+        id="test-workflow-1",
+        task_queue="rust-queue"
+    )
+
+    print("Workflow result:", result)
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+```bash
+python test_workflow.py
+```
+
+### Option C: Upload Target and Run (Full Flow)
+
+```python
+# upload_and_run.py
+import asyncio
+import boto3
+from pathlib import Path
+from temporalio.client import Client
+
+async def main():
+    # 1. Upload target to MinIO
+    s3 = boto3.client(
+        's3',
+        endpoint_url='http://localhost:9000',
+        aws_access_key_id='fuzzforge',
+        aws_secret_access_key='fuzzforge123',
+        region_name='us-east-1'
+    )
+
+    # Create a test file
+    test_file = Path('/tmp/test_target.txt')
+    test_file.write_text('This is a test target file')
+
+    # Upload to MinIO
+    target_id = 'my-test-target-001'
+    s3.upload_file(
+        str(test_file),
+        'targets',
+        f'{target_id}/target'
+    )
+    print(f"✓ Uploaded target: {target_id}")
+
+    # 2. Run workflow
+    client = await Client.connect("localhost:7233")
+
+    result = await client.execute_workflow(
+        "RustTestWorkflow",
+        {"target_id": target_id, "test_message": "Full flow test!"},
+        id=f"workflow-{target_id}",
+        task_queue="rust-queue"
+    )
+
+    print("✓ Workflow completed!")
+    print("Results:", result)
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+```bash
+# Install dependencies
+pip install temporalio boto3
+
+# Run test
+python upload_and_run.py
+```
+
+## Step 5: Monitor Execution
+
+### View in Temporal UI
+
+1. Open http://localhost:8233
+2. Click on "Workflows"
+3. Find your workflow by ID
+4. Click to see:
+   - Execution history
+   - Activity results
+   - Error stack traces (if any)
+
+### View Logs
+
+```bash
+# Worker logs (shows activity execution)
+docker logs -f fuzzforge-worker-rust
+
+# Temporal server logs
+docker logs -f fuzzforge-temporal
+```
+
+### Check MinIO Storage
+
+1. Open http://localhost:9001
+2. Login: `fuzzforge` / `fuzzforge123`
+3. Browse buckets:
+   - `targets/` - Uploaded target files
+   - `results/` - Workflow results (if uploaded)
+   - `cache/` - Worker cache (temporary)
+
+## Troubleshooting
+
+### Services Not Starting
+
+```bash
+# Check logs for all services
+docker-compose -f docker-compose.temporal.yaml logs
+
+# Check specific service
+docker-compose -f docker-compose.temporal.yaml logs temporal
+docker-compose -f docker-compose.temporal.yaml logs minio
+docker-compose -f docker-compose.temporal.yaml logs worker-rust
+```
+
+### Worker Not Discovering Workflows
+
+**Issue**: Worker logs show "No workflows found for vertical: rust"
+
+**Solution**:
+1. Check toolbox mount: `docker exec fuzzforge-worker-rust ls /app/toolbox/workflows`
+2. Verify metadata.yaml exists and has `vertical: rust`
+3. Check workflow.py has `@workflow.defn` decorator
+
+### Cannot Connect to Temporal
+
+**Issue**: `Failed to connect to Temporal`
+
+**Solution**:
+```bash
+# Wait for Temporal to be healthy
+docker-compose -f docker-compose.temporal.yaml ps
+
+# Check Temporal health manually
+curl http://localhost:8233
+
+# Restart Temporal if needed
+docker-compose -f docker-compose.temporal.yaml restart temporal
+```
+
+### MinIO Connection Failed
+
+**Issue**: `Failed to download target`
+
+**Solution**:
+```bash
+# Check MinIO is running
+docker ps | grep minio
+
+# Check buckets exist
+docker exec fuzzforge-minio mc ls fuzzforge/
+
+# Verify target was uploaded
+docker exec fuzzforge-minio mc ls fuzzforge/targets/
+```
+
+### Workflow Hangs
+
+**Issue**: Workflow starts but never completes
+
+**Check**:
+1. Worker logs for errors: `docker logs fuzzforge-worker-rust`
+2. Activity timeouts in workflow code
+3. Target file actually exists in MinIO
+
+## Scaling
+
+### Add More Workers
+
+```bash
+# Scale rust workers horizontally
+docker-compose -f docker-compose.temporal.yaml up -d --scale worker-rust=3
+
+# Verify all workers are running
+docker ps | grep worker-rust
+```
+
+### Increase Concurrent Activities
+
+Edit `docker-compose.temporal.yaml`:
+
+```yaml
+worker-rust:
+  environment:
+    MAX_CONCURRENT_ACTIVITIES: 10  # Increase from 5
+```
+
+```bash
+# Apply changes
+docker-compose -f docker-compose.temporal.yaml up -d worker-rust
+```
+
+## Cleanup
+
+```bash
+# Stop all services
+docker-compose -f docker-compose.temporal.yaml down
+
+# Remove volumes (WARNING: deletes all data)
+docker-compose -f docker-compose.temporal.yaml down -v
+
+# Remove everything including images
+docker-compose -f docker-compose.temporal.yaml down -v --rmi all
+```
+
+## Next Steps
+
+1. **Add More Workflows**: Create workflows in `backend/toolbox/workflows/`
+2. **Add More Verticals**: Create new worker types (android, web, etc.) - see `workers/README.md`
+3. **Integrate with Backend**: Update FastAPI backend to use Temporal client
+4. **Update CLI**: Modify `ff` CLI to work with Temporal workflows
+
+## Useful Commands
+
+```bash
+# View all logs
+docker-compose -f docker-compose.temporal.yaml logs -f
+
+# View specific service logs
+docker-compose -f docker-compose.temporal.yaml logs -f worker-rust
+
+# Restart a service
+docker-compose -f docker-compose.temporal.yaml restart worker-rust
+
+# Check service status
+docker-compose -f docker-compose.temporal.yaml ps
+
+# Execute command in worker
+docker exec -it fuzzforge-worker-rust bash
+
+# View worker Python environment
+docker exec fuzzforge-worker-rust pip list
+
+# Check workflow discovery manually
+docker exec fuzzforge-worker-rust python -c "
+from pathlib import Path
+import yaml
+for w in Path('/app/toolbox/workflows').iterdir():
+    if w.is_dir():
+        meta = w / 'metadata.yaml'
+        if meta.exists():
+            print(f'{w.name}: {yaml.safe_load(meta.read_text()).get(\"vertical\")}')"
+```
+
+## Architecture Overview
+
+```
+┌─────────────┐     ┌──────────────┐     ┌──────────────┐
+│   Temporal  │────▶│  Task Queue  │────▶│ Worker-Rust  │
+│   Server    │     │  rust-queue  │     │  (Long-lived)│
+└─────────────┘     └──────────────┘     └──────┬───────┘
+       │                                         │
+       │                                         │
+       ▼                                         ▼
+┌─────────────┐                          ┌──────────────┐
+│  Postgres   │                          │    MinIO     │
+│  (State)    │                          │  (Storage)   │
+└─────────────┘                          └──────────────┘
+                                                │
+                                         ┌──────┴──────┐
+                                         │             │
+                                    ┌────▼────┐  ┌─────▼────┐
+                                    │ Targets │  │ Results  │
+                                    └─────────┘  └──────────┘
+```
+
+## Support
+
+- **Documentation**: See `ARCHITECTURE.md` for detailed design
+- **Worker Guide**: See `workers/README.md` for adding verticals
+- **Issues**: Open GitHub issue with logs and steps to reproduce
--- a/README.md
+++ b/README.md
@@ -1,266 +1,231 @@
-<h1 align="center"> FuzzForge AI</h1>
-<h3 align="center">AI-Powered Security Research Orchestration via MCP</h3>
-
 <p align="center">
-  <a href="https://discord.gg/8XEX33UUwZ"><img src="https://img.shields.io/discord/1420767905255133267?logo=discord&label=Discord" alt="Discord"></a>
-  <a href="LICENSE"><img src="https://img.shields.io/badge/license-BSL%201.1-blue" alt="License: BSL 1.1"></a>
-  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.12%2B-blue" alt="Python 3.12+"/></a>
-  <a href="https://modelcontextprotocol.io"><img src="https://img.shields.io/badge/MCP-compatible-green" alt="MCP Compatible"/></a>
-  <a href="https://fuzzforge.ai"><img src="https://img.shields.io/badge/Website-fuzzforge.ai-purple" alt="Website"/></a>
+  <img src="docs/static/img/fuzzforge_banner_github.png" alt="FuzzForge Banner" width="100%">
 </p>
+<h1 align="center">🚧 FuzzForge is under active development</h1>
+
+<p align="center"><strong>AI-powered workflow automation and AI Agents for AppSec, Fuzzing & Offensive Security</strong></p>

 <p align="center">
-  <strong>Let AI agents orchestrate your security research workflows locally</strong>
+  <a href="https://discord.gg/8XEX33UUwZ/"><img src="https://img.shields.io/discord/1420767905255133267?logo=discord&label=Discord" alt="Discord"></a>
+  <a href="LICENSE"><img src="https://img.shields.io/badge/license-BSL%20%2B%20Apache-orange" alt="License: BSL + Apache"></a>
+  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/badge/python-3.11%2B-blue" alt="Python 3.11+"/></a>
+  <a href="https://fuzzforge.ai"><img src="https://img.shields.io/badge/Website-fuzzforge.ai-blue" alt="Website"/></a>
+  <img src="https://img.shields.io/badge/version-0.7.0-green" alt="Version">
+  <a href="https://github.com/FuzzingLabs/fuzzforge_ai/stargazers"><img src="https://img.shields.io/github/stars/FuzzingLabs/fuzzforge_ai?style=social" alt="GitHub Stars"></a>
+  
 </p>

 <p align="center">
  <sub>
-    <a href="#-overview"><b>Overview</b></a> •
-    <a href="#-features"><b>Features</b></a> •
-    <a href="#-mcp-security-hub"><b>Security Hub</b></a> •
-    <a href="#-installation"><b>Installation</b></a> •
-    <a href="USAGE.md"><b>Usage Guide</b></a> •
-    <a href="#-contributing"><b>Contributing</b></a>
+    <a href="#-overview"><b>Overview</b></a>
+    • <a href="#-key-features"><b>Features</b></a>
+    • <a href="#-installation"><b>Installation</b></a>
+    • <a href="#-quickstart"><b>Quickstart</b></a>
+    • <a href="#ai-powered-workflow-execution"><b>AI Demo</b></a>
+    • <a href="#-contributing"><b>Contributing</b></a>
+    • <a href="#%EF%B8%8F-roadmap"><b>Roadmap</b></a>
  </sub>
 </p>

 ---

-> 🚧 **FuzzForge AI is under active development.** Expect breaking changes and new features!
+## 🚀 Overview
+
+**FuzzForge** helps security researchers and engineers automate **application security** and **offensive security** workflows with the power of AI and fuzzing frameworks.
+
+- Orchestrate static & dynamic analysis
+- Automate vulnerability research
+- Scale AppSec testing with AI agents
+- Build, share & reuse workflows across teams
+
+FuzzForge is **open source**, built to empower security teams, researchers, and the community.
+
+> 🚧 FuzzForge is under active development. Expect breaking changes.
+>
+> **Note:** Fuzzing workflows (`atheris_fuzzing`, `cargo_fuzzing`, `ossfuzz_campaign`) are in early development. OSS-Fuzz integration is under heavy active development. For stable workflows, use: `security_assessment`, `gitleaks_detection`, `trufflehog_detection`, or `llm_secret_detection`.

 ---

-## 🚀 Overview
+## Demo - Manual Workflow Setup

-**FuzzForge AI** is an open-source MCP server that enables AI agents (GitHub Copilot, Claude, etc.) to orchestrate security research workflows through the **Model Context Protocol (MCP)**.
+![Manual Workflow Demo](docs/static/videos/manual_workflow.gif)

-FuzzForge connects your AI assistant to **MCP tool hubs** — collections of containerized security tools that the agent can discover, chain, and execute autonomously. Instead of manually running security tools, describe what you want and let your AI assistant handle it.
+_Setting up and running security workflows through the interface_

-### The Core: Hub Architecture
+👉 More installation options in the [Documentation](https://docs.fuzzforge.ai).

-FuzzForge acts as a **meta-MCP server** — a single MCP endpoint that gives your AI agent access to tools from multiple MCP hub servers. Each hub server is a containerized security tool (Binwalk, YARA, Radare2, Nmap, etc.) that the agent can discover at runtime.
+---

- **🔍 Discovery**: The agent lists available hub servers and discovers their tools
- **🤖 AI-Native**: Hub tools provide agent context — usage tips, workflow guidance, and domain knowledge
- **🔗 Composable**: Chain tools from different hubs into automated pipelines
- **📦 Extensible**: Add your own MCP servers to the hub registry
+## ✨ Key Features

-### 🎬 Use Case: Firmware Vulnerability Research
-
-> **Scenario**: Analyze a firmware image to find security vulnerabilities — fully automated by an AI agent.
-
-```
-User: "Search for vulnerabilities in firmware.bin"
-
-Agent → Binwalk:  Extract filesystem from firmware image
-Agent → YARA:     Scan extracted files for vulnerability patterns
-Agent → Radare2:  Trace dangerous function calls in prioritized binaries
-Agent → Report:   8 vulnerabilities found (2 critical, 4 high, 2 medium)
-```
-
-### 🎬 Use Case: Rust Fuzzing Pipeline
-
-> **Scenario**: Fuzz a Rust crate to discover vulnerabilities using AI-assisted harness generation and parallel fuzzing.
-
-```
-User: "Fuzz the blurhash crate for vulnerabilities"
-
-Agent → Rust Analyzer:  Identify fuzzable functions and attack surface
-Agent → Harness Gen:    Generate and validate fuzzing harnesses
-Agent → Cargo Fuzzer:   Run parallel coverage-guided fuzzing sessions
-Agent → Crash Analysis:  Deduplicate and triage discovered crashes
-```
+- 🤖 **AI Agents for Security** – Specialized agents for AppSec, reversing, and fuzzing
+- 🛠 **Workflow Automation** – Define & execute AppSec workflows as code
+- 📈 **Vulnerability Research at Scale** – Rediscover 1-days & find 0-days with automation
+- 🔗 **Fuzzer Integration** – Atheris (Python), cargo-fuzz (Rust), OSS-Fuzz campaigns
+- 🌐 **Community Marketplace** – Share workflows, corpora, PoCs, and modules
+- 🔒 **Enterprise Ready** – Team/Corp cloud tiers for scaling offensive security

 ---

 ## ⭐ Support the Project

-If you find FuzzForge useful, please **star the repo** to support development! 🚀
-
 <a href="https://github.com/FuzzingLabs/fuzzforge_ai/stargazers">
  <img src="https://img.shields.io/github/stars/FuzzingLabs/fuzzforge_ai?style=social" alt="GitHub Stars">
 </a>

---
-
-## ✨ Features
-
-| Feature | Description |
-|---------|-------------|
-| 🤖 **AI-Native** | Built for MCP — works with GitHub Copilot, Claude, and any MCP-compatible agent |
-| 🔌 **Hub System** | Connect to MCP tool hubs — each hub brings dozens of containerized security tools |
-| 🔍 **Tool Discovery** | Agents discover available tools at runtime with built-in usage guidance |
-| 🔗 **Pipelines** | Chain tools from different hubs into automated multi-step workflows |
-| 🔄 **Persistent Sessions** | Long-running tools (Radare2, fuzzers) with stateful container sessions |
-| 🏠 **Local First** | All execution happens on your machine — no cloud required |
-| 🔒 **Sandboxed** | Every tool runs in an isolated container via Docker or Podman |
+If you find FuzzForge useful, please star the repo to support development 🚀

 ---

-## 🏗️ Architecture
+## 🔍 Secret Detection Benchmarks

-```
-┌─────────────────────────────────────────────────────────────────┐
-│                     AI Agent (Copilot/Claude)                   │
-└───────────────────────────┬─────────────────────────────────────┘
-                            │ MCP Protocol (stdio)
-                            ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                     FuzzForge MCP Server                        │
-│                                                                 │
-│  Projects          Hub Discovery         Hub Execution          │
-│  ┌──────────────┐  ┌──────────────────┐  ┌───────────────────┐  │
-│  │init_project  │  │list_hub_servers  │  │execute_hub_tool   │  │
-│  │set_assets    │  │discover_hub_tools│  │start_hub_server   │  │
-│  │list_results  │  │get_tool_schema   │  │stop_hub_server    │  │
-│  └──────────────┘  └──────────────────┘  └───────────────────┘  │
-└───────────────────────────┬─────────────────────────────────────┘
-                            │ Docker/Podman
-                            ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                     MCP Hub Servers                             │
-│                                                                 │
-│  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌───────────┐   │
-│  │ Binwalk   │  │   YARA    │  │ Radare2   │  │   Nmap    │   │
-│  │  6 tools  │  │  5 tools  │  │ 32 tools  │  │  8 tools  │   │
-│  └───────────┘  └───────────┘  └───────────┘  └───────────┘   │
-│  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌───────────┐   │
-│  │ Nuclei    │  │  SQLMap   │  │  Trivy    │  │   ...     │   │
-│  │  7 tools  │  │  8 tools  │  │  7 tools  │  │  36 hubs  │   │
-│  └───────────┘  └───────────┘  └───────────┘  └───────────┘   │
-└─────────────────────────────────────────────────────────────────┘
-```
+FuzzForge includes three secret detection workflows benchmarked on a controlled dataset of **32 documented secrets** (12 Easy, 10 Medium, 10 Hard):

---
+| Tool | Recall | Secrets Found | Speed |
+|------|--------|---------------|-------|
+| **LLM (gpt-5-mini)** | **84.4%** | 41 | 618s |
+| **LLM (gpt-4o-mini)** | 56.2% | 30 | 297s |
+| **Gitleaks** | 37.5% | 12 | 5s |
+| **TruffleHog** | 0.0% | 1 | 5s |

-## 🔧 MCP Security Hub
+📊 [Full benchmark results and analysis](backend/benchmarks/by_category/secret_detection/results/comparison_report.md)

-FuzzForge ships with built-in support for the **[MCP Security Hub](https://github.com/FuzzingLabs/mcp-security-hub)** — a collection of 36 production-ready, Dockerized MCP servers covering offensive security:
-
-| Category | Servers | Examples |
-|----------|---------|----------|
-| 🔍 **Reconnaissance** | 8 | Nmap, Masscan, Shodan, WhatWeb |
-| 🌐 **Web Security** | 6 | Nuclei, SQLMap, ffuf, Nikto |
-| 🔬 **Binary Analysis** | 6 | Radare2, Binwalk, YARA, Capa, Ghidra |
-| ⛓️ **Blockchain** | 3 | Medusa, Solazy, DAML Viewer |
-| ☁️ **Cloud Security** | 3 | Trivy, Prowler, RoadRecon |
-| 💻 **Code Security** | 1 | Semgrep |
-| 🔑 **Secrets Detection** | 1 | Gitleaks |
-| 💥 **Exploitation** | 1 | SearchSploit |
-| 🎯 **Fuzzing** | 2 | Boofuzz, Dharma |
-| 🕵️ **OSINT** | 2 | Maigret, DNSTwist |
-| 🛡️ **Threat Intel** | 2 | VirusTotal, AlienVault OTX |
-| 🏰 **Active Directory** | 1 | BloodHound |
-
-> 185+ individual tools accessible through a single MCP connection.
-
-The hub is open source and can be extended with your own MCP servers. See the [mcp-security-hub repository](https://github.com/FuzzingLabs/mcp-security-hub) for details.
+The LLM-based detector excels at finding obfuscated and hidden secrets through semantic analysis, while pattern-based tools (Gitleaks) offer speed for standard secret formats.

 ---

 ## 📦 Installation

-### Prerequisites
+### Requirements

- **Python 3.12+**
- **[uv](https://docs.astral.sh/uv/)** package manager
- **Docker** ([Install Docker](https://docs.docker.com/get-docker/)) or Podman
+**Python 3.11+**
+Python 3.11 or higher is required.

-### Quick Install
+**uv Package Manager**
+
+```bash
+curl -LsSf https://astral.sh/uv/install.sh | sh
+```
+
+**Docker**
+For containerized workflows, see the [Docker Installation Guide](https://docs.docker.com/get-docker/).
+
+#### Configure AI Agent API Keys (Optional)
+
+For AI-powered workflows, configure your LLM API keys:
+
+```bash
+cp volumes/env/.env.example volumes/env/.env
+# Edit volumes/env/.env and add your API keys (OpenAI, Anthropic, Google, etc.)
+```
+
+This is required for:
+- `llm_secret_detection` workflow
+- AI agent features (`ff ai agent`)
+
+Basic security workflows (gitleaks, trufflehog, security_assessment) work without this configuration.
+
+### CLI Installation
+
+After installing the requirements, install the FuzzForge CLI:

 ```bash
 # Clone the repository
-git clone https://github.com/FuzzingLabs/fuzzforge_ai.git
+git clone https://github.com/fuzzinglabs/fuzzforge_ai.git
 cd fuzzforge_ai

-# Install dependencies
-uv sync
+# Install CLI with uv (from the root directory)
+uv tool install --python python3.12 .
 ```

-### Link the Security Hub
-
-```bash
-# Clone the MCP Security Hub
-git clone https://github.com/FuzzingLabs/mcp-security-hub.git ~/.fuzzforge/hubs/mcp-security-hub
-
-# Build the Docker images for the hub tools
-./scripts/build-hub-images.sh
-```
-
-Or use the terminal UI (`uv run fuzzforge ui`) to link hubs interactively.
-
-### Configure MCP for Your AI Agent
-
-```bash
-# For GitHub Copilot
-uv run fuzzforge mcp install copilot
-
-# For Claude Code (CLI)
-uv run fuzzforge mcp install claude-code
-
-# For Claude Desktop (standalone app)
-uv run fuzzforge mcp install claude-desktop
-
-# Verify installation
-uv run fuzzforge mcp status
-```
-
-**Restart your editor** and your AI agent will have access to FuzzForge tools!
-
 ---

-## 🧑‍💻 Usage
+## ⚡ Quickstart

-Once installed, just talk to your AI agent:
+Run your first workflow with **Temporal orchestration** and **automatic file upload**:

-```
-"What security tools are available?"
-"Scan this firmware image for vulnerabilities"
-"Analyze this binary with radare2"
-"Run nuclei against https://example.com"
+```bash
+# 1. Clone the repo
+git clone https://github.com/fuzzinglabs/fuzzforge_ai.git
+cd fuzzforge_ai
+
+# 2. Copy the default LLM env config
+cp volumes/env/.env.example volumes/env/.env
+
+# 3. Start FuzzForge with Temporal
+docker compose up -d
+
+# 4. Start the Python worker (needed for security_assessment workflow)
+docker compose up -d worker-python
 ```

-The agent will use FuzzForge to discover the right hub tools, chain them into a pipeline, and return results — all without you touching a terminal.
+> The first launch can take 2-3 minutes for services to initialize ☕
+>
+> Workers don't auto-start by default (saves RAM). Start the worker you need before running workflows.

-See the [Usage Guide](USAGE.md) for detailed setup and advanced workflows.
-
---
-
-## 📁 Project Structure
+```bash
+# 5. Run your first workflow (files are automatically uploaded)
+cd test_projects/vulnerable_app/
+fuzzforge init                           # Initialize FuzzForge project
+ff workflow run security_assessment .    # Start workflow - CLI uploads files automatically!

+# The CLI will:
+# - Detect the local directory
+# - Create a compressed tarball
+# - Upload to backend (via MinIO)
+# - Start the workflow on vertical worker
 ```
-fuzzforge_ai/
-├── fuzzforge-mcp/           # MCP server — the core of FuzzForge
-├── fuzzforge-cli/           # Command-line interface & terminal UI
-├── fuzzforge-common/        # Shared abstractions (containers, storage)
-├── fuzzforge-runner/        # Container execution engine (Docker/Podman)
-├── fuzzforge-tests/         # Integration tests
-├── mcp-security-hub/        # Default hub: 36 offensive security MCP servers
-└── scripts/                 # Hub image build scripts
-```
+
+**What's running:**
+- **Temporal**: Workflow orchestration (UI at http://localhost:8080)
+- **MinIO**: File storage for targets (Console at http://localhost:9001)
+- **Vertical Workers**: Pre-built workers with security toolchains
+- **Backend API**: FuzzForge REST API (http://localhost:8000)
+
+## AI-Powered Workflow Execution
+
+![LLM Workflow Demo](docs/static/videos/llm_workflow.gif)
+
+_AI agents automatically analyzing code and providing security insights_
+
+## 📚 Resources
+
+- 🌐 [Website](https://fuzzforge.ai)
+- 📖 [Documentation](https://docs.fuzzforge.ai)
+- 💬 [Community Discord](https://discord.gg/8XEX33UUwZ)
+- 🎓 [FuzzingLabs Academy](https://academy.fuzzinglabs.com/?coupon=GITHUB_FUZZFORGE)

 ---

 ## 🤝 Contributing

-We welcome contributions from the community!
+We welcome contributions from the community!  
+There are many ways to help:

- 🐛 Report bugs via [GitHub Issues](../../issues)
- 💡 Suggest features or improvements
- 🔧 Submit pull requests
- 🔌 Add new MCP servers to the [Security Hub](https://github.com/FuzzingLabs/mcp-security-hub)
+- Report bugs by opening an [issue](../../issues)
+- Suggest new features or improvements
+- Submit pull requests with fixes or enhancements
+- Share workflows, corpora, or modules with the community

-See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
+See our [Contributing Guide](CONTRIBUTING.md) for details.

 ---

-## 📄 License
+## 🗺️ Roadmap

-BSL 1.1 - See [LICENSE](LICENSE) for details.
+Planned features and improvements:
+
+- 📦 Public workflow & module marketplace
+- 🤖 New specialized AI agents (Rust, Go, Android, Automotive)
+- 🔗 Expanded fuzzer integrations (LibFuzzer, Jazzer, more network fuzzers)
+- ☁️ Multi-tenant SaaS platform with team collaboration
+- 📊 Advanced reporting & analytics
+
+👉 Follow updates in the [GitHub issues](../../issues) and [Discord](https://discord.gg/8XEX33UUwZ)

 ---

-<p align="center">
-  <strong>Maintained by <a href="https://fuzzinglabs.com">FuzzingLabs</a></strong>
-  <br>
-</p>
+## 📜 License
+
+FuzzForge is released under the **Business Source License (BSL) 1.1**, with an automatic fallback to **Apache 2.0** after 4 years.  
+See [LICENSE](LICENSE) and [LICENSE-APACHE](LICENSE-APACHE) for details.
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -1,125 +0,0 @@
-# FuzzForge AI Roadmap
-
-This document outlines the planned features and development direction for FuzzForge AI.
-
---
-
-## 🎯 Upcoming Features
-
-### 1. MCP Security Hub Integration
-
-**Status:** 🔄 Planned
-
-Integrate [mcp-security-hub](https://github.com/FuzzingLabs/mcp-security-hub) tools into FuzzForge, giving AI agents access to 28 MCP servers and 163+ security tools through a unified interface.
-
-#### How It Works
-
-Unlike native FuzzForge modules (built with the SDK), mcp-security-hub tools are **standalone MCP servers**. The integration will bridge these tools so they can be:
-
- Discovered via `list_modules` alongside native modules
- Executed through FuzzForge's orchestration layer
- Chained with native modules in workflows
-
-| Aspect | Native Modules | MCP Hub Tools |
-|--------|----------------|---------------|
-| **Runtime** | FuzzForge SDK container | Standalone MCP server container |
-| **Protocol** | Direct execution | MCP-to-MCP bridge |
-| **Configuration** | Module config | Tool-specific args |
-| **Output** | FuzzForge results format | Tool-native format (normalized) |
-
-#### Goals
-
- Unified discovery of all available tools (native + hub)
- Orchestrate hub tools through FuzzForge's workflow engine
- Normalize outputs for consistent result handling
- No modification required to mcp-security-hub tools
-
-#### Planned Tool Categories
-
-| Category | Tools | Example Use Cases |
-|----------|-------|-------------------|
-| **Reconnaissance** | nmap, masscan, whatweb, shodan | Network scanning, service discovery |
-| **Web Security** | nuclei, sqlmap, ffuf, nikto | Vulnerability scanning, fuzzing |
-| **Binary Analysis** | radare2, binwalk, yara, capa, ghidra | Reverse engineering, malware analysis |
-| **Cloud Security** | trivy, prowler | Container scanning, cloud auditing |
-| **Secrets Detection** | gitleaks | Credential scanning |
-| **OSINT** | maigret, dnstwist | Username tracking, typosquatting |
-| **Threat Intel** | virustotal, otx | Malware analysis, IOC lookup |
-
-#### Example Workflow
-
-```
-You: "Scan example.com for vulnerabilities and analyze any suspicious binaries"
-
-AI Agent:
-1. Uses nmap module for port discovery
-2. Uses nuclei module for vulnerability scanning
-3. Uses binwalk module to extract firmware
-4. Uses yara module for malware detection
-5. Generates consolidated report
-```
-
---
-
-### 2. User Interface
-
-**Status:** 🔄 Planned
-
-A graphical interface to manage FuzzForge without the command line.
-
-#### Goals
-
- Provide an alternative to CLI for users who prefer visual tools
- Make configuration and monitoring more accessible
- Complement (not replace) the CLI experience
-
-#### Planned Capabilities
-
-| Capability | Description |
-|------------|-------------|
-| **Configuration** | Change MCP server settings, engine options, paths |
-| **Module Management** | Browse, configure, and launch modules |
-| **Execution Monitoring** | View running tasks, logs, progress, metrics |
-| **Project Overview** | Manage projects and browse execution results |
-| **Workflow Management** | Create and run multi-module workflows |
-
---
-
-## 📋 Backlog
-
-Features under consideration for future releases:
-
-| Feature | Description |
-|---------|-------------|
-| **Module Marketplace** | Browse and install community modules |
-| **Scheduled Executions** | Run modules on a schedule (cron-style) |
-| **Team Collaboration** | Share projects, results, and workflows |
-| **Reporting Engine** | Generate PDF/HTML security reports |
-| **Notifications** | Slack, Discord, email alerts for findings |
-
---
-
-## ✅ Completed
-
-| Feature | Version | Date |
-|---------|---------|------|
-| Docker as default engine | 0.1.0 | Jan 2026 |
-| MCP server for AI agents | 0.1.0 | Jan 2026 |
-| CLI for project management | 0.1.0 | Jan 2026 |
-| Continuous execution mode | 0.1.0 | Jan 2026 |
-| Workflow orchestration | 0.1.0 | Jan 2026 |
-
---
-
-## 💬 Feedback
-
-Have suggestions for the roadmap? 
-
- Open an issue on [GitHub](https://github.com/FuzzingLabs/fuzzforge_ai/issues)
- Join our [Discord](https://discord.gg/8XEX33UUwZ)
-
---
-
-<p align="center">
-  <strong>Built with ❤️ by <a href="https://fuzzinglabs.com">FuzzingLabs</a></strong>
-</p>
--- a/USAGE.md
+++ b/USAGE.md
@@ -1,517 +0,0 @@
-# FuzzForge AI Usage Guide
-
-This guide covers everything you need to know to get started with FuzzForge AI — from installation to linking your first MCP hub and running security research workflows with AI.
-
-> **FuzzForge is designed to be used with AI agents** (GitHub Copilot, Claude, etc.) via MCP.
-> A terminal UI (`fuzzforge ui`) is provided for managing agents and hubs.
-> The CLI is available for advanced users but the primary experience is through natural language interaction with your AI assistant.
-
---
-
-## Table of Contents
-
- [Quick Start](#quick-start)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Terminal UI](#terminal-ui)
-  - [Launching the UI](#launching-the-ui)
-  - [Dashboard](#dashboard)
-  - [Agent Setup](#agent-setup)
-  - [Hub Manager](#hub-manager)
- [MCP Hub System](#mcp-hub-system)
-  - [What is an MCP Hub?](#what-is-an-mcp-hub)
-  - [FuzzingLabs Security Hub](#fuzzinglabs-security-hub)
-  - [Linking a Custom Hub](#linking-a-custom-hub)
-  - [Building Hub Images](#building-hub-images)
- [MCP Server Configuration (CLI)](#mcp-server-configuration-cli)
-  - [GitHub Copilot](#github-copilot)
-  - [Claude Code (CLI)](#claude-code-cli)
-  - [Claude Desktop](#claude-desktop)
- [Using FuzzForge with AI](#using-fuzzforge-with-ai)
- [CLI Reference](#cli-reference)
- [Environment Variables](#environment-variables)
- [Troubleshooting](#troubleshooting)
-
---
-
-## Quick Start
-
-> **Prerequisites:** You need [uv](https://docs.astral.sh/uv/) and [Docker](https://docs.docker.com/get-docker/) installed.
-> See the [Prerequisites](#prerequisites) section for details.
-
-```bash
-# 1. Clone and install
-git clone https://github.com/FuzzingLabs/fuzzforge_ai.git
-cd fuzzforge_ai
-uv sync
-
-# 2. Launch the terminal UI
-uv run fuzzforge ui
-
-# 3. Press 'h' → "FuzzingLabs Hub" to clone & link the default security hub
-# 4. Select an agent row and press Enter to install the MCP server for your agent
-# 5. Build the Docker images for the hub tools (required before tools can run)
-./scripts/build-hub-images.sh
-
-# 6. Restart your AI agent and start talking:
-#    "What security tools are available?"
-#    "Scan this binary with binwalk and yara"
-#    "Analyze this Rust crate for fuzzable functions"
-```
-
-Or do it entirely from the command line:
-
-```bash
-# Install MCP for your AI agent
-uv run fuzzforge mcp install copilot     # For VS Code + GitHub Copilot
-# OR
-uv run fuzzforge mcp install claude-code # For Claude Code CLI
-
-# Clone and link the default security hub
-git clone git@github.com:FuzzingLabs/mcp-security-hub.git ~/.fuzzforge/hubs/mcp-security-hub
-
-# Build hub tool images (required — tools only run once their image is built)
-./scripts/build-hub-images.sh
-
-# Restart your AI agent — done!
-```
-
-> **Note:** FuzzForge uses Docker by default. Podman is also supported via `--engine podman`.
-
---
-
-## Prerequisites
-
-Before installing FuzzForge AI, ensure you have:
-
- **Python 3.12+** — [Download Python](https://www.python.org/downloads/)
- **uv** package manager — [Install uv](https://docs.astral.sh/uv/)
- **Docker** — Container runtime ([Install Docker](https://docs.docker.com/get-docker/))
- **Git** — For cloning hub repositories
-
-### Installing uv
-
-```bash
-# Linux/macOS
-curl -LsSf https://astral.sh/uv/install.sh | sh
-
-# Or with pip
-pip install uv
-```
-
-### Installing Docker
-
-```bash
-# Linux (Ubuntu/Debian)
-curl -fsSL https://get.docker.com | sh
-sudo usermod -aG docker $USER
-# Log out and back in for group changes to take effect
-
-# macOS/Windows
-# Install Docker Desktop from https://docs.docker.com/get-docker/
-```
-
-> **Note:** Podman is also supported. Use `--engine podman` with CLI commands
-> or set `FUZZFORGE_ENGINE=podman` environment variable.
-
---
-
-## Installation
-
-### 1. Clone the Repository
-
-```bash
-git clone https://github.com/FuzzingLabs/fuzzforge_ai.git
-cd fuzzforge_ai
-```
-
-### 2. Install Dependencies
-
-```bash
-uv sync
-```
-
-This installs all FuzzForge components in a virtual environment.
-
-### 3. Verify Installation
-
-```bash
-uv run fuzzforge --help
-```
-
---
-
-## Terminal UI
-
-FuzzForge ships with a terminal user interface (TUI) built on [Textual](https://textual.textualize.io/) for managing AI agents and MCP hub servers from a single dashboard.
-
-### Launching the UI
-
-```bash
-uv run fuzzforge ui
-```
-
-### Dashboard
-
-The main screen is split into two panels:
-
-| Panel | Content |
-|-------|---------|
-| **AI Agents** (left) | Shows GitHub Copilot, Claude Desktop, and Claude Code with live link status and config file path |
-| **Hub Servers** (right) | Shows all configured MCP hub tools with Docker image name, source hub, and build status (✓ Ready / ✗ Not built) |
-
-### Keyboard Shortcuts
-
-| Key | Action |
-|-----|--------|
-| `Enter` | **Select** — Act on the selected row (setup/unlink an agent) |
-| `h` | **Hub Manager** — Open the hub management screen |
-| `r` | **Refresh** — Re-check all agent and hub statuses |
-| `q` | **Quit** |
-
-### Agent Setup
-
-Select an agent row in the AI Agents table and press `Enter`:
-
- **If the agent is not linked** → a setup dialog opens asking for your container engine (Docker or Podman), then installs the FuzzForge MCP configuration
- **If the agent is already linked** → a confirmation dialog offers to unlink it (removes the `fuzzforge` entry without touching other MCP servers)
-
-The setup auto-detects:
- FuzzForge installation root
- Docker/Podman socket path
- Hub configuration from `hub-config.json`
-
-### Hub Manager
-
-Press `h` to open the hub manager. This is where you manage your MCP hub repositories:
-
-| Button | Action |
-|--------|--------|
-| **FuzzingLabs Hub** | One-click clone of the official [mcp-security-hub](https://github.com/FuzzingLabs/mcp-security-hub) repository — clones to `~/.fuzzforge/hubs/mcp-security-hub`, scans for tools, and registers them in `hub-config.json` |
-| **Link Path** | Link any local directory as a hub — enter a name and path, FuzzForge scans it for `category/tool-name/Dockerfile` patterns |
-| **Clone URL** | Clone any git repository and link it as a hub |
-| **Remove** | Unlink the selected hub and remove its servers from the configuration |
-
-The hub table shows:
- **Name** — Hub name (★ prefix for the default hub)
- **Path** — Local directory path
- **Servers** — Number of MCP tools discovered
- **Source** — Git URL or "local"
-
---
-
-## MCP Hub System
-
-### What is an MCP Hub?
-
-An MCP hub is a directory containing one or more containerized MCP tools, organized by category:
-
-```
-my-hub/
-├── category-a/
-│   ├── tool-1/
-│   │   └── Dockerfile
-│   └── tool-2/
-│       └── Dockerfile
-├── category-b/
-│   └── tool-3/
-│       └── Dockerfile
-└── ...
-```
-
-FuzzForge scans for the pattern `category/tool-name/Dockerfile` and auto-generates server configuration entries for each discovered tool.
-
-### FuzzingLabs Security Hub
-
-The default MCP hub is [mcp-security-hub](https://github.com/FuzzingLabs/mcp-security-hub), maintained by FuzzingLabs. It includes **40+ security tools** across categories:
-
-| Category | Tools |
-|----------|-------|
-| **Reconnaissance** | nmap, masscan, shodan, zoomeye, whatweb, pd-tools, externalattacker, networksdb |
-| **Binary Analysis** | binwalk, yara, capa, radare2, ghidra, ida |
-| **Code Security** | semgrep, rust-analyzer, harness-tester, cargo-fuzzer, crash-analyzer |
-| **Web Security** | nuclei, nikto, sqlmap, ffuf, burp, waybackurls |
-| **Fuzzing** | boofuzz, dharma |
-| **Exploitation** | searchsploit |
-| **Secrets** | gitleaks |
-| **Cloud Security** | trivy, prowler, roadrecon |
-| **OSINT** | maigret, dnstwist |
-| **Threat Intel** | virustotal, otx |
-| **Password Cracking** | hashcat |
-| **Blockchain** | medusa, solazy, daml-viewer |
-
-**Clone it via the UI:**
-
-1. `uv run fuzzforge ui`
-2. Press `h` → click **FuzzingLabs Hub**
-3. Wait for the clone to finish — servers are auto-registered
-
-**Or clone manually:**
-
-```bash
-git clone git@github.com:FuzzingLabs/mcp-security-hub.git ~/.fuzzforge/hubs/mcp-security-hub
-```
-
-### Linking a Custom Hub
-
-You can link any directory that follows the `category/tool-name/Dockerfile` layout:
-
-**Via the UI:**
-
-1. Press `h` → **Link Path**
-2. Enter a name and the directory path
-
-**Via the CLI (planned):** Not yet available — use the UI.
-
-### Building Hub Images
-
-After linking a hub, you need to build the Docker images before the tools can be used:
-
-```bash
-# Build all images from the default security hub
-./scripts/build-hub-images.sh
-
-# Or build a single tool image
-docker build -t semgrep-mcp:latest mcp-security-hub/code-security/semgrep-mcp/
-```
-
-The dashboard hub table shows ✓ Ready for built images and ✗ Not built for missing ones.
-
---
-
-## MCP Server Configuration (CLI)
-
-If you prefer the command line over the TUI, you can configure agents directly:
-
-### GitHub Copilot
-
-```bash
-uv run fuzzforge mcp install copilot
-```
-
-The command auto-detects:
- **FuzzForge root** — Where FuzzForge is installed
- **Docker socket** — Auto-detects `/var/run/docker.sock`
-
-**Optional overrides:**
-```bash
-uv run fuzzforge mcp install copilot --engine podman
-```
-
-**After installation:** Restart VS Code. FuzzForge tools appear in GitHub Copilot Chat.
-
-### Claude Code (CLI)
-
-```bash
-uv run fuzzforge mcp install claude-code
-```
-
-Installs to `~/.claude.json`. FuzzForge tools are available from any directory after restarting Claude.
-
-### Claude Desktop
-
-```bash
-uv run fuzzforge mcp install claude-desktop
-```
-
-**After installation:** Restart Claude Desktop.
-
-### Check Status
-
-```bash
-uv run fuzzforge mcp status
-```
-
-### Remove Configuration
-
-```bash
-uv run fuzzforge mcp uninstall copilot
-uv run fuzzforge mcp uninstall claude-code
-uv run fuzzforge mcp uninstall claude-desktop
-```
-
---
-
-## Using FuzzForge with AI
-
-Once MCP is configured and hub images are built, interact with FuzzForge through natural language with your AI assistant.
-
-### Example Conversations
-
-**Discover available tools:**
-```
-You: "What security tools are available in FuzzForge?"
-AI: Queries hub tools → "I found 15 tools across categories: nmap for 
-    port scanning, binwalk for firmware analysis, semgrep for code 
-    scanning, cargo-fuzzer for Rust fuzzing..."
-```
-
-**Analyze a binary:**
-```
-You: "Extract and analyze this firmware image"
-AI: Uses binwalk to extract → yara for pattern matching → capa for 
-    capability detection → "Found 3 embedded filesystems, 2 YARA 
-    matches for known vulnerabilities..."
-```
-
-**Fuzz Rust code:**
-```
-You: "Analyze this Rust crate for functions I should fuzz"
-AI: Uses rust-analyzer → "Found 3 fuzzable entry points..."
-
-You: "Start fuzzing parse_input for 10 minutes"
-AI: Uses cargo-fuzzer → "Fuzzing session started. 2 crashes found..."
-```
-
-**Scan for vulnerabilities:**
-```
-You: "Scan this codebase with semgrep for security issues"
-AI: Uses semgrep-mcp → "Found 5 findings: 2 high severity SQL injection 
-    patterns, 3 medium severity hardcoded secrets..."
-```
-
---
-
-## CLI Reference
-
-### UI Command
-
-```bash
-uv run fuzzforge ui                      # Launch the terminal dashboard
-```
-
-### MCP Commands
-
-```bash
-uv run fuzzforge mcp status              # Check agent configuration status
-uv run fuzzforge mcp install <agent>     # Install MCP config (copilot|claude-code|claude-desktop)
-uv run fuzzforge mcp uninstall <agent>   # Remove MCP config
-uv run fuzzforge mcp generate <agent>    # Preview config without installing
-```
-
-### Project Commands
-
-```bash
-uv run fuzzforge project init             # Initialize a project
-uv run fuzzforge project info             # Show project info
-uv run fuzzforge project executions       # List executions
-uv run fuzzforge project results <id>     # Get execution results
-```
-
---
-
-## Environment Variables
-
-Configure FuzzForge using environment variables:
-
-```bash
-# Override the FuzzForge installation root (auto-detected from cwd by default)
-export FUZZFORGE_ROOT=/path/to/fuzzforge_ai
-
-# Override the user-global data directory (default: ~/.fuzzforge)
-# Useful for isolated testing without touching your real installation
-export FUZZFORGE_USER_DIR=/tmp/my-fuzzforge-test
-
-# Storage path for projects and execution results (default: <workspace>/.fuzzforge/storage)
-export FUZZFORGE_STORAGE__PATH=/path/to/storage
-
-# Container engine (Docker is default)
-export FUZZFORGE_ENGINE__TYPE=docker  # or podman
-
-# Podman-specific container storage paths
-export FUZZFORGE_ENGINE__GRAPHROOT=~/.fuzzforge/containers/storage
-export FUZZFORGE_ENGINE__RUNROOT=~/.fuzzforge/containers/run
-```
-
---
-
-## Troubleshooting
-
-### Docker Not Running
-
-```
-Error: Cannot connect to Docker daemon
-```
-
-**Solution:**
-```bash
-# Linux: Start Docker service
-sudo systemctl start docker
-
-# macOS/Windows: Start Docker Desktop application
-
-# Verify Docker is running
-docker run --rm hello-world
-```
-
-### Permission Denied on Docker Socket
-
-```
-Error: Permission denied connecting to Docker socket
-```
-
-**Solution:**
-```bash
-sudo usermod -aG docker $USER
-# Log out and back in, then verify:
-docker run --rm hello-world
-```
-
-### Hub Images Not Built
-
-The dashboard shows ✗ Not built for tools:
-
-```bash
-# Build all hub images
-./scripts/build-hub-images.sh
-
-# Or build a single tool
-docker build -t <tool-name>:latest mcp-security-hub/<category>/<tool-name>/
-```
-
-### MCP Server Not Starting
-
-```bash
-# Check agent configuration
-uv run fuzzforge mcp status
-
-# Verify the config file path exists and contains valid JSON
-cat ~/.config/Code/User/mcp.json    # Copilot
-cat ~/.claude.json                   # Claude Code
-```
-
-### Using Podman Instead of Docker
-
-```bash
-# Install with Podman engine
-uv run fuzzforge mcp install copilot --engine podman
-
-# Or set environment variable
-export FUZZFORGE_ENGINE=podman
-```
-
-### Hub Registry
-
-FuzzForge stores linked hub information in `~/.fuzzforge/hubs.json`. If something goes wrong:
-
-```bash
-# View registry
-cat ~/.fuzzforge/hubs.json
-
-# Reset registry
-rm ~/.fuzzforge/hubs.json
-```
-
---
-
-## Next Steps
-
- 🖥️ Launch `uv run fuzzforge ui` and explore the dashboard
- 🔒 Clone the [mcp-security-hub](https://github.com/FuzzingLabs/mcp-security-hub) for 40+ security tools
- 💬 Join our [Discord](https://discord.gg/8XEX33UUwZ) for support
-
---
-
-<p align="center">
-  <strong>Built with ❤️ by <a href="https://fuzzinglabs.com">FuzzingLabs</a></strong>
-</p>
--- a/ai/.gitignore
+++ b/ai/.gitignore
@@ -0,0 +1,6 @@
+.env
+__pycache__/
+*.pyc
+fuzzforge_sessions.db
+agentops.log
+*.log
--- a/ai/README.md
+++ b/ai/README.md
@@ -0,0 +1,110 @@
+# FuzzForge AI Module
+
+FuzzForge AI is the multi-agent layer that lets you operate the FuzzForge security platform through natural language. It orchestrates local tooling, registered Agent-to-Agent (A2A) peers, and the Temporal-powered backend while keeping long-running context in memory and project knowledge graphs.
+
+## Quick Start
+
+1. **Initialise a project**
+   ```bash
+   cd /path/to/project
+   fuzzforge init
+   ```
+2. **Review environment settings** – copy `.fuzzforge/.env.template` to `.fuzzforge/.env`, then edit the values to match your provider. The template ships with commented defaults for OpenAI-style usage and placeholders for Cognee keys.
+   ```env
+   LLM_PROVIDER=openai
+   LITELLM_MODEL=gpt-5-mini
+   OPENAI_API_KEY=sk-your-key
+   FUZZFORGE_MCP_URL=http://localhost:8010/mcp
+   SESSION_PERSISTENCE=sqlite
+   ```
+   Optional flags you may want to enable early:
+   ```env
+   MEMORY_SERVICE=inmemory
+   AGENTOPS_API_KEY=sk-your-agentops-key   # Enable hosted tracing
+   LOG_LEVEL=INFO                          # CLI / server log level
+   ```
+3. **Populate the knowledge graph**
+   ```bash
+   fuzzforge ingest --path . --recursive
+   # alias: fuzzforge rag ingest --path . --recursive
+   ```
+4. **Launch the agent shell**
+   ```bash
+   fuzzforge ai agent
+   ```
+   Keep the backend running (Temporal API at `FUZZFORGE_MCP_URL`) so workflow commands succeed.
+
+## Everyday Workflow
+
+- Run `fuzzforge ai agent` and start with `list available fuzzforge workflows` or `/memory status` to confirm everything is wired.
+- Use natural prompts for automation (`run fuzzforge workflow …`, `search project knowledge for …`) and fall back to slash commands for precision (`/recall`, `/sendfile`).
+- Keep `/memory datasets` handy to see which Cognee datasets are available after each ingest.
+- Start the HTTP surface with `python -m fuzzforge_ai` when external agents need access to artifacts or graph queries. The CLI stays usable at the same time.
+- Refresh the knowledge graph regularly: `fuzzforge ingest --path . --recursive --force` keeps responses aligned with recent code changes.
+
+## What the Agent Can Do
+
+- **Route requests** – automatically selects the right local tool or remote agent using the A2A capability registry.
+- **Run security workflows** – list, submit, and monitor FuzzForge workflows via MCP wrappers.
+- **Manage artifacts** – create downloadable files for reports, code edits, and shared attachments.
+- **Maintain context** – stores session history, semantic recall, and Cognee project graphs.
+- **Serve over HTTP** – expose the same agent as an A2A server using `python -m fuzzforge_ai`.
+
+## Essential Commands
+
+Inside `fuzzforge ai agent` you can mix slash commands and free-form prompts:
+
+```text
+/list                     # Show registered A2A agents
+/register http://:10201   # Add a remote agent
+/artifacts                 # List generated files
+/sendfile SecurityAgent src/report.md "Please review"
+You> route_to SecurityAnalyzer: scan ./backend for secrets
+You> run fuzzforge workflow static_analysis_scan on ./test_projects/demo
+You> search project knowledge for "temporal status" using INSIGHTS
+```
+
+Artifacts created during the conversation are served from `.fuzzforge/artifacts/` and exposed through the A2A HTTP API.
+
+## Memory & Knowledge
+
+The module layers three storage systems:
+
+- **Session persistence** (SQLite or in-memory) for chat transcripts.
+- **Semantic recall** via the ADK memory service for fuzzy search.
+- **Cognee graphs** for project-wide knowledge built from ingestion runs.
+
+Re-run ingestion after major code changes to keep graph answers relevant. If Cognee variables are not set, graph-specific tools automatically respond with a polite "not configured" message.
+
+## Sample Prompts
+
+Use these to validate the setup once the agent shell is running:
+
+- `list available fuzzforge workflows`
+- `run fuzzforge workflow static_analysis_scan on ./backend with target_branch=main`
+- `show findings for that run once it finishes`
+- `refresh the project knowledge graph for ./backend`
+- `search project knowledge for "temporal readiness" using INSIGHTS`
+- `/recall terraform secrets`
+- `/memory status`
+- `ROUTE_TO SecurityAnalyzer: audit infrastructure_vulnerable`
+
+## Need More Detail?
+
+Dive into the dedicated guides under `ai/docs/advanced/`:
+
+- [Architecture](https://docs.fuzzforge.ai/docs/ai/intro) – High-level architecture with diagrams and component breakdowns.
+- [Ingestion](https://docs.fuzzforge.ai/docs/ai/ingestion.md) – Command options, Cognee persistence, and prompt examples.
+- [Configuration](https://docs.fuzzforge.ai/docs/ai/configuration.md) – LLM provider matrix, local model setup, and tracing options.
+- [Prompts](https://docs.fuzzforge.ai/docs/ai/prompts.md) – Slash commands, workflow prompts, and routing tips.
+- [A2A Services](https://docs.fuzzforge.ai/docs/ai/a2a-services.md) – HTTP endpoints, agent card, and collaboration flow.
+- [Memory Persistence](https://docs.fuzzforge.ai/docs/ai/architecture.md#memory--persistence) – Deep dive on memory storage, datasets, and how `/memory status` inspects them.
+
+## Development Notes
+
+- Entry point for the CLI: `ai/src/fuzzforge_ai/cli.py`
+- A2A HTTP server: `ai/src/fuzzforge_ai/a2a_server.py`
+- Tool routing & workflow glue: `ai/src/fuzzforge_ai/agent_executor.py`
+- Ingestion helpers: `ai/src/fuzzforge_ai/ingest_utils.py`
+
+Install the module in editable mode (`pip install -e ai`) while iterating so CLI changes are picked up immediately.
--- a/ai/agents/task_agent/.dockerignore
+++ b/ai/agents/task_agent/.dockerignore
@@ -0,0 +1,9 @@
+__pycache__
+*.pyc
+*.pyo
+*.pytest_cache
+*.coverage
+coverage.xml
+build/
+dist/
+.env
--- a/ai/agents/task_agent/.env.example
+++ b/ai/agents/task_agent/.env.example
@@ -0,0 +1,17 @@
+# Default LiteLLM configuration routed through the proxy
+LITELLM_MODEL=openai/gpt-4o-mini
+LITELLM_PROVIDER=openai
+
+# Proxy connection (override when running locally without Docker networking)
+# Use http://localhost:10999 when accessing from the host
+FF_LLM_PROXY_BASE_URL=http://llm-proxy:8080
+
+# Virtual key issued by Bifrost or LiteLLM proxy for the task agent (bootstrap replaces the placeholder)
+OPENAI_API_KEY=sk-proxy-default
+
+# Upstream provider keys live inside the proxy container
+# BIFROST_OPENAI_KEY=
+# BIFROST_ANTHROPIC_KEY=
+# BIFROST_GEMINI_KEY=
+# BIFROST_MISTRAL_KEY=
+# BIFROST_OPENROUTER_KEY=
--- a/ai/agents/task_agent/ARCHITECTURE.md
+++ b/ai/agents/task_agent/ARCHITECTURE.md
@@ -0,0 +1,82 @@
+# Architecture Overview
+
+This package is a minimal ADK agent that keeps runtime behaviour and A2A access in separate layers so it can double as boilerplate.
+
+## Directory Layout
+
+```text
+agent_with_adk_format/
+├── __init__.py               # Exposes root_agent for ADK runners
+├── a2a_hot_swap.py           # JSON-RPC helper for model/prompt swaps
+├── README.md, QUICKSTART.md  # Operational docs
+├── ARCHITECTURE.md           # This document
+├── .env                      # Active environment (gitignored)
+├── .env.example              # Environment template
+└── litellm_agent/
+    ├── agent.py              # Root Agent definition (LiteLLM shell)
+    ├── callbacks.py          # before_agent / before_model hooks
+    ├── config.py             # Defaults, state keys, control prefix
+    ├── control.py            # HOTSWAP command parsing/serialization
+    ├── state.py              # Session state wrapper + LiteLLM factory
+    ├── tools.py              # set_model / set_prompt / get_config
+    ├── prompts.py            # Base instruction text
+    └── agent.json            # A2A agent card (served under /.well-known)
+```
+
+```mermaid
+flowchart TD
+  subgraph ADK Runner
+    A["adk api_server / adk web / adk run"]
+    B["agent_with_adk_format/__init__.py"]
+    C["litellm_agent/agent.py (root_agent)"]
+    D["HotSwapState (state.py)"]
+    E["LiteLlm(model, provider)"]
+  end
+
+  subgraph Session State
+    S1[app:litellm_agent/model]
+    S2[app:litellm_agent/provider]
+    S3[app:litellm_agent/prompt]
+  end
+
+  A --> B --> C
+  C --> D
+  D -->|instantiate| E
+  D --> S1
+  D --> S2
+  D --> S3
+  E --> C
+```
+
+## Runtime Flow (ADK Runners)
+
+1. **Startup**: `adk api_server`/`adk web` imports `agent_with_adk_format`, which exposes `root_agent` from `litellm_agent/agent.py`. `.env` at package root is loaded before the runner constructs the agent.
+2. **Session State**: `callbacks.py` and `tools.py` read/write through `state.py`. We store `model`, `provider`, and `prompt` keys (prefixed `app:litellm_agent/…`) which persist across turns.
+3. **Instruction Generation**: `provide_instruction` composes the base persona from `prompts.py` plus any stored prompt override. The current model/provider is appended for observability.
+4. **Model Hot-Swap**: When a control message is detected (`[HOTSWAP:MODEL:…]`) the callback parses it via `control.py`, updates the session state, and calls `state.apply_state_to_agent` to instantiate a new `LiteLlm(model=…, custom_llm_provider=…)`. ADK runners reuse that instance for subsequent turns.
+5. **Prompt Hot-Swap**: Similar path (`set_prompt` tool/callback) updates state; the dynamic instruction immediately reflects the change.
+6. **Config Reporting**: Both the callback and the tool surface the summary string produced by `HotSwapState.describe()`, ensuring CLI, A2A, and UI all show the same data.
+
+## A2A Integration
+
+- `agent.json` defines the agent card and enables ADK to register `/a2a/litellm_agent` routes when launched with `--a2a`.
+- `a2a_hot_swap.py` uses `a2a.client.A2AClient` to programmatically send control messages and user text via JSON-RPC. It supports streaming when available and falls back to blocking requests otherwise.
+
+```mermaid
+sequenceDiagram
+  participant Client as a2a_hot_swap.py
+  participant Server as ADK API Server
+  participant Agent as root_agent
+
+  Client->>Server: POST /a2a/litellm_agent (message/stream or message/send)
+  Server->>Agent: Invoke callbacks/tools
+  Agent->>Server: Status / artifacts / final message
+  Server->>Client: Streamed Task events
+  Client->>Client: Extract text & print summary
+```
+
+## Extending the Boilerplate
+
+- Add tools under `litellm_agent/tools.py` and register them in `agent.py` to expose new capabilities.
+- Use `state.py` to track additional configuration or session data (store under your own prefix to avoid collisions).
+- When layering business logic, prefer expanding callbacks or adding higher-level agents while leaving the hot-swap mechanism untouched for reuse.
--- a/ai/agents/task_agent/DEPLOY.md
+++ b/ai/agents/task_agent/DEPLOY.md
@@ -0,0 +1,71 @@
+# Docker & Kubernetes Deployment
+
+## Local Docker
+
+Build from the repository root:
+
+```bash
+docker build -t litellm-hot-swap:latest agent_with_adk_format
+```
+
+Run the container (port 8000, inject provider keys via env file or flags):
+
+```bash
+docker run \
+  -p 8000:8000 \
+  --env-file agent_with_adk_format/.env \
+  litellm-hot-swap:latest
+```
+
+The container serves Uvicorn on `http://localhost:8000`. Update `.env` (or pass `-e KEY=value`) before launching if you plan to hot-swap providers.
+
+## Kubernetes (example manifest)
+
+Use the same image, optionally pushed to a registry (`docker tag` + `docker push`). A simple Deployment/Service pair:
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: litellm-hot-swap
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: litellm-hot-swap
+  template:
+    metadata:
+      labels:
+        app: litellm-hot-swap
+    spec:
+      containers:
+      - name: server
+        image: <REGISTRY_URI>/litellm-hot-swap:latest
+        ports:
+        - containerPort: 8000
+        env:
+        - name: PORT
+          value: "8000"
+        - name: LITELLM_MODEL
+          value: gemini/gemini-2.0-flash-001
+        # Add provider keys as needed
+        # - name: OPENAI_API_KEY
+        #   valueFrom:
+        #     secretKeyRef:
+        #       name: litellm-secrets
+        #       key: OPENAI_API_KEY
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: litellm-hot-swap
+spec:
+  type: LoadBalancer
+  selector:
+    app: litellm-hot-swap
+  ports:
+  - port: 80
+    targetPort: 8000
+```
+
+Apply with `kubectl apply -f deployment.yaml`. Provide secrets via `env` or Kubernetes Secrets.
--- a/ai/agents/task_agent/Dockerfile
+++ b/ai/agents/task_agent/Dockerfile
@@ -0,0 +1,19 @@
+# syntax=docker/dockerfile:1
+
+FROM python:3.11-slim AS base
+
+ENV PYTHONUNBUFFERED=1 \
+    PYTHONDONTWRITEBYTECODE=1 \
+    PIP_NO_CACHE_DIR=1 \
+    PORT=8000
+
+WORKDIR /app
+
+COPY requirements.txt ./requirements.txt
+RUN pip install --upgrade pip && pip install -r requirements.txt
+
+COPY . /app/agent_with_adk_format
+WORKDIR /app/agent_with_adk_format
+ENV PYTHONPATH=/app
+
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
--- a/ai/agents/task_agent/QUICKSTART.md
+++ b/ai/agents/task_agent/QUICKSTART.md
@@ -0,0 +1,61 @@
+# Quick Start Guide
+
+## Launch the Agent
+
+From the repository root you can expose the agent through any ADK entry point:
+
+```bash
+# A2A / HTTP server
+adk api_server --a2a --port 8000 agent_with_adk_format
+
+# Browser UI
+adk web agent_with_adk_format
+
+# Interactive terminal
+adk run agent_with_adk_format
+```
+
+The A2A server exposes the JSON-RPC endpoint at `http://localhost:8000/a2a/litellm_agent`.
+
+## Hot-Swap from the Command Line
+
+Use the bundled helper to change model and prompt via A2A without touching the UI:
+
+```bash
+python agent_with_adk_format/a2a_hot_swap.py \
+  --model openai gpt-4o \
+  --prompt "You are concise." \
+  --config \
+  --context demo-session
+```
+
+The script sends the control messages for you and prints the server’s responses. The `--context` flag lets you reuse the same conversation across multiple invocations.
+
+### Follow-up Messages
+
+Once the swaps are applied you can send a user message on the same session:
+
+```bash
+python agent_with_adk_format/a2a_hot_swap.py \
+  --context demo-session \
+  --message "Summarise the current configuration in five words."
+```
+
+### Clearing the Prompt
+
+```bash
+python agent_with_adk_format/a2a_hot_swap.py \
+  --context demo-session \
+  --prompt "" \
+  --config
+```
+
+## Control Messages (for reference)
+
+Behind the scenes the helper sends plain text messages understood by the callbacks:
+
+- `[HOTSWAP:MODEL:provider/model]`
+- `[HOTSWAP:PROMPT:text]`
+- `[HOTSWAP:GET_CONFIG]`
+
+You can craft the same messages from any A2A client if you prefer.
--- a/ai/agents/task_agent/README.md
+++ b/ai/agents/task_agent/README.md
@@ -0,0 +1,361 @@
+# LiteLLM Agent with Hot-Swap Support
+
+A flexible AI agent powered by LiteLLM that supports runtime hot-swapping of models and system prompts. Compatible with ADK and A2A protocols.
+
+## Features
+
+- 🔄 **Hot-Swap Models**: Change LLM models on-the-fly without restarting
+- 📝 **Dynamic Prompts**: Update system prompts during conversation
+- 🌐 **Multi-Provider Support**: Works with OpenAI, Anthropic, Google, OpenRouter, and more
+- 🔌 **A2A Compatible**: Can be served as an A2A agent
+- 🛠️ **ADK Integration**: Run with `adk web`, `adk run`, or `adk api_server`
+
+## Architecture
+
+```
+task_agent/
+├── __init__.py              # Exposes root_agent for ADK
+├── a2a_hot_swap.py          # JSON-RPC helper for hot-swapping
+├── README.md                # This guide
+├── QUICKSTART.md            # Quick-start walkthrough
+├── .env                     # Active environment (gitignored)
+├── .env.example             # Environment template
+└── litellm_agent/
+    ├── __init__.py
+    ├── agent.py             # Main agent implementation
+    ├── agent.json           # A2A agent card
+    ├── callbacks.py         # ADK callbacks
+    ├── config.py            # Defaults and state keys
+    ├── control.py           # HOTSWAP message helpers
+    ├── prompts.py           # Base instruction
+    ├── state.py             # Session state utilities
+    └── tools.py             # set_model / set_prompt / get_config
+```
+
+## Setup
+
+### 1. Environment Configuration
+
+Copying the example file is optional—the repository already ships with a root-level `.env` seeded with defaults. Adjust the values at the package root:
+```bash
+cd task_agent
+# Optionally refresh from the template
+# cp .env.example .env
+```
+
+Edit `.env` (or `.env.example`) and add your proxy + API keys. The agent must be restarted after changes so the values are picked up:
+```bash
+# Route every request through the proxy container (use http://localhost:10999 from the host)
+FF_LLM_PROXY_BASE_URL=http://llm-proxy:8080
+
+# Default model + provider the agent boots with
+LITELLM_MODEL=openai/gpt-4o-mini
+LITELLM_PROVIDER=openai
+
+# Virtual key issued by the proxy to the task agent (bootstrap replaces the placeholder)
+OPENAI_API_KEY=sk-proxy-default
+
+# Upstream keys stay inside the proxy (Bifrost config references env.BIFROST_* names)
+BIFROST_OPENAI_KEY=your_real_openai_api_key
+BIFROST_ANTHROPIC_KEY=your_real_anthropic_key
+BIFROST_GEMINI_KEY=your_real_gemini_key
+BIFROST_MISTRAL_KEY=your_real_mistral_key
+BIFROST_OPENROUTER_KEY=your_real_openrouter_key
+```
+
+> When running the agent outside of Docker, swap `FF_LLM_PROXY_BASE_URL` to the host port (default `http://localhost:10999`).
+
+The compose bootstrap container provisions the Bifrost gateway, creates a virtual key for `fuzzforge-task-agent`, and rewrites `volumes/env/.env`. Fill in the `BIFROST_*` upstream secrets before the first launch so the proxy can reach your providers when the bootstrap script runs.
+
+### 2. Install Dependencies
+
+```bash
+pip install "google-adk" "a2a-sdk[all]" "python-dotenv" "litellm"
+```
+
+### 3. Run in Docker
+
+Build the container (this image can be pushed to any registry or run locally):
+
+```bash
+docker build -t litellm-hot-swap:latest task_agent
+```
+
+Provide environment configuration at runtime (either pass variables individually or mount a file):
+
+```bash
+docker run \
+  -p 8000:8000 \
+  --env-file task_agent/.env \
+  litellm-hot-swap:latest
+```
+
+The container starts Uvicorn with the ADK app (`main.py`) listening on port 8000.
+
+## Running the Agent
+
+### Option 1: ADK Web UI (Recommended for Testing)
+
+Start the web interface:
+```bash
+adk web task_agent
+```
+
+> **Tip:** before launching `adk web`/`adk run`/`adk api_server`, ensure the root-level `.env` contains valid API keys for any provider you plan to hot-swap to (e.g. set `OPENAI_API_KEY` before switching to `openai/gpt-4o`).
+
+Open http://localhost:8000 in your browser and interact with the agent.
+
+### Option 2: ADK Terminal
+
+Run in terminal mode:
+```bash
+adk run task_agent
+```
+
+### Option 3: A2A API Server
+
+Start as an A2A-compatible API server:
+```bash
+adk api_server --a2a --port 8000 task_agent
+```
+
+The agent will be available at: `http://localhost:8000/a2a/litellm_agent`
+
+### Command-line helper
+
+Use the bundled script to drive hot-swaps and user messages over A2A:
+
+```bash
+python task_agent/a2a_hot_swap.py \
+  --url http://127.0.0.1:8000/a2a/litellm_agent \
+  --model openai gpt-4o \
+  --prompt "You are concise." \
+  --config \
+  --context demo-session
+```
+
+To send a follow-up prompt in the same session (with a larger timeout for long answers):
+
+```bash
+python task_agent/a2a_hot_swap.py \
+  --url http://127.0.0.1:8000/a2a/litellm_agent \
+  --model openai gpt-4o \
+  --prompt "You are concise." \
+  --message "Give me a fuzzing harness." \
+  --context demo-session \
+  --timeout 120
+```
+
+> Ensure the corresponding provider keys are present in `.env` (or passed via environment variables) before issuing model swaps.
+
+## Hot-Swap Tools
+
+The agent provides three special tools:
+
+### 1. `set_model` - Change the LLM Model
+
+Change the model during conversation:
+
+```
+User: Use the set_model tool to change to gpt-4o with openai provider
+Agent: ✅ Model configured to: openai/gpt-4o
+       This change is active now!
+```
+
+**Parameters:**
+- `model`: Model name (e.g., "gpt-4o", "claude-3-sonnet-20240229")
+- `custom_llm_provider`: Optional provider prefix (e.g., "openai", "anthropic", "openrouter")
+
+**Examples:**
+- OpenAI: `set_model(model="gpt-4o", custom_llm_provider="openai")`
+- Anthropic: `set_model(model="claude-3-sonnet-20240229", custom_llm_provider="anthropic")`
+- Google: `set_model(model="gemini-2.0-flash-001", custom_llm_provider="gemini")`
+
+### 2. `set_prompt` - Change System Prompt
+
+Update the system instructions:
+
+```
+User: Use set_prompt to change my behavior to "You are a helpful coding assistant"
+Agent: ✅ System prompt updated:
+       You are a helpful coding assistant
+
+       This change is active now!
+```
+
+### 3. `get_config` - View Configuration
+
+Check current model and prompt:
+
+```
+User: Use get_config to show me your configuration
+Agent: 📊 Current Configuration:
+       ━━━━━━━━━━━━━━━━━━━━━━
+       Model: openai/gpt-4o
+       System Prompt: You are a helpful coding assistant
+       ━━━━━━━━━━━━━━━━━━━━━━
+```
+
+## Testing
+
+### Basic A2A Client Test
+
+```bash
+python agent/test_a2a_client.py
+```
+
+### Hot-Swap Functionality Test
+
+```bash
+python agent/test_hotswap.py
+```
+
+This will:
+1. Check initial configuration
+2. Query with default model
+3. Hot-swap to GPT-4o
+4. Verify model changed
+5. Change system prompt
+6. Test new prompt behavior
+7. Hot-swap to Claude
+8. Verify final configuration
+
+### Command-Line Hot-Swap Helper
+
+You can trigger model and prompt changes directly against the A2A endpoint without the interactive CLI:
+
+```bash
+# Start the agent first (in another terminal):
+adk api_server --a2a --port 8000 task_agent
+
+# Apply swaps via pure A2A calls
+python task_agent/a2a_hot_swap.py --model openai gpt-4o --prompt "You are concise." --config
+python task_agent/a2a_hot_swap.py --model anthropic claude-3-sonnet-20240229 --context shared-session --config
+python task_agent/a2a_hot_swap.py --prompt "" --context shared-session --config  # Clear the prompt and show current state
+```
+
+`--model` accepts either `provider/model` or a provider/model pair. Add `--context` if you want to reuse the same conversation across invocations. Use `--config` to dump the agent's configuration after the changes are applied.
+
+## Supported Models
+
+### OpenAI
+- `openai/gpt-4o`
+- `openai/gpt-4-turbo`
+- `openai/gpt-3.5-turbo`
+
+### Anthropic
+- `anthropic/claude-3-opus-20240229`
+- `anthropic/claude-3-sonnet-20240229`
+- `anthropic/claude-3-haiku-20240307`
+
+### Google
+- `gemini/gemini-2.0-flash-001`
+- `gemini/gemini-2.5-pro-exp-03-25`
+- `vertex_ai/gemini-2.0-flash-001`
+
+### OpenRouter
+- `openrouter/anthropic/claude-3-opus`
+- `openrouter/openai/gpt-4`
+- Any model from OpenRouter catalog
+
+## How It Works
+
+### Session State
+- Model and prompt settings are stored in session state
+- Each session maintains its own configuration
+- Settings persist across messages in the same session
+
+### Hot-Swap Mechanism
+1. Tools update session state with new model/prompt
+2. `before_agent_callback` checks for changes
+3. If model changed, directly updates: `agent.model = LiteLlm(model=new_model)`
+4. Dynamic instruction function reads custom prompt from session state
+
+### A2A Compatibility
+- Agent card at `agent.json` defines A2A metadata
+- Served at `/a2a/litellm_agent` endpoint
+- Compatible with A2A client protocol
+
+## Example Usage
+
+### Interactive Session
+
+```python
+from a2a.client import A2AClient
+import asyncio
+
+async def chat():
+    client = A2AClient("http://localhost:8000/a2a/litellm_agent")
+    context_id = "my-session-123"
+
+    # Start with default model
+    async for msg in client.send_message("Hello!", context_id=context_id):
+        print(msg)
+
+    # Switch to GPT-4
+    async for msg in client.send_message(
+        "Use set_model with model gpt-4o and provider openai",
+        context_id=context_id
+    ):
+        print(msg)
+
+    # Continue with new model
+    async for msg in client.send_message(
+        "Help me write a function",
+        context_id=context_id
+    ):
+        print(msg)
+
+asyncio.run(chat())
+```
+
+## Troubleshooting
+
+### Model Not Found
+- Ensure API key for the provider is set in `.env`
+- Check model name is correct for the provider
+- Verify LiteLLM supports the model (https://docs.litellm.ai/docs/providers)
+
+### Connection Refused
+- Ensure the agent is running (`adk api_server --a2a task_agent`)
+- Check the port matches (default: 8000)
+- Verify no firewall blocking localhost
+
+### Hot-Swap Not Working
+- Check that you're using the same `context_id` across messages
+- Ensure the tool is being called (not just asked to switch)
+- Look for `🔄 Hot-swapped model to:` in server logs
+
+## Development
+
+### Adding New Tools
+
+```python
+async def my_tool(tool_ctx: ToolContext, param: str) -> str:
+    """Your tool description."""
+    # Access session state
+    tool_ctx.state["my_key"] = "my_value"
+    return "Tool result"
+
+# Add to agent
+root_agent = LlmAgent(
+    # ...
+    tools=[set_model, set_prompt, get_config, my_tool],
+)
+```
+
+### Modifying Callbacks
+
+```python
+async def after_model_callback(
+    callback_context: CallbackContext,
+    llm_response: LlmResponse
+) -> Optional[LlmResponse]:
+    """Modify response after model generates it."""
+    # Your logic here
+    return llm_response
+```
+
+## License
+
+Apache 2.0
--- a/ai/agents/task_agent/init.py
+++ b/ai/agents/task_agent/init.py
@@ -0,0 +1,5 @@
+"""Package entry point for the ADK-formatted hot swap agent."""
+
+from .litellm_agent.agent import root_agent
+
+__all__ = ["root_agent"]
--- a/ai/agents/task_agent/a2a_hot_swap.py
+++ b/ai/agents/task_agent/a2a_hot_swap.py
@@ -0,0 +1,224 @@
+#!/usr/bin/env python3
+"""Minimal A2A client utility for hot-swapping LiteLLM model/prompt."""
+
+from __future__ import annotations
+
+import argparse
+import asyncio
+from typing import Optional
+from uuid import uuid4
+
+import httpx
+from a2a.client import A2AClient
+from a2a.client.errors import A2AClientHTTPError
+from a2a.types import (
+    JSONRPCErrorResponse,
+    Message,
+    MessageSendConfiguration,
+    MessageSendParams,
+    Part,
+    Role,
+    SendMessageRequest,
+    SendStreamingMessageRequest,
+    Task,
+    TaskArtifactUpdateEvent,
+    TaskStatusUpdateEvent,
+    TextPart,
+)
+
+from litellm_agent.control import (
+    HotSwapCommand,
+    build_control_message,
+    parse_model_spec,
+    serialize_model_spec,
+)
+
+DEFAULT_URL = "http://localhost:8000/a2a/litellm_agent"
+
+
+async def _collect_text(client: A2AClient, message: str, context_id: str) -> str:
+    """Send a message and collect streamed agent text into a single string."""
+
+    params = MessageSendParams(
+        configuration=MessageSendConfiguration(blocking=True),
+        message=Message(
+            context_id=context_id,
+            message_id=str(uuid4()),
+            role=Role.user,
+            parts=[Part(root=TextPart(text=message))],
+        ),
+    )
+
+    stream_request = SendStreamingMessageRequest(id=str(uuid4()), params=params)
+    buffer: list[str] = []
+    try:
+        async for response in client.send_message_streaming(stream_request):
+            root = response.root
+            if isinstance(root, JSONRPCErrorResponse):
+                raise RuntimeError(f"A2A error: {root.error}")
+
+            payload = root.result
+            buffer.extend(_extract_text(payload))
+    except A2AClientHTTPError as exc:
+        if "text/event-stream" not in str(exc):
+            raise
+
+        send_request = SendMessageRequest(id=str(uuid4()), params=params)
+        response = await client.send_message(send_request)
+        root = response.root
+        if isinstance(root, JSONRPCErrorResponse):
+            raise RuntimeError(f"A2A error: {root.error}")
+        payload = root.result
+        buffer.extend(_extract_text(payload))
+
+    if buffer:
+        buffer = list(dict.fromkeys(buffer))
+    return "\n".join(buffer).strip()
+
+
+def _extract_text(
+    result: Message | Task | TaskStatusUpdateEvent | TaskArtifactUpdateEvent,
+) -> list[str]:
+    texts: list[str] = []
+    if isinstance(result, Message):
+        if result.role is Role.agent:
+            for part in result.parts:
+                root_part = part.root
+                text = getattr(root_part, "text", None)
+                if text:
+                    texts.append(text)
+    elif isinstance(result, Task) and result.history:
+        for msg in result.history:
+            if msg.role is Role.agent:
+                for part in msg.parts:
+                    root_part = part.root
+                    text = getattr(root_part, "text", None)
+                    if text:
+                        texts.append(text)
+    elif isinstance(result, TaskStatusUpdateEvent):
+        message = result.status.message
+        if message:
+            texts.extend(_extract_text(message))
+    elif isinstance(result, TaskArtifactUpdateEvent):
+        artifact = result.artifact
+        if artifact and artifact.parts:
+            for part in artifact.parts:
+                root_part = part.root
+                text = getattr(root_part, "text", None)
+                if text:
+                    texts.append(text)
+    return texts
+
+
+def _split_model_args(model_args: Optional[list[str]]) -> tuple[Optional[str], Optional[str]]:
+    if not model_args:
+        return None, None
+
+    if len(model_args) == 1:
+        return model_args[0], None
+
+    provider = model_args[0]
+    model = " ".join(model_args[1:])
+    return model, provider
+
+
+async def hot_swap(
+    url: str,
+    *,
+    model_args: Optional[list[str]],
+    provider: Optional[str],
+    prompt: Optional[str],
+    message: Optional[str],
+    show_config: bool,
+    context_id: Optional[str],
+    timeout: float,
+) -> None:
+    """Execute the requested hot-swap operations against the A2A endpoint."""
+
+    timeout_config = httpx.Timeout(timeout)
+    async with httpx.AsyncClient(timeout=timeout_config) as http_client:
+        client = A2AClient(url=url, httpx_client=http_client)
+        session_id = context_id or str(uuid4())
+
+        model, derived_provider = _split_model_args(model_args)
+
+        if model:
+            spec = parse_model_spec(model, provider=provider or derived_provider)
+            payload = serialize_model_spec(spec)
+            control_msg = build_control_message(HotSwapCommand.MODEL, payload)
+            result = await _collect_text(client, control_msg, session_id)
+            print(f"Model response: {result or '(no response)'}")
+
+        if prompt is not None:
+            control_msg = build_control_message(HotSwapCommand.PROMPT, prompt)
+            result = await _collect_text(client, control_msg, session_id)
+            print(f"Prompt response: {result or '(no response)'}")
+
+        if show_config:
+            control_msg = build_control_message(HotSwapCommand.GET_CONFIG)
+            result = await _collect_text(client, control_msg, session_id)
+            print(f"Config:\n{result or '(no response)'}")
+
+        if message:
+            result = await _collect_text(client, message, session_id)
+            print(f"Message response: {result or '(no response)'}")
+
+        print(f"Context ID: {session_id}")
+
+
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--url",
+        default=DEFAULT_URL,
+        help=f"A2A endpoint for the agent (default: {DEFAULT_URL})",
+    )
+    parser.add_argument(
+        "--model",
+        nargs="+",
+        help="LiteLLM model spec: either 'provider/model' or '<provider> <model>'.",
+    )
+    parser.add_argument(
+        "--provider",
+        help="Optional LiteLLM provider when --model lacks a prefix.")
+    parser.add_argument(
+        "--prompt",
+        help="Set the system prompt (omit to leave unchanged; empty string clears it).",
+    )
+    parser.add_argument(
+        "--message",
+        help="Send an additional user message after the swaps complete.")
+    parser.add_argument(
+        "--config",
+        action="store_true",
+        help="Print the agent configuration after performing swaps.")
+    parser.add_argument(
+        "--context",
+        help="Optional context/session identifier to reuse across calls.")
+    parser.add_argument(
+        "--timeout",
+        type=float,
+        default=60.0,
+        help="Request timeout (seconds) for A2A calls (default: 60).",
+    )
+    return parser.parse_args()
+
+
+def main() -> None:
+    args = parse_args()
+    asyncio.run(
+        hot_swap(
+            args.url,
+            model_args=args.model,
+            provider=args.provider,
+            prompt=args.prompt,
+            message=args.message,
+            show_config=args.config,
+            context_id=args.context,
+            timeout=args.timeout,
+        )
+    )
+
+
+if __name__ == "__main__":
+    main()
--- a/ai/agents/task_agent/docker-compose.yml
+++ b/ai/agents/task_agent/docker-compose.yml
@@ -0,0 +1,24 @@
+version: '3.8'
+
+services:
+  task-agent:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    container_name: fuzzforge-task-agent
+    ports:
+      - "10900:8000"
+    env_file:
+      - ../../../volumes/env/.env
+    environment:
+      - PORT=8000
+      - PYTHONUNBUFFERED=1
+    volumes:
+      # Mount volumes/env for runtime config access
+      - ../../../volumes/env:/app/config:ro
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
--- a/ai/agents/task_agent/litellm_agent/init.py
+++ b/ai/agents/task_agent/litellm_agent/init.py
@@ -0,0 +1,55 @@
+"""LiteLLM hot-swap agent package exports."""
+
+from .agent import root_agent
+from .callbacks import (
+    before_agent_callback,
+    before_model_callback,
+    provide_instruction,
+)
+from .config import (
+    AGENT_DESCRIPTION,
+    AGENT_NAME,
+    CONTROL_PREFIX,
+    DEFAULT_MODEL,
+    DEFAULT_PROVIDER,
+    STATE_MODEL_KEY,
+    STATE_PROVIDER_KEY,
+    STATE_PROMPT_KEY,
+)
+from .control import (
+    HotSwapCommand,
+    ModelSpec,
+    build_control_message,
+    parse_control_message,
+    parse_model_spec,
+    serialize_model_spec,
+)
+from .state import HotSwapState, apply_state_to_agent
+from .tools import HOTSWAP_TOOLS, get_config, set_model, set_prompt
+
+__all__ = [
+    "root_agent",
+    "before_agent_callback",
+    "before_model_callback",
+    "provide_instruction",
+    "AGENT_DESCRIPTION",
+    "AGENT_NAME",
+    "CONTROL_PREFIX",
+    "DEFAULT_MODEL",
+    "DEFAULT_PROVIDER",
+    "STATE_MODEL_KEY",
+    "STATE_PROVIDER_KEY",
+    "STATE_PROMPT_KEY",
+    "HotSwapCommand",
+    "ModelSpec",
+    "HotSwapState",
+    "apply_state_to_agent",
+    "build_control_message",
+    "parse_control_message",
+    "parse_model_spec",
+    "serialize_model_spec",
+    "HOTSWAP_TOOLS",
+    "get_config",
+    "set_model",
+    "set_prompt",
+]
--- a/ai/agents/task_agent/litellm_agent/agent.json
+++ b/ai/agents/task_agent/litellm_agent/agent.json
@@ -0,0 +1,24 @@
+{
+  "name": "litellm_agent",
+  "description": "A flexible AI agent powered by LiteLLM with hot-swappable models from OpenRouter and other providers",
+  "url": "http://localhost:8000",
+  "version": "1.0.0",
+  "defaultInputModes": ["text/plain"],
+  "defaultOutputModes": ["text/plain"],
+  "capabilities": {
+    "streaming": true
+  },
+  "skills": [
+    {
+      "id": "litellm-general-purpose",
+      "name": "General Purpose AI Assistant",
+      "description": "A flexible AI assistant that can help with various tasks using any LiteLLM-supported model. Supports runtime model and prompt hot-swapping.",
+      "tags": ["ai", "assistant", "litellm", "flexible", "hot-swap"],
+      "examples": [
+        "Help me write a Python function",
+        "Explain quantum computing",
+        "Switch to Claude model and help me code"
+      ]
+    }
+  ]
+}
--- a/ai/agents/task_agent/litellm_agent/agent.py
+++ b/ai/agents/task_agent/litellm_agent/agent.py
@@ -0,0 +1,29 @@
+"""Root agent definition for the LiteLLM hot-swap shell."""
+
+from __future__ import annotations
+
+from google.adk.agents import Agent
+
+from .callbacks import (
+    before_agent_callback,
+    before_model_callback,
+    provide_instruction,
+)
+from .config import AGENT_DESCRIPTION, AGENT_NAME, DEFAULT_MODEL, DEFAULT_PROVIDER
+from .state import HotSwapState
+from .tools import HOTSWAP_TOOLS
+
+_initial_state = HotSwapState(model=DEFAULT_MODEL, provider=DEFAULT_PROVIDER)
+
+root_agent = Agent(
+    name=AGENT_NAME,
+    model=_initial_state.instantiate_llm(),
+    description=AGENT_DESCRIPTION,
+    instruction=provide_instruction,
+    tools=HOTSWAP_TOOLS,
+    before_agent_callback=before_agent_callback,
+    before_model_callback=before_model_callback,
+)
+
+
+__all__ = ["root_agent"]
--- a/ai/agents/task_agent/litellm_agent/callbacks.py
+++ b/ai/agents/task_agent/litellm_agent/callbacks.py
@@ -0,0 +1,137 @@
+"""Callbacks and instruction providers for the LiteLLM hot-swap agent."""
+
+from __future__ import annotations
+
+import logging
+from typing import Optional
+
+from google.adk.agents.callback_context import CallbackContext
+from google.adk.agents.readonly_context import ReadonlyContext
+from google.adk.models.llm_request import LlmRequest
+from google.genai import types
+
+from .config import CONTROL_PREFIX, DEFAULT_MODEL
+from .control import HotSwapCommand, parse_control_message, parse_model_spec
+from .prompts import BASE_INSTRUCTION
+from .state import HotSwapState, apply_state_to_agent
+
+_LOGGER = logging.getLogger(__name__)
+
+
+def provide_instruction(ctx: ReadonlyContext | None = None) -> str:
+    """Compose the system instruction using the stored state."""
+
+    state_mapping = getattr(ctx, "state", None)
+    state = HotSwapState.from_mapping(state_mapping)
+    prompt = state.prompt or BASE_INSTRUCTION
+    return f"{prompt}\n\nActive model: {state.display_model}"
+
+
+def _ensure_state(callback_context: CallbackContext) -> HotSwapState:
+    state = HotSwapState.from_mapping(callback_context.state)
+    state.persist(callback_context.state)
+    return state
+
+
+def _session_id(callback_context: CallbackContext) -> str:
+    session = getattr(callback_context, "session", None)
+    if session is None:
+        session = getattr(callback_context._invocation_context, "session", None)
+    return getattr(session, "id", "unknown-session")
+
+
+async def before_model_callback(
+    callback_context: CallbackContext,
+    llm_request: LlmRequest,
+) -> Optional[types.Content]:
+    """Ensure outgoing requests use the active model from session state."""
+
+    state = _ensure_state(callback_context)
+    try:
+        apply_state_to_agent(callback_context._invocation_context, state)
+    except Exception:  # pragma: no cover - defensive logging
+        _LOGGER.exception(
+            "Failed to apply LiteLLM model '%s' (provider=%s) for session %s",
+            state.model,
+            state.provider,
+            callback_context.session.id,
+        )
+    llm_request.model = state.model or DEFAULT_MODEL
+    return None
+
+
+async def before_agent_callback(
+    callback_context: CallbackContext,
+) -> Optional[types.Content]:
+    """Intercept hot-swap control messages and update session state."""
+
+    user_content = callback_context.user_content
+    if not user_content or not user_content.parts:
+        return None
+
+    first_part = user_content.parts[0]
+    message_text = (first_part.text or "").strip()
+    if not message_text.startswith(CONTROL_PREFIX):
+        return None
+
+    parsed = parse_control_message(message_text)
+    if not parsed:
+        return None
+
+    command, payload = parsed
+    state = _ensure_state(callback_context)
+
+    if command is HotSwapCommand.MODEL:
+        if not payload:
+            return _render("❌ Missing model specification for hot-swap.")
+        try:
+            spec = parse_model_spec(payload)
+        except ValueError as exc:
+            return _render(f"❌ Invalid model specification: {exc}")
+
+        state.model = spec.model
+        state.provider = spec.provider
+        state.persist(callback_context.state)
+        try:
+            apply_state_to_agent(callback_context._invocation_context, state)
+        except Exception:  # pragma: no cover - defensive logging
+            _LOGGER.exception(
+                "Failed to apply LiteLLM model '%s' (provider=%s) for session %s",
+                state.model,
+                state.provider,
+                _session_id(callback_context),
+            )
+        _LOGGER.info(
+            "Hot-swapped model to %s (provider=%s, session=%s)",
+            state.model,
+            state.provider,
+            _session_id(callback_context),
+        )
+        label = state.display_model
+        return _render(f"✅ Model switched to: {label}")
+
+    if command is HotSwapCommand.PROMPT:
+        prompt_value = (payload or "").strip()
+        state.prompt = prompt_value or None
+        state.persist(callback_context.state)
+        if state.prompt:
+            _LOGGER.info(
+                "Updated prompt for session %s", _session_id(callback_context)
+            )
+            return _render(
+                "✅ System prompt updated. This change takes effect immediately."
+            )
+        return _render("✅ System prompt cleared. Reverting to default instruction.")
+
+    if command is HotSwapCommand.GET_CONFIG:
+        return _render(state.describe())
+
+    expected = ", ".join(HotSwapCommand.choices())
+    return _render(
+        "⚠️ Unsupported hot-swap command. Available verbs: "
+        f"{expected}."
+    )
+
+
+def _render(message: str) -> types.ModelContent:
+    return types.ModelContent(parts=[types.Part(text=message)])
--- a/ai/agents/task_agent/litellm_agent/config.py
+++ b/ai/agents/task_agent/litellm_agent/config.py
@@ -0,0 +1,35 @@
+"""Configuration constants for the LiteLLM hot-swap agent."""
+
+from __future__ import annotations
+
+import os
+
+
+def _normalize_proxy_base_url(raw_value: str | None) -> str | None:
+    if not raw_value:
+        return None
+    cleaned = raw_value.strip()
+    if not cleaned:
+        return None
+    # Avoid double slashes in downstream requests
+    return cleaned.rstrip("/")
+
+AGENT_NAME = "litellm_agent"
+AGENT_DESCRIPTION = (
+    "A LiteLLM-backed shell that exposes hot-swappable model and prompt controls."
+)
+
+DEFAULT_MODEL = os.getenv("LITELLM_MODEL", "openai/gpt-4o-mini")
+DEFAULT_PROVIDER = os.getenv("LITELLM_PROVIDER") or None
+PROXY_BASE_URL = _normalize_proxy_base_url(
+    os.getenv("FF_LLM_PROXY_BASE_URL")
+    or os.getenv("LITELLM_API_BASE")
+    or os.getenv("LITELLM_BASE_URL")
+)
+
+STATE_PREFIX = "app:litellm_agent/"
+STATE_MODEL_KEY = f"{STATE_PREFIX}model"
+STATE_PROVIDER_KEY = f"{STATE_PREFIX}provider"
+STATE_PROMPT_KEY = f"{STATE_PREFIX}prompt"
+
+CONTROL_PREFIX = "[HOTSWAP"
--- a/ai/agents/task_agent/litellm_agent/control.py
+++ b/ai/agents/task_agent/litellm_agent/control.py
@@ -0,0 +1,99 @@
+"""Control message helpers for hot-swapping model and prompt."""
+
+from __future__ import annotations
+
+import re
+from dataclasses import dataclass
+from enum import Enum
+from typing import Optional, Tuple
+
+from .config import DEFAULT_PROVIDER
+
+
+class HotSwapCommand(str, Enum):
+    """Supported control verbs embedded in user messages."""
+
+    MODEL = "MODEL"
+    PROMPT = "PROMPT"
+    GET_CONFIG = "GET_CONFIG"
+
+    @classmethod
+    def choices(cls) -> tuple[str, ...]:
+        return tuple(item.value for item in cls)
+
+
+@dataclass(frozen=True)
+class ModelSpec:
+    """Represents a LiteLLM model and optional provider."""
+
+    model: str
+    provider: Optional[str] = None
+
+
+_COMMAND_PATTERN = re.compile(
+    r"^\[HOTSWAP:(?P<verb>[A-Z_]+)(?::(?P<payload>.*))?\]$",
+)
+
+
+def parse_control_message(text: str) -> Optional[Tuple[HotSwapCommand, Optional[str]]]:
+    """Return hot-swap command tuple when the string matches the control format."""
+
+    match = _COMMAND_PATTERN.match(text.strip())
+    if not match:
+        return None
+
+    verb = match.group("verb")
+    if verb not in HotSwapCommand.choices():
+        return None
+
+    payload = match.group("payload")
+    return HotSwapCommand(verb), payload if payload else None
+
+
+def build_control_message(command: HotSwapCommand, payload: Optional[str] = None) -> str:
+    """Serialise a control command for downstream clients."""
+
+    if command not in HotSwapCommand:
+        raise ValueError(f"Unsupported hot-swap command: {command}")
+    if payload is None or payload == "":
+        return f"[HOTSWAP:{command.value}]"
+    return f"[HOTSWAP:{command.value}:{payload}]"
+
+
+def parse_model_spec(model: str, provider: Optional[str] = None) -> ModelSpec:
+    """Parse model/provider inputs into a structured ModelSpec."""
+
+    candidate = (model or "").strip()
+    if not candidate:
+        raise ValueError("Model name cannot be empty")
+
+    if provider:
+        provider_clean = provider.strip()
+        if not provider_clean:
+            raise ValueError("Provider cannot be empty when supplied")
+        if "/" in candidate:
+            raise ValueError(
+                "Provide either provider/model or use provider argument, not both",
+            )
+        return ModelSpec(model=candidate, provider=provider_clean)
+
+    if "/" in candidate:
+        provider_part, model_part = candidate.split("/", 1)
+        provider_part = provider_part.strip()
+        model_part = model_part.strip()
+        if not provider_part or not model_part:
+            raise ValueError("Model spec must include provider and model when using '/' format")
+        return ModelSpec(model=model_part, provider=provider_part)
+
+    if DEFAULT_PROVIDER:
+        return ModelSpec(model=candidate, provider=DEFAULT_PROVIDER.strip())
+
+    return ModelSpec(model=candidate, provider=None)
+
+
+def serialize_model_spec(spec: ModelSpec) -> str:
+    """Render a ModelSpec to provider/model string for control messages."""
+
+    if spec.provider:
+        return f"{spec.provider}/{spec.model}"
+    return spec.model
--- a/ai/agents/task_agent/litellm_agent/prompts.py
+++ b/ai/agents/task_agent/litellm_agent/prompts.py
@@ -0,0 +1,9 @@
+"""System prompt templates for the LiteLLM agent."""
+
+BASE_INSTRUCTION = (
+    "You are a focused orchestration layer that relays between the user and a"
+    " LiteLLM managed model."
+    "\n- Keep answers concise and actionable."
+    "\n- Prefer plain language; reveal intermediate reasoning only when helpful."
+    "\n- Surface any tool results clearly with short explanations."
+)
--- a/ai/agents/task_agent/litellm_agent/state.py
+++ b/ai/agents/task_agent/litellm_agent/state.py
@@ -0,0 +1,254 @@
+"""Session state utilities for the LiteLLM hot-swap agent."""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+import os
+from typing import Any, Mapping, MutableMapping, Optional
+
+import httpx
+
+from .config import (
+    DEFAULT_MODEL,
+    DEFAULT_PROVIDER,
+    PROXY_BASE_URL,
+    STATE_MODEL_KEY,
+    STATE_PROMPT_KEY,
+    STATE_PROVIDER_KEY,
+)
+
+
+@dataclass(slots=True)
+class HotSwapState:
+    """Lightweight view of the hot-swap session state."""
+
+    model: str = DEFAULT_MODEL
+    provider: Optional[str] = None
+    prompt: Optional[str] = None
+
+    @classmethod
+    def from_mapping(cls, mapping: Optional[Mapping[str, Any]]) -> "HotSwapState":
+        if not mapping:
+            return cls()
+
+        raw_model = mapping.get(STATE_MODEL_KEY, DEFAULT_MODEL)
+        raw_provider = mapping.get(STATE_PROVIDER_KEY)
+        raw_prompt = mapping.get(STATE_PROMPT_KEY)
+
+        model = raw_model.strip() if isinstance(raw_model, str) else DEFAULT_MODEL
+        provider = raw_provider.strip() if isinstance(raw_provider, str) else None
+        if not provider and DEFAULT_PROVIDER:
+            provider = DEFAULT_PROVIDER.strip() or None
+        prompt = raw_prompt.strip() if isinstance(raw_prompt, str) else None
+        return cls(
+            model=model or DEFAULT_MODEL,
+            provider=provider or None,
+            prompt=prompt or None,
+        )
+
+    def persist(self, store: MutableMapping[str, object]) -> None:
+        store[STATE_MODEL_KEY] = self.model
+        if self.provider:
+            store[STATE_PROVIDER_KEY] = self.provider
+        else:
+            store[STATE_PROVIDER_KEY] = None
+        store[STATE_PROMPT_KEY] = self.prompt
+
+    def describe(self) -> str:
+        prompt_value = self.prompt if self.prompt else "(default prompt)"
+        provider_value = self.provider if self.provider else "(default provider)"
+        return (
+            "📊 Current Configuration\n"
+            "━━━━━━━━━━━━━━━━━━━━━━\n"
+            f"Model: {self.model}\n"
+            f"Provider: {provider_value}\n"
+            f"System Prompt: {prompt_value}\n"
+            "━━━━━━━━━━━━━━━━━━━━━━"
+        )
+
+    def instantiate_llm(self):
+        """Create a LiteLlm instance for the current state."""
+
+        from google.adk.models.lite_llm import LiteLlm  # Lazy import to avoid cycle
+        from google.adk.models.lite_llm import LiteLLMClient
+        from litellm.types.utils import Choices, Message, ModelResponse, Usage
+
+        kwargs = {"model": self.model}
+        if self.provider:
+            kwargs["custom_llm_provider"] = self.provider
+        if PROXY_BASE_URL:
+            provider = (self.provider or DEFAULT_PROVIDER or "").lower()
+            if provider and provider != "openai":
+                kwargs["api_base"] = f"{PROXY_BASE_URL.rstrip('/')}/{provider}"
+            else:
+                kwargs["api_base"] = PROXY_BASE_URL
+        kwargs.setdefault("api_key", os.environ.get("OPENAI_API_KEY"))
+
+        provider = (self.provider or DEFAULT_PROVIDER or "").lower()
+        model_suffix = self.model.split("/", 1)[-1]
+        use_responses = provider == "openai" and (
+            model_suffix.startswith("gpt-5") or model_suffix.startswith("o1")
+        )
+        if use_responses:
+            kwargs.setdefault("use_responses_api", True)
+
+        llm = LiteLlm(**kwargs)
+
+        if use_responses and PROXY_BASE_URL:
+
+            class _ResponsesAwareClient(LiteLLMClient):
+                def __init__(self, base_client: LiteLLMClient, api_base: str, api_key: str):
+                    self._base_client = base_client
+                    self._api_base = api_base.rstrip("/")
+                    self._api_key = api_key
+
+                async def acompletion(self, model, messages, tools, **kwargs):  # type: ignore[override]
+                    use_responses_api = kwargs.pop("use_responses_api", False)
+                    if not use_responses_api:
+                        return await self._base_client.acompletion(
+                            model=model,
+                            messages=messages,
+                            tools=tools,
+                            **kwargs,
+                        )
+
+                    resolved_model = model
+                    if "/" not in resolved_model:
+                        resolved_model = f"openai/{resolved_model}"
+
+                    payload = {
+                        "model": resolved_model,
+                        "input": _messages_to_responses_input(messages),
+                    }
+
+                    timeout = kwargs.get("timeout", 60)
+                    headers = {
+                        "Authorization": f"Bearer {self._api_key}",
+                        "Content-Type": "application/json",
+                    }
+
+                    async with httpx.AsyncClient(timeout=timeout) as client:
+                        response = await client.post(
+                            f"{self._api_base}/v1/responses",
+                            json=payload,
+                            headers=headers,
+                        )
+                        try:
+                            response.raise_for_status()
+                        except httpx.HTTPStatusError as exc:
+                            text = exc.response.text
+                            raise RuntimeError(
+                                f"Bifrost responses request failed: {text}"
+                            ) from exc
+                        data = response.json()
+
+                    text_output = _extract_output_text(data)
+                    usage = data.get("usage", {})
+
+                    return ModelResponse(
+                        id=data.get("id"),
+                        model=model,
+                        choices=[
+                            Choices(
+                                finish_reason="stop",
+                                index=0,
+                                message=Message(role="assistant", content=text_output),
+                                provider_specific_fields={"bifrost_response": data},
+                            )
+                        ],
+                        usage=Usage(
+                            prompt_tokens=usage.get("input_tokens"),
+                            completion_tokens=usage.get("output_tokens"),
+                            reasoning_tokens=usage.get("output_tokens_details", {}).get(
+                                "reasoning_tokens"
+                            ),
+                            total_tokens=usage.get("total_tokens"),
+                        ),
+                    )
+
+            llm.llm_client = _ResponsesAwareClient(
+                llm.llm_client,
+                PROXY_BASE_URL,
+                os.environ.get("OPENAI_API_KEY", ""),
+            )
+
+        return llm
+
+    @property
+    def display_model(self) -> str:
+        if self.provider:
+            return f"{self.provider}/{self.model}"
+        return self.model
+
+
+def apply_state_to_agent(invocation_context, state: HotSwapState) -> None:
+    """Update the provided agent with a LiteLLM instance matching state."""
+
+    agent = invocation_context.agent
+    agent.model = state.instantiate_llm()
+
+
+def _messages_to_responses_input(messages: list[dict[str, Any]]) -> list[dict[str, Any]]:
+    inputs: list[dict[str, Any]] = []
+    for message in messages:
+        role = message.get("role", "user")
+        content = message.get("content", "")
+        text_segments: list[str] = []
+
+        if isinstance(content, list):
+            for item in content:
+                if isinstance(item, dict):
+                    text = item.get("text") or item.get("content")
+                    if text:
+                        text_segments.append(str(text))
+                elif isinstance(item, str):
+                    text_segments.append(item)
+        elif isinstance(content, str):
+            text_segments.append(content)
+
+        text = "\n".join(segment.strip() for segment in text_segments if segment)
+        if not text:
+            continue
+
+        entry_type = "input_text"
+        if role == "assistant":
+            entry_type = "output_text"
+
+        inputs.append(
+            {
+                "role": role,
+                "content": [
+                    {
+                        "type": entry_type,
+                        "text": text,
+                    }
+                ],
+            }
+        )
+
+    if not inputs:
+        inputs.append(
+            {
+                "role": "user",
+                "content": [
+                    {
+                        "type": "input_text",
+                        "text": "",
+                    }
+                ],
+            }
+        )
+    return inputs
+
+
+def _extract_output_text(response_json: dict[str, Any]) -> str:
+    outputs = response_json.get("output", [])
+    collected: list[str] = []
+    for item in outputs:
+        if isinstance(item, dict) and item.get("type") == "message":
+            for part in item.get("content", []):
+                if isinstance(part, dict) and part.get("type") == "output_text":
+                    text = part.get("text", "")
+                    if text:
+                        collected.append(str(text))
+    return "\n\n".join(collected).strip()
--- a/ai/agents/task_agent/litellm_agent/tools.py
+++ b/ai/agents/task_agent/litellm_agent/tools.py
@@ -0,0 +1,64 @@
+"""Tool definitions exposed to the LiteLLM agent."""
+
+from __future__ import annotations
+
+from typing import Optional
+
+from google.adk.tools import FunctionTool, ToolContext
+
+from .control import parse_model_spec
+from .state import HotSwapState, apply_state_to_agent
+
+
+async def set_model(
+    model: str,
+    *,
+    provider: Optional[str] = None,
+    tool_context: ToolContext,
+) -> str:
+    """Hot-swap the active LiteLLM model for this session."""
+
+    spec = parse_model_spec(model, provider=provider)
+    state = HotSwapState.from_mapping(tool_context.state)
+    state.model = spec.model
+    state.provider = spec.provider
+    state.persist(tool_context.state)
+    try:
+        apply_state_to_agent(tool_context._invocation_context, state)
+    except Exception as exc:  # pragma: no cover - defensive reporting
+        return f"❌ Failed to apply model '{state.display_model}': {exc}"
+    return f"✅ Model switched to: {state.display_model}"
+
+
+async def set_prompt(prompt: str, *, tool_context: ToolContext) -> str:
+    """Update or clear the system prompt used for this session."""
+
+    state = HotSwapState.from_mapping(tool_context.state)
+    prompt_value = prompt.strip()
+    state.prompt = prompt_value or None
+    state.persist(tool_context.state)
+    if state.prompt:
+        return "✅ System prompt updated. This change takes effect immediately."
+    return "✅ System prompt cleared. Reverting to default instruction."
+
+
+async def get_config(*, tool_context: ToolContext) -> str:
+    """Return a summary of the current model and prompt configuration."""
+
+    state = HotSwapState.from_mapping(tool_context.state)
+    return state.describe()
+
+
+HOTSWAP_TOOLS = [
+    FunctionTool(set_model),
+    FunctionTool(set_prompt),
+    FunctionTool(get_config),
+]
+
+
+__all__ = [
+    "set_model",
+    "set_prompt",
+    "get_config",
+    "HOTSWAP_TOOLS",
+]
--- a/ai/agents/task_agent/main.py
+++ b/ai/agents/task_agent/main.py
@@ -0,0 +1,13 @@
+"""ASGI entrypoint for containerized deployments."""
+
+from pathlib import Path
+
+from google.adk.cli.fast_api import get_fast_api_app
+
+AGENT_DIR = Path(__file__).resolve().parent
+
+app = get_fast_api_app(
+    agents_dir=str(AGENT_DIR),
+    web=False,
+    a2a=True,
+)
--- a/ai/agents/task_agent/requirements.txt
+++ b/ai/agents/task_agent/requirements.txt
@@ -0,0 +1,4 @@
+google-adk
+a2a-sdk[all]
+litellm
+python-dotenv
--- a/ai/llm.txt
+++ b/ai/llm.txt
@@ -0,0 +1,93 @@
+FuzzForge AI LLM Configuration Guide
+===================================
+
+This note summarises the environment variables and libraries that drive LiteLLM (via the Google ADK runtime) inside the FuzzForge AI module. For complete matrices and advanced examples, read `docs/advanced/configuration.md`.
+
+Core Libraries
+--------------
+- `google-adk` – hosts the agent runtime, memory services, and LiteLLM bridge.
+- `litellm` – provider-agnostic LLM client used by ADK and the executor.
+- Provider SDKs – install the SDK that matches your target backend (`openai`, `anthropic`, `google-cloud-aiplatform`, `groq`, etc.).
+- Optional extras: `agentops` for tracing, `cognee[all]` for knowledge-graph ingestion, `ollama` CLI for running local models.
+
+Quick install foundation::
+
+```
+pip install google-adk litellm openai
+```
+
+Add any provider-specific SDKs (for example `pip install anthropic groq`) on top of that base.
+
+Baseline Setup
+--------------
+Copy `.fuzzforge/.env.template` to `.fuzzforge/.env` and set the core fields:
+
+```
+LLM_PROVIDER=openai
+LITELLM_MODEL=gpt-5-mini
+OPENAI_API_KEY=sk-your-key
+FUZZFORGE_MCP_URL=http://localhost:8010/mcp
+SESSION_PERSISTENCE=sqlite
+MEMORY_SERVICE=inmemory
+```
+
+LiteLLM Provider Examples
+-------------------------
+
+OpenAI-compatible (Azure, etc.)::
+```
+LLM_PROVIDER=azure_openai
+LITELLM_MODEL=gpt-4o-mini
+LLM_API_KEY=sk-your-azure-key
+LLM_ENDPOINT=https://your-resource.openai.azure.com
+```
+
+Anthropic::
+```
+LLM_PROVIDER=anthropic
+LITELLM_MODEL=claude-3-haiku-20240307
+ANTHROPIC_API_KEY=sk-your-key
+```
+
+Ollama (local)::
+```
+LLM_PROVIDER=ollama_chat
+LITELLM_MODEL=codellama:latest
+OLLAMA_API_BASE=http://localhost:11434
+```
+Run `ollama pull codellama:latest` so the adapter can respond immediately.
+
+Vertex AI::
+```
+LLM_PROVIDER=vertex_ai
+LITELLM_MODEL=gemini-1.5-pro
+GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
+```
+
+Provider Checklist
+------------------
+- **OpenAI / Azure OpenAI**: `LLM_PROVIDER`, `LITELLM_MODEL`, API key, optional endpoint + API version (Azure).
+- **Anthropic**: `LLM_PROVIDER=anthropic`, `LITELLM_MODEL`, `ANTHROPIC_API_KEY`.
+- **Google Vertex AI**: `LLM_PROVIDER=vertex_ai`, `LITELLM_MODEL`, `GOOGLE_APPLICATION_CREDENTIALS`, `GOOGLE_CLOUD_PROJECT`.
+- **Groq**: `LLM_PROVIDER=groq`, `LITELLM_MODEL`, `GROQ_API_KEY`.
+- **Ollama / Local**: `LLM_PROVIDER=ollama_chat`, `LITELLM_MODEL`, `OLLAMA_API_BASE`, and the model pulled locally (`ollama pull <model>`).
+
+Knowledge Graph Add-ons
+-----------------------
+Set these only if you plan to use Cognee project graphs:
+
+```
+LLM_COGNEE_PROVIDER=openai
+LLM_COGNEE_MODEL=gpt-5-mini
+LLM_COGNEE_API_KEY=sk-your-key
+```
+
+Tracing & Debugging
+-------------------
+- Provide `AGENTOPS_API_KEY` to enable hosted traces for every conversation.
+- Set `FUZZFORGE_DEBUG=1` (and optionally `LOG_LEVEL=DEBUG`) for verbose executor output.
+- Restart the agent after changing environment variables; LiteLLM loads configuration on boot.
+
+Further Reading
+---------------
+`docs/advanced/configuration.md` – provider comparison, debugging flags, and referenced modules.
--- a/ai/proxy/README.md
+++ b/ai/proxy/README.md
@@ -0,0 +1,5 @@
+# LLM Proxy Integrations
+
+This directory contains vendor source trees that were vendored only for reference when integrating LLM gateways. The actual FuzzForge deployment uses the official Docker images for each project.
+
+See `docs/docs/how-to/llm-proxy.md` for up-to-date instructions on running the proxy services and issuing keys for the agents.
--- a/ai/pyproject.toml
+++ b/ai/pyproject.toml
@@ -0,0 +1,44 @@
+[project]
+name = "fuzzforge-ai"
+version = "0.7.0"
+description = "FuzzForge AI orchestration module"
+readme = "README.md"
+requires-python = ">=3.11"
+dependencies = [
+    "google-adk",
+    "a2a-sdk",
+    "litellm",
+    "python-dotenv",
+    "httpx",
+    "uvicorn",
+    "rich",
+    "agentops",
+    "fastmcp",
+    "mcp",
+    "typing-extensions",
+    "cognee>=0.3.0",
+]
+
+[project.optional-dependencies]
+dev = [
+    "pytest",
+    "pytest-asyncio",
+    "black",
+    "ruff",
+]
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[tool.hatch.build.targets.wheel]
+packages = ["src/fuzzforge_ai"]
+
+[tool.hatch.metadata]
+allow-direct-references = true
+
+[tool.uv]
+dev-dependencies = [
+    "pytest",
+    "pytest-asyncio",
+]
--- a/ai/src/fuzzforge_ai/init.py
+++ b/ai/src/fuzzforge_ai/init.py
@@ -0,0 +1,24 @@
+"""
+FuzzForge AI Module - Agent-to-Agent orchestration system
+
+This module integrates the fuzzforge_ai components into FuzzForge,
+providing intelligent AI agent capabilities for security analysis.
+
+Usage:
+    from fuzzforge_ai.a2a_wrapper import send_agent_task
+    from fuzzforge_ai.agent import FuzzForgeAgent
+    from fuzzforge_ai.config_manager import ConfigManager
+"""
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+
+__version__ = "0.6.0"
--- a/ai/src/fuzzforge_ai/main.py
+++ b/ai/src/fuzzforge_ai/main.py
@@ -0,0 +1,110 @@
+# ruff: noqa: E402  # Imports delayed for environment/logging setup
+"""
+FuzzForge A2A Server
+Run this to expose FuzzForge as an A2A-compatible agent
+"""
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+
+import os
+import warnings
+import logging
+from dotenv import load_dotenv
+
+from fuzzforge_ai.config_bridge import ProjectConfigManager
+
+# Suppress warnings
+warnings.filterwarnings("ignore")
+logging.getLogger("google.adk").setLevel(logging.ERROR)
+logging.getLogger("google.adk.tools.base_authenticated_tool").setLevel(logging.ERROR)
+
+# Load .env from .fuzzforge directory first, then fallback
+from pathlib import Path
+
+# Ensure Cognee logs stay inside the project workspace
+project_root = Path.cwd()
+default_log_dir = project_root / ".fuzzforge" / "logs"
+default_log_dir.mkdir(parents=True, exist_ok=True)
+log_path = default_log_dir / "cognee.log"
+os.environ.setdefault("COGNEE_LOG_PATH", str(log_path))
+fuzzforge_env = Path.cwd() / ".fuzzforge" / ".env"
+if fuzzforge_env.exists():
+    load_dotenv(fuzzforge_env, override=True)
+else:
+    load_dotenv(override=True)
+
+# Ensure Cognee uses the project-specific storage paths when available
+try:
+    project_config = ProjectConfigManager()
+    project_config.setup_cognee_environment()
+except Exception:
+    # Project may not be initialized; fall through with default settings
+    pass
+
+# Check configuration
+if not os.getenv('LITELLM_MODEL'):
+    print("[ERROR] LITELLM_MODEL not set in .env file")
+    print("Please set LITELLM_MODEL to your desired model (e.g., gpt-4o-mini)")
+    exit(1)
+
+from .agent import get_fuzzforge_agent
+from .a2a_server import create_a2a_app as create_custom_a2a_app
+
+
+def create_a2a_app():
+    """Create the A2A application"""
+    # Get configuration
+    port = int(os.getenv('FUZZFORGE_PORT', 10100))
+    
+    # Get the FuzzForge agent
+    fuzzforge = get_fuzzforge_agent()
+    
+    # Print ASCII banner
+    print("\033[95m")  # Purple color
+    print(" ███████╗██╗   ██╗███████╗███████╗███████╗ ██████╗ ██████╗  ██████╗ ███████╗     █████╗ ██╗")
+    print(" ██╔════╝██║   ██║╚══███╔╝╚══███╔╝██╔════╝██╔═══██╗██╔══██╗██╔════╝ ██╔════╝    ██╔══██╗██║")
+    print(" █████╗  ██║   ██║  ███╔╝   ███╔╝ █████╗  ██║   ██║██████╔╝██║  ███╗█████╗      ███████║██║")
+    print(" ██╔══╝  ██║   ██║ ███╔╝   ███╔╝  ██╔══╝  ██║   ██║██╔══██╗██║   ██║██╔══╝      ██╔══██║██║")
+    print(" ██║     ╚██████╔╝███████╗███████╗██║     ╚██████╔╝██║  ██║╚██████╔╝███████╗    ██║  ██║██║")
+    print(" ╚═╝      ╚═════╝ ╚══════╝╚══════╝╚═╝      ╚═════╝ ╚═╝  ╚═╝ ╚═════╝ ╚══════╝    ╚═╝  ╚═╝╚═╝")
+    print("\033[0m")  # Reset color
+    
+    # Create A2A app
+    print("🚀 Starting FuzzForge A2A Server")
+    print(f"   Model: {fuzzforge.model}")
+    if fuzzforge.cognee_url:
+        print(f"   Memory: Cognee at {fuzzforge.cognee_url}")
+    print(f"   Port: {port}")
+    
+    app = create_custom_a2a_app(fuzzforge.adk_agent, port=port, executor=fuzzforge.executor)
+    
+    print("\n✅ FuzzForge A2A Server ready!")
+    print(f"   Agent card: http://localhost:{port}/.well-known/agent-card.json")
+    print(f"   A2A endpoint: http://localhost:{port}/")
+    print(f"\n📡 Other agents can register FuzzForge at: http://localhost:{port}")
+    
+    return app
+
+
+def main():
+    """Start the A2A server using uvicorn."""
+    import uvicorn
+
+    app = create_a2a_app()
+    port = int(os.getenv('FUZZFORGE_PORT', 10100))
+
+    print("\n🎯 Starting server with uvicorn...")
+    uvicorn.run(app, host="127.0.0.1", port=port)
+
+
+if __name__ == "__main__":
+    main()
--- a/ai/src/fuzzforge_ai/a2a_server.py
+++ b/ai/src/fuzzforge_ai/a2a_server.py
@@ -0,0 +1,229 @@
+"""Custom A2A wiring so we can access task store and queue manager."""
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+
+from __future__ import annotations
+
+import logging
+from typing import Optional, Union
+
+from starlette.applications import Starlette
+from starlette.responses import Response, FileResponse
+
+from google.adk.a2a.executor.a2a_agent_executor import A2aAgentExecutor
+from google.adk.a2a.utils.agent_card_builder import AgentCardBuilder
+from google.adk.a2a.experimental import a2a_experimental
+from google.adk.agents.base_agent import BaseAgent
+from google.adk.artifacts.in_memory_artifact_service import InMemoryArtifactService
+from google.adk.auth.credential_service.in_memory_credential_service import InMemoryCredentialService
+from google.adk.cli.utils.logs import setup_adk_logger
+from google.adk.memory.in_memory_memory_service import InMemoryMemoryService
+from google.adk.runners import Runner
+from google.adk.sessions.in_memory_session_service import InMemorySessionService
+
+from a2a.server.apps import A2AStarletteApplication
+from a2a.server.request_handlers.default_request_handler import DefaultRequestHandler
+from a2a.server.tasks.inmemory_task_store import InMemoryTaskStore
+from a2a.server.events.in_memory_queue_manager import InMemoryQueueManager
+from a2a.types import AgentCard
+
+from .agent_executor import FuzzForgeExecutor
+
+
+import json
+
+
+async def serve_artifact(request):
+    """Serve artifact files via HTTP for A2A agents"""
+    artifact_id = request.path_params["artifact_id"]
+    
+    # Try to get the executor instance to access artifact cache
+    # We'll store a reference to it during app creation
+    executor = getattr(serve_artifact, '_executor', None)
+    if not executor:
+        return Response("Artifact service not available", status_code=503)
+    
+    try:
+        # Look in the artifact cache directory
+        artifact_cache_dir = executor._artifact_cache_dir
+        artifact_dir = artifact_cache_dir / artifact_id
+        
+        if not artifact_dir.exists():
+            return Response("Artifact not found", status_code=404)
+            
+        # Find the artifact file (should be only one file in the directory)
+        artifact_files = list(artifact_dir.glob("*"))
+        if not artifact_files:
+            return Response("Artifact file not found", status_code=404)
+            
+        artifact_file = artifact_files[0]  # Take the first (and should be only) file
+        
+        # Determine mime type from file extension or default to octet-stream
+        import mimetypes
+        mime_type, _ = mimetypes.guess_type(str(artifact_file))
+        if not mime_type:
+            mime_type = 'application/octet-stream'
+            
+        return FileResponse(
+            path=str(artifact_file),
+            media_type=mime_type,
+            filename=artifact_file.name
+        )
+
+    except Exception as e:
+        return Response(f"Error serving artifact: {str(e)}", status_code=500)
+
+
+async def knowledge_query(request):
+    """Expose knowledge graph search over HTTP for external agents."""
+    executor = getattr(knowledge_query, '_executor', None)
+    if not executor:
+        return Response("Knowledge service not available", status_code=503)
+
+    try:
+        payload = await request.json()
+    except Exception:
+        return Response("Invalid JSON body", status_code=400)
+
+    query = payload.get("query")
+    if not query:
+        return Response("'query' is required", status_code=400)
+
+    search_type = payload.get("search_type", "INSIGHTS")
+    dataset = payload.get("dataset")
+
+    result = await executor.query_project_knowledge_api(
+        query=query,
+        search_type=search_type,
+        dataset=dataset,
+    )
+
+    status = 200 if not isinstance(result, dict) or "error" not in result else 400
+    return Response(
+        json.dumps(result, default=str),
+        status_code=status,
+        media_type="application/json",
+    )
+
+
+async def create_file_artifact(request):
+    """Create an artifact from a project file via HTTP."""
+    executor = getattr(create_file_artifact, '_executor', None)
+    if not executor:
+        return Response("File service not available", status_code=503)
+
+    try:
+        payload = await request.json()
+    except Exception:
+        return Response("Invalid JSON body", status_code=400)
+
+    path = payload.get("path")
+    if not path:
+        return Response("'path' is required", status_code=400)
+
+    result = await executor.create_project_file_artifact_api(path)
+    status = 200 if not isinstance(result, dict) or "error" not in result else 400
+    return Response(
+        json.dumps(result, default=str),
+        status_code=status,
+        media_type="application/json",
+    )
+
+
+def _load_agent_card(agent_card: Optional[Union[AgentCard, str]]) -> Optional[AgentCard]:
+    if agent_card is None:
+        return None
+    if isinstance(agent_card, AgentCard):
+        return agent_card
+
+    import json
+    from pathlib import Path
+
+    path = Path(agent_card)
+    with path.open('r', encoding='utf-8') as handle:
+        data = json.load(handle)
+    return AgentCard(**data)
+
+
+@a2a_experimental
+def create_a2a_app(
+    agent: BaseAgent,
+    *,
+    host: str = "localhost",
+    port: int = 8000,
+    protocol: str = "http",
+    agent_card: Optional[Union[AgentCard, str]] = None,
+    executor=None,  # Accept executor reference
+) -> Starlette:
+    """Variant of google.adk.a2a.utils.to_a2a that exposes task-store handles."""
+
+    setup_adk_logger(logging.INFO)
+
+    async def create_runner() -> Runner:
+        return Runner(
+            agent=agent,
+            app_name=agent.name or "fuzzforge",
+            artifact_service=InMemoryArtifactService(),
+            session_service=InMemorySessionService(),
+            memory_service=InMemoryMemoryService(),
+            credential_service=InMemoryCredentialService(),
+        )
+
+    task_store = InMemoryTaskStore()
+    queue_manager = InMemoryQueueManager()
+
+    agent_executor = A2aAgentExecutor(runner=create_runner)
+    request_handler = DefaultRequestHandler(
+        agent_executor=agent_executor,
+        task_store=task_store,
+        queue_manager=queue_manager,
+    )
+
+    rpc_url = f"{protocol}://{host}:{port}/"
+    provided_card = _load_agent_card(agent_card)
+
+    card_builder = AgentCardBuilder(agent=agent, rpc_url=rpc_url)
+
+    app = Starlette()
+
+    async def setup() -> None:
+        if provided_card is not None:
+            final_card = provided_card
+        else:
+            final_card = await card_builder.build()
+
+        a2a_app = A2AStarletteApplication(
+            agent_card=final_card,
+            http_handler=request_handler,
+        )
+        a2a_app.add_routes_to_app(app)
+        
+        # Add artifact serving route
+        app.router.add_route("/artifacts/{artifact_id}", serve_artifact, methods=["GET"])
+        app.router.add_route("/graph/query", knowledge_query, methods=["POST"])
+        app.router.add_route("/project/files", create_file_artifact, methods=["POST"])
+
+    app.add_event_handler("startup", setup)
+
+    # Expose handles so the executor can emit task updates later
+    FuzzForgeExecutor.task_store = task_store
+    FuzzForgeExecutor.queue_manager = queue_manager
+    
+    # Store reference to executor for artifact serving
+    serve_artifact._executor = executor
+    knowledge_query._executor = executor
+    create_file_artifact._executor = executor
+
+    return app
+
+
+__all__ = ["create_a2a_app"]
--- a/ai/src/fuzzforge_ai/a2a_wrapper.py
+++ b/ai/src/fuzzforge_ai/a2a_wrapper.py
@@ -0,0 +1,288 @@
+"""
+A2A Wrapper Module for FuzzForge
+Programmatic interface to send tasks to A2A agents with custom model/prompt/context
+"""
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+from __future__ import annotations
+
+from typing import Optional, Any
+from uuid import uuid4
+
+import httpx
+from a2a.client import A2AClient
+from a2a.client.errors import A2AClientHTTPError
+from a2a.types import (
+    JSONRPCErrorResponse,
+    Message,
+    MessageSendConfiguration,
+    MessageSendParams,
+    Part,
+    Role,
+    SendMessageRequest,
+    SendStreamingMessageRequest,
+    Task,
+    TaskArtifactUpdateEvent,
+    TaskStatusUpdateEvent,
+    TextPart,
+)
+
+
+class A2ATaskResult:
+    """Result from an A2A agent task"""
+
+    def __init__(self, text: str, context_id: str, raw_response: Any = None):
+        self.text = text
+        self.context_id = context_id
+        self.raw_response = raw_response
+
+    def __str__(self) -> str:
+        return self.text
+
+    def __repr__(self) -> str:
+        return f"A2ATaskResult(text={self.text[:50]}..., context_id={self.context_id})"
+
+
+def _build_control_message(command: str, payload: Optional[str] = None) -> str:
+    """Build a control message for hot-swapping agent configuration"""
+    if payload is None or payload == "":
+        return f"[HOTSWAP:{command}]"
+    return f"[HOTSWAP:{command}:{payload}]"
+
+
+def _extract_text(
+    result: Message | Task | TaskStatusUpdateEvent | TaskArtifactUpdateEvent,
+) -> list[str]:
+    """Extract text content from A2A response objects"""
+    texts: list[str] = []
+    if isinstance(result, Message):
+        if result.role is Role.agent:
+            for part in result.parts:
+                root_part = part.root
+                text = getattr(root_part, "text", None)
+                if text:
+                    texts.append(text)
+    elif isinstance(result, Task) and result.history:
+        for msg in result.history:
+            if msg.role is Role.agent:
+                for part in msg.parts:
+                    root_part = part.root
+                    text = getattr(root_part, "text", None)
+                    if text:
+                        texts.append(text)
+    elif isinstance(result, TaskStatusUpdateEvent):
+        message = result.status.message
+        if message:
+            texts.extend(_extract_text(message))
+    elif isinstance(result, TaskArtifactUpdateEvent):
+        artifact = result.artifact
+        if artifact and artifact.parts:
+            for part in artifact.parts:
+                root_part = part.root
+                text = getattr(root_part, "text", None)
+                if text:
+                    texts.append(text)
+    return texts
+
+
+async def _send_message(
+    client: A2AClient,
+    message: str,
+    context_id: str,
+) -> str:
+    """Send a message to the A2A agent and collect the response"""
+
+    params = MessageSendParams(
+        configuration=MessageSendConfiguration(blocking=True),
+        message=Message(
+            context_id=context_id,
+            message_id=str(uuid4()),
+            role=Role.user,
+            parts=[Part(root=TextPart(text=message))],
+        ),
+    )
+
+    stream_request = SendStreamingMessageRequest(id=str(uuid4()), params=params)
+    buffer: list[str] = []
+
+    try:
+        async for response in client.send_message_streaming(stream_request):
+            root = response.root
+            if isinstance(root, JSONRPCErrorResponse):
+                raise RuntimeError(f"A2A error: {root.error}")
+
+            payload = root.result
+            buffer.extend(_extract_text(payload))
+    except A2AClientHTTPError as exc:
+        if "text/event-stream" not in str(exc):
+            raise
+
+        # Fallback to non-streaming
+        send_request = SendMessageRequest(id=str(uuid4()), params=params)
+        response = await client.send_message(send_request)
+        root = response.root
+        if isinstance(root, JSONRPCErrorResponse):
+            raise RuntimeError(f"A2A error: {root.error}")
+        payload = root.result
+        buffer.extend(_extract_text(payload))
+
+    if buffer:
+        buffer = list(dict.fromkeys(buffer))  # Remove duplicates
+    return "\n".join(buffer).strip()
+
+
+async def send_agent_task(
+    url: str,
+    message: str,
+    *,
+    model: Optional[str] = None,
+    provider: Optional[str] = None,
+    prompt: Optional[str] = None,
+    context: Optional[str] = None,
+    timeout: float = 120.0,
+) -> A2ATaskResult:
+    """
+    Send a task to an A2A agent with optional model/prompt configuration.
+
+    Args:
+        url: A2A endpoint URL (e.g., "http://127.0.0.1:8000/a2a/litellm_agent")
+        message: The task message to send to the agent
+        model: Optional model name (e.g., "gpt-4o", "gemini-2.0-flash")
+        provider: Optional provider name (e.g., "openai", "gemini")
+        prompt: Optional system prompt to set before sending the message
+        context: Optional context/session ID (generated if not provided)
+        timeout: Request timeout in seconds (default: 120)
+
+    Returns:
+        A2ATaskResult with the agent's response text and context ID
+
+    Example:
+        >>> result = await send_agent_task(
+        ...     url="http://127.0.0.1:8000/a2a/litellm_agent",
+        ...     model="gpt-4o",
+        ...     provider="openai",
+        ...     prompt="You are concise.",
+        ...     message="Give me a fuzzing harness.",
+        ...     context="fuzzing",
+        ...     timeout=120
+        ... )
+        >>> print(result.text)
+    """
+    timeout_config = httpx.Timeout(timeout)
+    context_id = context or str(uuid4())
+
+    async with httpx.AsyncClient(timeout=timeout_config) as http_client:
+        client = A2AClient(url=url, httpx_client=http_client)
+
+        # Set model if provided
+        if model:
+            model_spec = f"{provider}/{model}" if provider else model
+            control_msg = _build_control_message("MODEL", model_spec)
+            await _send_message(client, control_msg, context_id)
+
+        # Set prompt if provided
+        if prompt is not None:
+            control_msg = _build_control_message("PROMPT", prompt)
+            await _send_message(client, control_msg, context_id)
+
+        # Send the actual task message
+        response_text = await _send_message(client, message, context_id)
+
+        return A2ATaskResult(
+            text=response_text,
+            context_id=context_id,
+        )
+
+
+async def get_agent_config(
+    url: str,
+    context: Optional[str] = None,
+    timeout: float = 60.0,
+) -> str:
+    """
+    Get the current configuration of an A2A agent.
+
+    Args:
+        url: A2A endpoint URL
+        context: Optional context/session ID
+        timeout: Request timeout in seconds
+
+    Returns:
+        Configuration string from the agent
+    """
+    timeout_config = httpx.Timeout(timeout)
+    context_id = context or str(uuid4())
+
+    async with httpx.AsyncClient(timeout=timeout_config) as http_client:
+        client = A2AClient(url=url, httpx_client=http_client)
+        control_msg = _build_control_message("GET_CONFIG")
+        config_text = await _send_message(client, control_msg, context_id)
+        return config_text
+
+
+async def hot_swap_model(
+    url: str,
+    model: str,
+    provider: Optional[str] = None,
+    context: Optional[str] = None,
+    timeout: float = 60.0,
+) -> str:
+    """
+    Hot-swap the model of an A2A agent without sending a task.
+
+    Args:
+        url: A2A endpoint URL
+        model: Model name to switch to
+        provider: Optional provider name
+        context: Optional context/session ID
+        timeout: Request timeout in seconds
+
+    Returns:
+        Response from the agent
+    """
+    timeout_config = httpx.Timeout(timeout)
+    context_id = context or str(uuid4())
+
+    async with httpx.AsyncClient(timeout=timeout_config) as http_client:
+        client = A2AClient(url=url, httpx_client=http_client)
+        model_spec = f"{provider}/{model}" if provider else model
+        control_msg = _build_control_message("MODEL", model_spec)
+        response = await _send_message(client, control_msg, context_id)
+        return response
+
+
+async def hot_swap_prompt(
+    url: str,
+    prompt: str,
+    context: Optional[str] = None,
+    timeout: float = 60.0,
+) -> str:
+    """
+    Hot-swap the system prompt of an A2A agent.
+
+    Args:
+        url: A2A endpoint URL
+        prompt: System prompt to set
+        context: Optional context/session ID
+        timeout: Request timeout in seconds
+
+    Returns:
+        Response from the agent
+    """
+    timeout_config = httpx.Timeout(timeout)
+    context_id = context or str(uuid4())
+
+    async with httpx.AsyncClient(timeout=timeout_config) as http_client:
+        client = A2AClient(url=url, httpx_client=http_client)
+        control_msg = _build_control_message("PROMPT", prompt)
+        response = await _send_message(client, control_msg, context_id)
+        return response
--- a/ai/src/fuzzforge_ai/agent.py
+++ b/ai/src/fuzzforge_ai/agent.py
@@ -0,0 +1,133 @@
+"""
+FuzzForge Agent Definition
+The core agent that combines all components
+"""
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+
+import os
+from pathlib import Path
+from typing import Dict, Any, List
+from google.adk import Agent
+from google.adk.models.lite_llm import LiteLlm
+from .agent_card import get_fuzzforge_agent_card
+from .agent_executor import FuzzForgeExecutor
+from .memory_service import FuzzForgeMemoryService, HybridMemoryManager
+
+# Load environment variables from the AI module's .env file
+try:
+    from dotenv import load_dotenv
+    _ai_dir = Path(__file__).parent
+    _env_file = _ai_dir / ".env"
+    if _env_file.exists():
+        load_dotenv(_env_file, override=False)  # Don't override existing env vars
+except ImportError:
+    # dotenv not available, skip loading
+    pass
+
+
+class FuzzForgeAgent:
+    """The main FuzzForge agent that combines card, executor, and ADK agent"""
+    
+    def __init__(
+        self,
+        model: str = None,
+        cognee_url: str = None,
+        port: int = 10100,
+    ):
+        """Initialize FuzzForge agent with configuration"""
+        self.model = model or os.getenv('LITELLM_MODEL', 'gpt-4o-mini')
+        self.cognee_url = cognee_url or os.getenv('COGNEE_MCP_URL')
+        self.port = port
+
+        # Initialize ADK Memory Service for conversational memory
+        memory_type = os.getenv('MEMORY_SERVICE', 'inmemory')
+        self.memory_service = FuzzForgeMemoryService(memory_type=memory_type)
+        
+        # Create the executor (the brain) with memory and session services
+        self.executor = FuzzForgeExecutor(
+            model=self.model,
+            cognee_url=self.cognee_url,
+            debug=os.getenv('FUZZFORGE_DEBUG', '0') == '1',
+            memory_service=self.memory_service,
+            session_persistence=os.getenv('SESSION_PERSISTENCE', 'inmemory'),
+            fuzzforge_mcp_url=None,  # Disabled
+        )
+        
+        # Create Hybrid Memory Manager (ADK + Cognee direct integration)
+        # MCP tools removed - using direct Cognee integration only
+        self.memory_manager = HybridMemoryManager(
+            memory_service=self.memory_service,
+            cognee_tools=None  # No MCP tools, direct integration used instead
+        )
+        
+        # Get the agent card (the identity)
+        self.agent_card = get_fuzzforge_agent_card(f"http://localhost:{self.port}")
+        
+        # Create the ADK agent (for A2A server mode)
+        self.adk_agent = self._create_adk_agent()
+        
+    def _create_adk_agent(self) -> Agent:
+        """Create the ADK agent for A2A server mode"""
+        # Build instruction
+        instruction = f"""You are {self.agent_card.name}, {self.agent_card.description}
+
+Your capabilities include:
+"""
+        for skill in self.agent_card.skills:
+            instruction += f"\n- {skill.name}: {skill.description}"
+        
+        instruction += """
+
+When responding to requests:
+1. Use your registered agents when appropriate
+2. Use Cognee memory tools when available
+3. Provide helpful, concise responses
+4. Maintain context across conversations
+"""
+        
+        # Create ADK agent
+        return Agent(
+            model=LiteLlm(model=self.model),
+            name=self.agent_card.name,
+            description=self.agent_card.description,
+            instruction=instruction,
+            tools=self.executor.agent.tools if hasattr(self.executor.agent, 'tools') else []
+        )
+    
+    async def process_message(self, message: str, context_id: str = None) -> str:
+        """Process a message using the executor"""
+        result = await self.executor.execute(message, context_id or "default")
+        return result.get("response", "No response generated")
+    
+    async def register_agent(self, url: str) -> Dict[str, Any]:
+        """Register a new agent"""
+        return await self.executor.register_agent(url)
+    
+    def list_agents(self) -> List[Dict[str, Any]]:
+        """List registered agents"""
+        return self.executor.list_agents()
+    
+    async def cleanup(self):
+        """Clean up resources"""
+        await self.executor.cleanup()
+
+
+# Create a singleton instance for import
+_instance = None
+
+def get_fuzzforge_agent() -> FuzzForgeAgent:
+    """Get the singleton FuzzForge agent instance"""
+    global _instance
+    if _instance is None:
+        _instance = FuzzForgeAgent()
+    return _instance
--- a/ai/src/fuzzforge_ai/agent_card.py
+++ b/ai/src/fuzzforge_ai/agent_card.py
@@ -0,0 +1,182 @@
+"""
+FuzzForge Agent Card and Skills Definition
+Defines what FuzzForge can do and how others can discover it
+"""
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+
+from dataclasses import dataclass
+from typing import List, Dict, Any
+
+@dataclass
+class AgentSkill:
+    """Represents a specific capability of the agent"""
+    id: str
+    name: str
+    description: str
+    tags: List[str]
+    examples: List[str]
+    input_modes: List[str] = None
+    output_modes: List[str] = None
+    
+    def to_dict(self) -> Dict[str, Any]:
+        """Convert to dictionary for JSON serialization"""
+        return {
+            "id": self.id,
+            "name": self.name,
+            "description": self.description,
+            "tags": self.tags,
+            "examples": self.examples,
+            "inputModes": self.input_modes or ["text/plain"],
+            "outputModes": self.output_modes or ["text/plain"]
+        }
+
+
+@dataclass
+class AgentCapabilities:
+    """Defines agent capabilities for A2A protocol"""
+    streaming: bool = False
+    push_notifications: bool = False
+    multi_turn: bool = True
+    context_retention: bool = True
+    
+    def to_dict(self) -> Dict[str, Any]:
+        return {
+            "streaming": self.streaming,
+            "pushNotifications": self.push_notifications,
+            "multiTurn": self.multi_turn,
+            "contextRetention": self.context_retention
+        }
+
+
+@dataclass
+class AgentCard:
+    """The agent's business card - tells others what this agent can do"""
+    name: str
+    description: str
+    version: str
+    url: str
+    skills: List[AgentSkill]
+    capabilities: AgentCapabilities
+    default_input_modes: List[str] = None
+    default_output_modes: List[str] = None
+    preferred_transport: str = "JSONRPC"
+    protocol_version: str = "0.3.0"
+    
+    def to_dict(self) -> Dict[str, Any]:
+        """Convert to A2A-compliant agent card JSON"""
+        return {
+            "name": self.name,
+            "description": self.description,
+            "version": self.version,
+            "url": self.url,
+            "protocolVersion": self.protocol_version,
+            "preferredTransport": self.preferred_transport,
+            "defaultInputModes": self.default_input_modes or ["text/plain"],
+            "defaultOutputModes": self.default_output_modes or ["text/plain"],
+            "capabilities": self.capabilities.to_dict(),
+            "skills": [skill.to_dict() for skill in self.skills]
+        }
+
+
+# Define FuzzForge's skills
+orchestration_skill = AgentSkill(
+    id="orchestration",
+    name="Agent Orchestration",
+    description="Route requests to appropriate registered agents based on their capabilities",
+    tags=["orchestration", "routing", "coordination"],
+    examples=[
+        "Route this to the calculator",
+        "Send this to the appropriate agent",
+        "Which agent should handle this?"
+    ]
+)
+
+memory_skill = AgentSkill(
+    id="memory",
+    name="Memory Management",
+    description="Store and retrieve information using Cognee knowledge graph",
+    tags=["memory", "knowledge", "storage", "cognee"],
+    examples=[
+        "Remember that my favorite color is blue",
+        "What do you remember about me?",
+        "Search your memory for project details"
+    ]
+)
+
+conversation_skill = AgentSkill(
+    id="conversation",
+    name="General Conversation",
+    description="Engage in general conversation and answer questions using LLM",
+    tags=["chat", "conversation", "qa", "llm"],
+    examples=[
+        "What is the meaning of life?",
+        "Explain quantum computing",
+        "Help me understand this concept"
+    ]
+)
+
+workflow_automation_skill = AgentSkill(
+    id="workflow_automation",
+    name="Workflow Automation",
+    description="Operate project workflows via MCP, monitor runs, and share results",
+    tags=["workflow", "automation", "mcp", "orchestration"],
+    examples=[
+        "Submit the security assessment workflow",
+        "Kick off the infrastructure scan and monitor it",
+        "Summarise findings for run abc123"
+    ]
+)
+
+agent_management_skill = AgentSkill(
+    id="agent_management",
+    name="Agent Registry Management",
+    description="Register, list, and manage connections to other A2A agents",
+    tags=["registry", "management", "discovery"],
+    examples=[
+        "Register agent at http://localhost:10201",
+        "List all registered agents",
+        "Show agent capabilities"
+    ]
+)
+
+# Define FuzzForge's capabilities
+fuzzforge_capabilities = AgentCapabilities(
+    streaming=False,
+    push_notifications=True,
+    multi_turn=True,  # We support multi-turn conversations
+    context_retention=True  # We maintain context across turns
+)
+
+# Create the public agent card
+def get_fuzzforge_agent_card(url: str = "http://localhost:10100") -> AgentCard:
+    """Get FuzzForge's agent card with current configuration"""
+    return AgentCard(
+        name="ProjectOrchestrator",
+        description=(
+            "An A2A-capable project agent that can launch and monitor FuzzForge workflows, "
+            "consult the project knowledge graph, and coordinate with speciality agents."
+        ),
+        version="project-agent",
+        url=url,
+        skills=[
+            orchestration_skill,
+            memory_skill,
+            conversation_skill,
+            agent_management_skill
+        ],
+        capabilities=fuzzforge_capabilities,
+        default_input_modes=["text/plain", "application/json"],
+        default_output_modes=["text/plain", "application/json"],
+        preferred_transport="JSONRPC",
+        protocol_version="0.3.0"
+    )
--- a/ai/src/fuzzforge_ai/agent_executor.py
+++ b/ai/src/fuzzforge_ai/agent_executor.py
--- a/ai/src/fuzzforge_ai/cli.py
+++ b/ai/src/fuzzforge_ai/cli.py
@@ -0,0 +1,971 @@
+# ruff: noqa: E402  # Imports delayed for environment/logging setup
+#!/usr/bin/env python3
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+"""
+FuzzForge CLI - Clean modular version
+Uses the separated agent components
+"""
+
+import asyncio
+import shlex
+import os
+import sys
+import signal
+import warnings
+import logging
+import random
+from datetime import datetime
+from contextlib import contextmanager
+from pathlib import Path
+
+from dotenv import load_dotenv
+
+# Ensure Cognee writes logs inside the project workspace
+project_root = Path.cwd()
+default_log_dir = project_root / ".fuzzforge" / "logs"
+default_log_dir.mkdir(parents=True, exist_ok=True)
+log_path = default_log_dir / "cognee.log"
+os.environ.setdefault("COGNEE_LOG_PATH", str(log_path))
+
+# Suppress warnings
+warnings.filterwarnings("ignore")
+logging.basicConfig(level=logging.ERROR)
+
+# Load .env file with explicit path handling
+# 1. First check current working directory for .fuzzforge/.env
+fuzzforge_env = Path.cwd() / ".fuzzforge" / ".env"
+if fuzzforge_env.exists():
+    load_dotenv(fuzzforge_env, override=True)
+else:
+    # 2. Then check parent directories for .fuzzforge projects
+    current_path = Path.cwd()
+    for parent in [current_path] + list(current_path.parents):
+        fuzzforge_dir = parent / ".fuzzforge"
+        if fuzzforge_dir.exists():
+            project_env = fuzzforge_dir / ".env"
+            if project_env.exists():
+                load_dotenv(project_env, override=True)
+                break
+    else:
+        # 3. Fallback to generic load_dotenv
+        load_dotenv(override=True)
+
+# Enhanced readline configuration for Rich Console input compatibility
+try:
+    import readline
+    # Enable Rich-compatible input features
+    readline.parse_and_bind("tab: complete")
+    readline.parse_and_bind("set editing-mode emacs")
+    readline.parse_and_bind("set show-all-if-ambiguous on") 
+    readline.parse_and_bind("set completion-ignore-case on")
+    readline.parse_and_bind("set colored-completion-prefix on")
+    readline.parse_and_bind("set enable-bracketed-paste on")  # Better paste support
+    # Navigation bindings for better editing
+    readline.parse_and_bind("Control-a: beginning-of-line")
+    readline.parse_and_bind("Control-e: end-of-line") 
+    readline.parse_and_bind("Control-u: unix-line-discard")
+    readline.parse_and_bind("Control-k: kill-line")
+    readline.parse_and_bind("Control-w: unix-word-rubout")
+    readline.parse_and_bind("Meta-Backspace: backward-kill-word")
+    # History and completion
+    readline.set_history_length(2000)
+    readline.set_startup_hook(None)
+    # Enable multiline editing hints
+    readline.parse_and_bind("set horizontal-scroll-mode off")
+    readline.parse_and_bind("set mark-symlinked-directories on")
+    READLINE_AVAILABLE = True
+except ImportError:
+    READLINE_AVAILABLE = False
+
+from rich.console import Console
+from rich.table import Table
+from rich.panel import Panel
+from rich import box
+
+
+from .agent import FuzzForgeAgent
+from .config_manager import ConfigManager
+from .config_bridge import ProjectConfigManager
+
+console = Console()
+
+# Global shutdown flag
+shutdown_requested = False
+
+# Dynamic status messages for better UX
+THINKING_MESSAGES = [
+    "Thinking", "Processing", "Computing", "Analyzing", "Working", 
+    "Pondering", "Deliberating", "Calculating", "Reasoning", "Evaluating"
+]
+
+WORKING_MESSAGES = [
+    "Working", "Processing", "Handling", "Executing", "Running",
+    "Operating", "Performing", "Conducting", "Managing", "Coordinating" 
+]
+
+SEARCH_MESSAGES = [
+    "Searching", "Scanning", "Exploring", "Investigating", "Hunting",
+    "Seeking", "Probing", "Examining", "Inspecting", "Browsing"
+]
+
+# Cool prompt symbols
+PROMPT_STYLES = [
+    "▶", "❯", "➤", "→", "»", "⟩", "▷", "⇨", "⟶", "◆"
+]
+
+def get_dynamic_status(action_type="thinking"):
+    """Get a random status message based on action type"""
+    if action_type == "thinking":
+        return f"{random.choice(THINKING_MESSAGES)}..."
+    elif action_type == "working":
+        return f"{random.choice(WORKING_MESSAGES)}..."
+    elif action_type == "searching":
+        return f"{random.choice(SEARCH_MESSAGES)}..."
+    else:
+        return f"{random.choice(THINKING_MESSAGES)}..."
+
+def get_prompt_symbol():
+    """Get prompt symbol indicating where to write"""
+    return ">>"
+
+def signal_handler(signum, frame):
+    """Handle Ctrl+C gracefully"""
+    global shutdown_requested
+    shutdown_requested = True
+    console.print("\n\n[yellow]Shutting down gracefully...[/yellow]")
+    sys.exit(0)
+
+signal.signal(signal.SIGINT, signal_handler)
+
+@contextmanager
+def safe_status(message: str):
+    """Safe status context manager"""
+    status = console.status(message, spinner="dots")
+    try:
+        status.start()
+        yield
+    finally:
+        status.stop()
+
+
+class FuzzForgeCLI:
+    """Command-line interface for FuzzForge"""
+    
+    def __init__(self):
+        """Initialize the CLI"""
+        # Ensure .env is loaded from .fuzzforge directory
+        fuzzforge_env = Path.cwd() / ".fuzzforge" / ".env"
+        if fuzzforge_env.exists():
+            load_dotenv(fuzzforge_env, override=True)
+        
+        # Load configuration for agent registry
+        self.config_manager = ConfigManager()
+        
+        # Check environment configuration
+        if not os.getenv('LITELLM_MODEL'):
+            console.print("[red]ERROR: LITELLM_MODEL not set in .env file[/red]")
+            console.print("Please set LITELLM_MODEL to your desired model")
+            sys.exit(1)
+        
+        # Create the agent (uses env vars directly)
+        self.agent = FuzzForgeAgent()
+        
+        # Create a consistent context ID for this CLI session
+        self.context_id = f"cli_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
+        
+        # Track registered agents for config persistence
+        self.agents_modified = False
+        
+        # Command handlers
+        self.commands = {
+            "/help": self.cmd_help,
+            "/register": self.cmd_register,
+            "/unregister": self.cmd_unregister,
+            "/list": self.cmd_list,
+            "/memory": self.cmd_memory,
+            "/recall": self.cmd_recall,
+            "/artifacts": self.cmd_artifacts,
+            "/tasks": self.cmd_tasks,
+            "/skills": self.cmd_skills,
+            "/sessions": self.cmd_sessions,
+            "/clear": self.cmd_clear,
+            "/sendfile": self.cmd_sendfile,
+            "/quit": self.cmd_quit,
+            "/exit": self.cmd_quit,
+        }
+
+        self.background_tasks: set[asyncio.Task] = set()
+        
+    def print_banner(self):
+        """Print welcome banner"""
+        card = self.agent.agent_card
+        
+        # Print ASCII banner
+        console.print("[medium_purple3] ███████╗██╗   ██╗███████╗███████╗███████╗ ██████╗ ██████╗  ██████╗ ███████╗     █████╗ ██╗[/medium_purple3]")
+        console.print("[medium_purple3] ██╔════╝██║   ██║╚══███╔╝╚══███╔╝██╔════╝██╔═══██╗██╔══██╗██╔════╝ ██╔════╝    ██╔══██╗██║[/medium_purple3]")
+        console.print("[medium_purple3] █████╗  ██║   ██║  ███╔╝   ███╔╝ █████╗  ██║   ██║██████╔╝██║  ███╗█████╗      ███████║██║[/medium_purple3]")
+        console.print("[medium_purple3] ██╔══╝  ██║   ██║ ███╔╝   ███╔╝  ██╔══╝  ██║   ██║██╔══██╗██║   ██║██╔══╝      ██╔══██║██║[/medium_purple3]")
+        console.print("[medium_purple3] ██║     ╚██████╔╝███████╗███████╗██║     ╚██████╔╝██║  ██║╚██████╔╝███████╗    ██║  ██║██║[/medium_purple3]")
+        console.print("[medium_purple3] ╚═╝      ╚═════╝ ╚══════╝╚══════╝╚═╝      ╚═════╝ ╚═╝  ╚═╝ ╚═════╝ ╚══════╝    ╚═╝  ╚═╝╚═╝[/medium_purple3]")
+        console.print(f"\n[dim]{card.description}[/dim]\n")
+
+        provider = (
+            os.getenv("LLM_PROVIDER")
+            or os.getenv("LLM_COGNEE_PROVIDER")
+            or os.getenv("COGNEE_LLM_PROVIDER")
+            or "unknown"
+        )
+
+        console.print(
+            "LLM Provider: [medium_purple1]{provider}[/medium_purple1]".format(
+                provider=provider
+            )
+        )
+        console.print(
+            "LLM Model: [medium_purple1]{model}[/medium_purple1]".format(
+                model=self.agent.model
+            )
+        )
+        if self.agent.executor.agentops_trace:
+            console.print("Tracking: [medium_purple1]AgentOps active[/medium_purple1]")
+
+        # Show skills
+        console.print("\nSkills:")
+        for skill in card.skills:
+            console.print(
+                f"   • [deep_sky_blue1]{skill.name}[/deep_sky_blue1] – {skill.description}"
+            )
+        console.print("\nType /help for commands or just chat\n")
+        
+    async def cmd_help(self, args: str = "") -> None:
+        """Show help"""
+        help_text = """
+[bold]Commands:[/bold]
+  /register <url>  - Register an A2A agent (saves to config)
+  /unregister <name> - Remove agent from registry and config
+  /list           - List registered agents
+  
+[bold]Memory Systems:[/bold]  
+  /recall <query> - Search past conversations (ADK Memory)
+  /memory         - Show knowledge graph (Cognee)
+  /memory save    - Save to knowledge graph
+  /memory search  - Search knowledge graph
+  
+[bold]Other:[/bold]
+  /artifacts      - List created artifacts
+  /artifacts <id> - Show artifact content
+  /tasks [id]     - Show task list or details
+  /skills         - Show FuzzForge skills
+  /sessions       - List active sessions
+  /sendfile <agent> <path> [message] - Attach file as artifact and route to agent
+  /clear          - Clear screen
+  /help           - Show this help
+  /quit           - Exit
+
+[bold]Sample prompts:[/bold]
+  run fuzzforge workflow security_assessment on /absolute/path --volume-mode ro
+  list fuzzforge runs limit=5
+  get fuzzforge summary <run_id>
+  query project knowledge about "unsafe Rust" using GRAPH_COMPLETION
+  export project file src/lib.rs as artifact
+  /memory search "recent findings"
+
+[bold]Input Editing:[/bold]
+  Arrow keys      - Move cursor
+  Ctrl+A/E        - Start/end of line
+  Up/Down         - Command history
+        """
+        console.print(help_text)
+        
+    async def cmd_register(self, args: str) -> None:
+        """Register an agent"""
+        if not args:
+            console.print("Usage: /register <url>")
+            return
+            
+        with safe_status(f"{get_dynamic_status('working')} Registering {args}"):
+            result = await self.agent.register_agent(args.strip())
+            
+        if result["success"]:
+            console.print(f"✅ Registered: [bold]{result['name']}[/bold]")
+            console.print(f"   Capabilities: {result['capabilities']} skills")
+            
+            # Get description from the agent's card
+            agents = self.agent.list_agents()
+            description = ""
+            for agent in agents:
+                if agent['name'] == result['name']:
+                    description = agent.get('description', '')
+                    break
+            
+            # Add to config for persistence
+            self.config_manager.add_registered_agent(
+                name=result['name'],
+                url=args.strip(),
+                description=description
+            )
+            console.print("   [dim]Saved to config for auto-registration[/dim]")
+        else:
+            console.print(f"[red]Failed: {result['error']}[/red]")
+            
+    async def cmd_unregister(self, args: str) -> None:
+        """Unregister an agent and remove from config"""
+        if not args:
+            console.print("Usage: /unregister <name or url>")
+            return
+        
+        # Try to find the agent
+        agents = self.agent.list_agents()
+        agent_to_remove = None
+        
+        for agent in agents:
+            if agent['name'].lower() == args.lower() or agent['url'] == args:
+                agent_to_remove = agent
+                break
+        
+        if not agent_to_remove:
+            console.print(f"[yellow]Agent '{args}' not found[/yellow]")
+            return
+        
+        # Remove from config
+        if self.config_manager.remove_registered_agent(name=agent_to_remove['name'], url=agent_to_remove['url']):
+            console.print(f"✅ Unregistered: [bold]{agent_to_remove['name']}[/bold]")
+            console.print("   [dim]Removed from config (won't auto-register next time)[/dim]")
+        else:
+            console.print("[yellow]Agent unregistered from session but not found in config[/yellow]")
+    
+    async def cmd_list(self, args: str = "") -> None:
+        """List registered agents"""
+        agents = self.agent.list_agents()
+        
+        if not agents:
+            console.print("No agents registered. Use /register <url>")
+            return
+            
+        table = Table(title="Registered Agents", box=box.ROUNDED)
+        table.add_column("Name", style="medium_purple3")
+        table.add_column("URL", style="deep_sky_blue3")
+        table.add_column("Skills", style="plum3")
+        table.add_column("Description", style="dim")
+        
+        for agent in agents:
+            desc = agent['description']
+            if len(desc) > 40:
+                desc = desc[:37] + "..."
+            table.add_row(
+                agent['name'],
+                agent['url'],
+                str(agent['skills']),
+                desc
+            )
+            
+        console.print(table)
+        
+    async def cmd_recall(self, args: str = "") -> None:
+        """Search conversational memory (past conversations)"""
+        if not args:
+            console.print("Usage: /recall <query>")
+            return
+        
+        await self._sync_conversational_memory()
+
+        # First try MemoryService (for ingested memories)
+        with safe_status(get_dynamic_status('searching')):
+            results = await self.agent.memory_manager.search_conversational_memory(args)
+        
+        if results and results.memories:
+            console.print(f"[bold]Found {len(results.memories)} memories:[/bold]\n")
+            for i, memory in enumerate(results.memories, 1):
+                # MemoryEntry has 'text' field, not 'content'
+                text = getattr(memory, 'text', str(memory))
+                if len(text) > 200:
+                    text = text[:200] + "..."
+                console.print(f"{i}. {text}")
+        else:
+            # If MemoryService is empty, search SQLite directly
+            console.print("[yellow]No memories in MemoryService, searching SQLite sessions...[/yellow]")
+            
+            # Check if using DatabaseSessionService
+            if hasattr(self.agent.executor, 'session_service'):
+                service_type = type(self.agent.executor.session_service).__name__
+                if service_type == 'DatabaseSessionService':
+                    # Search SQLite database directly
+                    import sqlite3
+                    import os
+                    db_path = os.getenv('SESSION_DB_PATH', './fuzzforge_sessions.db')
+                    
+                    if os.path.exists(db_path):
+                        conn = sqlite3.connect(db_path)
+                        cursor = conn.cursor()
+                        
+                        # Search in events table
+                        query = f"%{args}%"
+                        cursor.execute(
+                            "SELECT content FROM events WHERE content LIKE ? LIMIT 10",
+                            (query,)
+                        )
+                        
+                        rows = cursor.fetchall()
+                        conn.close()
+                        
+                        if rows:
+                            console.print(f"[green]Found {len(rows)} matches in SQLite sessions:[/green]\n")
+                            for i, (content,) in enumerate(rows, 1):
+                                # Parse JSON content
+                                import json
+                                try:
+                                    data = json.loads(content)
+                                    if 'parts' in data and data['parts']:
+                                        text = data['parts'][0].get('text', '')[:150]
+                                        role = data.get('role', 'unknown')
+                                        console.print(f"{i}. [{role}]: {text}...")
+                                except Exception:
+                                    console.print(f"{i}. {content[:150]}...")
+                        else:
+                            console.print("[yellow]No matches found in SQLite either[/yellow]")
+                    else:
+                        console.print("[yellow]SQLite database not found[/yellow]")
+                else:
+                    console.print(f"[dim]Using {service_type} (not searchable)[/dim]")
+            else:
+                console.print("[yellow]No session history available[/yellow]")
+    
+    async def cmd_memory(self, args: str = "") -> None:
+        """Inspect conversational memory and knowledge graph state."""
+        raw_args = (args or "").strip()
+        lower_args = raw_args.lower()
+
+        if not raw_args or lower_args in {"status", "info"}:
+            await self._show_memory_status()
+            return
+
+        if lower_args == "datasets":
+            await self._show_dataset_summary()
+            return
+
+        if lower_args.startswith("search ") or lower_args.startswith("recall "):
+            query = raw_args.split(" ", 1)[1].strip() if " " in raw_args else ""
+            if not query:
+                console.print("Usage: /memory search <query>")
+                return
+            await self.cmd_recall(query)
+            return
+
+        console.print("Usage: /memory [status|datasets|search <query>]")
+        console.print("[dim]/memory search <query> is an alias for /recall <query>[/dim]")
+
+    async def _sync_conversational_memory(self) -> None:
+        """Ensure the ADK memory service ingests any completed sessions."""
+        memory_service = getattr(self.agent.memory_manager, "memory_service", None)
+        executor_sessions = getattr(self.agent.executor, "sessions", {})
+        metadata_map = getattr(self.agent.executor, "session_metadata", {})
+
+        if not memory_service or not executor_sessions:
+            return
+
+        for context_id, session in list(executor_sessions.items()):
+            meta = metadata_map.get(context_id, {})
+            if meta.get('memory_synced'):
+                continue
+
+            add_session = getattr(memory_service, "add_session_to_memory", None)
+            if not callable(add_session):
+                return
+
+            try:
+                await add_session(session)
+                meta['memory_synced'] = True
+                metadata_map[context_id] = meta
+            except Exception as exc:  # pragma: no cover - defensive logging
+                if os.getenv('FUZZFORGE_DEBUG', '0') == '1':
+                    console.print(f"[yellow]Memory sync failed:[/yellow] {exc}")
+
+    async def _show_memory_status(self) -> None:
+        """Render conversational memory, session store, and knowledge graph status."""
+        await self._sync_conversational_memory()
+
+        status = self.agent.memory_manager.get_status()
+
+        conversational = status.get("conversational_memory", {})
+        conv_type = conversational.get("type", "unknown")
+        conv_active = "yes" if conversational.get("active") else "no"
+        conv_details = conversational.get("details", "")
+
+        session_service = getattr(self.agent.executor, "session_service", None)
+        session_service_name = type(session_service).__name__ if session_service else "Unavailable"
+
+        session_lines = [
+            f"[bold]Service:[/bold] {session_service_name}"
+        ]
+
+        session_count = None
+        event_count = None
+        db_path_display = None
+
+        if session_service_name == "DatabaseSessionService":
+            import sqlite3
+
+            db_path = os.getenv('SESSION_DB_PATH', './fuzzforge_sessions.db')
+            session_path = Path(db_path).expanduser().resolve()
+            db_path_display = str(session_path)
+
+            if session_path.exists():
+                try:
+                    with sqlite3.connect(session_path) as conn:
+                        cursor = conn.cursor()
+                        cursor.execute("SELECT COUNT(*) FROM sessions")
+                        session_count = cursor.fetchone()[0]
+                        cursor.execute("SELECT COUNT(*) FROM events")
+                        event_count = cursor.fetchone()[0]
+                except Exception as exc:
+                    session_lines.append(f"[yellow]Warning:[/yellow] Unable to read session database ({exc})")
+            else:
+                session_lines.append("[yellow]SQLite session database not found yet[/yellow]")
+
+        elif session_service_name == "InMemorySessionService":
+            session_lines.append("[dim]Session data persists for the current process only[/dim]")
+
+        if db_path_display:
+            session_lines.append(f"[bold]Database:[/bold] {db_path_display}")
+        if session_count is not None:
+            session_lines.append(f"[bold]Sessions Recorded:[/bold] {session_count}")
+        if event_count is not None:
+            session_lines.append(f"[bold]Events Logged:[/bold] {event_count}")
+
+        conv_lines = [
+            f"[bold]Type:[/bold] {conv_type}",
+            f"[bold]Active:[/bold] {conv_active}"
+        ]
+        if conv_details:
+            conv_lines.append(f"[bold]Details:[/bold] {conv_details}")
+
+        console.print(Panel("\n".join(conv_lines), title="Conversation Memory", border_style="medium_purple3"))
+        console.print(Panel("\n".join(session_lines), title="Session Store", border_style="deep_sky_blue3"))
+
+        # Knowledge graph section
+        knowledge = status.get("knowledge_graph", {})
+        kg_active = knowledge.get("active", False)
+        kg_lines = [
+            f"[bold]Active:[/bold] {'yes' if kg_active else 'no'}",
+            f"[bold]Purpose:[/bold] {knowledge.get('purpose', 'N/A')}"
+        ]
+
+        cognee_data = None
+        cognee_error = None
+        try:
+            project_config = ProjectConfigManager()
+            cognee_data = project_config.get_cognee_config()
+        except Exception as exc:  # pragma: no cover - defensive
+            cognee_error = str(exc)
+
+        if cognee_data:
+            data_dir = cognee_data.get('data_directory')
+            system_dir = cognee_data.get('system_directory')
+            if data_dir:
+                kg_lines.append(f"[bold]Data dir:[/bold] {data_dir}")
+            if system_dir:
+                kg_lines.append(f"[bold]System dir:[/bold] {system_dir}")
+        elif cognee_error:
+            kg_lines.append(f"[yellow]Config unavailable:[/yellow] {cognee_error}")
+
+        dataset_summary = None
+        if kg_active:
+            try:
+                integration = await self.agent.executor._get_knowledge_integration()
+                if integration:
+                    dataset_summary = await integration.list_datasets()
+            except Exception as exc:  # pragma: no cover - defensive
+                kg_lines.append(f"[yellow]Dataset listing failed:[/yellow] {exc}")
+
+        if dataset_summary:
+            if dataset_summary.get("error"):
+                kg_lines.append(f"[yellow]Dataset listing failed:[/yellow] {dataset_summary['error']}")
+            else:
+                datasets = dataset_summary.get("datasets", [])
+                total = dataset_summary.get("total_datasets")
+                if total is not None:
+                    kg_lines.append(f"[bold]Datasets:[/bold] {total}")
+                if datasets:
+                    preview = ", ".join(sorted(datasets)[:5])
+                    if len(datasets) > 5:
+                        preview += ", …"
+                    kg_lines.append(f"[bold]Samples:[/bold] {preview}")
+        else:
+            kg_lines.append("[dim]Run `fuzzforge ingest` to populate the knowledge graph[/dim]")
+
+        console.print(Panel("\n".join(kg_lines), title="Knowledge Graph", border_style="spring_green4"))
+        console.print("\n[dim]Subcommands: /memory datasets | /memory search <query>[/dim]")
+
+    async def _show_dataset_summary(self) -> None:
+        """List datasets available in the Cognee knowledge graph."""
+        try:
+            integration = await self.agent.executor._get_knowledge_integration()
+        except Exception as exc:
+            console.print(f"[yellow]Knowledge graph unavailable:[/yellow] {exc}")
+            return
+
+        if not integration:
+            console.print("[yellow]Knowledge graph is not initialised yet.[/yellow]")
+            console.print("[dim]Run `fuzzforge ingest --path . --recursive` to create the project dataset.[/dim]")
+            return
+
+        with safe_status(get_dynamic_status('searching')):
+            dataset_info = await integration.list_datasets()
+
+        if dataset_info.get("error"):
+            console.print(f"[red]{dataset_info['error']}[/red]")
+            return
+
+        datasets = dataset_info.get("datasets", [])
+        if not datasets:
+            console.print("[yellow]No datasets found.[/yellow]")
+            console.print("[dim]Run `fuzzforge ingest` to populate the knowledge graph.[/dim]")
+            return
+
+        table = Table(title="Cognee Datasets", box=box.ROUNDED)
+        table.add_column("Dataset", style="medium_purple3")
+        table.add_column("Notes", style="dim")
+
+        for name in sorted(datasets):
+            note = ""
+            if name.endswith("_codebase"):
+                note = "primary project dataset"
+            table.add_row(name, note)
+
+        console.print(table)
+        console.print(
+            "[dim]Use knowledge graph prompts (e.g. `search project knowledge for \"topic\" using INSIGHTS`) to query these datasets.[/dim]"
+        )
+            
+    async def cmd_artifacts(self, args: str = "") -> None:
+        """List or show artifacts"""
+        if args:
+            # Show specific artifact
+            artifacts = await self.agent.executor.get_artifacts(self.context_id)
+            for artifact in artifacts:
+                if artifact['id'] == args or args in artifact['id']:
+                    console.print(Panel(
+                        f"[bold]{artifact['title']}[/bold]\n"
+                        f"Type: {artifact['type']} | Created: {artifact['created_at'][:19]}\n\n"
+                        f"[code]{artifact['content']}[/code]",
+                        title=f"Artifact: {artifact['id']}",
+                        border_style="medium_purple3"
+                    ))
+                    return
+            console.print(f"[yellow]Artifact {args} not found[/yellow]")
+            return
+        
+        # List all artifacts
+        artifacts = await self.agent.executor.get_artifacts(self.context_id)
+        
+        if not artifacts:
+            console.print("No artifacts created yet")
+            console.print("[dim]Artifacts are created when generating code, configs, or documents[/dim]")
+            return
+        
+        table = Table(title="Artifacts", box=box.ROUNDED)
+        table.add_column("ID", style="medium_purple3")
+        table.add_column("Type", style="deep_sky_blue3")
+        table.add_column("Title", style="plum3")
+        table.add_column("Size", style="dim")
+        table.add_column("Created", style="dim")
+        
+        for artifact in artifacts:
+            size = f"{len(artifact['content'])} chars"
+            created = artifact['created_at'][:19]  # Just date and time
+            
+        table.add_row(
+            artifact['id'],
+            artifact['type'],
+            artifact['title'][:40] + "..." if len(artifact['title']) > 40 else artifact['title'],
+            size,
+            created
+        )
+        
+        console.print(table)
+        console.print("\n[dim]Use /artifacts <id> to view artifact content[/dim]")
+
+    async def cmd_tasks(self, args: str = "") -> None:
+        """List tasks or show details for a specific task."""
+        store = getattr(self.agent.executor, "task_store", None)
+        if not store or not hasattr(store, "tasks"):
+            console.print("Task store not available")
+            return
+
+        task_id = args.strip()
+
+        async with store.lock:
+            tasks = dict(store.tasks)
+
+        if not tasks:
+            console.print("No tasks recorded yet")
+            return
+
+        if task_id:
+            task = tasks.get(task_id)
+            if not task:
+                console.print(f"Task '{task_id}' not found")
+                return
+
+            state_str = task.status.state.value if hasattr(task.status.state, "value") else str(task.status.state)
+            console.print(f"\n[bold]Task {task.id}[/bold]")
+            console.print(f"Context: {task.context_id}")
+            console.print(f"State: {state_str}")
+            console.print(f"Timestamp: {task.status.timestamp}")
+            if task.metadata:
+                console.print("Metadata:")
+                for key, value in task.metadata.items():
+                    console.print(f"  • {key}: {value}")
+            if task.history:
+                console.print("History:")
+                for entry in task.history[-5:]:
+                    text = getattr(entry, "text", None)
+                    if not text and hasattr(entry, "parts"):
+                        text = " ".join(
+                            getattr(part, "text", "") for part in getattr(entry, "parts", [])
+                        )
+                    console.print(f"  - {text}")
+            return
+
+        table = Table(title="FuzzForge Tasks", box=box.ROUNDED)
+        table.add_column("ID", style="medium_purple3")
+        table.add_column("State", style="white")
+        table.add_column("Workflow", style="deep_sky_blue3")
+        table.add_column("Updated", style="green")
+
+        for task in tasks.values():
+            state_value = task.status.state.value if hasattr(task.status.state, "value") else str(task.status.state)
+            workflow = ""
+            if task.metadata:
+                workflow = task.metadata.get("workflow") or task.metadata.get("workflow_name") or ""
+            timestamp = task.status.timestamp if task.status else ""
+            table.add_row(task.id, state_value, workflow, timestamp)
+
+        console.print(table)
+        console.print("\n[dim]Use /tasks <id> to view task details[/dim]")
+    
+    async def cmd_sessions(self, args: str = "") -> None:
+        """List active sessions"""
+        sessions = self.agent.executor.sessions
+        
+        if not sessions:
+            console.print("No active sessions")
+            return
+            
+        table = Table(title="Active Sessions", box=box.ROUNDED)
+        table.add_column("Context ID", style="medium_purple3")
+        table.add_column("Session ID", style="deep_sky_blue3")
+        table.add_column("User ID", style="plum3")
+        table.add_column("State", style="dim")
+        
+        for context_id, session in sessions.items():
+            # Get session info
+            session_id = getattr(session, 'id', 'N/A')
+            user_id = getattr(session, 'user_id', 'N/A')
+            state = getattr(session, 'state', {})
+            
+            # Format state info
+            agents_count = len(state.get('registered_agents', []))
+            state_info = f"{agents_count} agents registered"
+            
+            table.add_row(
+                context_id[:20] + "..." if len(context_id) > 20 else context_id,
+                session_id[:20] + "..." if len(str(session_id)) > 20 else str(session_id),
+                user_id,
+                state_info
+            )
+            
+        console.print(table)
+        console.print(f"\n[dim]Current session: {self.context_id}[/dim]")
+        
+    async def cmd_skills(self, args: str = "") -> None:
+        """Show FuzzForge skills"""
+        card = self.agent.agent_card
+        
+        table = Table(title=f"{card.name} Skills", box=box.ROUNDED)
+        table.add_column("Skill", style="medium_purple3")
+        table.add_column("Description", style="white")
+        table.add_column("Tags", style="deep_sky_blue3")
+        
+        for skill in card.skills:
+            table.add_row(
+                skill.name,
+                skill.description,
+                ", ".join(skill.tags[:3])
+            )
+            
+        console.print(table)
+        
+    async def cmd_clear(self, args: str = "") -> None:
+        """Clear screen"""
+        console.clear()
+        self.print_banner()
+
+    async def cmd_sendfile(self, args: str) -> None:
+        """Encode a local file as an artifact and route it to a registered agent."""
+        tokens = shlex.split(args)
+        if len(tokens) < 2:
+            console.print("Usage: /sendfile <agent_name> <path> [message]")
+            return
+
+        agent_name = tokens[0]
+        file_arg = tokens[1]
+        note = " ".join(tokens[2:]).strip()
+
+        file_path = Path(file_arg).expanduser()
+        if not file_path.exists():
+            console.print(f"[red]File not found:[/red] {file_path}")
+            return
+
+        session = self.agent.executor.sessions.get(self.context_id)
+        if not session:
+            console.print("[red]No active session available. Try sending a prompt first.[/red]")
+            return
+
+        console.print(f"[dim]Delegating {file_path.name} to {agent_name}...[/dim]")
+
+        async def _delegate() -> None:
+            try:
+                response = await self.agent.executor.delegate_file_to_agent(
+                    agent_name,
+                    str(file_path),
+                    note,
+                    session=session,
+                    context_id=self.context_id,
+                )
+                console.print(f"[{agent_name}]: {response}")
+            except Exception as exc:
+                console.print(f"[red]Failed to delegate file:[/red] {exc}")
+            finally:
+                self.background_tasks.discard(asyncio.current_task())
+
+        task = asyncio.create_task(_delegate())
+        self.background_tasks.add(task)
+        console.print("[dim]Delegation in progress… you can continue working.[/dim]")
+
+    async def cmd_quit(self, args: str = "") -> None:
+        """Exit the CLI"""
+        console.print("\n[green]Shutting down...[/green]")
+        await self.agent.cleanup()
+        if self.background_tasks:
+            for task in list(self.background_tasks):
+                task.cancel()
+            await asyncio.gather(*self.background_tasks, return_exceptions=True)
+        console.print("Goodbye!\n")
+        sys.exit(0)
+
+    async def process_command(self, text: str) -> bool:
+        """Process slash commands"""
+        if not text.startswith('/'):
+            return False
+            
+        parts = text.split(maxsplit=1)
+        cmd = parts[0].lower()
+        args = parts[1] if len(parts) > 1 else ""
+        
+        if cmd in self.commands:
+            await self.commands[cmd](args)
+            return True
+            
+        console.print(f"Unknown command: {cmd}")
+        return True
+        
+    async def auto_register_agents(self):
+        """Auto-register agents from config on startup"""
+        agents_to_register = self.config_manager.get_registered_agents()
+        
+        if agents_to_register:
+            console.print(f"\n[dim]Auto-registering {len(agents_to_register)} agents from config...[/dim]")
+            
+            for agent_config in agents_to_register:
+                url = agent_config.get('url')
+                name = agent_config.get('name', 'Unknown')
+                
+                if url:
+                    try:
+                        with safe_status(f"Registering {name}..."):
+                            result = await self.agent.register_agent(url)
+                        
+                        if result["success"]:
+                            console.print(f"  ✅ {name}: [green]Connected[/green]")
+                        else:
+                            console.print(f"  ⚠️  {name}: [yellow]Failed - {result.get('error', 'Unknown error')}[/yellow]")
+                    except Exception as e:
+                        console.print(f"  ⚠️  {name}: [yellow]Failed - {e}[/yellow]")
+            
+            console.print("")  # Empty line for spacing
+    
+    async def run(self):
+        """Main CLI loop"""
+        self.print_banner()
+        
+        # Auto-register agents from config
+        await self.auto_register_agents()
+        
+        while not shutdown_requested:
+            try:
+                # Use standard input with non-deletable colored prompt
+                prompt_symbol = get_prompt_symbol()
+                try:
+                    # Print colored prompt then use input() for non-deletable behavior
+                    console.print(f"[medium_purple3]{prompt_symbol}[/medium_purple3] ", end="")
+                    user_input = input().strip()
+                except (EOFError, KeyboardInterrupt):
+                    raise
+                
+                if not user_input:
+                    continue
+                    
+                # Check for commands
+                if await self.process_command(user_input):
+                    continue
+                    
+                # Process message
+                with safe_status(get_dynamic_status('thinking')):
+                    response = await self.agent.process_message(user_input, self.context_id)
+                    
+                # Display response
+                console.print(f"\n{response}\n")
+                    
+            except KeyboardInterrupt:
+                await self.cmd_quit()
+                
+            except EOFError:
+                await self.cmd_quit()
+                
+            except Exception as e:
+                console.print(f"[red]Error: {e}[/red]")
+                if os.getenv('FUZZFORGE_DEBUG') == '1':
+                    console.print_exception()
+                console.print("")
+        
+        await self.agent.cleanup()
+
+
+def main():
+    """Main entry point"""
+    try:
+        cli = FuzzForgeCLI()
+        asyncio.run(cli.run())
+    except KeyboardInterrupt:
+        console.print("\n[yellow]Interrupted[/yellow]")
+        sys.exit(0)
+    except Exception as e:
+        console.print(f"[red]Fatal error: {e}[/red]")
+        if os.getenv('FUZZFORGE_DEBUG') == '1':
+            console.print_exception()
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
--- a/ai/src/fuzzforge_ai/cognee_integration.py
+++ b/ai/src/fuzzforge_ai/cognee_integration.py
@@ -0,0 +1,469 @@
+"""
+Cognee Integration Module for FuzzForge
+Provides standardized access to project-specific knowledge graphs
+Can be reused by external agents and other components
+"""
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+
+import os
+from typing import Dict, Any, Optional
+from pathlib import Path
+
+
+class CogneeProjectIntegration:
+    """
+    Standardized Cognee integration that can be reused across agents
+    Automatically detects project context and provides knowledge graph access
+    """
+    
+    def __init__(self, project_dir: Optional[str] = None):
+        """
+        Initialize with project directory (defaults to current working directory)
+        
+        Args:
+            project_dir: Path to project directory (optional, defaults to cwd)
+        """
+        self.project_dir = Path(project_dir) if project_dir else Path.cwd()
+        self.config_file = self.project_dir / ".fuzzforge" / "config.yaml"
+        self.project_context = None
+        self._cognee = None
+        self._initialized = False
+        
+    async def initialize(self) -> bool:
+        """
+        Initialize Cognee with project context
+        
+        Returns:
+            bool: True if initialization successful
+        """
+        try:
+            # Import Cognee
+            import cognee
+            self._cognee = cognee
+            
+            # Load project context
+            if not self._load_project_context():
+                return False
+                
+            # Configure Cognee for this project
+            await self._setup_cognee_config()
+            
+            self._initialized = True
+            return True
+            
+        except ImportError:
+            print("Cognee not installed. Install with: pip install cognee")
+            return False
+        except Exception as e:
+            print(f"Failed to initialize Cognee: {e}")
+            return False
+    
+    def _load_project_context(self) -> bool:
+        """Load project context from FuzzForge config"""
+        try:
+            if not self.config_file.exists():
+                print(f"No FuzzForge config found at {self.config_file}")
+                return False
+                
+            import yaml
+            with open(self.config_file, 'r') as f:
+                config = yaml.safe_load(f)
+                
+            self.project_context = {
+                "project_name": config.get("project", {}).get("name", "default"),
+                "project_id": config.get("project", {}).get("id", "default"),
+                "tenant_id": config.get("cognee", {}).get("tenant", "default")
+            }
+            return True
+            
+        except Exception as e:
+            print(f"Error loading project context: {e}")
+            return False
+    
+    async def _setup_cognee_config(self):
+        """Configure Cognee for project-specific access"""
+        # Set API key and model
+        api_key = os.getenv('OPENAI_API_KEY')
+        model = os.getenv('LITELLM_MODEL', 'gpt-4o-mini')
+        
+        if not api_key:
+            raise ValueError("OPENAI_API_KEY required for Cognee operations")
+            
+        # Configure Cognee
+        self._cognee.config.set_llm_api_key(api_key)
+        self._cognee.config.set_llm_model(model)
+        self._cognee.config.set_llm_provider("openai")
+        
+        # Set project-specific directories
+        project_cognee_dir = self.project_dir / ".fuzzforge" / "cognee" / f"project_{self.project_context['project_id']}"
+        
+        self._cognee.config.data_root_directory(str(project_cognee_dir / "data"))
+        self._cognee.config.system_root_directory(str(project_cognee_dir / "system"))
+        
+        # Ensure directories exist
+        project_cognee_dir.mkdir(parents=True, exist_ok=True)
+        (project_cognee_dir / "data").mkdir(exist_ok=True)
+        (project_cognee_dir / "system").mkdir(exist_ok=True)
+    
+    async def search_knowledge_graph(self, query: str, search_type: str = "GRAPH_COMPLETION", dataset: str = None) -> Dict[str, Any]:
+        """
+        Search the project's knowledge graph
+        
+        Args:
+            query: Search query
+            search_type: Type of search ("GRAPH_COMPLETION", "INSIGHTS", "CHUNKS", etc.)
+            dataset: Specific dataset to search (optional)
+            
+        Returns:
+            Dict containing search results
+        """
+        if not self._initialized:
+            await self.initialize()
+            
+        if not self._initialized:
+            return {"error": "Cognee not initialized"}
+            
+        try:
+            from cognee.modules.search.types import SearchType
+
+            # Resolve search type dynamically; fallback to GRAPH_COMPLETION
+            try:
+                search_type_enum = getattr(SearchType, search_type.upper())
+            except AttributeError:
+                search_type_enum = SearchType.GRAPH_COMPLETION
+                search_type = "GRAPH_COMPLETION"
+
+            # Prepare search kwargs
+            search_kwargs = {
+                "query_type": search_type_enum,
+                "query_text": query
+            }
+            
+            # Add dataset filter if specified
+            if dataset:
+                search_kwargs["datasets"] = [dataset]
+                
+            results = await self._cognee.search(**search_kwargs)
+            
+            return {
+                "query": query,
+                "search_type": search_type,
+                "dataset": dataset,
+                "results": results,
+                "project": self.project_context["project_name"]
+            }
+        except Exception as e:
+            return {"error": f"Search failed: {e}"}
+    
+    async def list_knowledge_data(self) -> Dict[str, Any]:
+        """
+        List available data in the knowledge graph
+        
+        Returns:
+            Dict containing available data
+        """
+        if not self._initialized:
+            await self.initialize()
+            
+        if not self._initialized:
+            return {"error": "Cognee not initialized"}
+            
+        try:
+            data = await self._cognee.list_data()
+            return {
+                "project": self.project_context["project_name"],
+                "available_data": data
+            }
+        except Exception as e:
+            return {"error": f"Failed to list data: {e}"}
+    
+    async def cognify_text(self, text: str, dataset: str = None) -> Dict[str, Any]:
+        """
+        Cognify text content into knowledge graph
+
+        Args:
+            text: Text to cognify
+            dataset: Dataset name (defaults to project_name_codebase)
+
+        Returns:
+            Dict containing cognify results
+        """
+        if not self._initialized:
+            await self.initialize()
+
+        if not self._initialized:
+            return {"error": "Cognee not initialized"}
+
+        if not dataset:
+            dataset = f"{self.project_context['project_name']}_codebase"
+
+        try:
+            # Add text to dataset
+            await self._cognee.add([text], dataset_name=dataset)
+
+            # Process (cognify) the dataset
+            await self._cognee.cognify([dataset])
+
+            return {
+                "text_length": len(text),
+                "dataset": dataset,
+                "project": self.project_context["project_name"],
+                "status": "success"
+            }
+        except Exception as e:
+            return {"error": f"Cognify failed: {e}"}
+
+    async def ingest_text_to_dataset(self, text: str, dataset: str = None) -> Dict[str, Any]:
+        """
+        Ingest text content into a specific dataset
+
+        Args:
+            text: Text to ingest
+            dataset: Dataset name (defaults to project_name_codebase)
+
+        Returns:
+            Dict containing ingest results
+        """
+        if not self._initialized:
+            await self.initialize()
+
+        if not self._initialized:
+            return {"error": "Cognee not initialized"}
+
+        if not dataset:
+            dataset = f"{self.project_context['project_name']}_codebase"
+
+        try:
+            # Add text to dataset
+            await self._cognee.add([text], dataset_name=dataset)
+
+            # Process (cognify) the dataset
+            await self._cognee.cognify([dataset])
+
+            return {
+                "text_length": len(text),
+                "dataset": dataset,
+                "project": self.project_context["project_name"],
+                "status": "success"
+            }
+        except Exception as e:
+            return {"error": f"Ingest failed: {e}"}
+    
+    async def ingest_files_to_dataset(self, file_paths: list, dataset: str = None) -> Dict[str, Any]:
+        """
+        Ingest multiple files into a specific dataset
+        
+        Args:
+            file_paths: List of file paths to ingest
+            dataset: Dataset name (defaults to project_name_codebase)
+            
+        Returns:
+            Dict containing ingest results
+        """
+        if not self._initialized:
+            await self.initialize()
+            
+        if not self._initialized:
+            return {"error": "Cognee not initialized"}
+            
+        if not dataset:
+            dataset = f"{self.project_context['project_name']}_codebase"
+            
+        try:
+            # Validate and filter readable files
+            valid_files = []
+            for file_path in file_paths:
+                try:
+                    path = Path(file_path)
+                    if path.exists() and path.is_file():
+                        # Test if file is readable
+                        with open(path, 'r', encoding='utf-8') as f:
+                            f.read(1)
+                        valid_files.append(str(path))
+                except (UnicodeDecodeError, PermissionError, OSError):
+                    continue
+            
+            if not valid_files:
+                return {"error": "No valid files found to ingest"}
+            
+            # Add files to dataset
+            await self._cognee.add(valid_files, dataset_name=dataset)
+            
+            # Process (cognify) the dataset
+            await self._cognee.cognify([dataset])
+            
+            return {
+                "files_processed": len(valid_files),
+                "total_files_requested": len(file_paths),
+                "dataset": dataset,
+                "project": self.project_context["project_name"],
+                "status": "success"
+            }
+        except Exception as e:
+            return {"error": f"Ingest failed: {e}"}
+    
+    async def list_datasets(self) -> Dict[str, Any]:
+        """
+        List all datasets available in the project
+        
+        Returns:
+            Dict containing available datasets
+        """
+        if not self._initialized:
+            await self.initialize()
+            
+        if not self._initialized:
+            return {"error": "Cognee not initialized"}
+            
+        try:
+            # Get available datasets by searching for data
+            data = await self._cognee.list_data()
+            
+            # Extract unique dataset names from the data
+            datasets = set()
+            if isinstance(data, list):
+                for item in data:
+                    if isinstance(item, dict) and 'dataset_name' in item:
+                        datasets.add(item['dataset_name'])
+            
+            return {
+                "project": self.project_context["project_name"],
+                "datasets": list(datasets),
+                "total_datasets": len(datasets)
+            }
+        except Exception as e:
+            return {"error": f"Failed to list datasets: {e}"}
+    
+    async def create_dataset(self, dataset: str) -> Dict[str, Any]:
+        """
+        Create a new dataset (dataset is created automatically when data is added)
+        
+        Args:
+            dataset: Dataset name to create
+            
+        Returns:
+            Dict containing creation result
+        """
+        if not self._initialized:
+            await self.initialize()
+            
+        if not self._initialized:
+            return {"error": "Cognee not initialized"}
+            
+        try:
+            # In Cognee, datasets are created implicitly when data is added
+            # We'll add empty content to create the dataset
+            await self._cognee.add([f"Dataset {dataset} initialized for project {self.project_context['project_name']}"], 
+                                  dataset_name=dataset)
+            
+            return {
+                "dataset": dataset,
+                "project": self.project_context["project_name"],
+                "status": "created"
+            }
+        except Exception as e:
+            return {"error": f"Failed to create dataset: {e}"}
+    
+    def get_project_context(self) -> Optional[Dict[str, str]]:
+        """Get current project context"""
+        return self.project_context
+    
+    def is_initialized(self) -> bool:
+        """Check if Cognee is initialized"""
+        return self._initialized
+
+
+# Convenience functions for easy integration
+async def search_project_codebase(query: str, project_dir: Optional[str] = None, dataset: str = None, search_type: str = "GRAPH_COMPLETION") -> str:
+    """
+    Convenience function to search project codebase
+    
+    Args:
+        query: Search query
+        project_dir: Project directory (optional, defaults to cwd)
+        dataset: Specific dataset to search (optional)
+        search_type: Type of search ("GRAPH_COMPLETION", "INSIGHTS", "CHUNKS")
+        
+    Returns:
+        Formatted search results as string
+    """
+    cognee_integration = CogneeProjectIntegration(project_dir)
+    result = await cognee_integration.search_knowledge_graph(query, search_type, dataset)
+    
+    if "error" in result:
+        return f"Error searching codebase: {result['error']}"
+    
+    project_name = result.get("project", "Unknown")
+    results = result.get("results", [])
+    
+    if not results:
+        return f"No results found for '{query}' in project {project_name}"
+    
+    output = f"Search results for '{query}' in project {project_name}:\n\n"
+    
+    # Format results
+    if isinstance(results, list):
+        for i, item in enumerate(results, 1):
+            if isinstance(item, dict):
+                # Handle structured results
+                output += f"{i}. "
+                if "search_result" in item:
+                    output += f"Dataset: {item.get('dataset_name', 'Unknown')}\n"
+                    for result_item in item["search_result"]:
+                        if isinstance(result_item, dict):
+                            if "name" in result_item:
+                                output += f"   - {result_item['name']}: {result_item.get('description', '')}\n"
+                            elif "text" in result_item:
+                                text = result_item["text"][:200] + "..." if len(result_item["text"]) > 200 else result_item["text"]
+                                output += f"   - {text}\n"
+                            else:
+                                output += f"   - {str(result_item)[:200]}...\n"
+                else:
+                    output += f"{str(item)[:200]}...\n"
+                output += "\n"
+            else:
+                output += f"{i}. {str(item)[:200]}...\n\n"
+    else:
+        output += f"{str(results)[:500]}..."
+    
+    return output
+
+
+async def list_project_knowledge(project_dir: Optional[str] = None) -> str:
+    """
+    Convenience function to list project knowledge
+    
+    Args:
+        project_dir: Project directory (optional, defaults to cwd)
+        
+    Returns:
+        Formatted list of available data
+    """
+    cognee_integration = CogneeProjectIntegration(project_dir)
+    result = await cognee_integration.list_knowledge_data()
+    
+    if "error" in result:
+        return f"Error listing knowledge: {result['error']}"
+    
+    project_name = result.get("project", "Unknown")
+    data = result.get("available_data", [])
+    
+    output = f"Available knowledge in project {project_name}:\n\n"
+    
+    if not data:
+        output += "No data available in knowledge graph"
+    else:
+        for i, item in enumerate(data, 1):
+            output += f"{i}. {item}\n"
+    
+    return output
--- a/ai/src/fuzzforge_ai/cognee_service.py
+++ b/ai/src/fuzzforge_ai/cognee_service.py
@@ -0,0 +1,414 @@
+"""
+Cognee Service for FuzzForge
+Provides integrated Cognee functionality for codebase analysis and knowledge graphs
+"""
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+
+import os
+import logging
+from pathlib import Path
+from typing import Dict, List, Any
+
+logger = logging.getLogger(__name__)
+
+
+class CogneeService:
+    """
+    Service for managing Cognee integration with FuzzForge
+    Handles multi-tenant isolation and project-specific knowledge graphs
+    """
+    
+    def __init__(self, config):
+        """Initialize with FuzzForge config"""
+        self.config = config
+        self.cognee_config = config.get_cognee_config()
+        self.project_context = config.get_project_context()
+        self._cognee = None
+        self._user = None
+        self._initialized = False
+    
+    async def initialize(self):
+        """Initialize Cognee with project-specific configuration"""
+        try:
+            # Ensure environment variables for Cognee are set before import
+            self.config.setup_cognee_environment()
+            logger.debug(
+                "Cognee environment configured",
+                extra={
+                    "data": self.cognee_config.get("data_directory"),
+                    "system": self.cognee_config.get("system_directory"),
+                },
+            )
+
+            import cognee
+            self._cognee = cognee
+            
+            # Configure LLM with API key BEFORE any other cognee operations
+            provider = os.getenv("LLM_PROVIDER", "openai")
+            model = os.getenv("LLM_MODEL") or os.getenv("LITELLM_MODEL", "gpt-4o-mini")
+            api_key = os.getenv("LLM_API_KEY") or os.getenv("OPENAI_API_KEY")
+            endpoint = os.getenv("LLM_ENDPOINT")
+            api_version = os.getenv("LLM_API_VERSION")
+            max_tokens = os.getenv("LLM_MAX_TOKENS")
+
+            if provider.lower() in {"openai", "azure_openai", "custom"} and not api_key:
+                raise ValueError(
+                    "OpenAI-compatible API key is required for Cognee LLM operations. "
+                    "Set OPENAI_API_KEY, LLM_API_KEY, or COGNEE_LLM_API_KEY in your .env"
+                )
+
+            # Expose environment variables for downstream libraries
+            os.environ["LLM_PROVIDER"] = provider
+            os.environ["LITELLM_MODEL"] = model
+            os.environ["LLM_MODEL"] = model
+            if api_key:
+                os.environ["LLM_API_KEY"] = api_key
+                # Maintain compatibility with components still expecting OPENAI_API_KEY
+                if provider.lower() in {"openai", "azure_openai", "custom"}:
+                    os.environ.setdefault("OPENAI_API_KEY", api_key)
+            if endpoint:
+                os.environ["LLM_ENDPOINT"] = endpoint
+            if api_version:
+                os.environ["LLM_API_VERSION"] = api_version
+            if max_tokens:
+                os.environ["LLM_MAX_TOKENS"] = str(max_tokens)
+
+            # Configure Cognee's runtime using its configuration helpers when available
+            if hasattr(cognee.config, "set_llm_provider"):
+                cognee.config.set_llm_provider(provider)
+            if hasattr(cognee.config, "set_llm_model"):
+                cognee.config.set_llm_model(model)
+            if api_key and hasattr(cognee.config, "set_llm_api_key"):
+                cognee.config.set_llm_api_key(api_key)
+            if endpoint and hasattr(cognee.config, "set_llm_endpoint"):
+                cognee.config.set_llm_endpoint(endpoint)
+            if api_version and hasattr(cognee.config, "set_llm_api_version"):
+                cognee.config.set_llm_api_version(api_version)
+            if max_tokens and hasattr(cognee.config, "set_llm_max_tokens"):
+                cognee.config.set_llm_max_tokens(int(max_tokens))
+            
+            # Configure graph database
+            cognee.config.set_graph_db_config({
+                "graph_database_provider": self.cognee_config.get("graph_database_provider", "kuzu"),
+            })
+            
+            # Set data directories
+            data_dir = self.cognee_config.get("data_directory")
+            system_dir = self.cognee_config.get("system_directory")
+            
+            if data_dir:
+                logger.debug("Setting cognee data root", extra={"path": data_dir})
+                cognee.config.data_root_directory(data_dir)
+            if system_dir:
+                logger.debug("Setting cognee system root", extra={"path": system_dir})
+                cognee.config.system_root_directory(system_dir)
+            
+            # Setup multi-tenant user context
+            await self._setup_user_context()
+            
+            self._initialized = True
+            logger.info(f"Cognee initialized for project {self.project_context['project_name']} "
+                       f"with Kuzu at {system_dir}")
+            
+        except ImportError:
+            logger.error("Cognee not installed. Install with: pip install cognee")
+            raise
+        except Exception as e:
+            logger.error(f"Failed to initialize Cognee: {e}")
+            raise
+    
+    async def create_dataset(self):
+        """Create dataset for this project if it doesn't exist"""
+        if not self._initialized:
+            await self.initialize()
+        
+        try:
+            # Dataset creation is handled automatically by Cognee when adding files
+            # We just ensure we have the right context set up
+            dataset_name = f"{self.project_context['project_name']}_codebase"
+            logger.info(f"Dataset {dataset_name} ready for project {self.project_context['project_name']}")
+            return dataset_name
+        except Exception as e:
+            logger.error(f"Failed to create dataset: {e}")
+            raise
+    
+    async def _setup_user_context(self):
+        """Setup user context for multi-tenant isolation"""
+        try:
+            from cognee.modules.users.methods import create_user, get_user
+            
+            # Always try fallback email first to avoid validation issues
+            fallback_email = f"project_{self.project_context['project_id']}@fuzzforge.example"
+            user_tenant = self.project_context['tenant_id']
+            
+            # Try to get existing fallback user first
+            try:
+                self._user = await get_user(fallback_email)
+                logger.info(f"Using existing user: {fallback_email}")
+                return
+            except Exception:
+                # User doesn't exist, try to create fallback
+                pass
+            
+            # Create fallback user
+            try:
+                self._user = await create_user(fallback_email, user_tenant)
+                logger.info(f"Created fallback user: {fallback_email} for tenant: {user_tenant}")
+                return
+            except Exception as fallback_error:
+                logger.warning(f"Fallback user creation failed: {fallback_error}")
+                self._user = None
+                return
+            
+        except Exception as e:
+            logger.warning(f"Could not setup multi-tenant user context: {e}")
+            logger.info("Proceeding with default context")
+            self._user = None
+    
+    def get_project_dataset_name(self, dataset_suffix: str = "codebase") -> str:
+        """Get project-specific dataset name"""
+        return f"{self.project_context['project_name']}_{dataset_suffix}"
+    
+    async def ingest_text(self, content: str, dataset: str = "fuzzforge") -> bool:
+        """Ingest text content into knowledge graph"""
+        if not self._initialized:
+            await self.initialize()
+        
+        try:
+            await self._cognee.add([content], dataset)
+            await self._cognee.cognify([dataset])
+            return True
+        except Exception as e:
+            logger.error(f"Failed to ingest text: {e}")
+            return False
+    
+    async def ingest_files(self, file_paths: List[Path], dataset: str = "fuzzforge") -> Dict[str, Any]:
+        """Ingest multiple files into knowledge graph"""
+        if not self._initialized:
+            await self.initialize()
+        
+        results = {
+            "success": 0,
+            "failed": 0,
+            "errors": []
+        }
+        
+        try:
+            ingest_paths: List[str] = []
+            for file_path in file_paths:
+                try:
+                    with open(file_path, 'r', encoding='utf-8'):
+                        ingest_paths.append(str(file_path))
+                    results["success"] += 1
+                except (UnicodeDecodeError, PermissionError) as exc:
+                    results["failed"] += 1
+                    results["errors"].append(f"{file_path}: {exc}")
+                    logger.warning("Skipping %s: %s", file_path, exc)
+
+            if ingest_paths:
+                await self._cognee.add(ingest_paths, dataset_name=dataset)
+                await self._cognee.cognify([dataset])
+            
+        except Exception as e:
+            logger.error(f"Failed to ingest files: {e}")
+            results["errors"].append(f"Cognify error: {str(e)}")
+        
+        return results
+    
+    async def search_insights(self, query: str, dataset: str = None) -> List[str]:
+        """Search for insights in the knowledge graph"""
+        if not self._initialized:
+            await self.initialize()
+        
+        try:
+            from cognee.modules.search.types import SearchType
+            
+            kwargs = {
+                "query_type": SearchType.INSIGHTS,
+                "query_text": query
+            }
+            
+            if dataset:
+                kwargs["datasets"] = [dataset]
+            
+            results = await self._cognee.search(**kwargs)
+            return results if isinstance(results, list) else []
+            
+        except Exception as e:
+            logger.error(f"Failed to search insights: {e}")
+            return []
+    
+    async def search_chunks(self, query: str, dataset: str = None) -> List[str]:
+        """Search for relevant text chunks"""
+        if not self._initialized:
+            await self.initialize()
+        
+        try:
+            from cognee.modules.search.types import SearchType
+            
+            kwargs = {
+                "query_type": SearchType.CHUNKS,
+                "query_text": query
+            }
+            
+            if dataset:
+                kwargs["datasets"] = [dataset]
+            
+            results = await self._cognee.search(**kwargs)
+            return results if isinstance(results, list) else []
+            
+        except Exception as e:
+            logger.error(f"Failed to search chunks: {e}")
+            return []
+    
+    async def search_graph_completion(self, query: str) -> List[str]:
+        """Search for graph completion (relationships)"""
+        if not self._initialized:
+            await self.initialize()
+        
+        try:
+            from cognee.modules.search.types import SearchType
+            
+            results = await self._cognee.search(
+                query_type=SearchType.GRAPH_COMPLETION,
+                query_text=query
+            )
+            return results if isinstance(results, list) else []
+            
+        except Exception as e:
+            logger.error(f"Failed to search graph completion: {e}")
+            return []
+    
+    async def get_status(self) -> Dict[str, Any]:
+        """Get service status and statistics"""
+        status = {
+            "initialized": self._initialized,
+            "enabled": self.cognee_config.get("enabled", True),
+            "provider": self.cognee_config.get("graph_database_provider", "kuzu"),
+            "data_directory": self.cognee_config.get("data_directory"),
+            "system_directory": self.cognee_config.get("system_directory"),
+        }
+        
+        if self._initialized:
+            try:
+                # Check if directories exist and get sizes
+                data_dir = Path(status["data_directory"])
+                system_dir = Path(status["system_directory"])
+                
+                status.update({
+                    "data_dir_exists": data_dir.exists(),
+                    "system_dir_exists": system_dir.exists(),
+                    "kuzu_db_exists": (system_dir / "kuzu_db").exists(),
+                    "lancedb_exists": (system_dir / "lancedb").exists(),
+                })
+                
+            except Exception as e:
+                status["status_error"] = str(e)
+        
+        return status
+    
+    async def clear_data(self, confirm: bool = False):
+        """Clear all ingested data (dangerous!)"""
+        if not confirm:
+            raise ValueError("Must confirm data clearing with confirm=True")
+        
+        if not self._initialized:
+            await self.initialize()
+        
+        try:
+            await self._cognee.prune.prune_data()
+            await self._cognee.prune.prune_system(metadata=True)
+            logger.info("Cognee data cleared")
+        except Exception as e:
+            logger.error(f"Failed to clear data: {e}")
+            raise
+
+
+class FuzzForgeCogneeIntegration:
+    """
+    Main integration class for FuzzForge + Cognee
+    Provides high-level operations for security analysis
+    """
+    
+    def __init__(self, config):
+        self.service = CogneeService(config)
+    
+    async def analyze_codebase(self, path: Path, recursive: bool = True) -> Dict[str, Any]:
+        """
+        Analyze a codebase and extract security-relevant insights
+        """
+        # Collect code files
+        from fuzzforge_ai.ingest_utils import collect_ingest_files
+
+        files = collect_ingest_files(path, recursive, None, [])
+        
+        if not files:
+            return {"error": "No files found to analyze"}
+        
+        # Ingest files
+        results = await self.service.ingest_files(files, "security_analysis")
+        
+        if results["success"] == 0:
+            return {"error": "Failed to ingest any files", "details": results}
+        
+        # Extract security insights
+        security_queries = [
+            "vulnerabilities security risks",
+            "authentication authorization",
+            "input validation sanitization", 
+            "encryption cryptography",
+            "error handling exceptions",
+            "logging sensitive data"
+        ]
+        
+        insights = {}
+        for query in security_queries:
+            insight_results = await self.service.search_insights(query, "security_analysis")
+            if insight_results:
+                insights[query.replace(" ", "_")] = insight_results
+        
+        return {
+            "files_processed": results["success"],
+            "files_failed": results["failed"],
+            "errors": results["errors"],
+            "security_insights": insights
+        }
+    
+    async def query_codebase(self, query: str, search_type: str = "insights") -> List[str]:
+        """Query the ingested codebase"""
+        if search_type == "insights":
+            return await self.service.search_insights(query)
+        elif search_type == "chunks":
+            return await self.service.search_chunks(query)
+        elif search_type == "graph":
+            return await self.service.search_graph_completion(query)
+        else:
+            raise ValueError(f"Unknown search type: {search_type}")
+    
+    async def get_project_summary(self) -> Dict[str, Any]:
+        """Get a summary of the analyzed project"""
+        # Search for general project insights
+        summary_queries = [
+            "project structure components",
+            "main functionality features",
+            "programming languages frameworks",
+            "dependencies libraries"
+        ]
+        
+        summary = {}
+        for query in summary_queries:
+            results = await self.service.search_insights(query)
+            if results:
+                summary[query.replace(" ", "_")] = results[:3]  # Top 3 results
+        
+        return summary
--- a/ai/src/fuzzforge_ai/config.yaml
+++ b/ai/src/fuzzforge_ai/config.yaml
@@ -0,0 +1,9 @@
+# FuzzForge Registered Agents
+# These agents will be automatically registered on startup
+
+registered_agents:
+
+# Example entries:
+# - name: Calculator
+#   url: http://localhost:10201
+#   description: Mathematical calculations agent
--- a/ai/src/fuzzforge_ai/config_bridge.py
+++ b/ai/src/fuzzforge_ai/config_bridge.py
@@ -0,0 +1,31 @@
+"""Bridge module providing access to the host CLI configuration manager."""
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+
+try:
+    from fuzzforge_cli.config import ProjectConfigManager as _ProjectConfigManager
+except ImportError:  # pragma: no cover - used when CLI not available
+    class _ProjectConfigManager:  # type: ignore[no-redef]
+        """Fallback implementation that raises a helpful error."""
+
+        def __init__(self, *args, **kwargs):
+            raise ImportError(
+                "ProjectConfigManager is unavailable. Install the FuzzForge CLI "
+                "package or supply a compatible configuration object."
+            )
+
+    def __getattr__(name):  # pragma: no cover - defensive
+        raise ImportError("ProjectConfigManager unavailable")
+
+ProjectConfigManager = _ProjectConfigManager
+
+__all__ = ["ProjectConfigManager"]
--- a/ai/src/fuzzforge_ai/config_manager.py
+++ b/ai/src/fuzzforge_ai/config_manager.py
@@ -0,0 +1,134 @@
+"""
+Configuration manager for FuzzForge
+Handles loading and saving registered agents
+"""
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+
+import os
+import yaml
+from typing import Dict, Any, List
+
+class ConfigManager:
+    """Manages FuzzForge agent registry configuration"""
+    
+    def __init__(self, config_path: str = None):
+        """Initialize config manager"""
+        if config_path:
+            self.config_path = config_path
+        else:
+            # Check for local .fuzzforge/agents.yaml first, then fall back to global
+            local_config = os.path.join(os.getcwd(), '.fuzzforge', 'agents.yaml')
+            global_config = os.path.join(os.path.dirname(__file__), 'config.yaml')
+            
+            if os.path.exists(local_config):
+                self.config_path = local_config
+                if os.getenv("FUZZFORGE_DEBUG", "0") == "1":
+                    print(f"[CONFIG] Using local config: {local_config}")
+            else:
+                self.config_path = global_config
+                if os.getenv("FUZZFORGE_DEBUG", "0") == "1":
+                    print(f"[CONFIG] Using global config: {global_config}")
+        
+        self.config = self.load_config()
+    
+    def load_config(self) -> Dict[str, Any]:
+        """Load configuration from YAML file"""
+        if not os.path.exists(self.config_path):
+            # Create default config if it doesn't exist
+            return {'registered_agents': []}
+        
+        try:
+            with open(self.config_path, 'r') as f:
+                config = yaml.safe_load(f) or {}
+                # Ensure registered_agents is a list
+                if 'registered_agents' not in config or config['registered_agents'] is None:
+                    config['registered_agents'] = []
+                return config
+        except Exception as e:
+            print(f"[WARNING] Failed to load config: {e}")
+            return {'registered_agents': []}
+    
+    def save_config(self):
+        """Save current configuration to file"""
+        try:
+            # Create a clean config with comments
+            config_content = """# FuzzForge Registered Agents
+# These agents will be automatically registered on startup
+
+"""
+            # Add the agents list
+            if self.config.get('registered_agents'):
+                config_content += yaml.dump({'registered_agents': self.config['registered_agents']}, 
+                                           default_flow_style=False, sort_keys=False)
+            else:
+                config_content += "registered_agents: []\n"
+            
+            config_content += """
+# Example entries:
+# - name: Calculator
+#   url: http://localhost:10201
+#   description: Mathematical calculations agent
+"""
+            
+            with open(self.config_path, 'w') as f:
+                f.write(config_content)
+                
+            return True
+        except Exception as e:
+            print(f"[ERROR] Failed to save config: {e}")
+            return False
+    
+    def get_registered_agents(self) -> List[Dict[str, Any]]:
+        """Get list of registered agents from config"""
+        return self.config.get('registered_agents', [])
+    
+    def add_registered_agent(self, name: str, url: str, description: str = "") -> bool:
+        """Add a new registered agent to config"""
+        if 'registered_agents' not in self.config:
+            self.config['registered_agents'] = []
+        
+        # Check if agent already exists
+        for agent in self.config['registered_agents']:
+            if agent.get('url') == url:
+                # Update existing agent
+                agent['name'] = name
+                agent['description'] = description
+                return self.save_config()
+        
+        # Add new agent
+        self.config['registered_agents'].append({
+            'name': name,
+            'url': url,
+            'description': description
+        })
+        
+        return self.save_config()
+    
+    def remove_registered_agent(self, name: str = None, url: str = None) -> bool:
+        """Remove a registered agent from config"""
+        if 'registered_agents' not in self.config:
+            return False
+        
+        original_count = len(self.config['registered_agents'])
+        
+        # Filter out the agent
+        self.config['registered_agents'] = [
+            agent for agent in self.config['registered_agents']
+            if not ((name and agent.get('name') == name) or 
+                   (url and agent.get('url') == url))
+        ]
+        
+        if len(self.config['registered_agents']) < original_count:
+            return self.save_config()
+        
+        return False
--- a/ai/src/fuzzforge_ai/ingest_utils.py
+++ b/ai/src/fuzzforge_ai/ingest_utils.py
@@ -0,0 +1,104 @@
+"""Utilities for collecting files to ingest into Cognee."""
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+
+from __future__ import annotations
+
+import fnmatch
+from pathlib import Path
+from typing import Iterable, List, Optional
+
+_DEFAULT_FILE_TYPES = [
+    ".py",
+    ".js",
+    ".ts",
+    ".java",
+    ".cpp",
+    ".c",
+    ".h",
+    ".rs",
+    ".go",
+    ".rb",
+    ".php",
+    ".cs",
+    ".swift",
+    ".kt",
+    ".scala",
+    ".clj",
+    ".hs",
+    ".md",
+    ".txt",
+    ".yaml",
+    ".yml",
+    ".json",
+    ".toml",
+    ".cfg",
+    ".ini",
+]
+
+_DEFAULT_EXCLUDE = [
+    "*.pyc",
+    "__pycache__",
+    ".git",
+    ".svn",
+    ".hg",
+    "node_modules",
+    ".venv",
+    "venv",
+    ".env",
+    "dist",
+    "build",
+    ".pytest_cache",
+    ".mypy_cache",
+    ".tox",
+    "coverage",
+    "*.log",
+    "*.tmp",
+]
+
+
+def collect_ingest_files(
+    path: Path,
+    recursive: bool = True,
+    file_types: Optional[Iterable[str]] = None,
+    exclude: Optional[Iterable[str]] = None,
+) -> List[Path]:
+    """Return a list of files eligible for ingestion."""
+    path = path.resolve()
+    files: List[Path] = []
+
+    extensions = list(file_types) if file_types else list(_DEFAULT_FILE_TYPES)
+    exclusions = list(exclude) if exclude else []
+    exclusions.extend(_DEFAULT_EXCLUDE)
+
+    def should_exclude(file_path: Path) -> bool:
+        file_str = str(file_path)
+        for pattern in exclusions:
+            if fnmatch.fnmatch(file_str, f"*{pattern}*") or fnmatch.fnmatch(file_path.name, pattern):
+                return True
+        return False
+
+    if path.is_file():
+        if not should_exclude(path) and any(str(path).endswith(ext) for ext in extensions):
+            files.append(path)
+        return files
+
+    pattern = "**/*" if recursive else "*"
+    for file_path in path.glob(pattern):
+        if file_path.is_file() and not should_exclude(file_path):
+            if any(str(file_path).endswith(ext) for ext in extensions):
+                files.append(file_path)
+
+    return files
+
+
+__all__ = ["collect_ingest_files"]
--- a/ai/src/fuzzforge_ai/memory_service.py
+++ b/ai/src/fuzzforge_ai/memory_service.py
@@ -0,0 +1,244 @@
+"""
+FuzzForge Memory Service
+Implements ADK MemoryService pattern for conversational memory
+Separate from Cognee which will be used for RAG/codebase analysis
+"""
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+
+import os
+from typing import Dict, Any
+import logging
+
+# ADK Memory imports
+from google.adk.memory import InMemoryMemoryService, BaseMemoryService
+from google.adk.memory.base_memory_service import SearchMemoryResponse
+
+# Optional VertexAI Memory Bank
+try:
+    from google.adk.memory import VertexAiMemoryBankService
+    VERTEX_AVAILABLE = True
+except ImportError:
+    VERTEX_AVAILABLE = False
+
+logger = logging.getLogger(__name__)
+
+
+class FuzzForgeMemoryService:
+    """
+    Manages conversational memory using ADK patterns
+    This is separate from Cognee which will handle RAG/codebase
+    """
+    
+    def __init__(self, memory_type: str = "inmemory", **kwargs):
+        """
+        Initialize memory service
+        
+        Args:
+            memory_type: "inmemory" or "vertexai"
+            **kwargs: Additional args for specific memory service
+                     For vertexai: project, location, agent_engine_id
+        """
+        self.memory_type = memory_type
+        self.service = self._create_service(memory_type, **kwargs)
+        
+    def _create_service(self, memory_type: str, **kwargs) -> BaseMemoryService:
+        """Create the appropriate memory service"""
+        
+        if memory_type == "inmemory":
+            # Use ADK's InMemoryMemoryService for local development
+            logger.info("Using InMemory MemoryService for conversational memory")
+            return InMemoryMemoryService()
+            
+        elif memory_type == "vertexai" and VERTEX_AVAILABLE:
+            # Use VertexAI Memory Bank for production
+            project = kwargs.get('project') or os.getenv('GOOGLE_CLOUD_PROJECT')
+            location = kwargs.get('location') or os.getenv('GOOGLE_CLOUD_LOCATION', 'us-central1')
+            agent_engine_id = kwargs.get('agent_engine_id') or os.getenv('AGENT_ENGINE_ID')
+            
+            if not all([project, location, agent_engine_id]):
+                logger.warning("VertexAI config missing, falling back to InMemory")
+                return InMemoryMemoryService()
+            
+            logger.info(f"Using VertexAI MemoryBank: {agent_engine_id}")
+            return VertexAiMemoryBankService(
+                project=project,
+                location=location,
+                agent_engine_id=agent_engine_id
+            )
+        else:
+            # Default to in-memory
+            logger.info("Defaulting to InMemory MemoryService")
+            return InMemoryMemoryService()
+    
+    async def add_session_to_memory(self, session: Any) -> None:
+        """
+        Add a completed session to long-term memory
+        This extracts meaningful information from the conversation
+        
+        Args:
+            session: The session object to process
+        """
+        try:
+            # Let the underlying service handle the ingestion
+            # It will extract relevant information based on the implementation
+            await self.service.add_session_to_memory(session)
+            
+            logger.debug(f"Added session {session.id} to {self.memory_type} memory")
+            
+        except Exception as e:
+            logger.error(f"Failed to add session to memory: {e}")
+    
+    async def search_memory(self, 
+                          query: str,
+                          app_name: str = "fuzzforge",
+                          user_id: str = None,
+                          max_results: int = 10) -> SearchMemoryResponse:
+        """
+        Search long-term memory for relevant information
+        
+        Args:
+            query: The search query
+            app_name: Application name for filtering
+            user_id: User ID for filtering (optional)
+            max_results: Maximum number of results
+            
+        Returns:
+            SearchMemoryResponse with relevant memories
+        """
+        try:
+            # Search the memory service
+            results = await self.service.search_memory(
+                app_name=app_name,
+                user_id=user_id,
+                query=query
+            )
+            
+            logger.debug(f"Memory search for '{query}' returned {len(results.memories)} results")
+            return results
+            
+        except Exception as e:
+            logger.error(f"Memory search failed: {e}")
+            # Return empty results on error
+            return SearchMemoryResponse(memories=[])
+    
+    async def ingest_completed_sessions(self, session_service) -> int:
+        """
+        Batch ingest all completed sessions into memory
+        Useful for initial memory population
+        
+        Args:
+            session_service: The session service containing sessions
+            
+        Returns:
+            Number of sessions ingested
+        """
+        ingested = 0
+        
+        try:
+            # Get all sessions from the session service
+            sessions = await session_service.list_sessions(app_name="fuzzforge")
+            
+            for session_info in sessions:
+                # Load full session
+                session = await session_service.load_session(
+                    app_name="fuzzforge",
+                    user_id=session_info.get('user_id'),
+                    session_id=session_info.get('id')
+                )
+                
+                if session and len(session.get_events()) > 0:
+                    await self.add_session_to_memory(session)
+                    ingested += 1
+                    
+            logger.info(f"Ingested {ingested} sessions into {self.memory_type} memory")
+            
+        except Exception as e:
+            logger.error(f"Failed to batch ingest sessions: {e}")
+            
+        return ingested
+    
+    def get_status(self) -> Dict[str, Any]:
+        """Get memory service status"""
+        return {
+            "type": self.memory_type,
+            "active": self.service is not None,
+            "vertex_available": VERTEX_AVAILABLE,
+            "details": {
+                "inmemory": "Non-persistent, keyword search",
+                "vertexai": "Persistent, semantic search with LLM extraction"
+            }.get(self.memory_type, "Unknown")
+        }
+
+
+class HybridMemoryManager:
+    """
+    Manages both ADK MemoryService (conversational) and Cognee (RAG/codebase)
+    Provides unified interface for both memory systems
+    """
+    
+    def __init__(self, 
+                 memory_service: FuzzForgeMemoryService = None,
+                 cognee_tools = None):
+        """
+        Initialize with both memory systems
+        
+        Args:
+            memory_service: ADK-pattern memory for conversations
+            cognee_tools: Cognee MCP tools for RAG/codebase
+        """
+        # ADK memory for conversations
+        self.memory_service = memory_service or FuzzForgeMemoryService()
+        
+        # Cognee for knowledge graphs and RAG (future)
+        self.cognee_tools = cognee_tools
+        
+    async def search_conversational_memory(self, query: str) -> SearchMemoryResponse:
+        """Search past conversations using ADK memory"""
+        return await self.memory_service.search_memory(query)
+    
+    async def search_knowledge_graph(self, query: str, search_type: str = "GRAPH_COMPLETION"):
+        """Search Cognee knowledge graph (for RAG/codebase in future)"""
+        if not self.cognee_tools:
+            return None
+            
+        try:
+            # Use Cognee's graph search
+            return await self.cognee_tools.search(
+                query=query,
+                search_type=search_type
+            )
+        except Exception as e:
+            logger.debug(f"Cognee search failed: {e}")
+            return None
+    
+    async def store_in_graph(self, content: str):
+        """Store in Cognee knowledge graph (for codebase analysis later)"""
+        if not self.cognee_tools:
+            return None
+            
+        try:
+            # Use cognify to create graph structures
+            return await self.cognee_tools.cognify(content)
+        except Exception as e:
+            logger.debug(f"Cognee store failed: {e}")
+            return None
+    
+    def get_status(self) -> Dict[str, Any]:
+        """Get status of both memory systems"""
+        return {
+            "conversational_memory": self.memory_service.get_status(),
+            "knowledge_graph": {
+                "active": self.cognee_tools is not None,
+                "purpose": "RAG/codebase analysis (future)"
+            }
+        }
--- a/ai/src/fuzzforge_ai/remote_agent.py
+++ b/ai/src/fuzzforge_ai/remote_agent.py
@@ -0,0 +1,148 @@
+"""
+Remote Agent Connection Handler
+Handles A2A protocol communication with remote agents
+"""
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+
+import httpx
+import uuid
+from typing import Dict, Any, Optional, List
+
+
+class RemoteAgentConnection:
+    """Handles A2A protocol communication with remote agents"""
+    
+    def __init__(self, url: str):
+        """Initialize connection to a remote agent"""
+        self.url = url.rstrip('/')
+        self.agent_card = None
+        self.client = httpx.AsyncClient(timeout=120.0)
+        self.context_id = None
+        
+    async def get_agent_card(self) -> Optional[Dict[str, Any]]:
+        """Get the agent card from the remote agent"""
+        try:
+            # Try new path first (A2A 0.3.0+)
+            response = await self.client.get(f"{self.url}/.well-known/agent-card.json")
+            response.raise_for_status()
+            self.agent_card = response.json()
+            return self.agent_card
+        except Exception:
+            # Try old path for compatibility
+            try:
+                response = await self.client.get(f"{self.url}/.well-known/agent.json")
+                response.raise_for_status()
+                self.agent_card = response.json()
+                return self.agent_card
+            except Exception as e:
+                print(f"Failed to get agent card from {self.url}: {e}")
+                return None
+                
+    async def send_message(self, message: str | Dict[str, Any] | List[Dict[str, Any]]) -> str:
+        """Send a message to the remote agent using A2A protocol"""
+        try:
+            parts: List[Dict[str, Any]]
+            metadata: Dict[str, Any] | None = None
+            if isinstance(message, dict):
+                metadata = message.get("metadata") if isinstance(message.get("metadata"), dict) else None
+                raw_parts = message.get("parts", [])
+                if not raw_parts:
+                    text_value = message.get("text") or message.get("message")
+                    if isinstance(text_value, str):
+                        raw_parts = [{"type": "text", "text": text_value}]
+                parts = [raw_part for raw_part in raw_parts if isinstance(raw_part, dict)]
+            elif isinstance(message, list):
+                parts = [part for part in message if isinstance(part, dict)]
+                metadata = None
+            else:
+                parts = [{"type": "text", "text": message}]
+                metadata = None
+
+            if not parts:
+                parts = [{"type": "text", "text": ""}]
+
+            # Build JSON-RPC request per A2A spec
+            payload = {
+                "jsonrpc": "2.0",
+                "method": "message/send",
+                "params": {
+                    "message": {
+                        "messageId": str(uuid.uuid4()),
+                        "role": "user",
+                        "parts": parts,
+                    }
+                },
+                "id": 1
+            }
+
+            if metadata:
+                payload["params"]["message"]["metadata"] = metadata
+            
+            # Include context if we have one
+            if self.context_id:
+                payload["params"]["contextId"] = self.context_id
+            
+            # Send to root endpoint per A2A protocol
+            response = await self.client.post(f"{self.url}/", json=payload)
+            response.raise_for_status()
+            result = response.json()
+            
+            # Extract response based on A2A JSON-RPC format
+            if isinstance(result, dict):
+                # Update context for continuity
+                if "result" in result and isinstance(result["result"], dict):
+                    if "contextId" in result["result"]:
+                        self.context_id = result["result"]["contextId"]
+                    
+                    # Extract text from artifacts
+                    if "artifacts" in result["result"]:
+                        texts = []
+                        for artifact in result["result"]["artifacts"]:
+                            if isinstance(artifact, dict) and "parts" in artifact:
+                                for part in artifact["parts"]:
+                                    if isinstance(part, dict) and "text" in part:
+                                        texts.append(part["text"])
+                        if texts:
+                            return " ".join(texts)
+                    
+                    # Extract from message format
+                    if "message" in result["result"]:
+                        msg = result["result"]["message"]
+                        if isinstance(msg, dict) and "parts" in msg:
+                            texts = []
+                            for part in msg["parts"]:
+                                if isinstance(part, dict) and "text" in part:
+                                    texts.append(part["text"])
+                            return " ".join(texts) if texts else str(msg)
+                        return str(msg)
+                    
+                    return str(result["result"])
+                
+                # Handle error response
+                elif "error" in result:
+                    error = result["error"]
+                    if isinstance(error, dict):
+                        return f"Error: {error.get('message', str(error))}"
+                    return f"Error: {error}"
+                
+                # Fallback
+                return result.get("response", result.get("message", str(result)))
+            
+            return str(result)
+            
+        except Exception as e:
+            return f"Error communicating with agent: {e}"
+            
+    async def close(self):
+        """Close the connection properly"""
+        await self.client.aclose()
--- a/backend/Dockerfile
+++ b/backend/Dockerfile
@@ -0,0 +1,37 @@
+FROM python:3.11-slim
+
+WORKDIR /app
+
+# Install system dependencies including Docker client and rsync
+RUN apt-get update && apt-get install -y \
+    curl \
+    ca-certificates \
+    gnupg \
+    lsb-release \
+    rsync \
+    && curl -fsSL https://download.docker.com/linux/debian/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg \
+    && echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null \
+    && apt-get update \
+    && apt-get install -y docker-ce-cli \
+    && rm -rf /var/lib/apt/lists/*
+
+# Docker client configuration removed - localhost:5001 doesn't require insecure registry config
+
+# Copy project files
+COPY pyproject.toml ./
+
+# Install dependencies with pip
+RUN pip install --no-cache-dir -e .
+
+# Copy source code
+COPY . .
+
+# Expose ports (API on 8000, MCP on 8010)
+EXPOSE 8000 8010
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+
+# Start the application
+CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
--- a/backend/README.md
+++ b/backend/README.md
@@ -0,0 +1,316 @@
+# FuzzForge Backend
+
+A stateless API server for security testing workflow orchestration using Temporal. This system dynamically discovers workflows, executes them in isolated worker environments, and returns findings in SARIF format.
+
+## Architecture Overview
+
+### Core Components
+
+1. **Workflow Discovery System**: Automatically discovers workflows at startup
+2. **Module System**: Reusable components (scanner, analyzer, reporter) with a common interface
+3. **Temporal Integration**: Handles workflow orchestration, execution, and monitoring with vertical workers
+4. **File Upload & Storage**: HTTP multipart upload to MinIO for target files
+5. **SARIF Output**: Standardized security findings format
+
+### Key Features
+
+- **Stateless**: No persistent data, fully scalable
+- **Generic**: No hardcoded workflows, automatic discovery
+- **Isolated**: Each workflow runs in specialized vertical workers
+- **Extensible**: Easy to add new workflows and modules
+- **Secure**: File upload with MinIO storage, automatic cleanup via lifecycle policies
+- **Observable**: Comprehensive logging and status tracking
+
+## Quick Start
+
+### Prerequisites
+
+- Docker and Docker Compose
+
+### Installation
+
+From the project root, start all services:
+
+```bash
+docker-compose -f docker-compose.temporal.yaml up -d
+```
+
+This will start:
+- Temporal server (Web UI at http://localhost:8233, gRPC at :7233)
+- MinIO (S3 storage at http://localhost:9000, Console at http://localhost:9001)
+- PostgreSQL database (for Temporal state)
+- Vertical workers (worker-rust, worker-android, worker-web, etc.)
+- FuzzForge backend API (port 8000)
+
+**Note**: MinIO console login: `fuzzforge` / `fuzzforge123`
+
+## API Endpoints
+
+### Workflows
+
+- `GET /workflows` - List all discovered workflows
+- `GET /workflows/{name}/metadata` - Get workflow metadata and parameters
+- `GET /workflows/{name}/parameters` - Get workflow parameter schema
+- `GET /workflows/metadata/schema` - Get metadata.yaml schema
+- `POST /workflows/{name}/submit` - Submit a workflow for execution (path-based, legacy)
+- `POST /workflows/{name}/upload-and-submit` - **Upload local files and submit workflow** (recommended)
+
+### Runs
+
+- `GET /runs/{run_id}/status` - Get run status
+- `GET /runs/{run_id}/findings` - Get SARIF findings from completed run
+- `GET /runs/{workflow_name}/findings/{run_id}` - Alternative findings endpoint with workflow name
+
+## Workflow Structure
+
+Each workflow must have:
+
+```
+toolbox/workflows/{workflow_name}/
+   workflow.py       # Temporal workflow definition
+   metadata.yaml     # Mandatory metadata (parameters, version, vertical, etc.)
+   requirements.txt  # Optional Python dependencies (installed in vertical worker)
+```
+
+**Note**: With Temporal architecture, workflows run in pre-built vertical workers (e.g., `worker-rust`, `worker-android`), not individual Docker containers. The workflow code is mounted as a volume and discovered at runtime.
+
+### Example metadata.yaml
+
+```yaml
+name: security_assessment
+version: "1.0.0"
+description: "Comprehensive security analysis workflow"
+author: "FuzzForge Team"
+category: "comprehensive"
+vertical: "rust"  # Routes to worker-rust
+tags:
+  - "security"
+  - "analysis"
+  - "comprehensive"
+
+requirements:
+  tools:
+    - "file_scanner"
+    - "security_analyzer"
+    - "sarif_reporter"
+  resources:
+    memory: "512Mi"
+    cpu: "500m"
+    timeout: 1800
+
+has_docker: true
+
+parameters:
+  type: object
+  properties:
+    target_path:
+      type: string
+      default: "/workspace"
+      description: "Path to analyze"
+    scanner_config:
+      type: object
+      description: "Scanner configuration"
+      properties:
+        max_file_size:
+          type: integer
+          description: "Maximum file size to scan (bytes)"
+
+output_schema:
+  type: object
+  properties:
+    sarif:
+      type: object
+      description: "SARIF-formatted security findings"
+    summary:
+      type: object
+      description: "Scan execution summary"
+```
+
+### Metadata Field Descriptions
+
+- **name**: Workflow identifier (must match directory name)
+- **version**: Semantic version (x.y.z format)
+- **description**: Human-readable description of the workflow
+- **author**: Workflow author/maintainer
+- **category**: Workflow category (comprehensive, specialized, fuzzing, focused)
+- **tags**: Array of descriptive tags for categorization
+- **requirements.tools**: Required security tools that the workflow uses
+- **requirements.resources**: Resource requirements enforced at runtime:
+  - `memory`: Memory limit (e.g., "512Mi", "1Gi")
+  - `cpu`: CPU limit (e.g., "500m" for 0.5 cores, "1" for 1 core)
+  - `timeout`: Maximum execution time in seconds
+- **parameters**: JSON Schema object defining workflow parameters
+- **output_schema**: Expected output format (typically SARIF)
+
+### Resource Requirements
+
+Resource requirements defined in workflow metadata are automatically enforced. Users can override defaults when submitting workflows:
+
+```bash
+curl -X POST "http://localhost:8000/workflows/security_assessment/submit" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "target_path": "/tmp/project",
+    "resource_limits": {
+      "memory_limit": "1Gi",
+      "cpu_limit": "1"
+    }
+  }'
+```
+
+Resource precedence: User limits > Workflow requirements > System defaults
+
+## File Upload and Target Access
+
+### Upload Endpoint
+
+The backend provides an upload endpoint for submitting workflows with local files:
+
+```
+POST /workflows/{workflow_name}/upload-and-submit
+Content-Type: multipart/form-data
+
+Parameters:
+  file: File upload (supports .tar.gz for directories)
+  parameters: JSON string of workflow parameters (optional)
+  timeout: Execution timeout in seconds (optional)
+```
+
+Example using curl:
+
+```bash
+# Upload a directory (create tarball first)
+tar -czf project.tar.gz /path/to/project
+curl -X POST "http://localhost:8000/workflows/security_assessment/upload-and-submit" \
+  -F "file=@project.tar.gz" \
+  -F "parameters={\"check_secrets\":true}"
+
+# Upload a single file
+curl -X POST "http://localhost:8000/workflows/security_assessment/upload-and-submit" \
+  -F "file=@binary.elf"
+```
+
+### Storage Flow
+
+1. **CLI/API uploads file** via HTTP multipart
+2. **Backend receives file** and streams to temporary location (max 10GB)
+3. **Backend uploads to MinIO** with generated `target_id`
+4. **Workflow is submitted** to Temporal with `target_id`
+5. **Worker downloads target** from MinIO to local cache
+6. **Workflow processes target** from cache
+7. **MinIO lifecycle policy** deletes files after 7 days
+
+### Advantages
+
+- **No host filesystem access required** - workers can run anywhere
+- **Automatic cleanup** - lifecycle policies prevent disk exhaustion
+- **Caching** - repeated workflows reuse cached targets
+- **Multi-host ready** - targets accessible from any worker
+- **Secure** - isolated storage, no arbitrary host path access
+
+## Module Development
+
+Modules implement the `BaseModule` interface:
+
+```python
+from src.toolbox.modules.base import BaseModule, ModuleMetadata, ModuleResult
+
+class MyModule(BaseModule):
+    def get_metadata(self) -> ModuleMetadata:
+        return ModuleMetadata(
+            name="my_module",
+            version="1.0.0",
+            description="Module description",
+            category="scanner",
+            ...
+        )
+
+    async def execute(self, config: Dict, workspace: Path) -> ModuleResult:
+        # Module logic here
+        findings = [...]
+        return self.create_result(findings=findings)
+
+    def validate_config(self, config: Dict) -> bool:
+        # Validate configuration
+        return True
+```
+
+## Submitting a Workflow
+
+### With File Upload (Recommended)
+
+```bash
+# Automatic tarball and upload
+tar -czf project.tar.gz /home/user/project
+curl -X POST "http://localhost:8000/workflows/security_assessment/upload-and-submit" \
+  -F "file=@project.tar.gz" \
+  -F "parameters={\"scanner_config\":{\"patterns\":[\"*.py\"]},\"analyzer_config\":{\"check_secrets\":true}}"
+```
+
+### Legacy Path-Based Submission
+
+```bash
+# Only works if backend and target are on same machine
+curl -X POST "http://localhost:8000/workflows/security_assessment/submit" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "target_path": "/home/user/project",
+    "parameters": {
+      "scanner_config": {"patterns": ["*.py"]},
+      "analyzer_config": {"check_secrets": true}
+    }
+  }'
+```
+
+## Getting Findings
+
+```bash
+curl "http://localhost:8000/runs/{run_id}/findings"
+```
+
+Returns SARIF-formatted findings:
+
+```json
+{
+  "workflow": "security_assessment",
+  "run_id": "abc-123",
+  "sarif": {
+    "version": "2.1.0",
+    "runs": [{
+      "tool": {...},
+      "results": [...]
+    }]
+  }
+}
+```
+
+## Security Considerations
+
+1. **File Upload Security**: Files uploaded to MinIO with isolated storage
+2. **Read-Only Default**: Target files accessed as read-only unless explicitly set
+3. **Worker Isolation**: Each workflow runs in isolated vertical workers
+4. **Resource Limits**: Can set CPU/memory limits per worker
+5. **Automatic Cleanup**: MinIO lifecycle policies delete old files after 7 days
+
+## Development
+
+### Adding a New Workflow
+
+1. Create directory: `toolbox/workflows/my_workflow/`
+2. Add `workflow.py` with a Temporal workflow (using `@workflow.defn`)
+3. Add mandatory `metadata.yaml` with `vertical` field
+4. Restart the appropriate worker: `docker-compose -f docker-compose.temporal.yaml restart worker-rust`
+5. Worker will automatically discover and register the new workflow
+
+### Adding a New Module
+
+1. Create module in `toolbox/modules/{category}/`
+2. Implement `BaseModule` interface
+3. Use in workflows via import
+
+### Adding a New Vertical Worker
+
+1. Create worker directory: `workers/{vertical}/`
+2. Create `Dockerfile` with required tools
+3. Add worker to `docker-compose.temporal.yaml`
+4. Worker will automatically discover workflows with matching `vertical` in metadata
--- a/backend/benchmarks/README.md
+++ b/backend/benchmarks/README.md
@@ -0,0 +1,184 @@
+# FuzzForge Benchmark Suite
+
+Performance benchmarking infrastructure organized by module category.
+
+## Directory Structure
+
+```
+benchmarks/
+├── conftest.py              # Benchmark fixtures
+├── category_configs.py      # Category-specific thresholds
+├── by_category/             # Benchmarks organized by category
+│   ├── fuzzer/
+│   │   ├── bench_cargo_fuzz.py
+│   │   └── bench_atheris.py
+│   ├── scanner/
+│   │   └── bench_file_scanner.py
+│   ├── secret_detection/
+│   │   ├── bench_gitleaks.py
+│   │   └── bench_trufflehog.py
+│   └── analyzer/
+│       └── bench_security_analyzer.py
+├── fixtures/                # Benchmark test data
+│   ├── small/               # ~1K LOC
+│   ├── medium/              # ~10K LOC
+│   └── large/               # ~100K LOC
+└── results/                 # Benchmark results (JSON)
+```
+
+## Module Categories
+
+### Fuzzer
+**Expected Metrics**: execs/sec, coverage_rate, time_to_crash, memory_usage
+
+**Performance Thresholds**:
+- Min 1000 execs/sec
+- Max 10s for small projects
+- Max 2GB memory
+
+### Scanner
+**Expected Metrics**: files/sec, LOC/sec, findings_count
+
+**Performance Thresholds**:
+- Min 100 files/sec
+- Min 10K LOC/sec
+- Max 512MB memory
+
+### Secret Detection
+**Expected Metrics**: patterns/sec, precision, recall, F1
+
+**Performance Thresholds**:
+- Min 90% precision
+- Min 95% recall
+- Max 5 false positives per 100 secrets
+
+### Analyzer
+**Expected Metrics**: analysis_depth, files/sec, accuracy
+
+**Performance Thresholds**:
+- Min 10 files/sec (deep analysis)
+- Min 85% accuracy
+- Max 2GB memory
+
+## Running Benchmarks
+
+### All Benchmarks
+```bash
+cd backend
+pytest benchmarks/ --benchmark-only -v
+```
+
+### Specific Category
+```bash
+pytest benchmarks/by_category/fuzzer/ --benchmark-only -v
+```
+
+### With Comparison
+```bash
+# Run and save baseline
+pytest benchmarks/ --benchmark-only --benchmark-save=baseline
+
+# Compare against baseline
+pytest benchmarks/ --benchmark-only --benchmark-compare=baseline
+```
+
+### Generate Histogram
+```bash
+pytest benchmarks/ --benchmark-only --benchmark-histogram=histogram
+```
+
+## Benchmark Results
+
+Results are saved as JSON and include:
+- Mean execution time
+- Standard deviation
+- Min/Max values
+- Iterations per second
+- Memory usage
+
+Example output:
+```
+------------------------ benchmark: fuzzer --------------------------
+Name                                Mean      StdDev    Ops/Sec
+bench_cargo_fuzz[discovery]        0.0012s   0.0001s   833.33
+bench_cargo_fuzz[execution]        0.1250s   0.0050s     8.00
+bench_cargo_fuzz[memory]           0.0100s   0.0005s   100.00
+---------------------------------------------------------------------
+```
+
+## CI/CD Integration
+
+Benchmarks run:
+- **Nightly**: Full benchmark suite, track trends
+- **On PR**: When benchmarks/ or modules/ changed
+- **Manual**: Via workflow_dispatch
+
+### Regression Detection
+
+Benchmarks automatically fail if:
+- Performance degrades >10%
+- Memory usage exceeds thresholds
+- Throughput drops below minimum
+
+See `.github/workflows/benchmark.yml` for configuration.
+
+## Adding New Benchmarks
+
+### 1. Create benchmark file in category directory
+```python
+# benchmarks/by_category/fuzzer/bench_new_fuzzer.py
+
+import pytest
+from benchmarks.category_configs import ModuleCategory, get_threshold
+
+@pytest.mark.benchmark(group="fuzzer")
+def test_execution_performance(benchmark, new_fuzzer, test_workspace):
+    """Benchmark execution speed"""
+    result = benchmark(new_fuzzer.execute, config, test_workspace)
+
+    # Validate against threshold
+    threshold = get_threshold(ModuleCategory.FUZZER, "max_execution_time_small")
+    assert result.execution_time < threshold
+```
+
+### 2. Update category_configs.py if needed
+Add new thresholds or metrics for your module.
+
+### 3. Run locally
+```bash
+pytest benchmarks/by_category/fuzzer/bench_new_fuzzer.py --benchmark-only -v
+```
+
+## Best Practices
+
+1. **Use mocking** for external dependencies (network, disk I/O)
+2. **Fixed iterations** for consistent benchmarking
+3. **Warm-up runs** for JIT-compiled code
+4. **Category-specific metrics** aligned with module purpose
+5. **Realistic fixtures** that represent actual use cases
+6. **Memory profiling** using tracemalloc
+7. **Compare apples to apples** within the same category
+
+## Interpreting Results
+
+### Good Performance
+- ✅ Execution time below threshold
+- ✅ Memory usage within limits
+- ✅ Throughput meets minimum
+- ✅ <5% variance across runs
+
+### Performance Issues
+- ⚠️ Execution time 10-20% over threshold
+- ❌ Execution time >20% over threshold
+- ❌ Memory leaks (increasing over iterations)
+- ❌ High variance (>10%) indicates instability
+
+## Tracking Performance Over Time
+
+Benchmark results are stored as artifacts with:
+- Commit SHA
+- Timestamp
+- Environment details (Python version, OS)
+- Full metrics
+
+Use these to track long-term performance trends and detect gradual degradation.
--- a/backend/benchmarks/by_category/fuzzer/bench_cargo_fuzz.py
+++ b/backend/benchmarks/by_category/fuzzer/bench_cargo_fuzz.py
@@ -0,0 +1,221 @@
+"""
+Benchmarks for CargoFuzzer module
+
+Tests performance characteristics of Rust fuzzing:
+- Execution throughput (execs/sec)
+- Coverage rate
+- Memory efficiency
+- Time to first crash
+"""
+
+import pytest
+import asyncio
+from pathlib import Path
+from unittest.mock import AsyncMock, patch
+import sys
+
+sys.path.insert(0, str(Path(__file__).resolve().parents[3] / "toolbox"))
+
+from modules.fuzzer.cargo_fuzzer import CargoFuzzer
+from benchmarks.category_configs import ModuleCategory, get_threshold
+
+
+@pytest.fixture
+def cargo_fuzzer():
+    """Create CargoFuzzer instance for benchmarking"""
+    return CargoFuzzer()
+
+
+@pytest.fixture
+def benchmark_config():
+    """Benchmark-optimized configuration"""
+    return {
+        "target_name": None,
+        "max_iterations": 10000,  # Fixed iterations for consistent benchmarking
+        "timeout_seconds": 30,
+        "sanitizer": "address"
+    }
+
+
+@pytest.fixture
+def mock_rust_workspace(tmp_path):
+    """Create a minimal Rust workspace for benchmarking"""
+    workspace = tmp_path / "rust_project"
+    workspace.mkdir()
+
+    # Cargo.toml
+    (workspace / "Cargo.toml").write_text("""[package]
+name = "bench_project"
+version = "0.1.0"
+edition = "2021"
+""")
+
+    # src/lib.rs
+    src = workspace / "src"
+    src.mkdir()
+    (src / "lib.rs").write_text("""
+pub fn benchmark_function(data: &[u8]) -> Vec<u8> {
+    data.to_vec()
+}
+""")
+
+    # fuzz structure
+    fuzz = workspace / "fuzz"
+    fuzz.mkdir()
+    (fuzz / "Cargo.toml").write_text("""[package]
+name = "bench_project-fuzz"
+version = "0.0.0"
+edition = "2021"
+
+[dependencies]
+libfuzzer-sys = "0.4"
+
+[dependencies.bench_project]
+path = ".."
+
+[[bin]]
+name = "fuzz_target_1"
+path = "fuzz_targets/fuzz_target_1.rs"
+""")
+
+    targets = fuzz / "fuzz_targets"
+    targets.mkdir()
+    (targets / "fuzz_target_1.rs").write_text("""#![no_main]
+use libfuzzer_sys::fuzz_target;
+use bench_project::benchmark_function;
+
+fuzz_target!(|data: &[u8]| {
+    let _ = benchmark_function(data);
+});
+""")
+
+    return workspace
+
+
+class TestCargoFuzzerPerformance:
+    """Benchmark CargoFuzzer performance metrics"""
+
+    @pytest.mark.benchmark(group="fuzzer")
+    def test_target_discovery_performance(self, benchmark, cargo_fuzzer, mock_rust_workspace):
+        """Benchmark fuzz target discovery speed"""
+        def discover():
+            return asyncio.run(cargo_fuzzer._discover_fuzz_targets(mock_rust_workspace))
+
+        result = benchmark(discover)
+        assert len(result) > 0
+
+    @pytest.mark.benchmark(group="fuzzer")
+    def test_config_validation_performance(self, benchmark, cargo_fuzzer, benchmark_config):
+        """Benchmark configuration validation speed"""
+        result = benchmark(cargo_fuzzer.validate_config, benchmark_config)
+        assert result is True
+
+    @pytest.mark.benchmark(group="fuzzer")
+    def test_module_initialization_performance(self, benchmark):
+        """Benchmark module instantiation time"""
+        def init_module():
+            return CargoFuzzer()
+
+        module = benchmark(init_module)
+        assert module is not None
+
+
+class TestCargoFuzzerThroughput:
+    """Benchmark execution throughput"""
+
+    @pytest.mark.benchmark(group="fuzzer")
+    def test_execution_throughput(self, benchmark, cargo_fuzzer, mock_rust_workspace, benchmark_config):
+        """Benchmark fuzzing execution throughput"""
+
+        # Mock actual fuzzing to focus on orchestration overhead
+        async def mock_run(workspace, target, config, callback):
+            # Simulate 10K execs at 1000 execs/sec
+            if callback:
+                await callback({
+                    "total_execs": 10000,
+                    "execs_per_sec": 1000.0,
+                    "crashes": 0,
+                    "coverage": 50,
+                    "corpus_size": 10,
+                    "elapsed_time": 10
+                })
+            return [], {"total_executions": 10000, "execution_time": 10.0}
+
+        with patch.object(cargo_fuzzer, '_build_fuzz_target', new_callable=AsyncMock, return_value=True):
+            with patch.object(cargo_fuzzer, '_run_fuzzing', side_effect=mock_run):
+                with patch.object(cargo_fuzzer, '_parse_crash_artifacts', new_callable=AsyncMock, return_value=[]):
+                    def run_fuzzer():
+                        # Run in new event loop
+                        loop = asyncio.new_event_loop()
+                        try:
+                            return loop.run_until_complete(
+                                cargo_fuzzer.execute(benchmark_config, mock_rust_workspace)
+                            )
+                        finally:
+                            loop.close()
+
+                    result = benchmark(run_fuzzer)
+                    assert result.status == "success"
+
+                    # Verify performance threshold
+                    threshold = get_threshold(ModuleCategory.FUZZER, "max_execution_time_small")
+                    assert result.execution_time < threshold, \
+                        f"Execution time {result.execution_time}s exceeds threshold {threshold}s"
+
+
+class TestCargoFuzzerMemory:
+    """Benchmark memory efficiency"""
+
+    @pytest.mark.benchmark(group="fuzzer")
+    def test_memory_overhead(self, benchmark, cargo_fuzzer, mock_rust_workspace, benchmark_config):
+        """Benchmark memory usage during execution"""
+        import tracemalloc
+
+        def measure_memory():
+            tracemalloc.start()
+
+            # Simulate operations
+            cargo_fuzzer.validate_config(benchmark_config)
+            asyncio.run(cargo_fuzzer._discover_fuzz_targets(mock_rust_workspace))
+
+            current, peak = tracemalloc.get_traced_memory()
+            tracemalloc.stop()
+
+            return peak / 1024 / 1024  # Convert to MB
+
+        peak_mb = benchmark(measure_memory)
+
+        # Check against threshold
+        max_memory = get_threshold(ModuleCategory.FUZZER, "max_memory_mb")
+        assert peak_mb < max_memory, \
+            f"Peak memory {peak_mb:.2f}MB exceeds threshold {max_memory}MB"
+
+
+class TestCargoFuzzerScalability:
+    """Benchmark scalability characteristics"""
+
+    @pytest.mark.benchmark(group="fuzzer")
+    def test_multiple_target_discovery(self, benchmark, cargo_fuzzer, tmp_path):
+        """Benchmark discovery with multiple targets"""
+        workspace = tmp_path / "multi_target"
+        workspace.mkdir()
+
+        # Create workspace with 10 fuzz targets
+        (workspace / "Cargo.toml").write_text("[package]\nname = \"test\"\nversion = \"0.1.0\"\nedition = \"2021\"")
+        src = workspace / "src"
+        src.mkdir()
+        (src / "lib.rs").write_text("pub fn test() {}")
+
+        fuzz = workspace / "fuzz"
+        fuzz.mkdir()
+        targets = fuzz / "fuzz_targets"
+        targets.mkdir()
+
+        for i in range(10):
+            (targets / f"fuzz_target_{i}.rs").write_text("// Target")
+
+        def discover():
+            return asyncio.run(cargo_fuzzer._discover_fuzz_targets(workspace))
+
+        result = benchmark(discover)
+        assert len(result) == 10
--- a/backend/benchmarks/by_category/secret_detection/README.md
+++ b/backend/benchmarks/by_category/secret_detection/README.md
@@ -0,0 +1,240 @@
+# Secret Detection Benchmarks
+
+Comprehensive benchmarking suite comparing secret detection tools via complete workflow execution:
+- **Gitleaks** - Fast pattern-based detection
+- **TruffleHog** - Entropy analysis with verification
+- **LLM Detector** - AI-powered semantic analysis (gpt-4o-mini, gpt-5-mini)
+
+## Quick Start
+
+### Run All Comparisons
+
+```bash
+cd backend
+python benchmarks/by_category/secret_detection/compare_tools.py
+```
+
+This will run all workflows on `test_projects/secret_detection_benchmark/` and generate comparison reports.
+
+### Run Benchmark Tests
+
+```bash
+# All benchmarks (Gitleaks, TruffleHog, LLM with 3 models)
+pytest benchmarks/by_category/secret_detection/bench_comparison.py --benchmark-only -v
+
+# Specific tool only
+pytest benchmarks/by_category/secret_detection/bench_comparison.py::TestSecretDetectionComparison::test_gitleaks_workflow --benchmark-only -v
+
+# Performance tests only
+pytest benchmarks/by_category/secret_detection/bench_comparison.py::TestSecretDetectionPerformance --benchmark-only -v
+```
+
+## Ground Truth Dataset
+
+**Controlled Benchmark** (`test_projects/secret_detection_benchmark/`)
+
+**Exactly 32 documented secrets** for accurate precision/recall testing:
+- **12 Easy**: Standard patterns (AWS keys, GitHub PATs, Stripe keys, SSH keys)
+- **10 Medium**: Obfuscated (Base64, hex, concatenated, in comments, Unicode)
+- **10 Hard**: Well hidden (ROT13, binary, XOR, reversed, template strings, regex patterns)
+
+All secrets documented in `secret_detection_benchmark_GROUND_TRUTH.json` with exact file paths and line numbers.
+
+See `test_projects/secret_detection_benchmark/README.md` for details.
+
+## Metrics Measured
+
+### Accuracy Metrics
+- **Precision**: TP / (TP + FP) - How many detected secrets are real?
+- **Recall**: TP / (TP + FN) - How many real secrets were found?
+- **F1 Score**: Harmonic mean of precision and recall
+- **False Positive Rate**: FP / Total Detected
+
+### Performance Metrics
+- **Execution Time**: Total time to scan all files
+- **Throughput**: Files/secrets scanned per second
+- **Memory Usage**: Peak memory during execution
+
+### Thresholds (from `category_configs.py`)
+- Minimum Precision: 90%
+- Minimum Recall: 95%
+- Max Execution Time (small): 2.0s
+- Max False Positives: 5 per 100 secrets
+
+## Tool Comparison
+
+### Gitleaks
+**Strengths:**
+- Fastest execution
+- Git-aware (commit history scanning)
+- Low false positive rate
+- No API required
+- Works offline
+
+**Weaknesses:**
+- Pattern-based only
+- May miss obfuscated secrets
+- Limited to known patterns
+
+### TruffleHog
+**Strengths:**
+- Secret verification (validates if active)
+- High detection rate with entropy analysis
+- Multiple detectors (600+ secret types)
+- Catches high-entropy strings
+
+**Weaknesses:**
+- Slower than Gitleaks
+- Higher false positive rate
+- Verification requires network calls
+
+### LLM Detector
+**Strengths:**
+- Semantic understanding of context
+- Catches novel/custom secret patterns
+- Can reason about what "looks like" a secret
+- Multiple model options (GPT-4, Claude, etc.)
+- Understands code context
+
+**Weaknesses:**
+- Slowest (API latency + LLM processing)
+- Most expensive (LLM API costs)
+- Requires A2A agent infrastructure
+- Accuracy varies by model
+- May miss well-disguised secrets
+
+## Results Directory
+
+After running comparisons, results are saved to:
+```
+benchmarks/by_category/secret_detection/results/
+├── comparison_report.md    # Human-readable comparison with:
+│                           # - Summary table with secrets/files/avg per file/time
+│                           # - Agreement analysis (secrets found by N tools)
+│                           # - Tool agreement matrix (overlap between pairs)
+│                           # - Per-file detailed comparison table
+│                           # - File type breakdown
+│                           # - Files analyzed by each tool
+│                           # - Overlap analysis and performance summary
+└── comparison_results.json # Machine-readable data with findings_by_file
+```
+
+## Latest Benchmark Results
+
+Run the benchmark to generate results:
+```bash
+cd backend
+python benchmarks/by_category/secret_detection/compare_tools.py
+```
+
+Results are saved to `results/comparison_report.md` with:
+- Summary table (secrets found, files scanned, time)
+- Agreement analysis (how many tools found each secret)
+- Tool agreement matrix (overlap between tools)
+- Per-file detailed comparison
+- File type breakdown
+
+## CI/CD Integration
+
+Add to your CI pipeline:
+
+```yaml
+# .github/workflows/benchmark-secrets.yml
+name: Secret Detection Benchmark
+
+on:
+  schedule:
+    - cron: '0 0 * * 0'  # Weekly
+  workflow_dispatch:
+
+jobs:
+  benchmark:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Set up Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.11'
+
+      - name: Install dependencies
+        run: |
+          pip install -r backend/requirements.txt
+          pip install pytest-benchmark
+
+      - name: Run benchmarks
+        env:
+          GITGUARDIAN_API_KEY: ${{ secrets.GITGUARDIAN_API_KEY }}
+        run: |
+          cd backend
+          pytest benchmarks/by_category/secret_detection/bench_comparison.py \
+            --benchmark-only \
+            --benchmark-json=results.json \
+            --gitguardian-api-key
+
+      - name: Upload results
+        uses: actions/upload-artifact@v3
+        with:
+          name: benchmark-results
+          path: backend/results.json
+```
+
+## Adding New Tools
+
+To benchmark a new secret detection tool:
+
+1. Create module in `toolbox/modules/secret_detection/`
+2. Register in `__init__.py`
+3. Add to `compare_tools.py` in `run_all_tools()`
+4. Add test in `bench_comparison.py`
+
+## Interpreting Results
+
+### High Precision, Low Recall
+Tool is conservative - few false positives but misses secrets.
+**Use case**: Production environments where false positives are costly.
+
+### Low Precision, High Recall
+Tool is aggressive - finds most secrets but many false positives.
+**Use case**: Initial scans where manual review is acceptable.
+
+### Balanced (High F1)
+Tool has good balance of precision and recall.
+**Use case**: General purpose scanning.
+
+### Fast Execution
+Suitable for CI/CD pipelines and pre-commit hooks.
+
+### Slow but Accurate
+Better for comprehensive security audits.
+
+## Best Practices
+
+1. **Use multiple tools**: Each has strengths/weaknesses
+2. **Combine results**: Union of all findings for maximum coverage
+3. **Filter intelligently**: Remove known false positives
+4. **Verify findings**: Check if secrets are actually valid
+5. **Track over time**: Monitor precision/recall trends
+6. **Update regularly**: Patterns evolve, tools improve
+
+## Troubleshooting
+
+### GitGuardian Tests Skipped
+- Set `GITGUARDIAN_API_KEY` environment variable
+- Use `--gitguardian-api-key` flag
+
+### LLM Tests Skipped
+- Ensure A2A agent is running
+- Check agent URL in config
+- Use `--llm-enabled` flag
+
+### Low Recall
+- Check if ground truth is up to date
+- Verify tool is configured correctly
+- Review missed secrets manually
+
+### High False Positives
+- Adjust tool sensitivity
+- Add exclusion patterns
+- Review false positive list
--- a/backend/benchmarks/by_category/secret_detection/bench_comparison.py
+++ b/backend/benchmarks/by_category/secret_detection/bench_comparison.py
@@ -0,0 +1,285 @@
+"""
+Secret Detection Tool Comparison Benchmark
+
+Compares Gitleaks, TruffleHog, and LLM-based detection
+on the vulnerable_app ground truth dataset via workflow execution.
+"""
+
+import pytest
+import json
+from pathlib import Path
+from typing import Dict, List, Any
+import sys
+
+sys.path.insert(0, str(Path(__file__).resolve().parents[3] / "sdk" / "src"))
+
+from fuzzforge_sdk import FuzzForgeClient
+from benchmarks.category_configs import ModuleCategory, get_threshold
+
+
+@pytest.fixture
+def target_path():
+    """Path to vulnerable_app"""
+    path = Path(__file__).parent.parent.parent.parent.parent / "test_projects" / "vulnerable_app"
+    assert path.exists(), f"Target not found: {path}"
+    return path
+
+
+@pytest.fixture
+def ground_truth(target_path):
+    """Load ground truth data"""
+    metadata_file = target_path / "SECRETS_GROUND_TRUTH.json"
+    assert metadata_file.exists(), f"Ground truth not found: {metadata_file}"
+
+    with open(metadata_file) as f:
+        return json.load(f)
+
+
+@pytest.fixture
+def sdk_client():
+    """FuzzForge SDK client"""
+    client = FuzzForgeClient(base_url="http://localhost:8000")
+    yield client
+    client.close()
+
+
+def calculate_metrics(sarif_results: List[Dict], ground_truth: Dict[str, Any]) -> Dict[str, float]:
+    """Calculate precision, recall, and F1 score"""
+
+    # Extract expected secrets from ground truth
+    expected_secrets = set()
+    for file_info in ground_truth["files"]:
+        if "secrets" in file_info:
+            for secret in file_info["secrets"]:
+                expected_secrets.add((file_info["filename"], secret["line"]))
+
+    # Extract detected secrets from SARIF
+    detected_secrets = set()
+    for result in sarif_results:
+        locations = result.get("locations", [])
+        for location in locations:
+            physical_location = location.get("physicalLocation", {})
+            artifact_location = physical_location.get("artifactLocation", {})
+            region = physical_location.get("region", {})
+
+            uri = artifact_location.get("uri", "")
+            line = region.get("startLine", 0)
+
+            if uri and line:
+                file_path = Path(uri)
+                filename = file_path.name
+                detected_secrets.add((filename, line))
+                # Also try with relative path
+                if len(file_path.parts) > 1:
+                    rel_path = str(Path(*file_path.parts[-2:]))
+                    detected_secrets.add((rel_path, line))
+
+    # Calculate metrics
+    true_positives = len(expected_secrets & detected_secrets)
+    false_positives = len(detected_secrets - expected_secrets)
+    false_negatives = len(expected_secrets - detected_secrets)
+
+    precision = true_positives / (true_positives + false_positives) if (true_positives + false_positives) > 0 else 0
+    recall = true_positives / (true_positives + false_negatives) if (true_positives + false_negatives) > 0 else 0
+    f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
+
+    return {
+        "precision": precision,
+        "recall": recall,
+        "f1": f1,
+        "true_positives": true_positives,
+        "false_positives": false_positives,
+        "false_negatives": false_negatives
+    }
+
+
+class TestSecretDetectionComparison:
+    """Compare all secret detection tools"""
+
+    @pytest.mark.benchmark(group="secret_detection")
+    def test_gitleaks_workflow(self, benchmark, sdk_client, target_path, ground_truth):
+        """Benchmark Gitleaks workflow accuracy and performance"""
+
+        def run_gitleaks():
+            run = sdk_client.submit_workflow_with_upload(
+                workflow_name="gitleaks_detection",
+                target_path=str(target_path),
+                parameters={
+                    "scan_mode": "detect",
+                    "no_git": True,
+                    "redact": False
+                }
+            )
+
+            result = sdk_client.wait_for_completion(run.run_id, timeout=300)
+            assert result.status == "completed", f"Workflow failed: {result.status}"
+
+            findings = sdk_client.get_run_findings(run.run_id)
+            assert findings and findings.sarif, "No findings returned"
+
+            return findings
+
+        findings = benchmark(run_gitleaks)
+
+        # Extract SARIF results
+        sarif_results = []
+        for run_data in findings.sarif.get("runs", []):
+            sarif_results.extend(run_data.get("results", []))
+
+        # Calculate metrics
+        metrics = calculate_metrics(sarif_results, ground_truth)
+
+        # Log results
+        print(f"\n=== Gitleaks Workflow Results ===")
+        print(f"Precision: {metrics['precision']:.2%}")
+        print(f"Recall: {metrics['recall']:.2%}")
+        print(f"F1 Score: {metrics['f1']:.2%}")
+        print(f"True Positives: {metrics['true_positives']}")
+        print(f"False Positives: {metrics['false_positives']}")
+        print(f"False Negatives: {metrics['false_negatives']}")
+        print(f"Findings Count: {len(sarif_results)}")
+
+        # Assert meets thresholds
+        min_precision = get_threshold(ModuleCategory.SECRET_DETECTION, "min_precision")
+        min_recall = get_threshold(ModuleCategory.SECRET_DETECTION, "min_recall")
+
+        assert metrics['precision'] >= min_precision, \
+            f"Precision {metrics['precision']:.2%} below threshold {min_precision:.2%}"
+        assert metrics['recall'] >= min_recall, \
+            f"Recall {metrics['recall']:.2%} below threshold {min_recall:.2%}"
+
+    @pytest.mark.benchmark(group="secret_detection")
+    def test_trufflehog_workflow(self, benchmark, sdk_client, target_path, ground_truth):
+        """Benchmark TruffleHog workflow accuracy and performance"""
+
+        def run_trufflehog():
+            run = sdk_client.submit_workflow_with_upload(
+                workflow_name="trufflehog_detection",
+                target_path=str(target_path),
+                parameters={
+                    "verify": False,
+                    "max_depth": 10
+                }
+            )
+
+            result = sdk_client.wait_for_completion(run.run_id, timeout=300)
+            assert result.status == "completed", f"Workflow failed: {result.status}"
+
+            findings = sdk_client.get_run_findings(run.run_id)
+            assert findings and findings.sarif, "No findings returned"
+
+            return findings
+
+        findings = benchmark(run_trufflehog)
+
+        sarif_results = []
+        for run_data in findings.sarif.get("runs", []):
+            sarif_results.extend(run_data.get("results", []))
+
+        metrics = calculate_metrics(sarif_results, ground_truth)
+
+        print(f"\n=== TruffleHog Workflow Results ===")
+        print(f"Precision: {metrics['precision']:.2%}")
+        print(f"Recall: {metrics['recall']:.2%}")
+        print(f"F1 Score: {metrics['f1']:.2%}")
+        print(f"True Positives: {metrics['true_positives']}")
+        print(f"False Positives: {metrics['false_positives']}")
+        print(f"False Negatives: {metrics['false_negatives']}")
+        print(f"Findings Count: {len(sarif_results)}")
+
+        min_precision = get_threshold(ModuleCategory.SECRET_DETECTION, "min_precision")
+        min_recall = get_threshold(ModuleCategory.SECRET_DETECTION, "min_recall")
+
+        assert metrics['precision'] >= min_precision
+        assert metrics['recall'] >= min_recall
+
+    @pytest.mark.benchmark(group="secret_detection")
+    @pytest.mark.parametrize("model", [
+        "gpt-4o-mini",
+        "gpt-4o",
+        "claude-3-5-sonnet-20241022"
+    ])
+    def test_llm_workflow(self, benchmark, sdk_client, target_path, ground_truth, model):
+        """Benchmark LLM workflow with different models"""
+
+        def run_llm():
+            provider = "openai" if "gpt" in model else "anthropic"
+
+            run = sdk_client.submit_workflow_with_upload(
+                workflow_name="llm_secret_detection",
+                target_path=str(target_path),
+                parameters={
+                    "agent_url": "http://fuzzforge-task-agent:8000/a2a/litellm_agent",
+                    "llm_model": model,
+                    "llm_provider": provider,
+                    "max_files": 20,
+                    "timeout": 60
+                }
+            )
+
+            result = sdk_client.wait_for_completion(run.run_id, timeout=300)
+            assert result.status == "completed", f"Workflow failed: {result.status}"
+
+            findings = sdk_client.get_run_findings(run.run_id)
+            assert findings and findings.sarif, "No findings returned"
+
+            return findings
+
+        findings = benchmark(run_llm)
+
+        sarif_results = []
+        for run_data in findings.sarif.get("runs", []):
+            sarif_results.extend(run_data.get("results", []))
+
+        metrics = calculate_metrics(sarif_results, ground_truth)
+
+        print(f"\n=== LLM ({model}) Workflow Results ===")
+        print(f"Precision: {metrics['precision']:.2%}")
+        print(f"Recall: {metrics['recall']:.2%}")
+        print(f"F1 Score: {metrics['f1']:.2%}")
+        print(f"True Positives: {metrics['true_positives']}")
+        print(f"False Positives: {metrics['false_positives']}")
+        print(f"False Negatives: {metrics['false_negatives']}")
+        print(f"Findings Count: {len(sarif_results)}")
+
+
+class TestSecretDetectionPerformance:
+    """Performance benchmarks for each tool"""
+
+    @pytest.mark.benchmark(group="secret_detection")
+    def test_gitleaks_performance(self, benchmark, sdk_client, target_path):
+        """Benchmark Gitleaks workflow execution speed"""
+
+        def run():
+            run = sdk_client.submit_workflow_with_upload(
+                workflow_name="gitleaks_detection",
+                target_path=str(target_path),
+                parameters={"scan_mode": "detect", "no_git": True}
+            )
+            result = sdk_client.wait_for_completion(run.run_id, timeout=300)
+            return result
+
+        result = benchmark(run)
+
+        max_time = get_threshold(ModuleCategory.SECRET_DETECTION, "max_execution_time_small")
+        # Note: Workflow execution time includes orchestration overhead
+        # so we allow 2x the module threshold
+        assert result.execution_time < max_time * 2
+
+    @pytest.mark.benchmark(group="secret_detection")
+    def test_trufflehog_performance(self, benchmark, sdk_client, target_path):
+        """Benchmark TruffleHog workflow execution speed"""
+
+        def run():
+            run = sdk_client.submit_workflow_with_upload(
+                workflow_name="trufflehog_detection",
+                target_path=str(target_path),
+                parameters={"verify": False}
+            )
+            result = sdk_client.wait_for_completion(run.run_id, timeout=300)
+            return result
+
+        result = benchmark(run)
+
+        max_time = get_threshold(ModuleCategory.SECRET_DETECTION, "max_execution_time_small")
+        assert result.execution_time < max_time * 2
--- a/backend/benchmarks/by_category/secret_detection/compare_tools.py
+++ b/backend/benchmarks/by_category/secret_detection/compare_tools.py
@@ -0,0 +1,547 @@
+"""
+Secret Detection Tools Comparison Report Generator
+
+Generates comparison reports showing strengths/weaknesses of each tool.
+Uses workflow execution via SDK to test complete pipeline.
+"""
+
+import asyncio
+import json
+import time
+from pathlib import Path
+from typing import Dict, List, Any, Optional
+from dataclasses import dataclass, asdict
+import sys
+
+sys.path.insert(0, str(Path(__file__).resolve().parents[3] / "sdk" / "src"))
+
+from fuzzforge_sdk import FuzzForgeClient
+
+
+@dataclass
+class ToolResult:
+    """Results from running a tool"""
+    tool_name: str
+    execution_time: float
+    findings_count: int
+    findings_by_file: Dict[str, List[int]]  # file_path -> [line_numbers]
+    unique_files: int
+    unique_locations: int  # unique (file, line) pairs
+    secret_density: float  # average secrets per file
+    file_types: Dict[str, int]  # file extension -> count of files with secrets
+
+
+class SecretDetectionComparison:
+    """Compare secret detection tools"""
+
+    def __init__(self, target_path: Path, api_url: str = "http://localhost:8000"):
+        self.target_path = target_path
+        self.client = FuzzForgeClient(base_url=api_url)
+
+    async def run_workflow(self, workflow_name: str, tool_name: str, config: Dict[str, Any] = None) -> Optional[ToolResult]:
+        """Run a workflow and extract findings"""
+        print(f"\nRunning {tool_name} workflow...")
+
+        start_time = time.time()
+
+        try:
+            # Start workflow
+            run = self.client.submit_workflow_with_upload(
+                workflow_name=workflow_name,
+                target_path=str(self.target_path),
+                parameters=config or {}
+            )
+
+            print(f"  Started run: {run.run_id}")
+
+            # Wait for completion (up to 30 minutes for slow LLMs)
+            print(f"  Waiting for completion...")
+            result = self.client.wait_for_completion(run.run_id, timeout=1800)
+
+            execution_time = time.time() - start_time
+
+            if result.status != "COMPLETED":
+                print(f"❌ {tool_name} workflow failed: {result.status}")
+                return None
+
+            # Get findings from SARIF
+            findings = self.client.get_run_findings(run.run_id)
+
+            if not findings or not findings.sarif:
+                print(f"⚠️  {tool_name} produced no findings")
+                return None
+
+            # Extract results from SARIF and group by file
+            findings_by_file = {}
+            unique_locations = set()
+
+            for run_data in findings.sarif.get("runs", []):
+                for result in run_data.get("results", []):
+                    locations = result.get("locations", [])
+                    for location in locations:
+                        physical_location = location.get("physicalLocation", {})
+                        artifact_location = physical_location.get("artifactLocation", {})
+                        region = physical_location.get("region", {})
+
+                        uri = artifact_location.get("uri", "")
+                        line = region.get("startLine", 0)
+
+                        if uri and line:
+                            if uri not in findings_by_file:
+                                findings_by_file[uri] = []
+                            findings_by_file[uri].append(line)
+                            unique_locations.add((uri, line))
+
+            # Sort line numbers for each file
+            for file_path in findings_by_file:
+                findings_by_file[file_path] = sorted(set(findings_by_file[file_path]))
+
+            # Calculate file type distribution
+            file_types = {}
+            for file_path in findings_by_file:
+                ext = Path(file_path).suffix or Path(file_path).name  # Use full name for files like .env
+                if ext.startswith('.'):
+                    file_types[ext] = file_types.get(ext, 0) + 1
+                else:
+                    file_types['[no extension]'] = file_types.get('[no extension]', 0) + 1
+
+            # Calculate secret density
+            secret_density = len(unique_locations) / len(findings_by_file) if findings_by_file else 0
+
+            print(f"  ✓ Found {len(unique_locations)} secrets in {len(findings_by_file)} files (avg {secret_density:.1f} per file)")
+
+            return ToolResult(
+                tool_name=tool_name,
+                execution_time=execution_time,
+                findings_count=len(unique_locations),
+                findings_by_file=findings_by_file,
+                unique_files=len(findings_by_file),
+                unique_locations=len(unique_locations),
+                secret_density=secret_density,
+                file_types=file_types
+            )
+
+        except Exception as e:
+            print(f"❌ {tool_name} error: {e}")
+            return None
+
+
+    async def run_all_tools(self, llm_models: List[str] = None) -> List[ToolResult]:
+        """Run all available tools"""
+        results = []
+
+        if llm_models is None:
+            llm_models = ["gpt-4o-mini"]
+
+        # Gitleaks
+        result = await self.run_workflow("gitleaks_detection", "Gitleaks", {
+            "scan_mode": "detect",
+            "no_git": True,
+            "redact": False
+        })
+        if result:
+            results.append(result)
+
+        # TruffleHog
+        result = await self.run_workflow("trufflehog_detection", "TruffleHog", {
+            "verify": False,
+            "max_depth": 10
+        })
+        if result:
+            results.append(result)
+
+        # LLM Detector with multiple models
+        for model in llm_models:
+            tool_name = f"LLM ({model})"
+            result = await self.run_workflow("llm_secret_detection", tool_name, {
+                "agent_url": "http://fuzzforge-task-agent:8000/a2a/litellm_agent",
+                "llm_model": model,
+                "llm_provider": "openai" if "gpt" in model else "anthropic",
+                "max_files": 20,
+                "timeout": 60,
+                "file_patterns": [
+                    "*.py", "*.js", "*.ts", "*.java", "*.go", "*.env", "*.yaml", "*.yml",
+                    "*.json", "*.xml", "*.ini", "*.sql", "*.properties", "*.sh", "*.bat",
+                    "*.config", "*.conf", "*.toml", "*id_rsa*", "*.txt"
+                ]
+            })
+            if result:
+                results.append(result)
+
+        return results
+
+    def _calculate_agreement_matrix(self, results: List[ToolResult]) -> Dict[str, Dict[str, int]]:
+        """Calculate overlap matrix showing common secrets between tool pairs"""
+        matrix = {}
+
+        for i, result1 in enumerate(results):
+            matrix[result1.tool_name] = {}
+            # Convert to set of (file, line) tuples
+            secrets1 = set()
+            for file_path, lines in result1.findings_by_file.items():
+                for line in lines:
+                    secrets1.add((file_path, line))
+
+            for result2 in results:
+                secrets2 = set()
+                for file_path, lines in result2.findings_by_file.items():
+                    for line in lines:
+                        secrets2.add((file_path, line))
+
+                # Count common secrets
+                common = len(secrets1 & secrets2)
+                matrix[result1.tool_name][result2.tool_name] = common
+
+        return matrix
+
+    def _get_per_file_comparison(self, results: List[ToolResult]) -> Dict[str, Dict[str, int]]:
+        """Get per-file breakdown of findings across all tools"""
+        all_files = set()
+        for result in results:
+            all_files.update(result.findings_by_file.keys())
+
+        comparison = {}
+        for file_path in sorted(all_files):
+            comparison[file_path] = {}
+            for result in results:
+                comparison[file_path][result.tool_name] = len(result.findings_by_file.get(file_path, []))
+
+        return comparison
+
+    def _get_agreement_stats(self, results: List[ToolResult]) -> Dict[int, int]:
+        """Calculate how many secrets are found by 1, 2, 3, or all tools"""
+        # Collect all unique (file, line) pairs across all tools
+        all_secrets = {}  # (file, line) -> list of tools that found it
+
+        for result in results:
+            for file_path, lines in result.findings_by_file.items():
+                for line in lines:
+                    key = (file_path, line)
+                    if key not in all_secrets:
+                        all_secrets[key] = []
+                    all_secrets[key].append(result.tool_name)
+
+        # Count by number of tools
+        agreement_counts = {}
+        for secret, tools in all_secrets.items():
+            count = len(set(tools))  # Unique tools
+            agreement_counts[count] = agreement_counts.get(count, 0) + 1
+
+        return agreement_counts
+
+    def generate_markdown_report(self, results: List[ToolResult]) -> str:
+        """Generate markdown comparison report"""
+        report = []
+        report.append("# Secret Detection Tools Comparison\n")
+        report.append(f"**Target**: {self.target_path.name}")
+        report.append(f"**Tools**: {', '.join([r.tool_name for r in results])}\n")
+
+        # Summary table with extended metrics
+        report.append("\n## Summary\n")
+        report.append("| Tool | Secrets | Files | Avg/File | Time (s) |")
+        report.append("|------|---------|-------|----------|----------|")
+
+        for result in results:
+            report.append(
+                f"| {result.tool_name} | "
+                f"{result.findings_count} | "
+                f"{result.unique_files} | "
+                f"{result.secret_density:.1f} | "
+                f"{result.execution_time:.2f} |"
+            )
+
+        # Agreement Analysis
+        agreement_stats = self._get_agreement_stats(results)
+        report.append("\n## Agreement Analysis\n")
+        report.append("Secrets found by different numbers of tools:\n")
+        for num_tools in sorted(agreement_stats.keys(), reverse=True):
+            count = agreement_stats[num_tools]
+            if num_tools == len(results):
+                report.append(f"- **All {num_tools} tools agree**: {count} secrets")
+            elif num_tools == 1:
+                report.append(f"- **Only 1 tool found**: {count} secrets")
+            else:
+                report.append(f"- **{num_tools} tools agree**: {count} secrets")
+
+        # Agreement Matrix
+        agreement_matrix = self._calculate_agreement_matrix(results)
+        report.append("\n## Tool Agreement Matrix\n")
+        report.append("Number of common secrets found by tool pairs:\n")
+
+        # Header row
+        header = "| Tool |"
+        separator = "|------|"
+        for result in results:
+            short_name = result.tool_name.replace("LLM (", "").replace(")", "")
+            header += f" {short_name} |"
+            separator += "------|"
+        report.append(header)
+        report.append(separator)
+
+        # Data rows
+        for result in results:
+            short_name = result.tool_name.replace("LLM (", "").replace(")", "")
+            row = f"| {short_name} |"
+            for result2 in results:
+                count = agreement_matrix[result.tool_name][result2.tool_name]
+                row += f" {count} |"
+            report.append(row)
+
+        # Per-File Comparison
+        per_file = self._get_per_file_comparison(results)
+        report.append("\n## Per-File Detailed Comparison\n")
+        report.append("Secrets found per file by each tool:\n")
+
+        # Header
+        header = "| File |"
+        separator = "|------|"
+        for result in results:
+            short_name = result.tool_name.replace("LLM (", "").replace(")", "")
+            header += f" {short_name} |"
+            separator += "------|"
+        header += " Total |"
+        separator += "------|"
+        report.append(header)
+        report.append(separator)
+
+        # Show top 15 files by total findings
+        file_totals = [(f, sum(counts.values())) for f, counts in per_file.items()]
+        file_totals.sort(key=lambda x: x[1], reverse=True)
+
+        for file_path, total in file_totals[:15]:
+            row = f"| `{file_path}` |"
+            for result in results:
+                count = per_file[file_path].get(result.tool_name, 0)
+                row += f" {count} |"
+            row += f" **{total}** |"
+            report.append(row)
+
+        if len(file_totals) > 15:
+            report.append(f"| ... and {len(file_totals) - 15} more files | ... | ... | ... | ... | ... |")
+
+        # File Type Breakdown
+        report.append("\n## File Type Breakdown\n")
+        all_extensions = set()
+        for result in results:
+            all_extensions.update(result.file_types.keys())
+
+        if all_extensions:
+            header = "| Type |"
+            separator = "|------|"
+            for result in results:
+                short_name = result.tool_name.replace("LLM (", "").replace(")", "")
+                header += f" {short_name} |"
+                separator += "------|"
+            report.append(header)
+            report.append(separator)
+
+            for ext in sorted(all_extensions):
+                row = f"| `{ext}` |"
+                for result in results:
+                    count = result.file_types.get(ext, 0)
+                    row += f" {count} files |"
+                report.append(row)
+
+        # File analysis
+        report.append("\n## Files Analyzed\n")
+
+        # Collect all unique files across all tools
+        all_files = set()
+        for result in results:
+            all_files.update(result.findings_by_file.keys())
+
+        report.append(f"**Total unique files with secrets**: {len(all_files)}\n")
+
+        for result in results:
+            report.append(f"\n### {result.tool_name}\n")
+            report.append(f"Found secrets in **{result.unique_files} files**:\n")
+
+            # Sort files by number of findings (descending)
+            sorted_files = sorted(
+                result.findings_by_file.items(),
+                key=lambda x: len(x[1]),
+                reverse=True
+            )
+
+            # Show top 10 files
+            for file_path, lines in sorted_files[:10]:
+                report.append(f"- `{file_path}`: {len(lines)} secrets (lines: {', '.join(map(str, lines[:5]))}{'...' if len(lines) > 5 else ''})")
+
+            if len(sorted_files) > 10:
+                report.append(f"- ... and {len(sorted_files) - 10} more files")
+
+        # Overlap analysis
+        if len(results) >= 2:
+            report.append("\n## Overlap Analysis\n")
+
+            # Find common files
+            file_sets = [set(r.findings_by_file.keys()) for r in results]
+            common_files = set.intersection(*file_sets) if file_sets else set()
+
+            if common_files:
+                report.append(f"\n**Files found by all tools** ({len(common_files)}):\n")
+                for file_path in sorted(common_files)[:10]:
+                    report.append(f"- `{file_path}`")
+            else:
+                report.append("\n**No files were found by all tools**\n")
+
+            # Find tool-specific files
+            for i, result in enumerate(results):
+                unique_to_tool = set(result.findings_by_file.keys())
+                for j, other_result in enumerate(results):
+                    if i != j:
+                        unique_to_tool -= set(other_result.findings_by_file.keys())
+
+                if unique_to_tool:
+                    report.append(f"\n**Unique to {result.tool_name}** ({len(unique_to_tool)} files):\n")
+                    for file_path in sorted(unique_to_tool)[:5]:
+                        report.append(f"- `{file_path}`")
+                    if len(unique_to_tool) > 5:
+                        report.append(f"- ... and {len(unique_to_tool) - 5} more")
+
+        # Ground Truth Analysis (if available)
+        ground_truth_path = Path(__file__).parent / "secret_detection_benchmark_GROUND_TRUTH.json"
+        if ground_truth_path.exists():
+            report.append("\n## Ground Truth Analysis\n")
+            try:
+                with open(ground_truth_path) as f:
+                    gt_data = json.load(f)
+
+                gt_total = gt_data.get("total_secrets", 30)
+                report.append(f"**Expected secrets**: {gt_total} (documented in ground truth)\n")
+
+                # Build ground truth set of (file, line) tuples
+                gt_secrets = set()
+                for secret in gt_data.get("secrets", []):
+                    gt_secrets.add((secret["file"], secret["line"]))
+
+                report.append("### Tool Performance vs Ground Truth\n")
+                report.append("| Tool | Found | Expected | Recall | Extra Findings |")
+                report.append("|------|-------|----------|--------|----------------|")
+
+                for result in results:
+                    # Build tool findings set
+                    tool_secrets = set()
+                    for file_path, lines in result.findings_by_file.items():
+                        for line in lines:
+                            tool_secrets.add((file_path, line))
+
+                    # Calculate metrics
+                    true_positives = len(gt_secrets & tool_secrets)
+                    recall = (true_positives / gt_total * 100) if gt_total > 0 else 0
+                    extra = len(tool_secrets - gt_secrets)
+
+                    report.append(
+                        f"| {result.tool_name} | "
+                        f"{result.findings_count} | "
+                        f"{gt_total} | "
+                        f"{recall:.1f}% | "
+                        f"{extra} |"
+                    )
+
+                # Analyze LLM extra findings
+                llm_results = [r for r in results if "LLM" in r.tool_name]
+                if llm_results:
+                    report.append("\n### LLM Extra Findings Explanation\n")
+                    report.append("LLMs may find more than 30 secrets because they detect:\n")
+                    report.append("- **Split secret components**: Each part of `DB_PASS_PART1 + PART2 + PART3` counted separately")
+                    report.append("- **Join operations**: Lines like `''.join(AWS_SECRET_CHARS)` flagged as additional exposure")
+                    report.append("- **Decoding functions**: Code that reveals secrets (e.g., `base64.b64decode()`, `codecs.decode()`)")
+                    report.append("- **Comment identifiers**: Lines marking secret locations without plaintext values")
+                    report.append("\nThese are *technically correct* detections of secret exposure points, not false positives.")
+                    report.append("The ground truth documents 30 'primary' secrets, but the codebase has additional derivative exposures.\n")
+
+            except Exception as e:
+                report.append(f"*Could not load ground truth: {e}*\n")
+
+        # Performance summary
+        if results:
+            report.append("\n## Performance Summary\n")
+            most_findings = max(results, key=lambda r: r.findings_count)
+            most_files = max(results, key=lambda r: r.unique_files)
+            fastest = min(results, key=lambda r: r.execution_time)
+
+            report.append(f"- **Most secrets found**: {most_findings.tool_name} ({most_findings.findings_count} secrets)")
+            report.append(f"- **Most files covered**: {most_files.tool_name} ({most_files.unique_files} files)")
+            report.append(f"- **Fastest**: {fastest.tool_name} ({fastest.execution_time:.2f}s)")
+
+        return "\n".join(report)
+
+    def save_json_report(self, results: List[ToolResult], output_path: Path):
+        """Save results as JSON"""
+        data = {
+            "target_path": str(self.target_path),
+            "results": [asdict(r) for r in results]
+        }
+
+        with open(output_path, 'w') as f:
+            json.dump(data, f, indent=2)
+
+        print(f"\n✅ JSON report saved to: {output_path}")
+
+    def cleanup(self):
+        """Cleanup SDK client"""
+        self.client.close()
+
+
+async def main():
+    """Run comparison and generate reports"""
+    # Get target path (secret_detection_benchmark)
+    target_path = Path(__file__).parent.parent.parent.parent.parent / "test_projects" / "secret_detection_benchmark"
+
+    if not target_path.exists():
+        print(f"❌ Target not found at: {target_path}")
+        return 1
+
+    print("=" * 80)
+    print("Secret Detection Tools Comparison")
+    print("=" * 80)
+    print(f"Target: {target_path}")
+
+    # LLM models to test
+    llm_models = [
+        "gpt-4o-mini",
+        "gpt-5-mini"
+    ]
+    print(f"LLM models: {', '.join(llm_models)}\n")
+
+    # Run comparison
+    comparison = SecretDetectionComparison(target_path)
+
+    try:
+        results = await comparison.run_all_tools(llm_models=llm_models)
+
+        if not results:
+            print("❌ No tools ran successfully")
+            return 1
+
+        # Generate reports
+        print("\n" + "=" * 80)
+        markdown_report = comparison.generate_markdown_report(results)
+        print(markdown_report)
+
+        # Save reports
+        output_dir = Path(__file__).parent / "results"
+        output_dir.mkdir(exist_ok=True)
+
+        markdown_path = output_dir / "comparison_report.md"
+        with open(markdown_path, 'w') as f:
+            f.write(markdown_report)
+        print(f"\n✅ Markdown report saved to: {markdown_path}")
+
+        json_path = output_dir / "comparison_results.json"
+        comparison.save_json_report(results, json_path)
+
+        print("\n" + "=" * 80)
+        print("✅ Comparison complete!")
+        print("=" * 80)
+
+        return 0
+
+    finally:
+        comparison.cleanup()
+
+
+if __name__ == "__main__":
+    exit_code = asyncio.run(main())
+    sys.exit(exit_code)
--- a/backend/benchmarks/by_category/secret_detection/results/comparison_report.md
+++ b/backend/benchmarks/by_category/secret_detection/results/comparison_report.md
@@ -0,0 +1,169 @@
+# Secret Detection Tools Comparison
+
+**Target**: secret_detection_benchmark
+**Tools**: Gitleaks, TruffleHog, LLM (gpt-4o-mini), LLM (gpt-5-mini)
+
+
+## Summary
+
+| Tool | Secrets | Files | Avg/File | Time (s) |
+|------|---------|-------|----------|----------|
+| Gitleaks | 12 | 10 | 1.2 | 5.18 |
+| TruffleHog | 1 | 1 | 1.0 | 5.06 |
+| LLM (gpt-4o-mini) | 30 | 15 | 2.0 | 296.85 |
+| LLM (gpt-5-mini) | 41 | 16 | 2.6 | 618.55 |
+
+## Agreement Analysis
+
+Secrets found by different numbers of tools:
+
+- **3 tools agree**: 6 secrets
+- **2 tools agree**: 22 secrets
+- **Only 1 tool found**: 22 secrets
+
+## Tool Agreement Matrix
+
+Number of common secrets found by tool pairs:
+
+| Tool | Gitleaks | TruffleHog | gpt-4o-mini | gpt-5-mini |
+|------|------|------|------|------|
+| Gitleaks | 12 | 0 | 7 | 11 |
+| TruffleHog | 0 | 1 | 0 | 0 |
+| gpt-4o-mini | 7 | 0 | 30 | 22 |
+| gpt-5-mini | 11 | 0 | 22 | 41 |
+
+## Per-File Detailed Comparison
+
+Secrets found per file by each tool:
+
+| File | Gitleaks | TruffleHog | gpt-4o-mini | gpt-5-mini | Total |
+|------|------|------|------|------|------|
+| `src/obfuscated.py` | 2 | 0 | 6 | 7 | **15** |
+| `src/advanced.js` | 0 | 0 | 5 | 7 | **12** |
+| `src/config.py` | 1 | 0 | 0 | 6 | **7** |
+| `.env` | 1 | 0 | 2 | 2 | **5** |
+| `config/keys.yaml` | 1 | 0 | 2 | 2 | **5** |
+| `config/oauth.json` | 1 | 0 | 2 | 2 | **5** |
+| `config/settings.py` | 2 | 0 | 0 | 3 | **5** |
+| `scripts/deploy.sh` | 1 | 0 | 2 | 2 | **5** |
+| `config/legacy.ini` | 0 | 0 | 2 | 2 | **4** |
+| `src/Crypto.go` | 0 | 0 | 2 | 2 | **4** |
+| `config/app.properties` | 1 | 0 | 1 | 1 | **3** |
+| `config/database.yaml` | 0 | 1 | 1 | 1 | **3** |
+| `src/Main.java` | 1 | 0 | 1 | 1 | **3** |
+| `id_rsa` | 1 | 0 | 1 | 0 | **2** |
+| `scripts/webhook.js` | 0 | 0 | 1 | 1 | **2** |
+| ... and 2 more files | ... | ... | ... | ... | ... |
+
+## File Type Breakdown
+
+| Type | Gitleaks | TruffleHog | gpt-4o-mini | gpt-5-mini |
+|------|------|------|------|------|
+| `.env` | 1 files | 0 files | 1 files | 1 files |
+| `.go` | 0 files | 0 files | 1 files | 1 files |
+| `.ini` | 0 files | 0 files | 1 files | 1 files |
+| `.java` | 1 files | 0 files | 1 files | 1 files |
+| `.js` | 0 files | 0 files | 2 files | 2 files |
+| `.json` | 1 files | 0 files | 1 files | 1 files |
+| `.properties` | 1 files | 0 files | 1 files | 1 files |
+| `.py` | 3 files | 0 files | 2 files | 4 files |
+| `.sh` | 1 files | 0 files | 1 files | 1 files |
+| `.sql` | 0 files | 0 files | 1 files | 1 files |
+| `.yaml` | 1 files | 1 files | 2 files | 2 files |
+| `[no extension]` | 1 files | 0 files | 1 files | 0 files |
+
+## Files Analyzed
+
+**Total unique files with secrets**: 17
+
+
+### Gitleaks
+
+Found secrets in **10 files**:
+
+- `config/settings.py`: 2 secrets (lines: 6, 9)
+- `src/obfuscated.py`: 2 secrets (lines: 7, 17)
+- `.env`: 1 secrets (lines: 3)
+- `config/app.properties`: 1 secrets (lines: 6)
+- `config/keys.yaml`: 1 secrets (lines: 6)
+- `id_rsa`: 1 secrets (lines: 1)
+- `config/oauth.json`: 1 secrets (lines: 4)
+- `scripts/deploy.sh`: 1 secrets (lines: 5)
+- `src/Main.java`: 1 secrets (lines: 5)
+- `src/config.py`: 1 secrets (lines: 7)
+
+### TruffleHog
+
+Found secrets in **1 files**:
+
+- `config/database.yaml`: 1 secrets (lines: 6)
+
+### LLM (gpt-4o-mini)
+
+Found secrets in **15 files**:
+
+- `src/obfuscated.py`: 6 secrets (lines: 7, 10, 13, 18, 20...)
+- `src/advanced.js`: 5 secrets (lines: 4, 7, 10, 12, 17)
+- `src/Crypto.go`: 2 secrets (lines: 6, 10)
+- `.env`: 2 secrets (lines: 3, 4)
+- `config/keys.yaml`: 2 secrets (lines: 6, 12)
+- `config/oauth.json`: 2 secrets (lines: 3, 4)
+- `config/legacy.ini`: 2 secrets (lines: 4, 7)
+- `scripts/deploy.sh`: 2 secrets (lines: 6, 9)
+- `src/app.py`: 1 secrets (lines: 7)
+- `scripts/webhook.js`: 1 secrets (lines: 4)
+- ... and 5 more files
+
+### LLM (gpt-5-mini)
+
+Found secrets in **16 files**:
+
+- `src/obfuscated.py`: 7 secrets (lines: 7, 10, 13, 14, 17...)
+- `src/advanced.js`: 7 secrets (lines: 4, 7, 9, 10, 13...)
+- `src/config.py`: 6 secrets (lines: 7, 10, 13, 14, 15...)
+- `config/settings.py`: 3 secrets (lines: 6, 9, 20)
+- `src/Crypto.go`: 2 secrets (lines: 10, 15)
+- `.env`: 2 secrets (lines: 3, 4)
+- `config/keys.yaml`: 2 secrets (lines: 6, 12)
+- `config/oauth.json`: 2 secrets (lines: 3, 4)
+- `config/legacy.ini`: 2 secrets (lines: 3, 7)
+- `scripts/deploy.sh`: 2 secrets (lines: 5, 10)
+- ... and 6 more files
+
+## Overlap Analysis
+
+
+**No files were found by all tools**
+
+
+## Ground Truth Analysis
+
+**Expected secrets**: 32 (documented in ground truth)
+
+### Tool Performance vs Ground Truth
+
+| Tool | Found | Expected | Recall | Extra Findings |
+|------|-------|----------|--------|----------------|
+| Gitleaks | 12 | 32 | 37.5% | 0 |
+| TruffleHog | 1 | 32 | 0.0% | 1 |
+| LLM (gpt-4o-mini) | 30 | 32 | 56.2% | 12 |
+| LLM (gpt-5-mini) | 41 | 32 | 84.4% | 14 |
+
+### LLM Extra Findings Explanation
+
+LLMs may find more than 30 secrets because they detect:
+
+- **Split secret components**: Each part of `DB_PASS_PART1 + PART2 + PART3` counted separately
+- **Join operations**: Lines like `''.join(AWS_SECRET_CHARS)` flagged as additional exposure
+- **Decoding functions**: Code that reveals secrets (e.g., `base64.b64decode()`, `codecs.decode()`)
+- **Comment identifiers**: Lines marking secret locations without plaintext values
+
+These are *technically correct* detections of secret exposure points, not false positives.
+The ground truth documents 30 'primary' secrets, but the codebase has additional derivative exposures.
+
+
+## Performance Summary
+
+- **Most secrets found**: LLM (gpt-5-mini) (41 secrets)
+- **Most files covered**: LLM (gpt-5-mini) (16 files)
+- **Fastest**: TruffleHog (5.06s)
--- a/backend/benchmarks/by_category/secret_detection/results/comparison_results.json
+++ b/backend/benchmarks/by_category/secret_detection/results/comparison_results.json
@@ -0,0 +1,253 @@
+{
+  "target_path": "/Users/tduhamel/Documents/FuzzingLabs/fuzzforge_ai/test_projects/secret_detection_benchmark",
+  "results": [
+    {
+      "tool_name": "Gitleaks",
+      "execution_time": 5.177123069763184,
+      "findings_count": 12,
+      "findings_by_file": {
+        ".env": [
+          3
+        ],
+        "config/app.properties": [
+          6
+        ],
+        "config/keys.yaml": [
+          6
+        ],
+        "id_rsa": [
+          1
+        ],
+        "config/oauth.json": [
+          4
+        ],
+        "scripts/deploy.sh": [
+          5
+        ],
+        "config/settings.py": [
+          6,
+          9
+        ],
+        "src/Main.java": [
+          5
+        ],
+        "src/obfuscated.py": [
+          7,
+          17
+        ],
+        "src/config.py": [
+          7
+        ]
+      },
+      "unique_files": 10,
+      "unique_locations": 12,
+      "secret_density": 1.2,
+      "file_types": {
+        ".env": 1,
+        ".properties": 1,
+        ".yaml": 1,
+        "[no extension]": 1,
+        ".json": 1,
+        ".sh": 1,
+        ".py": 3,
+        ".java": 1
+      }
+    },
+    {
+      "tool_name": "TruffleHog",
+      "execution_time": 5.061383008956909,
+      "findings_count": 1,
+      "findings_by_file": {
+        "config/database.yaml": [
+          6
+        ]
+      },
+      "unique_files": 1,
+      "unique_locations": 1,
+      "secret_density": 1.0,
+      "file_types": {
+        ".yaml": 1
+      }
+    },
+    {
+      "tool_name": "LLM (gpt-4o-mini)",
+      "execution_time": 296.8492441177368,
+      "findings_count": 30,
+      "findings_by_file": {
+        "src/obfuscated.py": [
+          7,
+          10,
+          13,
+          18,
+          20,
+          23
+        ],
+        "src/app.py": [
+          7
+        ],
+        "scripts/webhook.js": [
+          4
+        ],
+        "src/advanced.js": [
+          4,
+          7,
+          10,
+          12,
+          17
+        ],
+        "src/Main.java": [
+          5
+        ],
+        "src/Crypto.go": [
+          6,
+          10
+        ],
+        ".env": [
+          3,
+          4
+        ],
+        "config/keys.yaml": [
+          6,
+          12
+        ],
+        "config/database.yaml": [
+          7
+        ],
+        "config/oauth.json": [
+          3,
+          4
+        ],
+        "config/legacy.ini": [
+          4,
+          7
+        ],
+        "src/database.sql": [
+          4
+        ],
+        "config/app.properties": [
+          6
+        ],
+        "scripts/deploy.sh": [
+          6,
+          9
+        ],
+        "id_rsa": [
+          1
+        ]
+      },
+      "unique_files": 15,
+      "unique_locations": 30,
+      "secret_density": 2.0,
+      "file_types": {
+        ".py": 2,
+        ".js": 2,
+        ".java": 1,
+        ".go": 1,
+        ".env": 1,
+        ".yaml": 2,
+        ".json": 1,
+        ".ini": 1,
+        ".sql": 1,
+        ".properties": 1,
+        ".sh": 1,
+        "[no extension]": 1
+      }
+    },
+    {
+      "tool_name": "LLM (gpt-5-mini)",
+      "execution_time": 618.5462851524353,
+      "findings_count": 41,
+      "findings_by_file": {
+        "config/settings.py": [
+          6,
+          9,
+          20
+        ],
+        "src/obfuscated.py": [
+          7,
+          10,
+          13,
+          14,
+          17,
+          20,
+          23
+        ],
+        "src/app.py": [
+          7
+        ],
+        "src/config.py": [
+          7,
+          10,
+          13,
+          14,
+          15,
+          16
+        ],
+        "scripts/webhook.js": [
+          4
+        ],
+        "src/advanced.js": [
+          4,
+          7,
+          9,
+          10,
+          13,
+          17,
+          19
+        ],
+        "src/Main.java": [
+          5
+        ],
+        "src/Crypto.go": [
+          10,
+          15
+        ],
+        ".env": [
+          3,
+          4
+        ],
+        "config/keys.yaml": [
+          6,
+          12
+        ],
+        "config/database.yaml": [
+          7
+        ],
+        "config/oauth.json": [
+          3,
+          4
+        ],
+        "config/legacy.ini": [
+          3,
+          7
+        ],
+        "src/database.sql": [
+          6
+        ],
+        "config/app.properties": [
+          6
+        ],
+        "scripts/deploy.sh": [
+          5,
+          10
+        ]
+      },
+      "unique_files": 16,
+      "unique_locations": 41,
+      "secret_density": 2.5625,
+      "file_types": {
+        ".py": 4,
+        ".js": 2,
+        ".java": 1,
+        ".go": 1,
+        ".env": 1,
+        ".yaml": 2,
+        ".json": 1,
+        ".ini": 1,
+        ".sql": 1,
+        ".properties": 1,
+        ".sh": 1
+      }
+    }
+  ]
+}
--- a/backend/benchmarks/by_category/secret_detection/secret_detection_benchmark_GROUND_TRUTH.json
+++ b/backend/benchmarks/by_category/secret_detection/secret_detection_benchmark_GROUND_TRUTH.json
@@ -0,0 +1,344 @@
+{
+  "description": "Ground truth dataset for secret detection benchmarking - Exactly 32 secrets",
+  "version": "1.1.0",
+  "total_secrets": 32,
+  "secrets_by_difficulty": {
+    "easy": 12,
+    "medium": 10,
+    "hard": 10
+  },
+  "secrets": [
+    {
+      "id": 1,
+      "file": ".env",
+      "line": 3,
+      "difficulty": "easy",
+      "type": "aws_access_key",
+      "value": "AKIAIOSFODNN7EXAMPLE",
+      "severity": "critical"
+    },
+    {
+      "id": 2,
+      "file": ".env",
+      "line": 4,
+      "difficulty": "easy",
+      "type": "aws_secret_access_key",
+      "value": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
+      "severity": "critical"
+    },
+    {
+      "id": 3,
+      "file": "config/settings.py",
+      "line": 6,
+      "difficulty": "easy",
+      "type": "github_pat",
+      "value": "ghp_vR8jK2mN4pQ6tX9bC3wY7zA1eF5hI8kL",
+      "severity": "critical"
+    },
+    {
+      "id": 4,
+      "file": "config/settings.py",
+      "line": 9,
+      "difficulty": "easy",
+      "type": "stripe_api_key",
+      "value": "sk_live_51MabcdefghijklmnopqrstuvwxyzABCDEF123456789",
+      "severity": "critical"
+    },
+    {
+      "id": 5,
+      "file": "config/settings.py",
+      "line": 17,
+      "difficulty": "easy",
+      "type": "database_password",
+      "value": "ProdDB_P@ssw0rd_2024_Secure!",
+      "severity": "critical"
+    },
+    {
+      "id": 6,
+      "file": "src/app.py",
+      "line": 6,
+      "difficulty": "easy",
+      "type": "jwt_secret",
+      "value": "my-super-secret-jwt-key-do-not-share-2024",
+      "severity": "critical"
+    },
+    {
+      "id": 7,
+      "file": "config/database.yaml",
+      "line": 7,
+      "difficulty": "easy",
+      "type": "azure_storage_key",
+      "value": "DefaultEndpointsProtocol=https;AccountName=prodstore;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;EndpointSuffix=core.windows.net",
+      "severity": "critical"
+    },
+    {
+      "id": 8,
+      "file": "scripts/webhook.js",
+      "line": 4,
+      "difficulty": "easy",
+      "type": "slack_webhook",
+      "value": "https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXX",
+      "severity": "high"
+    },
+    {
+      "id": 9,
+      "file": "config/app.properties",
+      "line": 6,
+      "difficulty": "easy",
+      "type": "api_key",
+      "value": "sk_test_4eC39HqLyjWDarjtT1zdp7dc",
+      "severity": "high"
+    },
+    {
+      "id": 10,
+      "file": "id_rsa",
+      "line": 1,
+      "difficulty": "easy",
+      "type": "ssh_private_key",
+      "value": "-----BEGIN OPENSSH PRIVATE KEY-----",
+      "severity": "critical"
+    },
+    {
+      "id": 11,
+      "file": "config/oauth.json",
+      "line": 4,
+      "difficulty": "easy",
+      "type": "oauth_client_secret",
+      "value": "GOCSPX-Ab12Cd34Ef56Gh78Ij90Kl12",
+      "severity": "critical"
+    },
+    {
+      "id": 12,
+      "file": "src/Main.java",
+      "line": 5,
+      "difficulty": "easy",
+      "type": "google_oauth_secret",
+      "value": "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",
+      "severity": "critical"
+    },
+    {
+      "id": 13,
+      "file": "src/config.py",
+      "line": 7,
+      "difficulty": "medium",
+      "type": "aws_access_key_base64",
+      "value": "QUtJQUlPU0ZPRE5ON0VYQU1QTEU=",
+      "decoded": "AKIAIOSFODNN7EXAMPLE",
+      "severity": "critical"
+    },
+    {
+      "id": 14,
+      "file": "src/config.py",
+      "line": 10,
+      "difficulty": "medium",
+      "type": "api_token_hex",
+      "value": "6170695f746f6b656e5f616263313233787977373839",
+      "decoded": "api_token_abc123xyz789",
+      "severity": "high"
+    },
+    {
+      "id": 15,
+      "file": "src/config.py",
+      "line": 16,
+      "difficulty": "medium",
+      "type": "database_password_concatenated",
+      "value": "MySecurePassword2024!",
+      "note": "Built from DB_PASS_PART1 + DB_PASS_PART2 + DB_PASS_PART3",
+      "severity": "critical"
+    },
+    {
+      "id": 16,
+      "file": "scripts/deploy.sh",
+      "line": 5,
+      "difficulty": "medium",
+      "type": "api_key_export",
+      "value": "sk_prod_1234567890abcdefghijklmnopqrstuvwxyz",
+      "severity": "critical"
+    },
+    {
+      "id": 17,
+      "file": "scripts/deploy.sh",
+      "line": 11,
+      "difficulty": "medium",
+      "type": "database_password_url_encoded",
+      "value": "mysql://admin:MyP%40ssw0rd%21@db.example.com:3306/prod",
+      "decoded": "mysql://admin:MyP@ssw0rd!@db.example.com:3306/prod",
+      "note": "In comment",
+      "severity": "critical"
+    },
+    {
+      "id": 18,
+      "file": "config/keys.yaml",
+      "line": 6,
+      "difficulty": "medium",
+      "type": "rsa_private_key_multiline",
+      "value": "-----BEGIN RSA PRIVATE KEY-----",
+      "note": "Multi-line YAML literal block",
+      "severity": "critical"
+    },
+    {
+      "id": 19,
+      "file": "config/keys.yaml",
+      "line": 11,
+      "difficulty": "medium",
+      "type": "api_token_unicode",
+      "value": "tøkęn_śęçrėt_ẃïth_ŭñïçődė_123456",
+      "severity": "high"
+    },
+    {
+      "id": 20,
+      "file": "src/database.sql",
+      "line": 6,
+      "difficulty": "medium",
+      "type": "database_connection_string",
+      "value": "postgresql://admin:Pr0dDB_S3cr3t_P@ss@db.prod.example.com:5432/prod_db",
+      "note": "In SQL comment",
+      "severity": "critical"
+    },
+    {
+      "id": 21,
+      "file": "config/legacy.ini",
+      "line": 3,
+      "difficulty": "medium",
+      "type": "database_password",
+      "value": "L3g@cy_DB_P@ssw0rd_2023",
+      "severity": "critical"
+    },
+    {
+      "id": 22,
+      "file": "config/legacy.ini",
+      "line": 7,
+      "difficulty": "medium",
+      "type": "api_key_commented",
+      "value": "backup_key_xyz789abc123def456ghi",
+      "note": "Commented backup key",
+      "severity": "high"
+    },
+    {
+      "id": 23,
+      "file": "src/obfuscated.py",
+      "line": 7,
+      "difficulty": "hard",
+      "type": "stripe_key_rot13",
+      "value": "fx_yvir_frperg_xrl_12345",
+      "decoded": "sk_live_secret_key_12345",
+      "severity": "critical"
+    },
+    {
+      "id": 24,
+      "file": "src/obfuscated.py",
+      "line": 10,
+      "difficulty": "hard",
+      "type": "github_token_binary",
+      "value": "b'\\x67\\x68\\x70\\x5f\\x4d\\x79\\x47\\x69\\x74\\x48\\x75\\x62\\x54\\x6f\\x6b\\x65\\x6e\\x31\\x32\\x33\\x34\\x35\\x36'",
+      "decoded": "ghp_MyGitHubToken123456",
+      "severity": "critical"
+    },
+    {
+      "id": 25,
+      "file": "src/obfuscated.py",
+      "line": 13,
+      "difficulty": "hard",
+      "type": "aws_secret_char_array",
+      "value": "['A','W','S','_','S','E','C','R','E','T','_','K','E','Y','_','X','Y','Z','7','8','9']",
+      "decoded": "AWS_SECRET_KEY_XYZ789",
+      "severity": "critical"
+    },
+    {
+      "id": 26,
+      "file": "src/obfuscated.py",
+      "line": 17,
+      "difficulty": "hard",
+      "type": "api_token_reversed",
+      "value": "321cba_desrever_nekot_ipa",
+      "decoded": "api_token_reversed_abc123",
+      "severity": "high"
+    },
+    {
+      "id": 27,
+      "file": "src/advanced.js",
+      "line": 4,
+      "difficulty": "hard",
+      "type": "secret_template_string",
+      "value": "sk_prod_template_key_xyz",
+      "note": "Built from template literals",
+      "severity": "critical"
+    },
+    {
+      "id": 28,
+      "file": "src/advanced.js",
+      "line": 7,
+      "difficulty": "hard",
+      "type": "password_in_regex",
+      "value": "password_regex_secret_789",
+      "note": "Inside regex pattern",
+      "severity": "medium"
+    },
+    {
+      "id": 29,
+      "file": "src/advanced.js",
+      "line": 10,
+      "difficulty": "hard",
+      "type": "api_key_xor",
+      "value": "[65,82,90,75,94,91,92,75,93,67,65,90,67,92,75,91,67,95]",
+      "decoded": "api_xor_secret_key",
+      "note": "XOR encrypted with key 42",
+      "severity": "critical"
+    },
+    {
+      "id": 30,
+      "file": "src/advanced.js",
+      "line": 17,
+      "difficulty": "hard",
+      "type": "api_key_escaped_json",
+      "value": "sk_escaped_json_key_456",
+      "note": "Escaped JSON within string",
+      "severity": "high"
+    },
+    {
+      "id": 31,
+      "file": "src/Crypto.go",
+      "line": 10,
+      "difficulty": "hard",
+      "type": "secret_in_heredoc",
+      "value": "golang_heredoc_secret_999",
+      "note": "In heredoc/multi-line string",
+      "severity": "high"
+    },
+    {
+      "id": 32,
+      "file": "src/Crypto.go",
+      "line": 15,
+      "difficulty": "hard",
+      "type": "stripe_key_typo",
+      "value": "strippe_sk_live_corrected_key",
+      "decoded": "stripe_sk_live_corrected_key",
+      "note": "Intentional typo corrected programmatically",
+      "severity": "critical"
+    }
+  ],
+  "file_summary": {
+    ".env": 2,
+    "config/settings.py": 3,
+    "src/app.py": 1,
+    "config/database.yaml": 1,
+    "scripts/webhook.js": 1,
+    "config/app.properties": 1,
+    "id_rsa": 1,
+    "config/oauth.json": 1,
+    "src/Main.java": 1,
+    "src/config.py": 3,
+    "scripts/deploy.sh": 2,
+    "config/keys.yaml": 2,
+    "src/database.sql": 1,
+    "config/legacy.ini": 2,
+    "src/obfuscated.py": 4,
+    "src/advanced.js": 4,
+    "src/Crypto.go": 2
+  },
+  "notes": {
+    "easy_secrets": "Standard patterns that any decent secret scanner should detect",
+    "medium_secrets": "Slightly obfuscated - base64, hex, concatenated, or in comments",
+    "hard_secrets": "Well hidden - ROT13, binary, XOR, reversed, split across constructs"
+  }
+}
--- a/backend/benchmarks/category_configs.py
+++ b/backend/benchmarks/category_configs.py
@@ -0,0 +1,151 @@
+"""
+Category-specific benchmark configurations
+
+Defines expected metrics and performance thresholds for each module category.
+"""
+
+from dataclasses import dataclass
+from typing import List, Dict
+from enum import Enum
+
+
+class ModuleCategory(str, Enum):
+    """Module categories for benchmarking"""
+    FUZZER = "fuzzer"
+    SCANNER = "scanner"
+    ANALYZER = "analyzer"
+    SECRET_DETECTION = "secret_detection"
+    REPORTER = "reporter"
+
+
+@dataclass
+class CategoryBenchmarkConfig:
+    """Benchmark configuration for a module category"""
+    category: ModuleCategory
+    expected_metrics: List[str]
+    performance_thresholds: Dict[str, float]
+    description: str
+
+
+# Fuzzer category configuration
+FUZZER_CONFIG = CategoryBenchmarkConfig(
+    category=ModuleCategory.FUZZER,
+    expected_metrics=[
+        "execs_per_sec",
+        "coverage_rate",
+        "time_to_first_crash",
+        "corpus_efficiency",
+        "execution_time",
+        "peak_memory_mb"
+    ],
+    performance_thresholds={
+        "min_execs_per_sec": 1000,  # Minimum executions per second
+        "max_execution_time_small": 10.0,  # Max time for small project (seconds)
+        "max_execution_time_medium": 60.0,  # Max time for medium project
+        "max_memory_mb": 2048,  # Maximum memory usage
+        "min_coverage_rate": 1.0,  # Minimum new coverage per second
+    },
+    description="Fuzzing modules: coverage-guided fuzz testing"
+)
+
+# Scanner category configuration
+SCANNER_CONFIG = CategoryBenchmarkConfig(
+    category=ModuleCategory.SCANNER,
+    expected_metrics=[
+        "files_per_sec",
+        "loc_per_sec",
+        "execution_time",
+        "peak_memory_mb",
+        "findings_count"
+    ],
+    performance_thresholds={
+        "min_files_per_sec": 100,  # Minimum files scanned per second
+        "min_loc_per_sec": 10000,  # Minimum lines of code per second
+        "max_execution_time_small": 1.0,
+        "max_execution_time_medium": 10.0,
+        "max_memory_mb": 512,
+    },
+    description="File scanning modules: fast pattern-based scanning"
+)
+
+# Secret detection category configuration
+SECRET_DETECTION_CONFIG = CategoryBenchmarkConfig(
+    category=ModuleCategory.SECRET_DETECTION,
+    expected_metrics=[
+        "patterns_per_sec",
+        "precision",
+        "recall",
+        "f1_score",
+        "false_positive_rate",
+        "execution_time",
+        "peak_memory_mb"
+    ],
+    performance_thresholds={
+        "min_patterns_per_sec": 1000,
+        "min_precision": 0.90,  # 90% precision target
+        "min_recall": 0.95,  # 95% recall target
+        "max_false_positives": 5,  # Max false positives per 100 secrets
+        "max_execution_time_small": 2.0,
+        "max_execution_time_medium": 20.0,
+        "max_memory_mb": 1024,
+    },
+    description="Secret detection modules: high precision pattern matching"
+)
+
+# Analyzer category configuration
+ANALYZER_CONFIG = CategoryBenchmarkConfig(
+    category=ModuleCategory.ANALYZER,
+    expected_metrics=[
+        "analysis_depth",
+        "files_analyzed_per_sec",
+        "execution_time",
+        "peak_memory_mb",
+        "findings_count",
+        "accuracy"
+    ],
+    performance_thresholds={
+        "min_files_per_sec": 10,  # Slower than scanners due to deep analysis
+        "max_execution_time_small": 5.0,
+        "max_execution_time_medium": 60.0,
+        "max_memory_mb": 2048,
+        "min_accuracy": 0.85,  # 85% accuracy target
+    },
+    description="Code analysis modules: deep semantic analysis"
+)
+
+# Reporter category configuration
+REPORTER_CONFIG = CategoryBenchmarkConfig(
+    category=ModuleCategory.REPORTER,
+    expected_metrics=[
+        "report_generation_time",
+        "findings_per_sec",
+        "peak_memory_mb"
+    ],
+    performance_thresholds={
+        "max_report_time_100_findings": 1.0,  # Max 1 second for 100 findings
+        "max_report_time_1000_findings": 10.0,  # Max 10 seconds for 1000 findings
+        "max_memory_mb": 256,
+    },
+    description="Reporting modules: fast report generation"
+)
+
+
+# Category configurations map
+CATEGORY_CONFIGS = {
+    ModuleCategory.FUZZER: FUZZER_CONFIG,
+    ModuleCategory.SCANNER: SCANNER_CONFIG,
+    ModuleCategory.SECRET_DETECTION: SECRET_DETECTION_CONFIG,
+    ModuleCategory.ANALYZER: ANALYZER_CONFIG,
+    ModuleCategory.REPORTER: REPORTER_CONFIG,
+}
+
+
+def get_category_config(category: ModuleCategory) -> CategoryBenchmarkConfig:
+    """Get benchmark configuration for a category"""
+    return CATEGORY_CONFIGS[category]
+
+
+def get_threshold(category: ModuleCategory, metric: str) -> float:
+    """Get performance threshold for a specific metric"""
+    config = get_category_config(category)
+    return config.performance_thresholds.get(metric, 0.0)
--- a/backend/benchmarks/conftest.py
+++ b/backend/benchmarks/conftest.py
@@ -0,0 +1,60 @@
+"""
+Benchmark fixtures and configuration
+"""
+
+import sys
+from pathlib import Path
+import pytest
+
+# Add parent directories to path
+BACKEND_ROOT = Path(__file__).resolve().parents[1]
+TOOLBOX = BACKEND_ROOT / "toolbox"
+
+if str(BACKEND_ROOT) not in sys.path:
+    sys.path.insert(0, str(BACKEND_ROOT))
+if str(TOOLBOX) not in sys.path:
+    sys.path.insert(0, str(TOOLBOX))
+
+
+# ============================================================================
+# Benchmark Fixtures
+# ============================================================================
+
+@pytest.fixture(scope="session")
+def benchmark_fixtures_dir():
+    """Path to benchmark fixtures directory"""
+    return Path(__file__).parent / "fixtures"
+
+
+@pytest.fixture(scope="session")
+def small_project_fixture(benchmark_fixtures_dir):
+    """Small project fixture (~1K LOC)"""
+    return benchmark_fixtures_dir / "small"
+
+
+@pytest.fixture(scope="session")
+def medium_project_fixture(benchmark_fixtures_dir):
+    """Medium project fixture (~10K LOC)"""
+    return benchmark_fixtures_dir / "medium"
+
+
+@pytest.fixture(scope="session")
+def large_project_fixture(benchmark_fixtures_dir):
+    """Large project fixture (~100K LOC)"""
+    return benchmark_fixtures_dir / "large"
+
+
+# ============================================================================
+# pytest-benchmark Configuration
+# ============================================================================
+
+def pytest_configure(config):
+    """Configure pytest-benchmark"""
+    config.addinivalue_line(
+        "markers", "benchmark: mark test as a benchmark"
+    )
+
+
+def pytest_benchmark_group_stats(config, benchmarks, group_by):
+    """Group benchmark results by category"""
+    return group_by
--- a/backend/mcp-config.json
+++ b/backend/mcp-config.json
@@ -0,0 +1,122 @@
+{
+  "name": "FuzzForge Security Testing Platform",
+  "description": "MCP server for FuzzForge security testing workflows via Docker Compose",
+  "version": "0.6.0",
+  "connection": {
+    "type": "http",
+    "host": "localhost",
+    "port": 8010,
+    "base_url": "http://localhost:8010",
+    "mcp_endpoint": "/mcp"
+  },
+  "docker_compose": {
+    "service": "fuzzforge-backend",
+    "command": "docker compose up -d",
+    "health_check": "http://localhost:8000/health"
+  },
+  "capabilities": {
+    "tools": [
+      {
+        "name": "submit_security_scan_mcp",
+        "description": "Submit a security scanning workflow for execution",
+        "parameters": {
+          "workflow_name": "string",
+          "target_path": "string",
+          "volume_mode": "string (ro|rw)",
+          "parameters": "object"
+        }
+      },
+      {
+        "name": "get_comprehensive_scan_summary",
+        "description": "Get a comprehensive summary of scan results with analysis",
+        "parameters": {
+          "run_id": "string"
+        }
+      }
+    ],
+    "fastapi_routes": [
+      {
+        "method": "GET",
+        "path": "/",
+        "description": "Get API status and loaded workflows count"
+      },
+      {
+        "method": "GET",
+        "path": "/workflows/",
+        "description": "List all available security testing workflows"
+      },
+      {
+        "method": "POST",
+        "path": "/workflows/{workflow_name}/submit",
+        "description": "Submit a security scanning workflow for execution"
+      },
+      {
+        "method": "GET",
+        "path": "/runs/{run_id}/status",
+        "description": "Get the current status of a security scan run"
+      },
+      {
+        "method": "GET",
+        "path": "/runs/{run_id}/findings",
+        "description": "Get security findings from a completed scan"
+      },
+      {
+        "method": "GET",
+        "path": "/fuzzing/{run_id}/stats",
+        "description": "Get fuzzing statistics for a run"
+      }
+    ]
+  },
+  "examples": {
+    "start_infrastructure_scan": {
+      "description": "Run infrastructure security scan on a project",
+      "steps": [
+        "1. Start Docker Compose: docker compose up -d",
+        "2. Submit scan via MCP tool: submit_security_scan_mcp",
+        "3. Monitor status and get results"
+      ],
+      "workflow_name": "infrastructure_scan",
+      "target_path": "/Users/tduhamel/Documents/FuzzingLabs/fuzzforge_alpha/test_projects/infrastructure_vulnerable",
+      "parameters": {
+        "checkov_config": {
+          "severity": ["HIGH", "MEDIUM", "LOW"]
+        },
+        "hadolint_config": {
+          "severity": ["error", "warning", "info", "style"]
+        }
+      }
+    },
+    "static_analysis_scan": {
+      "description": "Run static analysis security scan",
+      "workflow_name": "static_analysis_scan",
+      "target_path": "/Users/tduhamel/Documents/FuzzingLabs/fuzzforge_alpha/test_projects/static_analysis_vulnerable",
+      "parameters": {
+        "bandit_config": {
+          "severity": ["HIGH", "MEDIUM", "LOW"]
+        },
+        "opengrep_config": {
+          "severity": ["HIGH", "MEDIUM", "LOW"]
+        }
+      }
+    },
+    "secret_detection_scan": {
+      "description": "Run secret detection scan",
+      "workflow_name": "secret_detection_scan",
+      "target_path": "/Users/tduhamel/Documents/FuzzingLabs/fuzzforge_alpha/test_projects/secret_detection_vulnerable",
+      "parameters": {
+        "trufflehog_config": {
+          "verified_only": false
+        },
+        "gitleaks_config": {
+          "no_git": true
+        }
+      }
+    }
+  },
+  "usage": {
+    "via_mcp": "Connect MCP client to http://localhost:8010/mcp after starting Docker Compose",
+    "via_api": "Use FastAPI endpoints directly at http://localhost:8000",
+    "start_system": "docker compose up -d",
+    "stop_system": "docker compose down"
+  }
+}
--- a/backend/pyproject.toml
+++ b/backend/pyproject.toml
@@ -0,0 +1,41 @@
+[project]
+name = "backend"
+version = "0.7.0"
+description = "FuzzForge OSS backend"
+authors = []
+readme = "README.md"
+requires-python = ">=3.11"
+dependencies = [
+    "fastapi>=0.116.1",
+    "temporalio>=1.6.0",
+    "boto3>=1.34.0",
+    "pydantic>=2.0.0",
+    "pyyaml>=6.0",
+    "docker>=7.0.0",
+    "aiofiles>=23.0.0",
+    "uvicorn>=0.30.0",
+    "aiohttp>=3.12.15",
+    "fastmcp",
+]
+
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-asyncio>=0.23.0",
+    "pytest-benchmark>=4.0.0",
+    "pytest-cov>=5.0.0",
+    "pytest-xdist>=3.5.0",
+    "pytest-mock>=3.12.0",
+    "httpx>=0.27.0",
+    "ruff>=0.1.0",
+]
+
+[tool.pytest.ini_options]
+asyncio_mode = "auto"
+testpaths = ["tests", "benchmarks"]
+python_files = ["test_*.py", "bench_*.py"]
+python_classes = ["Test*"]
+python_functions = ["test_*"]
+markers = [
+    "benchmark: mark test as a benchmark",
+]
--- a/backend/src/init.py
+++ b/backend/src/init.py
@@ -0,0 +1,11 @@
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
--- a/backend/src/api/init.py
+++ b/backend/src/api/init.py
@@ -0,0 +1,11 @@
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
--- a/backend/src/api/fuzzing.py
+++ b/backend/src/api/fuzzing.py
@@ -0,0 +1,325 @@
+"""
+API endpoints for fuzzing workflow management and real-time monitoring
+"""
+
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+import logging
+from typing import List, Dict
+from fastapi import APIRouter, HTTPException, WebSocket, WebSocketDisconnect
+from fastapi.responses import StreamingResponse
+import asyncio
+import json
+from datetime import datetime
+
+from src.models.findings import (
+    FuzzingStats,
+    CrashReport
+)
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/fuzzing", tags=["fuzzing"])
+
+# In-memory storage for real-time stats (in production, use Redis or similar)
+fuzzing_stats: Dict[str, FuzzingStats] = {}
+crash_reports: Dict[str, List[CrashReport]] = {}
+active_connections: Dict[str, List[WebSocket]] = {}
+
+
+def initialize_fuzzing_tracking(run_id: str, workflow_name: str):
+    """
+    Initialize fuzzing tracking for a new run.
+
+    This function should be called when a workflow is submitted to enable
+    real-time monitoring and stats collection.
+
+    Args:
+        run_id: The run identifier
+        workflow_name: Name of the workflow
+    """
+    fuzzing_stats[run_id] = FuzzingStats(
+        run_id=run_id,
+        workflow=workflow_name
+    )
+    crash_reports[run_id] = []
+    active_connections[run_id] = []
+
+
+@router.get("/{run_id}/stats", response_model=FuzzingStats)
+async def get_fuzzing_stats(run_id: str) -> FuzzingStats:
+    """
+    Get current fuzzing statistics for a run.
+
+    Args:
+        run_id: The fuzzing run ID
+
+    Returns:
+        Current fuzzing statistics
+
+    Raises:
+        HTTPException: 404 if run not found
+    """
+    if run_id not in fuzzing_stats:
+        raise HTTPException(
+            status_code=404,
+            detail=f"Fuzzing run not found: {run_id}"
+        )
+
+    return fuzzing_stats[run_id]
+
+
+@router.get("/{run_id}/crashes", response_model=List[CrashReport])
+async def get_crash_reports(run_id: str) -> List[CrashReport]:
+    """
+    Get crash reports for a fuzzing run.
+
+    Args:
+        run_id: The fuzzing run ID
+
+    Returns:
+        List of crash reports
+
+    Raises:
+        HTTPException: 404 if run not found
+    """
+    if run_id not in crash_reports:
+        raise HTTPException(
+            status_code=404,
+            detail=f"Fuzzing run not found: {run_id}"
+        )
+
+    return crash_reports[run_id]
+
+
+@router.post("/{run_id}/stats")
+async def update_fuzzing_stats(run_id: str, stats: FuzzingStats):
+    """
+    Update fuzzing statistics (called by fuzzing workflows).
+
+    Args:
+        run_id: The fuzzing run ID
+        stats: Updated statistics
+
+    Raises:
+        HTTPException: 404 if run not found
+    """
+    if run_id not in fuzzing_stats:
+        raise HTTPException(
+            status_code=404,
+            detail=f"Fuzzing run not found: {run_id}"
+        )
+
+    # Update stats
+    fuzzing_stats[run_id] = stats
+
+    # Debug: log reception for live instrumentation
+    try:
+        logger.info(
+            "Received fuzzing stats update: run_id=%s exec=%s eps=%.2f crashes=%s corpus=%s coverage=%s elapsed=%ss",
+            run_id,
+            stats.executions,
+            stats.executions_per_sec,
+            stats.crashes,
+            stats.corpus_size,
+            stats.coverage,
+            stats.elapsed_time,
+        )
+    except Exception:
+        pass
+
+    # Notify connected WebSocket clients
+    if run_id in active_connections:
+        message = {
+            "type": "stats_update",
+            "data": stats.model_dump()
+        }
+        for websocket in active_connections[run_id][:]:  # Copy to avoid modification during iteration
+            try:
+                await websocket.send_text(json.dumps(message))
+            except Exception:
+                # Remove disconnected clients
+                active_connections[run_id].remove(websocket)
+
+
+@router.post("/{run_id}/crash")
+async def report_crash(run_id: str, crash: CrashReport):
+    """
+    Report a new crash (called by fuzzing workflows).
+
+    Args:
+        run_id: The fuzzing run ID
+        crash: Crash report details
+    """
+    if run_id not in crash_reports:
+        crash_reports[run_id] = []
+
+    # Add crash report
+    crash_reports[run_id].append(crash)
+
+    # Update stats
+    if run_id in fuzzing_stats:
+        fuzzing_stats[run_id].crashes += 1
+        fuzzing_stats[run_id].last_crash_time = crash.timestamp
+
+    # Notify connected WebSocket clients
+    if run_id in active_connections:
+        message = {
+            "type": "crash_report",
+            "data": crash.model_dump()
+        }
+        for websocket in active_connections[run_id][:]:
+            try:
+                await websocket.send_text(json.dumps(message))
+            except Exception:
+                active_connections[run_id].remove(websocket)
+
+
+@router.websocket("/{run_id}/live")
+async def websocket_endpoint(websocket: WebSocket, run_id: str):
+    """
+    WebSocket endpoint for real-time fuzzing updates.
+
+    Args:
+        websocket: WebSocket connection
+        run_id: The fuzzing run ID to monitor
+    """
+    await websocket.accept()
+
+    # Initialize connection tracking
+    if run_id not in active_connections:
+        active_connections[run_id] = []
+    active_connections[run_id].append(websocket)
+
+    try:
+        # Send current stats on connection
+        if run_id in fuzzing_stats:
+            current = fuzzing_stats[run_id]
+            if isinstance(current, dict):
+                payload = current
+            elif hasattr(current, "model_dump"):
+                payload = current.model_dump()
+            elif hasattr(current, "dict"):
+                payload = current.dict()
+            else:
+                payload = getattr(current, "__dict__", {"run_id": run_id})
+            message = {"type": "stats_update", "data": payload}
+            await websocket.send_text(json.dumps(message))
+
+        # Keep connection alive
+        while True:
+            try:
+                # Wait for ping or handle disconnect
+                data = await asyncio.wait_for(websocket.receive_text(), timeout=30.0)
+                # Echo back for ping-pong
+                if data == "ping":
+                    await websocket.send_text("pong")
+            except asyncio.TimeoutError:
+                # Send periodic heartbeat
+                await websocket.send_text(json.dumps({"type": "heartbeat"}))
+
+    except WebSocketDisconnect:
+        # Clean up connection
+        if run_id in active_connections and websocket in active_connections[run_id]:
+            active_connections[run_id].remove(websocket)
+    except Exception as e:
+        logger.error(f"WebSocket error for run {run_id}: {e}")
+        if run_id in active_connections and websocket in active_connections[run_id]:
+            active_connections[run_id].remove(websocket)
+
+
+@router.get("/{run_id}/stream")
+async def stream_fuzzing_updates(run_id: str):
+    """
+    Server-Sent Events endpoint for real-time fuzzing updates.
+
+    Args:
+        run_id: The fuzzing run ID to monitor
+
+    Returns:
+        Streaming response with real-time updates
+    """
+    if run_id not in fuzzing_stats:
+        raise HTTPException(
+            status_code=404,
+            detail=f"Fuzzing run not found: {run_id}"
+        )
+
+    async def event_stream():
+        """Generate server-sent events for fuzzing updates"""
+        last_stats_time = datetime.utcnow()
+
+        while True:
+            try:
+                # Send current stats
+                if run_id in fuzzing_stats:
+                    current_stats = fuzzing_stats[run_id]
+                    if isinstance(current_stats, dict):
+                        stats_payload = current_stats
+                    elif hasattr(current_stats, "model_dump"):
+                        stats_payload = current_stats.model_dump()
+                    elif hasattr(current_stats, "dict"):
+                        stats_payload = current_stats.dict()
+                    else:
+                        stats_payload = getattr(current_stats, "__dict__", {"run_id": run_id})
+                    event_data = f"data: {json.dumps({'type': 'stats', 'data': stats_payload})}\n\n"
+                    yield event_data
+
+                # Send recent crashes
+                if run_id in crash_reports:
+                    recent_crashes = [
+                        crash for crash in crash_reports[run_id]
+                        if crash.timestamp > last_stats_time
+                    ]
+                    for crash in recent_crashes:
+                        event_data = f"data: {json.dumps({'type': 'crash', 'data': crash.model_dump()})}\n\n"
+                        yield event_data
+
+                last_stats_time = datetime.utcnow()
+                await asyncio.sleep(5)  # Update every 5 seconds
+
+            except Exception as e:
+                logger.error(f"Error in event stream for run {run_id}: {e}")
+                break
+
+    return StreamingResponse(
+        event_stream(),
+        media_type="text/event-stream",
+        headers={
+            "Cache-Control": "no-cache",
+            "Connection": "keep-alive",
+        }
+    )
+
+
+@router.delete("/{run_id}")
+async def cleanup_fuzzing_run(run_id: str):
+    """
+    Clean up fuzzing run data.
+
+    Args:
+        run_id: The fuzzing run ID to clean up
+    """
+    # Clean up tracking data
+    fuzzing_stats.pop(run_id, None)
+    crash_reports.pop(run_id, None)
+
+    # Close any active WebSocket connections
+    if run_id in active_connections:
+        for websocket in active_connections[run_id]:
+            try:
+                await websocket.close()
+            except Exception:
+                pass
+        del active_connections[run_id]
+
+    return {"message": f"Cleaned up fuzzing run {run_id}"}
--- a/backend/src/api/runs.py
+++ b/backend/src/api/runs.py
@@ -0,0 +1,177 @@
+"""
+API endpoints for workflow run management and findings retrieval
+"""
+
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+import logging
+from fastapi import APIRouter, HTTPException, Depends
+
+from src.models.findings import WorkflowFindings, WorkflowStatus
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter(prefix="/runs", tags=["runs"])
+
+
+def get_temporal_manager():
+    """Dependency to get the Temporal manager instance"""
+    from src.main import temporal_mgr
+    return temporal_mgr
+
+
+@router.get("/{run_id}/status", response_model=WorkflowStatus)
+async def get_run_status(
+    run_id: str,
+    temporal_mgr=Depends(get_temporal_manager)
+) -> WorkflowStatus:
+    """
+    Get the current status of a workflow run.
+
+    Args:
+        run_id: The workflow run ID
+
+    Returns:
+        Status information including state, timestamps, and completion flags
+
+    Raises:
+        HTTPException: 404 if run not found
+    """
+    try:
+        status = await temporal_mgr.get_workflow_status(run_id)
+
+        # Map Temporal status to response format
+        workflow_status = status.get("status", "UNKNOWN")
+        is_completed = workflow_status in ["COMPLETED", "FAILED", "CANCELLED"]
+        is_failed = workflow_status == "FAILED"
+        is_running = workflow_status == "RUNNING"
+
+        return WorkflowStatus(
+            run_id=run_id,
+            workflow="unknown",  # Temporal doesn't track workflow name in status
+            status=workflow_status,
+            is_completed=is_completed,
+            is_failed=is_failed,
+            is_running=is_running,
+            created_at=status.get("start_time"),
+            updated_at=status.get("close_time") or status.get("execution_time")
+        )
+
+    except Exception as e:
+        logger.error(f"Failed to get status for run {run_id}: {e}")
+        raise HTTPException(
+            status_code=404,
+            detail=f"Run not found: {run_id}"
+        )
+
+
+@router.get("/{run_id}/findings", response_model=WorkflowFindings)
+async def get_run_findings(
+    run_id: str,
+    temporal_mgr=Depends(get_temporal_manager)
+) -> WorkflowFindings:
+    """
+    Get the findings from a completed workflow run.
+
+    Args:
+        run_id: The workflow run ID
+
+    Returns:
+        SARIF-formatted findings from the workflow execution
+
+    Raises:
+        HTTPException: 404 if run not found, 400 if run not completed
+    """
+    try:
+        # Get run status first
+        status = await temporal_mgr.get_workflow_status(run_id)
+        workflow_status = status.get("status", "UNKNOWN")
+
+        if workflow_status not in ["COMPLETED", "FAILED", "CANCELLED"]:
+            if workflow_status == "RUNNING":
+                raise HTTPException(
+                    status_code=400,
+                    detail=f"Run {run_id} is still running. Current status: {workflow_status}"
+                )
+            else:
+                raise HTTPException(
+                    status_code=400,
+                    detail=f"Run {run_id} not completed. Status: {workflow_status}"
+                )
+
+        if workflow_status == "FAILED":
+            raise HTTPException(
+                status_code=400,
+                detail=f"Run {run_id} failed. Status: {workflow_status}"
+            )
+
+        # Get the workflow result
+        result = await temporal_mgr.get_workflow_result(run_id)
+
+        # Extract SARIF from result (handle None for backwards compatibility)
+        if isinstance(result, dict):
+            sarif = result.get("sarif") or {}
+        else:
+            sarif = {}
+
+        # Metadata
+        metadata = {
+            "completion_time": status.get("close_time"),
+            "workflow_version": "unknown"
+        }
+
+        return WorkflowFindings(
+            workflow="unknown",
+            run_id=run_id,
+            sarif=sarif,
+            metadata=metadata
+        )
+
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"Failed to get findings for run {run_id}: {e}")
+        raise HTTPException(
+            status_code=500,
+            detail=f"Failed to retrieve findings: {str(e)}"
+        )
+
+
+@router.get("/{workflow_name}/findings/{run_id}", response_model=WorkflowFindings)
+async def get_workflow_findings(
+    workflow_name: str,
+    run_id: str,
+    temporal_mgr=Depends(get_temporal_manager)
+) -> WorkflowFindings:
+    """
+    Get findings for a specific workflow run.
+
+    Alternative endpoint that includes workflow name in the path for clarity.
+
+    Args:
+        workflow_name: Name of the workflow
+        run_id: The workflow run ID
+
+    Returns:
+        SARIF-formatted findings from the workflow execution
+
+    Raises:
+        HTTPException: 404 if workflow or run not found, 400 if run not completed
+    """
+    if workflow_name not in temporal_mgr.workflows:
+        raise HTTPException(
+            status_code=404,
+            detail=f"Workflow not found: {workflow_name}"
+        )
+
+    # Delegate to the main findings endpoint
+    return await get_run_findings(run_id, temporal_mgr)
--- a/backend/src/api/workflows.py
+++ b/backend/src/api/workflows.py
@@ -0,0 +1,635 @@
+"""
+API endpoints for workflow management with enhanced error handling
+"""
+
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+import logging
+import traceback
+import tempfile
+from typing import List, Dict, Any, Optional
+from fastapi import APIRouter, HTTPException, Depends, UploadFile, File, Form
+from pathlib import Path
+
+from src.models.findings import (
+    WorkflowSubmission,
+    WorkflowMetadata,
+    WorkflowListItem,
+    RunSubmissionResponse
+)
+from src.temporal.discovery import WorkflowDiscovery
+
+logger = logging.getLogger(__name__)
+
+# Configuration for file uploads
+MAX_UPLOAD_SIZE = 10 * 1024 * 1024 * 1024  # 10 GB
+ALLOWED_CONTENT_TYPES = [
+    "application/gzip",
+    "application/x-gzip",
+    "application/x-tar",
+    "application/x-compressed-tar",
+    "application/octet-stream",  # Generic binary
+]
+
+router = APIRouter(prefix="/workflows", tags=["workflows"])
+
+
+def create_structured_error_response(
+    error_type: str,
+    message: str,
+    workflow_name: Optional[str] = None,
+    run_id: Optional[str] = None,
+    container_info: Optional[Dict[str, Any]] = None,
+    deployment_info: Optional[Dict[str, Any]] = None,
+    suggestions: Optional[List[str]] = None
+) -> Dict[str, Any]:
+    """Create a structured error response with rich context."""
+    error_response = {
+        "error": {
+            "type": error_type,
+            "message": message,
+            "timestamp": __import__("datetime").datetime.utcnow().isoformat() + "Z"
+        }
+    }
+
+    if workflow_name:
+        error_response["error"]["workflow_name"] = workflow_name
+
+    if run_id:
+        error_response["error"]["run_id"] = run_id
+
+    if container_info:
+        error_response["error"]["container"] = container_info
+
+    if deployment_info:
+        error_response["error"]["deployment"] = deployment_info
+
+    if suggestions:
+        error_response["error"]["suggestions"] = suggestions
+
+    return error_response
+
+
+def get_temporal_manager():
+    """Dependency to get the Temporal manager instance"""
+    from src.main import temporal_mgr
+    return temporal_mgr
+
+
+@router.get("/", response_model=List[WorkflowListItem])
+async def list_workflows(
+    temporal_mgr=Depends(get_temporal_manager)
+) -> List[WorkflowListItem]:
+    """
+    List all discovered workflows with their metadata.
+
+    Returns a summary of each workflow including name, version, description,
+    author, and tags.
+    """
+    workflows = []
+    for name, info in temporal_mgr.workflows.items():
+        workflows.append(WorkflowListItem(
+            name=name,
+            version=info.metadata.get("version", "0.6.0"),
+            description=info.metadata.get("description", ""),
+            author=info.metadata.get("author"),
+            tags=info.metadata.get("tags", [])
+        ))
+
+    return workflows
+
+
+@router.get("/metadata/schema")
+async def get_metadata_schema() -> Dict[str, Any]:
+    """
+    Get the JSON schema for workflow metadata files.
+
+    This schema defines the structure and requirements for metadata.yaml files
+    that must accompany each workflow.
+    """
+    return WorkflowDiscovery.get_metadata_schema()
+
+
+@router.get("/{workflow_name}/metadata", response_model=WorkflowMetadata)
+async def get_workflow_metadata(
+    workflow_name: str,
+    temporal_mgr=Depends(get_temporal_manager)
+) -> WorkflowMetadata:
+    """
+    Get complete metadata for a specific workflow.
+
+    Args:
+        workflow_name: Name of the workflow
+
+    Returns:
+        Complete metadata including parameters schema, supported volume modes,
+        required modules, and more.
+
+    Raises:
+        HTTPException: 404 if workflow not found
+    """
+    if workflow_name not in temporal_mgr.workflows:
+        available_workflows = list(temporal_mgr.workflows.keys())
+        error_response = create_structured_error_response(
+            error_type="WorkflowNotFound",
+            message=f"Workflow '{workflow_name}' not found",
+            workflow_name=workflow_name,
+            suggestions=[
+                f"Available workflows: {', '.join(available_workflows)}",
+                "Use GET /workflows/ to see all available workflows",
+                "Check workflow name spelling and case sensitivity"
+            ]
+        )
+        raise HTTPException(
+            status_code=404,
+            detail=error_response
+        )
+
+    info = temporal_mgr.workflows[workflow_name]
+    metadata = info.metadata
+
+    return WorkflowMetadata(
+        name=workflow_name,
+        version=metadata.get("version", "0.6.0"),
+        description=metadata.get("description", ""),
+        author=metadata.get("author"),
+        tags=metadata.get("tags", []),
+        parameters=metadata.get("parameters", {}),
+        default_parameters=metadata.get("default_parameters", {}),
+        required_modules=metadata.get("required_modules", [])
+    )
+
+
+@router.post("/{workflow_name}/submit", response_model=RunSubmissionResponse)
+async def submit_workflow(
+    workflow_name: str,
+    submission: WorkflowSubmission,
+    temporal_mgr=Depends(get_temporal_manager)
+) -> RunSubmissionResponse:
+    """
+    Submit a workflow for execution.
+
+    Args:
+        workflow_name: Name of the workflow to execute
+        submission: Submission parameters including target path and parameters
+
+    Returns:
+        Run submission response with run_id and initial status
+
+    Raises:
+        HTTPException: 404 if workflow not found, 400 for invalid parameters
+    """
+    if workflow_name not in temporal_mgr.workflows:
+        available_workflows = list(temporal_mgr.workflows.keys())
+        error_response = create_structured_error_response(
+            error_type="WorkflowNotFound",
+            message=f"Workflow '{workflow_name}' not found",
+            workflow_name=workflow_name,
+            suggestions=[
+                f"Available workflows: {', '.join(available_workflows)}",
+                "Use GET /workflows/ to see all available workflows",
+                "Check workflow name spelling and case sensitivity"
+            ]
+        )
+        raise HTTPException(
+            status_code=404,
+            detail=error_response
+        )
+
+    try:
+        # Upload target file to MinIO and get target_id
+        target_path = Path(submission.target_path)
+        if not target_path.exists():
+            raise ValueError(f"Target path does not exist: {submission.target_path}")
+
+        # Upload target (using anonymous user for now)
+        target_id = await temporal_mgr.upload_target(
+            file_path=target_path,
+            user_id="api-user",
+            metadata={"workflow": workflow_name}
+        )
+
+        # Merge default parameters with user parameters
+        workflow_info = temporal_mgr.workflows[workflow_name]
+        metadata = workflow_info.metadata or {}
+        defaults = metadata.get("default_parameters", {})
+        user_params = submission.parameters or {}
+        workflow_params = {**defaults, **user_params}
+
+        # Start workflow execution
+        handle = await temporal_mgr.run_workflow(
+            workflow_name=workflow_name,
+            target_id=target_id,
+            workflow_params=workflow_params
+        )
+
+        run_id = handle.id
+
+        # Initialize fuzzing tracking if this looks like a fuzzing workflow
+        workflow_info = temporal_mgr.workflows.get(workflow_name, {})
+        workflow_tags = workflow_info.metadata.get("tags", []) if hasattr(workflow_info, 'metadata') else []
+        if "fuzzing" in workflow_tags or "fuzz" in workflow_name.lower():
+            from src.api.fuzzing import initialize_fuzzing_tracking
+            initialize_fuzzing_tracking(run_id, workflow_name)
+
+        return RunSubmissionResponse(
+            run_id=run_id,
+            status="RUNNING",
+            workflow=workflow_name,
+            message=f"Workflow '{workflow_name}' submitted successfully"
+        )
+
+    except ValueError as e:
+        # Parameter validation errors
+        error_response = create_structured_error_response(
+            error_type="ValidationError",
+            message=str(e),
+            workflow_name=workflow_name,
+            suggestions=[
+                "Check parameter types and values",
+                "Use GET /workflows/{workflow_name}/parameters for schema",
+                "Ensure all required parameters are provided"
+            ]
+        )
+        raise HTTPException(status_code=400, detail=error_response)
+
+    except Exception as e:
+        logger.error(f"Failed to submit workflow '{workflow_name}': {e}")
+        logger.error(f"Traceback: {traceback.format_exc()}")
+
+        # Try to get more context about the error
+        container_info = None
+        deployment_info = None
+        suggestions = []
+
+        error_message = str(e)
+        error_type = "WorkflowSubmissionError"
+
+        # Detect specific error patterns
+        if "workflow" in error_message.lower() and "not found" in error_message.lower():
+            error_type = "WorkflowError"
+            suggestions.extend([
+                "Check if Temporal server is running and accessible",
+                "Verify workflow workers are running",
+                "Check if workflow is registered with correct vertical",
+                "Ensure Docker is running and has sufficient resources"
+            ])
+
+        elif "volume" in error_message.lower() or "mount" in error_message.lower():
+            error_type = "VolumeError"
+            suggestions.extend([
+                "Check if the target path exists and is accessible",
+                "Verify file permissions (Docker needs read access)",
+                "Ensure the path is not in use by another process",
+                "Try using an absolute path instead of relative path"
+            ])
+
+        elif "memory" in error_message.lower() or "resource" in error_message.lower():
+            error_type = "ResourceError"
+            suggestions.extend([
+                "Check system memory and CPU availability",
+                "Consider reducing resource limits or dataset size",
+                "Monitor Docker resource usage",
+                "Increase Docker memory limits if needed"
+            ])
+
+        elif "image" in error_message.lower():
+            error_type = "ImageError"
+            suggestions.extend([
+                "Check if the workflow image exists",
+                "Verify Docker registry access",
+                "Try rebuilding the workflow image",
+                "Check network connectivity to registries"
+            ])
+
+        else:
+            suggestions.extend([
+                "Check FuzzForge backend logs for details",
+                "Verify all services are running (docker-compose up -d)",
+                "Try restarting the workflow deployment",
+                "Contact support if the issue persists"
+            ])
+
+        error_response = create_structured_error_response(
+            error_type=error_type,
+            message=f"Failed to submit workflow: {error_message}",
+            workflow_name=workflow_name,
+            container_info=container_info,
+            deployment_info=deployment_info,
+            suggestions=suggestions
+        )
+
+        raise HTTPException(
+            status_code=500,
+            detail=error_response
+        )
+
+
+@router.post("/{workflow_name}/upload-and-submit", response_model=RunSubmissionResponse)
+async def upload_and_submit_workflow(
+    workflow_name: str,
+    file: UploadFile = File(..., description="Target file or tarball to analyze"),
+    parameters: Optional[str] = Form(None, description="JSON-encoded workflow parameters"),
+    timeout: Optional[int] = Form(None, description="Timeout in seconds"),
+    temporal_mgr=Depends(get_temporal_manager)
+) -> RunSubmissionResponse:
+    """
+    Upload a target file/tarball and submit workflow for execution.
+
+    This endpoint accepts multipart/form-data uploads and is the recommended
+    way to submit workflows from remote CLI clients.
+
+    Args:
+        workflow_name: Name of the workflow to execute
+        file: Target file or tarball (compressed directory)
+        parameters: JSON string of workflow parameters (optional)
+        timeout: Execution timeout in seconds (optional)
+
+    Returns:
+        Run submission response with run_id and initial status
+
+    Raises:
+        HTTPException: 404 if workflow not found, 400 for invalid parameters,
+                      413 if file too large
+    """
+    if workflow_name not in temporal_mgr.workflows:
+        available_workflows = list(temporal_mgr.workflows.keys())
+        error_response = create_structured_error_response(
+            error_type="WorkflowNotFound",
+            message=f"Workflow '{workflow_name}' not found",
+            workflow_name=workflow_name,
+            suggestions=[
+                f"Available workflows: {', '.join(available_workflows)}",
+                "Use GET /workflows/ to see all available workflows"
+            ]
+        )
+        raise HTTPException(status_code=404, detail=error_response)
+
+    temp_file_path = None
+
+    try:
+        # Validate file size
+        file_size = 0
+        chunk_size = 1024 * 1024  # 1MB chunks
+
+        # Create temporary file
+        temp_fd, temp_file_path = tempfile.mkstemp(suffix=".tar.gz")
+
+        logger.info(f"Receiving file upload for workflow '{workflow_name}': {file.filename}")
+
+        # Stream file to disk
+        with open(temp_fd, 'wb') as temp_file:
+            while True:
+                chunk = await file.read(chunk_size)
+                if not chunk:
+                    break
+
+                file_size += len(chunk)
+
+                # Check size limit
+                if file_size > MAX_UPLOAD_SIZE:
+                    raise HTTPException(
+                        status_code=413,
+                        detail=create_structured_error_response(
+                            error_type="FileTooLarge",
+                            message=f"File size exceeds maximum allowed size of {MAX_UPLOAD_SIZE / (1024**3):.1f} GB",
+                            workflow_name=workflow_name,
+                            suggestions=[
+                                "Reduce the size of your target directory",
+                                "Exclude unnecessary files (build artifacts, dependencies, etc.)",
+                                "Consider splitting into smaller analysis targets"
+                            ]
+                        )
+                    )
+
+                temp_file.write(chunk)
+
+        logger.info(f"Received file: {file_size / (1024**2):.2f} MB")
+
+        # Parse parameters
+        workflow_params = {}
+        if parameters:
+            try:
+                import json
+                workflow_params = json.loads(parameters)
+                if not isinstance(workflow_params, dict):
+                    raise ValueError("Parameters must be a JSON object")
+            except (json.JSONDecodeError, ValueError) as e:
+                raise HTTPException(
+                    status_code=400,
+                    detail=create_structured_error_response(
+                        error_type="InvalidParameters",
+                        message=f"Invalid parameters JSON: {e}",
+                        workflow_name=workflow_name,
+                        suggestions=["Ensure parameters is valid JSON object"]
+                    )
+                )
+
+        # Upload to MinIO
+        target_id = await temporal_mgr.upload_target(
+            file_path=Path(temp_file_path),
+            user_id="api-user",
+            metadata={
+                "workflow": workflow_name,
+                "original_filename": file.filename,
+                "upload_method": "multipart"
+            }
+        )
+
+        logger.info(f"Uploaded to MinIO with target_id: {target_id}")
+
+        # Merge default parameters with user parameters
+        workflow_info = temporal_mgr.workflows.get(workflow_name)
+        metadata = workflow_info.metadata or {}
+        defaults = metadata.get("default_parameters", {})
+        workflow_params = {**defaults, **workflow_params}
+
+        # Start workflow execution
+        handle = await temporal_mgr.run_workflow(
+            workflow_name=workflow_name,
+            target_id=target_id,
+            workflow_params=workflow_params
+        )
+
+        run_id = handle.id
+
+        # Initialize fuzzing tracking if needed
+        workflow_info = temporal_mgr.workflows.get(workflow_name, {})
+        workflow_tags = workflow_info.metadata.get("tags", []) if hasattr(workflow_info, 'metadata') else []
+        if "fuzzing" in workflow_tags or "fuzz" in workflow_name.lower():
+            from src.api.fuzzing import initialize_fuzzing_tracking
+            initialize_fuzzing_tracking(run_id, workflow_name)
+
+        return RunSubmissionResponse(
+            run_id=run_id,
+            status="RUNNING",
+            workflow=workflow_name,
+            message=f"Workflow '{workflow_name}' submitted successfully with uploaded target"
+        )
+
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"Failed to upload and submit workflow '{workflow_name}': {e}")
+        logger.error(f"Traceback: {traceback.format_exc()}")
+
+        error_response = create_structured_error_response(
+            error_type="WorkflowSubmissionError",
+            message=f"Failed to process upload and submit workflow: {str(e)}",
+            workflow_name=workflow_name,
+            suggestions=[
+                "Check if the uploaded file is a valid tarball",
+                "Verify MinIO storage is accessible",
+                "Check backend logs for detailed error information",
+                "Ensure Temporal workers are running"
+            ]
+        )
+
+        raise HTTPException(status_code=500, detail=error_response)
+
+    finally:
+        # Cleanup temporary file
+        if temp_file_path and Path(temp_file_path).exists():
+            try:
+                Path(temp_file_path).unlink()
+                logger.debug(f"Cleaned up temp file: {temp_file_path}")
+            except Exception as e:
+                logger.warning(f"Failed to cleanup temp file {temp_file_path}: {e}")
+
+
+@router.get("/{workflow_name}/worker-info")
+async def get_workflow_worker_info(
+    workflow_name: str,
+    temporal_mgr=Depends(get_temporal_manager)
+) -> Dict[str, Any]:
+    """
+    Get worker information for a workflow.
+
+    Returns details about which worker is required to execute this workflow,
+    including container name, task queue, and vertical.
+
+    Args:
+        workflow_name: Name of the workflow
+
+    Returns:
+        Worker information including vertical, container name, and task queue
+
+    Raises:
+        HTTPException: 404 if workflow not found
+    """
+    if workflow_name not in temporal_mgr.workflows:
+        available_workflows = list(temporal_mgr.workflows.keys())
+        error_response = create_structured_error_response(
+            error_type="WorkflowNotFound",
+            message=f"Workflow '{workflow_name}' not found",
+            workflow_name=workflow_name,
+            suggestions=[
+                f"Available workflows: {', '.join(available_workflows)}",
+                "Use GET /workflows/ to see all available workflows"
+            ]
+        )
+        raise HTTPException(
+            status_code=404,
+            detail=error_response
+        )
+
+    info = temporal_mgr.workflows[workflow_name]
+    metadata = info.metadata
+
+    # Extract vertical from metadata
+    vertical = metadata.get("vertical")
+
+    if not vertical:
+        error_response = create_structured_error_response(
+            error_type="MissingVertical",
+            message=f"Workflow '{workflow_name}' does not specify a vertical in metadata",
+            workflow_name=workflow_name,
+            suggestions=[
+                "Check workflow metadata.yaml for 'vertical' field",
+                "Contact workflow author for support"
+            ]
+        )
+        raise HTTPException(
+            status_code=500,
+            detail=error_response
+        )
+
+    return {
+        "workflow": workflow_name,
+        "vertical": vertical,
+        "worker_container": f"fuzzforge-worker-{vertical}",
+        "worker_service": f"worker-{vertical}",
+        "task_queue": f"{vertical}-queue",
+        "required": True
+    }
+
+
+@router.get("/{workflow_name}/parameters")
+async def get_workflow_parameters(
+    workflow_name: str,
+    temporal_mgr=Depends(get_temporal_manager)
+) -> Dict[str, Any]:
+    """
+    Get the parameters schema for a workflow.
+
+    Args:
+        workflow_name: Name of the workflow
+
+    Returns:
+        Parameters schema with types, descriptions, and defaults
+
+    Raises:
+        HTTPException: 404 if workflow not found
+    """
+    if workflow_name not in temporal_mgr.workflows:
+        available_workflows = list(temporal_mgr.workflows.keys())
+        error_response = create_structured_error_response(
+            error_type="WorkflowNotFound",
+            message=f"Workflow '{workflow_name}' not found",
+            workflow_name=workflow_name,
+            suggestions=[
+                f"Available workflows: {', '.join(available_workflows)}",
+                "Use GET /workflows/ to see all available workflows"
+            ]
+        )
+        raise HTTPException(
+            status_code=404,
+            detail=error_response
+        )
+
+    info = temporal_mgr.workflows[workflow_name]
+    metadata = info.metadata
+
+    # Return parameters with enhanced schema information
+    parameters_schema = metadata.get("parameters", {})
+
+    # Extract the actual parameter definitions from JSON schema structure
+    if "properties" in parameters_schema:
+        param_definitions = parameters_schema["properties"]
+    else:
+        param_definitions = parameters_schema
+
+    # Add default values to the schema
+    default_params = metadata.get("default_parameters", {})
+    for param_name, param_schema in param_definitions.items():
+        if isinstance(param_schema, dict) and param_name in default_params:
+            param_schema["default"] = default_params[param_name]
+
+    return {
+        "workflow": workflow_name,
+        "parameters": param_definitions,
+        "default_parameters": default_params,
+        "required_parameters": [
+            name for name, schema in param_definitions.items()
+            if isinstance(schema, dict) and schema.get("required", False)
+        ]
+    }
--- a/backend/src/core/init.py
+++ b/backend/src/core/init.py
@@ -0,0 +1,11 @@
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
--- a/backend/src/core/setup.py
+++ b/backend/src/core/setup.py
@@ -0,0 +1,45 @@
+"""
+Setup utilities for FuzzForge infrastructure
+"""
+
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+import logging
+
+logger = logging.getLogger(__name__)
+
+
+async def setup_result_storage():
+    """
+    Setup result storage (MinIO).
+
+    MinIO is used for both target upload and result storage.
+    This is a placeholder for any MinIO-specific setup if needed.
+    """
+    logger.info("Result storage (MinIO) configured")
+    # MinIO is configured via environment variables in docker-compose
+    # No additional setup needed here
+    return True
+
+
+async def validate_infrastructure():
+    """
+    Validate all required infrastructure components.
+
+    This should be called during startup to ensure everything is ready.
+    """
+    logger.info("Validating infrastructure...")
+
+    # Setup storage (MinIO)
+    await setup_result_storage()
+
+    logger.info("Infrastructure validation completed")
--- a/backend/src/main.py
+++ b/backend/src/main.py
@@ -0,0 +1,725 @@
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+import asyncio
+import logging
+import os
+from contextlib import AsyncExitStack, asynccontextmanager, suppress
+from typing import Any, Dict, Optional, List
+
+import uvicorn
+from fastapi import FastAPI
+from starlette.applications import Starlette
+from starlette.routing import Mount
+
+from fastmcp.server.http import create_sse_app
+
+from src.temporal.manager import TemporalManager
+from src.core.setup import setup_result_storage, validate_infrastructure
+from src.api import workflows, runs, fuzzing
+
+from fastmcp import FastMCP
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+
+temporal_mgr = TemporalManager()
+
+
+class TemporalBootstrapState:
+    """Tracks Temporal initialization progress for API and MCP consumers."""
+
+    def __init__(self) -> None:
+        self.ready: bool = False
+        self.status: str = "not_started"
+        self.last_error: Optional[str] = None
+        self.task_running: bool = False
+
+    def as_dict(self) -> Dict[str, Any]:
+        return {
+            "ready": self.ready,
+            "status": self.status,
+            "last_error": self.last_error,
+            "task_running": self.task_running,
+        }
+
+
+temporal_bootstrap_state = TemporalBootstrapState()
+
+# Configure retry strategy for bootstrapping Temporal + infrastructure
+STARTUP_RETRY_SECONDS = max(1, int(os.getenv("FUZZFORGE_STARTUP_RETRY_SECONDS", "5")))
+STARTUP_RETRY_MAX_SECONDS = max(
+    STARTUP_RETRY_SECONDS,
+    int(os.getenv("FUZZFORGE_STARTUP_RETRY_MAX_SECONDS", "60")),
+)
+
+temporal_bootstrap_task: Optional[asyncio.Task] = None
+
+# ---------------------------------------------------------------------------
+# FastAPI application (REST API)
+# ---------------------------------------------------------------------------
+
+app = FastAPI(
+    title="FuzzForge API",
+    description="Security testing workflow orchestration API with fuzzing support",
+    version="0.6.0",
+)
+
+app.include_router(workflows.router)
+app.include_router(runs.router)
+app.include_router(fuzzing.router)
+
+
+def get_temporal_status() -> Dict[str, Any]:
+    """Return a snapshot of Temporal bootstrap state for diagnostics."""
+    status = temporal_bootstrap_state.as_dict()
+    status["workflows_loaded"] = len(temporal_mgr.workflows)
+    status["bootstrap_task_running"] = (
+        temporal_bootstrap_task is not None and not temporal_bootstrap_task.done()
+    )
+    return status
+
+
+def _temporal_not_ready_status() -> Optional[Dict[str, Any]]:
+    """Return status details if Temporal is not ready yet."""
+    status = get_temporal_status()
+    if status.get("ready"):
+        return None
+    return status
+
+
+@app.get("/")
+async def root() -> Dict[str, Any]:
+    status = get_temporal_status()
+    return {
+        "name": "FuzzForge API",
+        "version": "0.6.0",
+        "status": "ready" if status.get("ready") else "initializing",
+        "workflows_loaded": status.get("workflows_loaded", 0),
+        "temporal": status,
+    }
+
+
+@app.get("/health")
+async def health() -> Dict[str, str]:
+    status = get_temporal_status()
+    health_status = "healthy" if status.get("ready") else "initializing"
+    return {"status": health_status}
+
+
+# Map FastAPI OpenAPI operationIds to readable MCP tool names
+FASTAPI_MCP_NAME_OVERRIDES: Dict[str, str] = {
+    "list_workflows_workflows__get": "api_list_workflows",
+    "get_metadata_schema_workflows_metadata_schema_get": "api_get_metadata_schema",
+    "get_workflow_metadata_workflows__workflow_name__metadata_get": "api_get_workflow_metadata",
+    "submit_workflow_workflows__workflow_name__submit_post": "api_submit_workflow",
+    "get_workflow_parameters_workflows__workflow_name__parameters_get": "api_get_workflow_parameters",
+    "get_run_status_runs__run_id__status_get": "api_get_run_status",
+    "get_run_findings_runs__run_id__findings_get": "api_get_run_findings",
+    "get_workflow_findings_runs__workflow_name__findings__run_id__get": "api_get_workflow_findings",
+    "get_fuzzing_stats_fuzzing__run_id__stats_get": "api_get_fuzzing_stats",
+    "update_fuzzing_stats_fuzzing__run_id__stats_post": "api_update_fuzzing_stats",
+    "get_crash_reports_fuzzing__run_id__crashes_get": "api_get_crash_reports",
+    "report_crash_fuzzing__run_id__crash_post": "api_report_crash",
+    "stream_fuzzing_updates_fuzzing__run_id__stream_get": "api_stream_fuzzing_updates",
+    "cleanup_fuzzing_run_fuzzing__run_id__delete": "api_cleanup_fuzzing_run",
+    "root__get": "api_root",
+    "health_health_get": "api_health",
+}
+
+
+# Create an MCP adapter exposing all FastAPI endpoints via OpenAPI parsing
+FASTAPI_MCP_ADAPTER = FastMCP.from_fastapi(
+    app,
+    name="FuzzForge FastAPI",
+    mcp_names=FASTAPI_MCP_NAME_OVERRIDES,
+)
+_fastapi_mcp_imported = False
+
+
+# ---------------------------------------------------------------------------
+# FastMCP server (runs on dedicated port outside FastAPI)
+# ---------------------------------------------------------------------------
+
+mcp = FastMCP(name="FuzzForge MCP")
+
+
+async def _bootstrap_temporal_with_retries() -> None:
+    """Initialize Temporal infrastructure with exponential backoff retries."""
+
+    attempt = 0
+
+    while True:
+        attempt += 1
+        temporal_bootstrap_state.task_running = True
+        temporal_bootstrap_state.status = "starting"
+        temporal_bootstrap_state.ready = False
+        temporal_bootstrap_state.last_error = None
+
+        try:
+            logger.info("Bootstrapping Temporal infrastructure...")
+            await validate_infrastructure()
+            await setup_result_storage()
+            await temporal_mgr.initialize()
+
+            temporal_bootstrap_state.ready = True
+            temporal_bootstrap_state.status = "ready"
+            temporal_bootstrap_state.task_running = False
+            logger.info("Temporal infrastructure ready")
+            return
+
+        except asyncio.CancelledError:
+            temporal_bootstrap_state.status = "cancelled"
+            temporal_bootstrap_state.task_running = False
+            logger.info("Temporal bootstrap task cancelled")
+            raise
+
+        except Exception as exc:  # pragma: no cover - defensive logging on infra startup
+            logger.exception("Temporal bootstrap failed")
+            temporal_bootstrap_state.ready = False
+            temporal_bootstrap_state.status = "error"
+            temporal_bootstrap_state.last_error = str(exc)
+
+            # Ensure partial initialization does not leave stale state behind
+            temporal_mgr.workflows.clear()
+
+            wait_time = min(
+                STARTUP_RETRY_SECONDS * (2 ** (attempt - 1)),
+                STARTUP_RETRY_MAX_SECONDS,
+            )
+            logger.info("Retrying Temporal bootstrap in %s second(s)", wait_time)
+
+            try:
+                await asyncio.sleep(wait_time)
+            except asyncio.CancelledError:
+                temporal_bootstrap_state.status = "cancelled"
+                temporal_bootstrap_state.task_running = False
+                raise
+
+
+def _lookup_workflow(workflow_name: str):
+    info = temporal_mgr.workflows.get(workflow_name)
+    if not info:
+        return None
+    metadata = info.metadata
+    defaults = metadata.get("default_parameters", {})
+    default_target_path = metadata.get("default_target_path") or defaults.get("target_path")
+    supported_modes = metadata.get("supported_volume_modes") or ["ro", "rw"]
+    if not isinstance(supported_modes, list) or not supported_modes:
+        supported_modes = ["ro", "rw"]
+    default_volume_mode = (
+        metadata.get("default_volume_mode")
+        or defaults.get("volume_mode")
+        or supported_modes[0]
+    )
+    return {
+        "name": workflow_name,
+        "version": metadata.get("version", "0.6.0"),
+        "description": metadata.get("description", ""),
+        "author": metadata.get("author"),
+        "tags": metadata.get("tags", []),
+        "parameters": metadata.get("parameters", {}),
+        "default_parameters": metadata.get("default_parameters", {}),
+        "required_modules": metadata.get("required_modules", []),
+        "supported_volume_modes": supported_modes,
+        "default_target_path": default_target_path,
+        "default_volume_mode": default_volume_mode
+    }
+
+
+@mcp.tool
+async def list_workflows_mcp() -> Dict[str, Any]:
+    """List all discovered workflows and their metadata summary."""
+    not_ready = _temporal_not_ready_status()
+    if not_ready:
+        return {
+            "workflows": [],
+            "temporal": not_ready,
+            "message": "Temporal infrastructure is still initializing",
+        }
+
+    workflows_summary = []
+    for name, info in temporal_mgr.workflows.items():
+        metadata = info.metadata
+        defaults = metadata.get("default_parameters", {})
+        workflows_summary.append({
+            "name": name,
+            "version": metadata.get("version", "0.6.0"),
+            "description": metadata.get("description", ""),
+            "author": metadata.get("author"),
+            "tags": metadata.get("tags", []),
+            "supported_volume_modes": metadata.get("supported_volume_modes", ["ro", "rw"]),
+            "default_volume_mode": metadata.get("default_volume_mode")
+            or defaults.get("volume_mode")
+            or "ro",
+            "default_target_path": metadata.get("default_target_path")
+            or defaults.get("target_path")
+        })
+    return {"workflows": workflows_summary, "temporal": get_temporal_status()}
+
+
+@mcp.tool
+async def get_workflow_metadata_mcp(workflow_name: str) -> Dict[str, Any]:
+    """Fetch detailed metadata for a workflow."""
+    not_ready = _temporal_not_ready_status()
+    if not_ready:
+        return {
+            "error": "Temporal infrastructure not ready",
+            "temporal": not_ready,
+        }
+
+    data = _lookup_workflow(workflow_name)
+    if not data:
+        return {"error": f"Workflow not found: {workflow_name}"}
+    return data
+
+
+@mcp.tool
+async def get_workflow_parameters_mcp(workflow_name: str) -> Dict[str, Any]:
+    """Return the parameter schema and defaults for a workflow."""
+    not_ready = _temporal_not_ready_status()
+    if not_ready:
+        return {
+            "error": "Temporal infrastructure not ready",
+            "temporal": not_ready,
+        }
+
+    data = _lookup_workflow(workflow_name)
+    if not data:
+        return {"error": f"Workflow not found: {workflow_name}"}
+    return {
+        "parameters": data.get("parameters", {}),
+        "defaults": data.get("default_parameters", {}),
+    }
+
+
+@mcp.tool
+async def get_workflow_metadata_schema_mcp() -> Dict[str, Any]:
+    """Return the JSON schema describing workflow metadata files."""
+    from src.temporal.discovery import WorkflowDiscovery
+    return WorkflowDiscovery.get_metadata_schema()
+
+
+@mcp.tool
+async def submit_security_scan_mcp(
+    workflow_name: str,
+    target_id: str,
+    parameters: Dict[str, Any] | None = None,
+) -> Dict[str, Any] | Dict[str, str]:
+    """Submit a Temporal workflow via MCP."""
+    try:
+        not_ready = _temporal_not_ready_status()
+        if not_ready:
+            return {
+                "error": "Temporal infrastructure not ready",
+                "temporal": not_ready,
+            }
+
+        workflow_info = temporal_mgr.workflows.get(workflow_name)
+        if not workflow_info:
+            return {"error": f"Workflow '{workflow_name}' not found"}
+
+        metadata = workflow_info.metadata or {}
+        defaults = metadata.get("default_parameters", {})
+
+        parameters = parameters or {}
+        cleaned_parameters: Dict[str, Any] = {**defaults, **parameters}
+
+        # Ensure *_config structures default to dicts
+        for key, value in list(cleaned_parameters.items()):
+            if isinstance(key, str) and key.endswith("_config") and value is None:
+                cleaned_parameters[key] = {}
+
+        # Some workflows expect configuration dictionaries even when omitted
+        parameter_definitions = (
+            metadata.get("parameters", {}).get("properties", {})
+            if isinstance(metadata.get("parameters"), dict)
+            else {}
+        )
+        for key, definition in parameter_definitions.items():
+            if not isinstance(key, str) or not key.endswith("_config"):
+                continue
+            if key not in cleaned_parameters:
+                default_value = definition.get("default") if isinstance(definition, dict) else None
+                cleaned_parameters[key] = default_value if default_value is not None else {}
+            elif cleaned_parameters[key] is None:
+                cleaned_parameters[key] = {}
+
+        # Start workflow
+        handle = await temporal_mgr.run_workflow(
+            workflow_name=workflow_name,
+            target_id=target_id,
+            workflow_params=cleaned_parameters,
+        )
+
+        return {
+            "run_id": handle.id,
+            "status": "RUNNING",
+            "workflow": workflow_name,
+            "message": f"Workflow '{workflow_name}' submitted successfully",
+            "target_id": target_id,
+            "parameters": cleaned_parameters,
+            "mcp_enabled": True,
+        }
+    except Exception as exc:  # pragma: no cover - defensive logging
+        logger.exception("MCP submit failed")
+        return {"error": f"Failed to submit workflow: {exc}"}
+
+
+@mcp.tool
+async def get_comprehensive_scan_summary(run_id: str) -> Dict[str, Any] | Dict[str, str]:
+    """Return a summary for the given workflow run via MCP."""
+    try:
+        not_ready = _temporal_not_ready_status()
+        if not_ready:
+            return {
+                "error": "Temporal infrastructure not ready",
+                "temporal": not_ready,
+            }
+
+        status = await temporal_mgr.get_workflow_status(run_id)
+
+        # Try to get result if completed
+        total_findings = 0
+        severity_summary = {"critical": 0, "high": 0, "medium": 0, "low": 0, "info": 0}
+
+        if status.get("status") == "COMPLETED":
+            try:
+                result = await temporal_mgr.get_workflow_result(run_id)
+                if isinstance(result, dict):
+                    summary = result.get("summary", {})
+                    total_findings = summary.get("total_findings", 0)
+            except Exception as e:
+                logger.debug(f"Could not retrieve result for {run_id}: {e}")
+
+        return {
+            "run_id": run_id,
+            "workflow": "unknown",  # Temporal doesn't track workflow name in status
+            "status": status.get("status", "unknown"),
+            "is_completed": status.get("status") == "COMPLETED",
+            "total_findings": total_findings,
+            "severity_summary": severity_summary,
+            "scan_duration": status.get("close_time", "In progress"),
+            "recommendations": (
+                [
+                    "Review high and critical severity findings first",
+                    "Implement security fixes based on finding recommendations",
+                    "Re-run scan after applying fixes to verify remediation",
+                ]
+                if total_findings > 0
+                else ["No security issues found"]
+            ),
+            "mcp_analysis": True,
+        }
+    except Exception as exc:  # pragma: no cover
+        logger.exception("MCP summary failed")
+        return {"error": f"Failed to summarize run: {exc}"}
+
+
+@mcp.tool
+async def get_run_status_mcp(run_id: str) -> Dict[str, Any]:
+    """Return current status information for a Temporal run."""
+    try:
+        not_ready = _temporal_not_ready_status()
+        if not_ready:
+            return {
+                "error": "Temporal infrastructure not ready",
+                "temporal": not_ready,
+            }
+
+        status = await temporal_mgr.get_workflow_status(run_id)
+
+        return {
+            "run_id": run_id,
+            "workflow": "unknown",
+            "status": status["status"],
+            "is_completed": status["status"] in ["COMPLETED", "FAILED", "CANCELLED"],
+            "is_failed": status["status"] == "FAILED",
+            "is_running": status["status"] == "RUNNING",
+            "created_at": status.get("start_time"),
+            "updated_at": status.get("close_time") or status.get("execution_time"),
+        }
+    except Exception as exc:
+        logger.exception("MCP run status failed")
+        return {"error": f"Failed to get run status: {exc}"}
+
+
+@mcp.tool
+async def get_run_findings_mcp(run_id: str) -> Dict[str, Any]:
+    """Return SARIF findings for a completed run."""
+    try:
+        not_ready = _temporal_not_ready_status()
+        if not_ready:
+            return {
+                "error": "Temporal infrastructure not ready",
+                "temporal": not_ready,
+            }
+
+        status = await temporal_mgr.get_workflow_status(run_id)
+        if status.get("status") != "COMPLETED":
+            return {"error": f"Run {run_id} not completed. Status: {status.get('status')}"}
+
+        result = await temporal_mgr.get_workflow_result(run_id)
+
+        metadata = {
+            "completion_time": status.get("close_time"),
+            "workflow_version": "unknown",
+        }
+
+        sarif = result.get("sarif", {}) if isinstance(result, dict) else {}
+
+        return {
+            "workflow": "unknown",
+            "run_id": run_id,
+            "sarif": sarif,
+            "metadata": metadata,
+        }
+    except Exception as exc:
+        logger.exception("MCP findings failed")
+        return {"error": f"Failed to retrieve findings: {exc}"}
+
+
+@mcp.tool
+async def list_recent_runs_mcp(
+    limit: int = 10,
+    workflow_name: str | None = None,
+) -> Dict[str, Any]:
+    """List recent Temporal runs with optional workflow filter."""
+
+    not_ready = _temporal_not_ready_status()
+    if not_ready:
+        return {
+            "runs": [],
+            "temporal": not_ready,
+            "message": "Temporal infrastructure is still initializing",
+        }
+
+    try:
+        limit_value = int(limit)
+    except (TypeError, ValueError):
+        limit_value = 10
+    limit_value = max(1, min(limit_value, 100))
+
+    try:
+        # Build filter query
+        filter_query = None
+        if workflow_name:
+            workflow_info = temporal_mgr.workflows.get(workflow_name)
+            if workflow_info:
+                filter_query = f'WorkflowType="{workflow_info.workflow_type}"'
+
+        workflows = await temporal_mgr.list_workflows(filter_query, limit_value)
+
+        results: List[Dict[str, Any]] = []
+        for wf in workflows:
+            results.append({
+                "run_id": wf["workflow_id"],
+                "workflow": workflow_name or "unknown",
+                "state": wf["status"],
+                "state_type": wf["status"],
+                "is_completed": wf["status"] in ["COMPLETED", "FAILED", "CANCELLED"],
+                "is_running": wf["status"] == "RUNNING",
+                "is_failed": wf["status"] == "FAILED",
+                "created_at": wf.get("start_time"),
+                "updated_at": wf.get("close_time"),
+            })
+
+        return {"runs": results, "temporal": get_temporal_status()}
+
+    except Exception as exc:
+        logger.exception("Failed to list runs")
+        return {
+            "runs": [],
+            "temporal": get_temporal_status(),
+            "error": str(exc)
+        }
+
+
+@mcp.tool
+async def get_fuzzing_stats_mcp(run_id: str) -> Dict[str, Any]:
+    """Return fuzzing statistics for a run if available."""
+    not_ready = _temporal_not_ready_status()
+    if not_ready:
+        return {
+            "error": "Temporal infrastructure not ready",
+            "temporal": not_ready,
+        }
+
+    stats = fuzzing.fuzzing_stats.get(run_id)
+    if not stats:
+        return {"error": f"Fuzzing run not found: {run_id}"}
+    # Be resilient if a plain dict slipped into the cache
+    if isinstance(stats, dict):
+        return stats
+    if hasattr(stats, "model_dump"):
+        return stats.model_dump()
+    if hasattr(stats, "dict"):
+        return stats.dict()
+    # Last resort
+    return getattr(stats, "__dict__", {"run_id": run_id})
+
+
+@mcp.tool
+async def get_fuzzing_crash_reports_mcp(run_id: str) -> Dict[str, Any]:
+    """Return crash reports collected for a fuzzing run."""
+    not_ready = _temporal_not_ready_status()
+    if not_ready:
+        return {
+            "error": "Temporal infrastructure not ready",
+            "temporal": not_ready,
+        }
+
+    reports = fuzzing.crash_reports.get(run_id)
+    if reports is None:
+        return {"error": f"Fuzzing run not found: {run_id}"}
+    return {"run_id": run_id, "crashes": [report.model_dump() for report in reports]}
+
+
+@mcp.tool
+async def get_backend_status_mcp() -> Dict[str, Any]:
+    """Expose backend readiness, workflows, and registered MCP tools."""
+
+    status = get_temporal_status()
+    response: Dict[str, Any] = {"temporal": status}
+
+    if status.get("ready"):
+        response["workflows"] = list(temporal_mgr.workflows.keys())
+
+    try:
+        tools = await mcp._tool_manager.list_tools()
+        response["mcp_tools"] = sorted(tool.name for tool in tools)
+    except Exception as exc:  # pragma: no cover - defensive logging
+        logger.debug("Failed to enumerate MCP tools: %s", exc)
+
+    return response
+
+
+def create_mcp_transport_app() -> Starlette:
+    """Build a Starlette app serving HTTP + SSE transports on one port."""
+
+    http_app = mcp.http_app(path="/", transport="streamable-http")
+    sse_app = create_sse_app(
+        server=mcp,
+        message_path="/messages",
+        sse_path="/",
+        auth=mcp.auth,
+    )
+
+    routes = [
+        Mount("/mcp", app=http_app),
+        Mount("/mcp/sse", app=sse_app),
+    ]
+
+    @asynccontextmanager
+    async def lifespan(app: Starlette):  # pragma: no cover - integration wiring
+        async with AsyncExitStack() as stack:
+            await stack.enter_async_context(
+                http_app.router.lifespan_context(http_app)
+            )
+            await stack.enter_async_context(
+                sse_app.router.lifespan_context(sse_app)
+            )
+            yield
+
+    combined_app = Starlette(routes=routes, lifespan=lifespan)
+    combined_app.state.fastmcp_server = mcp
+    combined_app.state.http_app = http_app
+    combined_app.state.sse_app = sse_app
+    return combined_app
+
+
+# ---------------------------------------------------------------------------
+# Combined lifespan: Temporal init + dedicated MCP transports
+# ---------------------------------------------------------------------------
+
+@asynccontextmanager
+async def combined_lifespan(app: FastAPI):
+    global temporal_bootstrap_task, _fastapi_mcp_imported
+
+    logger.info("Starting FuzzForge backend...")
+
+    # Ensure FastAPI endpoints are exposed via MCP once
+    if not _fastapi_mcp_imported:
+        try:
+            await mcp.import_server(FASTAPI_MCP_ADAPTER)
+            _fastapi_mcp_imported = True
+            logger.info("Mounted FastAPI endpoints as MCP tools")
+        except Exception as exc:
+            logger.exception("Failed to import FastAPI endpoints into MCP", exc_info=exc)
+
+    # Kick off Temporal bootstrap in the background if needed
+    if temporal_bootstrap_task is None or temporal_bootstrap_task.done():
+        temporal_bootstrap_task = asyncio.create_task(_bootstrap_temporal_with_retries())
+        logger.info("Temporal bootstrap task started")
+    else:
+        logger.info("Temporal bootstrap task already running")
+
+    # Start MCP transports on shared port (HTTP + SSE)
+    mcp_app = create_mcp_transport_app()
+    mcp_config = uvicorn.Config(
+        app=mcp_app,
+        host="0.0.0.0",
+        port=8010,
+        log_level="info",
+        lifespan="on",
+    )
+    mcp_server = uvicorn.Server(mcp_config)
+    mcp_server.install_signal_handlers = lambda: None  # type: ignore[assignment]
+    mcp_task = asyncio.create_task(mcp_server.serve())
+
+    async def _wait_for_uvicorn_startup() -> None:
+        started_attr = getattr(mcp_server, "started", None)
+        if hasattr(started_attr, "wait"):
+            await asyncio.wait_for(started_attr.wait(), timeout=10)
+            return
+
+        # Fallback for uvicorn versions where "started" is a bool
+        poll_interval = 0.1
+        checks = int(10 / poll_interval)
+        for _ in range(checks):
+            if getattr(mcp_server, "started", False):
+                return
+            await asyncio.sleep(poll_interval)
+        raise asyncio.TimeoutError
+
+    try:
+        await _wait_for_uvicorn_startup()
+    except asyncio.TimeoutError:  # pragma: no cover - defensive logging
+        if mcp_task.done():
+            raise RuntimeError("MCP server failed to start") from mcp_task.exception()
+        logger.warning("Timed out waiting for MCP server startup; continuing anyway")
+
+    logger.info("MCP HTTP available at http://0.0.0.0:8010/mcp")
+    logger.info("MCP SSE available at http://0.0.0.0:8010/mcp/sse")
+
+    try:
+        yield
+    finally:
+        logger.info("Shutting down MCP transports...")
+        mcp_server.should_exit = True
+        mcp_server.force_exit = True
+        await asyncio.gather(mcp_task, return_exceptions=True)
+
+        if temporal_bootstrap_task and not temporal_bootstrap_task.done():
+            temporal_bootstrap_task.cancel()
+            with suppress(asyncio.CancelledError):
+                await temporal_bootstrap_task
+        temporal_bootstrap_state.task_running = False
+        if not temporal_bootstrap_state.ready:
+            temporal_bootstrap_state.status = "stopped"
+        temporal_bootstrap_task = None
+
+        # Close Temporal client
+        await temporal_mgr.close()
+        logger.info("Shutting down FuzzForge backend...")
+
+
+app.router.lifespan_context = combined_lifespan
--- a/backend/src/models/init.py
+++ b/backend/src/models/init.py
@@ -0,0 +1,11 @@
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
--- a/backend/src/models/findings.py
+++ b/backend/src/models/findings.py
@@ -0,0 +1,124 @@
+"""
+Models for workflow findings and submissions
+"""
+
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+from pydantic import BaseModel, Field
+from typing import Dict, Any, Optional, Literal, List
+from datetime import datetime
+
+
+class WorkflowFindings(BaseModel):
+    """Findings from a workflow execution in SARIF format"""
+    workflow: str = Field(..., description="Workflow name")
+    run_id: str = Field(..., description="Unique run identifier")
+    sarif: Dict[str, Any] = Field(..., description="SARIF formatted findings")
+    metadata: Dict[str, Any] = Field(default_factory=dict, description="Additional metadata")
+
+
+class WorkflowSubmission(BaseModel):
+    """
+    Submit a workflow with configurable settings.
+
+    Note: This model is deprecated in favor of the /upload-and-submit endpoint
+    which handles file uploads directly.
+    """
+    parameters: Dict[str, Any] = Field(
+        default_factory=dict,
+        description="Workflow-specific parameters"
+    )
+    timeout: Optional[int] = Field(
+        default=None,  # Allow workflow-specific defaults
+        description="Timeout in seconds (None for workflow default)",
+        ge=1,
+        le=604800  # Max 7 days to support fuzzing campaigns
+    )
+
+
+class WorkflowStatus(BaseModel):
+    """Status of a workflow run"""
+    run_id: str = Field(..., description="Unique run identifier")
+    workflow: str = Field(..., description="Workflow name")
+    status: str = Field(..., description="Current status")
+    is_completed: bool = Field(..., description="Whether the run is completed")
+    is_failed: bool = Field(..., description="Whether the run failed")
+    is_running: bool = Field(..., description="Whether the run is currently running")
+    created_at: datetime = Field(..., description="Run creation time")
+    updated_at: datetime = Field(..., description="Last update time")
+
+
+class WorkflowMetadata(BaseModel):
+    """Complete metadata for a workflow"""
+    name: str = Field(..., description="Workflow name")
+    version: str = Field(..., description="Semantic version")
+    description: str = Field(..., description="Workflow description")
+    author: Optional[str] = Field(None, description="Workflow author")
+    tags: List[str] = Field(default_factory=list, description="Workflow tags")
+    parameters: Dict[str, Any] = Field(..., description="Parameters schema")
+    default_parameters: Dict[str, Any] = Field(
+        default_factory=dict,
+        description="Default parameter values"
+    )
+    required_modules: List[str] = Field(
+        default_factory=list,
+        description="Required module names"
+    )
+    supported_volume_modes: List[Literal["ro", "rw"]] = Field(
+        default=["ro", "rw"],
+        description="Supported volume mount modes"
+    )
+
+
+class WorkflowListItem(BaseModel):
+    """Summary information for a workflow in list views"""
+    name: str = Field(..., description="Workflow name")
+    version: str = Field(..., description="Semantic version")
+    description: str = Field(..., description="Workflow description")
+    author: Optional[str] = Field(None, description="Workflow author")
+    tags: List[str] = Field(default_factory=list, description="Workflow tags")
+
+
+class RunSubmissionResponse(BaseModel):
+    """Response after submitting a workflow"""
+    run_id: str = Field(..., description="Unique run identifier")
+    status: str = Field(..., description="Initial status")
+    workflow: str = Field(..., description="Workflow name")
+    message: str = Field(default="Workflow submitted successfully")
+
+
+class FuzzingStats(BaseModel):
+    """Real-time fuzzing statistics"""
+    run_id: str = Field(..., description="Unique run identifier")
+    workflow: str = Field(..., description="Workflow name")
+    executions: int = Field(default=0, description="Total executions")
+    executions_per_sec: float = Field(default=0.0, description="Current execution rate")
+    crashes: int = Field(default=0, description="Total crashes found")
+    unique_crashes: int = Field(default=0, description="Unique crashes")
+    coverage: Optional[float] = Field(None, description="Code coverage percentage")
+    corpus_size: int = Field(default=0, description="Current corpus size")
+    elapsed_time: int = Field(default=0, description="Elapsed time in seconds")
+    last_crash_time: Optional[datetime] = Field(None, description="Time of last crash")
+
+
+class CrashReport(BaseModel):
+    """Individual crash report from fuzzing"""
+    run_id: str = Field(..., description="Run identifier")
+    crash_id: str = Field(..., description="Unique crash identifier")
+    timestamp: datetime = Field(default_factory=datetime.utcnow)
+    signal: Optional[str] = Field(None, description="Crash signal (SIGSEGV, etc.)")
+    crash_type: Optional[str] = Field(None, description="Type of crash")
+    stack_trace: Optional[str] = Field(None, description="Stack trace")
+    input_file: Optional[str] = Field(None, description="Path to crashing input")
+    reproducer: Optional[str] = Field(None, description="Minimized reproducer")
+    severity: str = Field(default="medium", description="Crash severity")
+    exploitability: Optional[str] = Field(None, description="Exploitability assessment")
--- a/backend/src/storage/init.py
+++ b/backend/src/storage/init.py
@@ -0,0 +1,10 @@
+"""
+Storage abstraction layer for FuzzForge.
+
+Provides unified interface for storing and retrieving targets and results.
+"""
+
+from .base import StorageBackend
+from .s3_cached import S3CachedStorage
+
+__all__ = ["StorageBackend", "S3CachedStorage"]
--- a/backend/src/storage/base.py
+++ b/backend/src/storage/base.py
@@ -0,0 +1,153 @@
+"""
+Base storage backend interface.
+
+All storage implementations must implement this interface.
+"""
+
+from abc import ABC, abstractmethod
+from pathlib import Path
+from typing import Optional, Dict, Any
+
+
+class StorageBackend(ABC):
+    """
+    Abstract base class for storage backends.
+
+    Implementations handle storage and retrieval of:
+    - Uploaded targets (code, binaries, etc.)
+    - Workflow results
+    - Temporary files
+    """
+
+    @abstractmethod
+    async def upload_target(
+        self,
+        file_path: Path,
+        user_id: str,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> str:
+        """
+        Upload a target file to storage.
+
+        Args:
+            file_path: Local path to file to upload
+            user_id: ID of user uploading the file
+            metadata: Optional metadata to store with file
+
+        Returns:
+            Target ID (unique identifier for retrieval)
+
+        Raises:
+            FileNotFoundError: If file_path doesn't exist
+            StorageError: If upload fails
+        """
+        pass
+
+    @abstractmethod
+    async def get_target(self, target_id: str) -> Path:
+        """
+        Get target file from storage.
+
+        Args:
+            target_id: Unique identifier from upload_target()
+
+        Returns:
+            Local path to cached file
+
+        Raises:
+            FileNotFoundError: If target doesn't exist
+            StorageError: If download fails
+        """
+        pass
+
+    @abstractmethod
+    async def delete_target(self, target_id: str) -> None:
+        """
+        Delete target from storage.
+
+        Args:
+            target_id: Unique identifier to delete
+
+        Raises:
+            StorageError: If deletion fails (doesn't raise if not found)
+        """
+        pass
+
+    @abstractmethod
+    async def upload_results(
+        self,
+        workflow_id: str,
+        results: Dict[str, Any],
+        results_format: str = "json"
+    ) -> str:
+        """
+        Upload workflow results to storage.
+
+        Args:
+            workflow_id: Workflow execution ID
+            results: Results dictionary
+            results_format: Format (json, sarif, etc.)
+
+        Returns:
+            URL to uploaded results
+
+        Raises:
+            StorageError: If upload fails
+        """
+        pass
+
+    @abstractmethod
+    async def get_results(self, workflow_id: str) -> Dict[str, Any]:
+        """
+        Get workflow results from storage.
+
+        Args:
+            workflow_id: Workflow execution ID
+
+        Returns:
+            Results dictionary
+
+        Raises:
+            FileNotFoundError: If results don't exist
+            StorageError: If download fails
+        """
+        pass
+
+    @abstractmethod
+    async def list_targets(
+        self,
+        user_id: Optional[str] = None,
+        limit: int = 100
+    ) -> list[Dict[str, Any]]:
+        """
+        List uploaded targets.
+
+        Args:
+            user_id: Filter by user ID (None = all users)
+            limit: Maximum number of results
+
+        Returns:
+            List of target metadata dictionaries
+
+        Raises:
+            StorageError: If listing fails
+        """
+        pass
+
+    @abstractmethod
+    async def cleanup_cache(self) -> int:
+        """
+        Clean up local cache (LRU eviction).
+
+        Returns:
+            Number of files removed
+
+        Raises:
+            StorageError: If cleanup fails
+        """
+        pass
+
+
+class StorageError(Exception):
+    """Base exception for storage operations."""
+    pass
--- a/backend/src/storage/s3_cached.py
+++ b/backend/src/storage/s3_cached.py
@@ -0,0 +1,423 @@
+"""
+S3-compatible storage backend with local caching.
+
+Works with MinIO (dev/prod) or AWS S3 (cloud).
+"""
+
+import json
+import logging
+import os
+import shutil
+from datetime import datetime
+from pathlib import Path
+from typing import Optional, Dict, Any
+from uuid import uuid4
+
+import boto3
+from botocore.exceptions import ClientError
+
+from .base import StorageBackend, StorageError
+
+logger = logging.getLogger(__name__)
+
+
+class S3CachedStorage(StorageBackend):
+    """
+    S3-compatible storage with local caching.
+
+    Features:
+    - Upload targets to S3/MinIO
+    - Download with local caching (LRU eviction)
+    - Lifecycle management (auto-cleanup old files)
+    - Metadata tracking
+    """
+
+    def __init__(
+        self,
+        endpoint_url: Optional[str] = None,
+        access_key: Optional[str] = None,
+        secret_key: Optional[str] = None,
+        bucket: str = "targets",
+        region: str = "us-east-1",
+        use_ssl: bool = False,
+        cache_dir: Optional[Path] = None,
+        cache_max_size_gb: int = 10
+    ):
+        """
+        Initialize S3 storage backend.
+
+        Args:
+            endpoint_url: S3 endpoint (None = AWS S3, or MinIO URL)
+            access_key: S3 access key (None = from env)
+            secret_key: S3 secret key (None = from env)
+            bucket: S3 bucket name
+            region: AWS region
+            use_ssl: Use HTTPS
+            cache_dir: Local cache directory
+            cache_max_size_gb: Maximum cache size in GB
+        """
+        # Use environment variables as defaults
+        self.endpoint_url = endpoint_url or os.getenv('S3_ENDPOINT', 'http://minio:9000')
+        self.access_key = access_key or os.getenv('S3_ACCESS_KEY', 'fuzzforge')
+        self.secret_key = secret_key or os.getenv('S3_SECRET_KEY', 'fuzzforge123')
+        self.bucket = bucket or os.getenv('S3_BUCKET', 'targets')
+        self.region = region or os.getenv('S3_REGION', 'us-east-1')
+        self.use_ssl = use_ssl or os.getenv('S3_USE_SSL', 'false').lower() == 'true'
+
+        # Cache configuration
+        self.cache_dir = cache_dir or Path(os.getenv('CACHE_DIR', '/tmp/fuzzforge-cache'))
+        self.cache_max_size = cache_max_size_gb * (1024 ** 3)  # Convert to bytes
+
+        # Ensure cache directory exists
+        self.cache_dir.mkdir(parents=True, exist_ok=True)
+
+        # Initialize S3 client
+        try:
+            self.s3_client = boto3.client(
+                's3',
+                endpoint_url=self.endpoint_url,
+                aws_access_key_id=self.access_key,
+                aws_secret_access_key=self.secret_key,
+                region_name=self.region,
+                use_ssl=self.use_ssl
+            )
+            logger.info(f"Initialized S3 storage: {self.endpoint_url}/{self.bucket}")
+        except Exception as e:
+            logger.error(f"Failed to initialize S3 client: {e}")
+            raise StorageError(f"S3 initialization failed: {e}")
+
+    async def upload_target(
+        self,
+        file_path: Path,
+        user_id: str,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> str:
+        """Upload target file to S3/MinIO."""
+        if not file_path.exists():
+            raise FileNotFoundError(f"File not found: {file_path}")
+
+        # Generate unique target ID
+        target_id = str(uuid4())
+
+        # Prepare metadata
+        upload_metadata = {
+            'user_id': user_id,
+            'uploaded_at': datetime.now().isoformat(),
+            'filename': file_path.name,
+            'size': str(file_path.stat().st_size)
+        }
+        if metadata:
+            upload_metadata.update(metadata)
+
+        # Upload to S3
+        s3_key = f'{target_id}/target'
+        try:
+            logger.info(f"Uploading target to s3://{self.bucket}/{s3_key}")
+
+            self.s3_client.upload_file(
+                str(file_path),
+                self.bucket,
+                s3_key,
+                ExtraArgs={
+                    'Metadata': upload_metadata
+                }
+            )
+
+            file_size_mb = file_path.stat().st_size / (1024 * 1024)
+            logger.info(
+                f"✓ Uploaded target {target_id} "
+                f"({file_path.name}, {file_size_mb:.2f} MB)"
+            )
+
+            return target_id
+
+        except ClientError as e:
+            logger.error(f"S3 upload failed: {e}", exc_info=True)
+            raise StorageError(f"Failed to upload target: {e}")
+        except Exception as e:
+            logger.error(f"Upload failed: {e}", exc_info=True)
+            raise StorageError(f"Upload error: {e}")
+
+    async def get_target(self, target_id: str) -> Path:
+        """Get target from cache or download from S3/MinIO."""
+        # Check cache first
+        cache_path = self.cache_dir / target_id
+        cached_file = cache_path / "target"
+
+        if cached_file.exists():
+            # Update access time for LRU
+            cached_file.touch()
+            logger.info(f"Cache HIT: {target_id}")
+            return cached_file
+
+        # Cache miss - download from S3
+        logger.info(f"Cache MISS: {target_id}, downloading from S3...")
+
+        try:
+            # Create cache directory
+            cache_path.mkdir(parents=True, exist_ok=True)
+
+            # Download from S3
+            s3_key = f'{target_id}/target'
+            logger.info(f"Downloading s3://{self.bucket}/{s3_key}")
+
+            self.s3_client.download_file(
+                self.bucket,
+                s3_key,
+                str(cached_file)
+            )
+
+            # Verify download
+            if not cached_file.exists():
+                raise StorageError(f"Downloaded file not found: {cached_file}")
+
+            file_size_mb = cached_file.stat().st_size / (1024 * 1024)
+            logger.info(f"✓ Downloaded target {target_id} ({file_size_mb:.2f} MB)")
+
+            return cached_file
+
+        except ClientError as e:
+            error_code = e.response.get('Error', {}).get('Code')
+            if error_code in ['404', 'NoSuchKey']:
+                logger.error(f"Target not found: {target_id}")
+                raise FileNotFoundError(f"Target {target_id} not found in storage")
+            else:
+                logger.error(f"S3 download failed: {e}", exc_info=True)
+                raise StorageError(f"Download failed: {e}")
+        except Exception as e:
+            logger.error(f"Download error: {e}", exc_info=True)
+            # Cleanup partial download
+            if cache_path.exists():
+                shutil.rmtree(cache_path, ignore_errors=True)
+            raise StorageError(f"Download error: {e}")
+
+    async def delete_target(self, target_id: str) -> None:
+        """Delete target from S3/MinIO."""
+        try:
+            s3_key = f'{target_id}/target'
+            logger.info(f"Deleting s3://{self.bucket}/{s3_key}")
+
+            self.s3_client.delete_object(
+                Bucket=self.bucket,
+                Key=s3_key
+            )
+
+            # Also delete from cache if present
+            cache_path = self.cache_dir / target_id
+            if cache_path.exists():
+                shutil.rmtree(cache_path, ignore_errors=True)
+                logger.info(f"✓ Deleted target {target_id} from S3 and cache")
+            else:
+                logger.info(f"✓ Deleted target {target_id} from S3")
+
+        except ClientError as e:
+            logger.error(f"S3 delete failed: {e}", exc_info=True)
+            # Don't raise error if object doesn't exist
+            if e.response.get('Error', {}).get('Code') not in ['404', 'NoSuchKey']:
+                raise StorageError(f"Delete failed: {e}")
+        except Exception as e:
+            logger.error(f"Delete error: {e}", exc_info=True)
+            raise StorageError(f"Delete error: {e}")
+
+    async def upload_results(
+        self,
+        workflow_id: str,
+        results: Dict[str, Any],
+        results_format: str = "json"
+    ) -> str:
+        """Upload workflow results to S3/MinIO."""
+        try:
+            # Prepare results content
+            if results_format == "json":
+                content = json.dumps(results, indent=2).encode('utf-8')
+                content_type = 'application/json'
+                file_ext = 'json'
+            elif results_format == "sarif":
+                content = json.dumps(results, indent=2).encode('utf-8')
+                content_type = 'application/sarif+json'
+                file_ext = 'sarif'
+            else:
+                content = json.dumps(results, indent=2).encode('utf-8')
+                content_type = 'application/json'
+                file_ext = 'json'
+
+            # Upload to results bucket
+            results_bucket = 'results'
+            s3_key = f'{workflow_id}/results.{file_ext}'
+
+            logger.info(f"Uploading results to s3://{results_bucket}/{s3_key}")
+
+            self.s3_client.put_object(
+                Bucket=results_bucket,
+                Key=s3_key,
+                Body=content,
+                ContentType=content_type,
+                Metadata={
+                    'workflow_id': workflow_id,
+                    'format': results_format,
+                    'uploaded_at': datetime.now().isoformat()
+                }
+            )
+
+            # Construct URL
+            results_url = f"{self.endpoint_url}/{results_bucket}/{s3_key}"
+            logger.info(f"✓ Uploaded results: {results_url}")
+
+            return results_url
+
+        except Exception as e:
+            logger.error(f"Results upload failed: {e}", exc_info=True)
+            raise StorageError(f"Results upload failed: {e}")
+
+    async def get_results(self, workflow_id: str) -> Dict[str, Any]:
+        """Get workflow results from S3/MinIO."""
+        try:
+            results_bucket = 'results'
+            s3_key = f'{workflow_id}/results.json'
+
+            logger.info(f"Downloading results from s3://{results_bucket}/{s3_key}")
+
+            response = self.s3_client.get_object(
+                Bucket=results_bucket,
+                Key=s3_key
+            )
+
+            content = response['Body'].read().decode('utf-8')
+            results = json.loads(content)
+
+            logger.info(f"✓ Downloaded results for workflow {workflow_id}")
+            return results
+
+        except ClientError as e:
+            error_code = e.response.get('Error', {}).get('Code')
+            if error_code in ['404', 'NoSuchKey']:
+                logger.error(f"Results not found: {workflow_id}")
+                raise FileNotFoundError(f"Results for workflow {workflow_id} not found")
+            else:
+                logger.error(f"Results download failed: {e}", exc_info=True)
+                raise StorageError(f"Results download failed: {e}")
+        except Exception as e:
+            logger.error(f"Results download error: {e}", exc_info=True)
+            raise StorageError(f"Results download error: {e}")
+
+    async def list_targets(
+        self,
+        user_id: Optional[str] = None,
+        limit: int = 100
+    ) -> list[Dict[str, Any]]:
+        """List uploaded targets."""
+        try:
+            targets = []
+            paginator = self.s3_client.get_paginator('list_objects_v2')
+
+            for page in paginator.paginate(Bucket=self.bucket, PaginationConfig={'MaxItems': limit}):
+                for obj in page.get('Contents', []):
+                    # Get object metadata
+                    try:
+                        metadata_response = self.s3_client.head_object(
+                            Bucket=self.bucket,
+                            Key=obj['Key']
+                        )
+                        metadata = metadata_response.get('Metadata', {})
+
+                        # Filter by user_id if specified
+                        if user_id and metadata.get('user_id') != user_id:
+                            continue
+
+                        targets.append({
+                            'target_id': obj['Key'].split('/')[0],
+                            'key': obj['Key'],
+                            'size': obj['Size'],
+                            'last_modified': obj['LastModified'].isoformat(),
+                            'metadata': metadata
+                        })
+
+                    except Exception as e:
+                        logger.warning(f"Failed to get metadata for {obj['Key']}: {e}")
+                        continue
+
+            logger.info(f"Listed {len(targets)} targets (user_id={user_id})")
+            return targets
+
+        except Exception as e:
+            logger.error(f"List targets failed: {e}", exc_info=True)
+            raise StorageError(f"List targets failed: {e}")
+
+    async def cleanup_cache(self) -> int:
+        """Clean up local cache using LRU eviction."""
+        try:
+            cache_files = []
+            total_size = 0
+
+            # Gather all cached files with metadata
+            for cache_file in self.cache_dir.rglob('*'):
+                if cache_file.is_file():
+                    try:
+                        stat = cache_file.stat()
+                        cache_files.append({
+                            'path': cache_file,
+                            'size': stat.st_size,
+                            'atime': stat.st_atime  # Last access time
+                        })
+                        total_size += stat.st_size
+                    except Exception as e:
+                        logger.warning(f"Failed to stat {cache_file}: {e}")
+                        continue
+
+            # Check if cleanup is needed
+            if total_size <= self.cache_max_size:
+                logger.info(
+                    f"Cache size OK: {total_size / (1024**3):.2f} GB / "
+                    f"{self.cache_max_size / (1024**3):.2f} GB"
+                )
+                return 0
+
+            # Sort by access time (oldest first)
+            cache_files.sort(key=lambda x: x['atime'])
+
+            # Remove files until under limit
+            removed_count = 0
+            for file_info in cache_files:
+                if total_size <= self.cache_max_size:
+                    break
+
+                try:
+                    file_info['path'].unlink()
+                    total_size -= file_info['size']
+                    removed_count += 1
+                    logger.debug(f"Evicted from cache: {file_info['path']}")
+                except Exception as e:
+                    logger.warning(f"Failed to delete {file_info['path']}: {e}")
+                    continue
+
+            logger.info(
+                f"✓ Cache cleanup: removed {removed_count} files, "
+                f"new size: {total_size / (1024**3):.2f} GB"
+            )
+            return removed_count
+
+        except Exception as e:
+            logger.error(f"Cache cleanup failed: {e}", exc_info=True)
+            raise StorageError(f"Cache cleanup failed: {e}")
+
+    def get_cache_stats(self) -> Dict[str, Any]:
+        """Get cache statistics."""
+        try:
+            total_size = 0
+            file_count = 0
+
+            for cache_file in self.cache_dir.rglob('*'):
+                if cache_file.is_file():
+                    total_size += cache_file.stat().st_size
+                    file_count += 1
+
+            return {
+                'total_size_bytes': total_size,
+                'total_size_gb': total_size / (1024 ** 3),
+                'file_count': file_count,
+                'max_size_gb': self.cache_max_size / (1024 ** 3),
+                'usage_percent': (total_size / self.cache_max_size) * 100
+            }
+        except Exception as e:
+            logger.error(f"Failed to get cache stats: {e}")
+            return {'error': str(e)}
--- a/backend/src/temporal/init.py
+++ b/backend/src/temporal/init.py
@@ -0,0 +1,10 @@
+"""
+Temporal integration for FuzzForge.
+
+Handles workflow execution, monitoring, and management.
+"""
+
+from .manager import TemporalManager
+from .discovery import WorkflowDiscovery
+
+__all__ = ["TemporalManager", "WorkflowDiscovery"]
--- a/backend/src/temporal/discovery.py
+++ b/backend/src/temporal/discovery.py
@@ -0,0 +1,257 @@
+"""
+Workflow Discovery for Temporal
+
+Discovers workflows from the toolbox/workflows directory
+and provides metadata about available workflows.
+"""
+
+import logging
+import yaml
+from pathlib import Path
+from typing import Dict, Any
+from pydantic import BaseModel, Field, ConfigDict
+
+logger = logging.getLogger(__name__)
+
+
+class WorkflowInfo(BaseModel):
+    """Information about a discovered workflow"""
+    name: str = Field(..., description="Workflow name")
+    path: Path = Field(..., description="Path to workflow directory")
+    workflow_file: Path = Field(..., description="Path to workflow.py file")
+    metadata: Dict[str, Any] = Field(..., description="Workflow metadata from YAML")
+    workflow_type: str = Field(..., description="Workflow class name")
+    vertical: str = Field(..., description="Vertical (worker type) for this workflow")
+
+    model_config = ConfigDict(arbitrary_types_allowed=True)
+
+
+class WorkflowDiscovery:
+    """
+    Discovers workflows from the filesystem.
+
+    Scans toolbox/workflows/ for directories containing:
+    - metadata.yaml (required)
+    - workflow.py (required)
+
+    Each workflow declares its vertical (rust, android, web, etc.)
+    which determines which worker pool will execute it.
+    """
+
+    def __init__(self, workflows_dir: Path):
+        """
+        Initialize workflow discovery.
+
+        Args:
+            workflows_dir: Path to the workflows directory
+        """
+        self.workflows_dir = workflows_dir
+        if not self.workflows_dir.exists():
+            self.workflows_dir.mkdir(parents=True, exist_ok=True)
+            logger.info(f"Created workflows directory: {self.workflows_dir}")
+
+    async def discover_workflows(self) -> Dict[str, WorkflowInfo]:
+        """
+        Discover workflows by scanning the workflows directory.
+
+        Returns:
+            Dictionary mapping workflow names to their information
+        """
+        workflows = {}
+
+        logger.info(f"Scanning for workflows in: {self.workflows_dir}")
+
+        for workflow_dir in self.workflows_dir.iterdir():
+            if not workflow_dir.is_dir():
+                continue
+
+            # Skip special directories
+            if workflow_dir.name.startswith('.') or workflow_dir.name == '__pycache__':
+                continue
+
+            metadata_file = workflow_dir / "metadata.yaml"
+            if not metadata_file.exists():
+                logger.debug(f"No metadata.yaml in {workflow_dir.name}, skipping")
+                continue
+
+            workflow_file = workflow_dir / "workflow.py"
+            if not workflow_file.exists():
+                logger.warning(
+                    f"Workflow {workflow_dir.name} has metadata but no workflow.py, skipping"
+                )
+                continue
+
+            try:
+                # Parse metadata
+                with open(metadata_file) as f:
+                    metadata = yaml.safe_load(f)
+
+                # Validate required fields
+                if 'name' not in metadata:
+                    logger.warning(f"Workflow {workflow_dir.name} metadata missing 'name' field")
+                    metadata['name'] = workflow_dir.name
+
+                if 'vertical' not in metadata:
+                    logger.warning(
+                        f"Workflow {workflow_dir.name} metadata missing 'vertical' field"
+                    )
+                    continue
+
+                # Infer workflow class name from metadata or use convention
+                workflow_type = metadata.get('workflow_class')
+                if not workflow_type:
+                    # Convention: convert snake_case to PascalCase + Workflow
+                    # e.g., rust_test -> RustTestWorkflow
+                    parts = workflow_dir.name.split('_')
+                    workflow_type = ''.join(part.capitalize() for part in parts) + 'Workflow'
+
+                # Create workflow info
+                info = WorkflowInfo(
+                    name=metadata['name'],
+                    path=workflow_dir,
+                    workflow_file=workflow_file,
+                    metadata=metadata,
+                    workflow_type=workflow_type,
+                    vertical=metadata['vertical']
+                )
+
+                workflows[info.name] = info
+                logger.info(
+                    f"✓ Discovered workflow: {info.name} "
+                    f"(vertical: {info.vertical}, class: {info.workflow_type})"
+                )
+
+            except Exception as e:
+                logger.error(
+                    f"Error discovering workflow {workflow_dir.name}: {e}",
+                    exc_info=True
+                )
+                continue
+
+        logger.info(f"Discovered {len(workflows)} workflows")
+        return workflows
+
+    def get_workflows_by_vertical(
+        self,
+        workflows: Dict[str, WorkflowInfo],
+        vertical: str
+    ) -> Dict[str, WorkflowInfo]:
+        """
+        Filter workflows by vertical.
+
+        Args:
+            workflows: All discovered workflows
+            vertical: Vertical name to filter by
+
+        Returns:
+            Filtered workflows dictionary
+        """
+        return {
+            name: info
+            for name, info in workflows.items()
+            if info.vertical == vertical
+        }
+
+    def get_available_verticals(self, workflows: Dict[str, WorkflowInfo]) -> list[str]:
+        """
+        Get list of all verticals from discovered workflows.
+
+        Args:
+            workflows: All discovered workflows
+
+        Returns:
+            List of unique vertical names
+        """
+        return list(set(info.vertical for info in workflows.values()))
+
+    @staticmethod
+    def get_metadata_schema() -> Dict[str, Any]:
+        """
+        Get the JSON schema for workflow metadata.
+
+        Returns:
+            JSON schema dictionary
+        """
+        return {
+            "type": "object",
+            "required": ["name", "version", "description", "author", "vertical", "parameters"],
+            "properties": {
+                "name": {
+                    "type": "string",
+                    "description": "Workflow name"
+                },
+                "version": {
+                    "type": "string",
+                    "pattern": "^\\d+\\.\\d+\\.\\d+$",
+                    "description": "Semantic version (x.y.z)"
+                },
+                "vertical": {
+                    "type": "string",
+                    "description": "Vertical worker type (rust, android, web, etc.)"
+                },
+                "description": {
+                    "type": "string",
+                    "description": "Workflow description"
+                },
+                "author": {
+                    "type": "string",
+                    "description": "Workflow author"
+                },
+                "category": {
+                    "type": "string",
+                    "enum": ["comprehensive", "specialized", "fuzzing", "focused"],
+                    "description": "Workflow category"
+                },
+                "tags": {
+                    "type": "array",
+                    "items": {"type": "string"},
+                    "description": "Workflow tags for categorization"
+                },
+                "requirements": {
+                    "type": "object",
+                    "required": ["tools", "resources"],
+                    "properties": {
+                        "tools": {
+                            "type": "array",
+                            "items": {"type": "string"},
+                            "description": "Required security tools"
+                        },
+                        "resources": {
+                            "type": "object",
+                            "required": ["memory", "cpu", "timeout"],
+                            "properties": {
+                                "memory": {
+                                    "type": "string",
+                                    "pattern": "^\\d+[GMK]i$",
+                                    "description": "Memory limit (e.g., 1Gi, 512Mi)"
+                                },
+                                "cpu": {
+                                    "type": "string",
+                                    "pattern": "^\\d+m?$",
+                                    "description": "CPU limit (e.g., 1000m, 2)"
+                                },
+                                "timeout": {
+                                    "type": "integer",
+                                    "minimum": 60,
+                                    "maximum": 7200,
+                                    "description": "Workflow timeout in seconds"
+                                }
+                            }
+                        }
+                    }
+                },
+                "parameters": {
+                    "type": "object",
+                    "description": "Workflow parameters schema"
+                },
+                "default_parameters": {
+                    "type": "object",
+                    "description": "Default parameter values"
+                },
+                "required_modules": {
+                    "type": "array",
+                    "items": {"type": "string"},
+                    "description": "Required module names"
+                }
+            }
+        }
--- a/backend/src/temporal/manager.py
+++ b/backend/src/temporal/manager.py
@@ -0,0 +1,376 @@
+"""
+Temporal Manager - Workflow execution and management
+
+Handles:
+- Workflow discovery from toolbox
+- Workflow execution (submit to Temporal)
+- Status monitoring
+- Results retrieval
+"""
+
+import logging
+import os
+from pathlib import Path
+from typing import Dict, Optional, Any
+from uuid import uuid4
+
+from temporalio.client import Client, WorkflowHandle
+from temporalio.common import RetryPolicy
+from datetime import timedelta
+
+from .discovery import WorkflowDiscovery, WorkflowInfo
+from src.storage import S3CachedStorage
+
+logger = logging.getLogger(__name__)
+
+
+class TemporalManager:
+    """
+    Manages Temporal workflow execution for FuzzForge.
+
+    This class:
+    - Discovers available workflows from toolbox
+    - Submits workflow executions to Temporal
+    - Monitors workflow status
+    - Retrieves workflow results
+    """
+
+    def __init__(
+        self,
+        workflows_dir: Optional[Path] = None,
+        temporal_address: Optional[str] = None,
+        temporal_namespace: str = "default",
+        storage: Optional[S3CachedStorage] = None
+    ):
+        """
+        Initialize Temporal manager.
+
+        Args:
+            workflows_dir: Path to workflows directory (default: toolbox/workflows)
+            temporal_address: Temporal server address (default: from env or localhost:7233)
+            temporal_namespace: Temporal namespace
+            storage: Storage backend for file uploads (default: S3CachedStorage)
+        """
+        if workflows_dir is None:
+            workflows_dir = Path("toolbox/workflows")
+
+        self.temporal_address = temporal_address or os.getenv(
+            'TEMPORAL_ADDRESS',
+            'localhost:7233'
+        )
+        self.temporal_namespace = temporal_namespace
+        self.discovery = WorkflowDiscovery(workflows_dir)
+        self.workflows: Dict[str, WorkflowInfo] = {}
+        self.client: Optional[Client] = None
+
+        # Initialize storage backend
+        self.storage = storage or S3CachedStorage()
+
+        logger.info(
+            f"TemporalManager initialized: {self.temporal_address} "
+            f"(namespace: {self.temporal_namespace})"
+        )
+
+    async def initialize(self):
+        """Initialize the manager by discovering workflows and connecting to Temporal."""
+        try:
+            # Discover workflows
+            self.workflows = await self.discovery.discover_workflows()
+
+            if not self.workflows:
+                logger.warning("No workflows discovered")
+            else:
+                logger.info(
+                    f"Discovered {len(self.workflows)} workflows: "
+                    f"{list(self.workflows.keys())}"
+                )
+
+            # Connect to Temporal
+            self.client = await Client.connect(
+                self.temporal_address,
+                namespace=self.temporal_namespace
+            )
+            logger.info(f"✓ Connected to Temporal: {self.temporal_address}")
+
+        except Exception as e:
+            logger.error(f"Failed to initialize Temporal manager: {e}", exc_info=True)
+            raise
+
+    async def close(self):
+        """Close Temporal client connection."""
+        if self.client:
+            # Temporal client doesn't need explicit close in Python SDK
+            pass
+
+    async def get_workflows(self) -> Dict[str, WorkflowInfo]:
+        """
+        Get all discovered workflows.
+
+        Returns:
+            Dictionary mapping workflow names to their info
+        """
+        return self.workflows
+
+    async def get_workflow(self, name: str) -> Optional[WorkflowInfo]:
+        """
+        Get workflow info by name.
+
+        Args:
+            name: Workflow name
+
+        Returns:
+            WorkflowInfo or None if not found
+        """
+        return self.workflows.get(name)
+
+    async def upload_target(
+        self,
+        file_path: Path,
+        user_id: str,
+        metadata: Optional[Dict[str, Any]] = None
+    ) -> str:
+        """
+        Upload target file to storage.
+
+        Args:
+            file_path: Local path to file
+            user_id: User ID
+            metadata: Optional metadata
+
+        Returns:
+            Target ID for use in workflow execution
+        """
+        target_id = await self.storage.upload_target(file_path, user_id, metadata)
+        logger.info(f"Uploaded target: {target_id}")
+        return target_id
+
+    async def run_workflow(
+        self,
+        workflow_name: str,
+        target_id: str,
+        workflow_params: Optional[Dict[str, Any]] = None,
+        workflow_id: Optional[str] = None
+    ) -> WorkflowHandle:
+        """
+        Execute a workflow.
+
+        Args:
+            workflow_name: Name of workflow to execute
+            target_id: Target ID (from upload_target)
+            workflow_params: Additional workflow parameters
+            workflow_id: Optional workflow ID (generated if not provided)
+
+        Returns:
+            WorkflowHandle for monitoring/results
+
+        Raises:
+            ValueError: If workflow not found or client not initialized
+        """
+        if not self.client:
+            raise ValueError("Temporal client not initialized. Call initialize() first.")
+
+        # Get workflow info
+        workflow_info = self.workflows.get(workflow_name)
+        if not workflow_info:
+            raise ValueError(f"Workflow not found: {workflow_name}")
+
+        # Generate workflow ID if not provided
+        if not workflow_id:
+            workflow_id = f"{workflow_name}-{str(uuid4())[:8]}"
+
+        # Prepare workflow input arguments
+        workflow_params = workflow_params or {}
+
+        # Build args list: [target_id, ...workflow_params in schema order]
+        # The workflow parameters are passed as individual positional args
+        workflow_args = [target_id]
+
+        # Add parameters in order based on metadata schema
+        # This ensures parameters match the workflow signature order
+        if workflow_params and 'parameters' in workflow_info.metadata:
+            param_schema = workflow_info.metadata['parameters'].get('properties', {})
+            # Iterate parameters in schema order and add values
+            for param_name in param_schema.keys():
+                param_value = workflow_params.get(param_name)
+                workflow_args.append(param_value)
+
+        # Determine task queue from workflow vertical
+        vertical = workflow_info.metadata.get("vertical", "default")
+        task_queue = f"{vertical}-queue"
+
+        logger.info(
+            f"Starting workflow: {workflow_name} "
+            f"(id={workflow_id}, queue={task_queue}, target={target_id})"
+        )
+        logger.info(f"DEBUG: workflow_args = {workflow_args}")
+        logger.info(f"DEBUG: workflow_params received = {workflow_params}")
+
+        try:
+            # Start workflow execution with positional arguments
+            handle = await self.client.start_workflow(
+                workflow=workflow_info.workflow_type,  # Workflow class name
+                args=workflow_args,  # Positional arguments
+                id=workflow_id,
+                task_queue=task_queue,
+                retry_policy=RetryPolicy(
+                    initial_interval=timedelta(seconds=1),
+                    maximum_interval=timedelta(minutes=1),
+                    maximum_attempts=3
+                )
+            )
+
+            logger.info(f"✓ Workflow started: {workflow_id}")
+            return handle
+
+        except Exception as e:
+            logger.error(f"Failed to start workflow {workflow_name}: {e}", exc_info=True)
+            raise
+
+    async def get_workflow_status(self, workflow_id: str) -> Dict[str, Any]:
+        """
+        Get workflow execution status.
+
+        Args:
+            workflow_id: Workflow execution ID
+
+        Returns:
+            Status dictionary with workflow state
+
+        Raises:
+            ValueError: If client not initialized or workflow not found
+        """
+        if not self.client:
+            raise ValueError("Temporal client not initialized")
+
+        try:
+            # Get workflow handle
+            handle = self.client.get_workflow_handle(workflow_id)
+
+            # Try to get result (non-blocking describe)
+            description = await handle.describe()
+
+            status = {
+                "workflow_id": workflow_id,
+                "status": description.status.name,
+                "start_time": description.start_time.isoformat() if description.start_time else None,
+                "execution_time": description.execution_time.isoformat() if description.execution_time else None,
+                "close_time": description.close_time.isoformat() if description.close_time else None,
+                "task_queue": description.task_queue,
+            }
+
+            logger.info(f"Workflow {workflow_id} status: {status['status']}")
+            return status
+
+        except Exception as e:
+            logger.error(f"Failed to get workflow status: {e}", exc_info=True)
+            raise
+
+    async def get_workflow_result(
+        self,
+        workflow_id: str,
+        timeout: Optional[timedelta] = None
+    ) -> Any:
+        """
+        Get workflow execution result (blocking).
+
+        Args:
+            workflow_id: Workflow execution ID
+            timeout: Maximum time to wait for result
+
+        Returns:
+            Workflow result
+
+        Raises:
+            ValueError: If client not initialized
+            TimeoutError: If timeout exceeded
+        """
+        if not self.client:
+            raise ValueError("Temporal client not initialized")
+
+        try:
+            handle = self.client.get_workflow_handle(workflow_id)
+
+            logger.info(f"Waiting for workflow result: {workflow_id}")
+
+            # Wait for workflow to complete and get result
+            if timeout:
+                # Use asyncio timeout if provided
+                import asyncio
+                result = await asyncio.wait_for(handle.result(), timeout=timeout.total_seconds())
+            else:
+                result = await handle.result()
+
+            logger.info(f"✓ Workflow {workflow_id} completed")
+            return result
+
+        except Exception as e:
+            logger.error(f"Failed to get workflow result: {e}", exc_info=True)
+            raise
+
+    async def cancel_workflow(self, workflow_id: str) -> None:
+        """
+        Cancel a running workflow.
+
+        Args:
+            workflow_id: Workflow execution ID
+
+        Raises:
+            ValueError: If client not initialized
+        """
+        if not self.client:
+            raise ValueError("Temporal client not initialized")
+
+        try:
+            handle = self.client.get_workflow_handle(workflow_id)
+            await handle.cancel()
+
+            logger.info(f"✓ Workflow cancelled: {workflow_id}")
+
+        except Exception as e:
+            logger.error(f"Failed to cancel workflow: {e}", exc_info=True)
+            raise
+
+    async def list_workflows(
+        self,
+        filter_query: Optional[str] = None,
+        limit: int = 100
+    ) -> list[Dict[str, Any]]:
+        """
+        List workflow executions.
+
+        Args:
+            filter_query: Optional Temporal list filter query
+            limit: Maximum number of results
+
+        Returns:
+            List of workflow execution info
+
+        Raises:
+            ValueError: If client not initialized
+        """
+        if not self.client:
+            raise ValueError("Temporal client not initialized")
+
+        try:
+            workflows = []
+
+            # Use Temporal's list API
+            async for workflow in self.client.list_workflows(filter_query):
+                workflows.append({
+                    "workflow_id": workflow.id,
+                    "workflow_type": workflow.workflow_type,
+                    "status": workflow.status.name,
+                    "start_time": workflow.start_time.isoformat() if workflow.start_time else None,
+                    "close_time": workflow.close_time.isoformat() if workflow.close_time else None,
+                    "task_queue": workflow.task_queue,
+                })
+
+                if len(workflows) >= limit:
+                    break
+
+            logger.info(f"Listed {len(workflows)} workflows")
+            return workflows
+
+        except Exception as e:
+            logger.error(f"Failed to list workflows: {e}", exc_info=True)
+            raise
--- a/backend/tests/README.md
+++ b/backend/tests/README.md
@@ -0,0 +1,119 @@
+# FuzzForge Test Suite
+
+Comprehensive test infrastructure for FuzzForge modules and workflows.
+
+## Directory Structure
+
+```
+tests/
+├── conftest.py           # Shared pytest fixtures
+├── unit/                 # Fast, isolated unit tests
+│   ├── test_modules/     # Module-specific tests
+│   │   ├── test_cargo_fuzzer.py
+│   │   └── test_atheris_fuzzer.py
+│   ├── test_workflows/   # Workflow tests
+│   └── test_api/         # API endpoint tests
+├── integration/          # Integration tests (requires Docker)
+└── fixtures/             # Test data and projects
+    ├── test_projects/    # Vulnerable projects for testing
+    └── expected_results/ # Expected output for validation
+```
+
+## Running Tests
+
+### All Tests
+```bash
+cd backend
+pytest tests/ -v
+```
+
+### Unit Tests Only (Fast)
+```bash
+pytest tests/unit/ -v
+```
+
+### Integration Tests (Requires Docker)
+```bash
+# Start services
+docker-compose up -d
+
+# Run integration tests
+pytest tests/integration/ -v
+
+# Cleanup
+docker-compose down
+```
+
+### With Coverage
+```bash
+pytest tests/ --cov=toolbox/modules --cov=src --cov-report=html
+```
+
+### Parallel Execution
+```bash
+pytest tests/unit/ -n auto
+```
+
+## Available Fixtures
+
+### Workspace Fixtures
+- `temp_workspace`: Empty temporary workspace
+- `python_test_workspace`: Python project with vulnerabilities
+- `rust_test_workspace`: Rust project with fuzz targets
+
+### Module Fixtures
+- `atheris_fuzzer`: AtherisFuzzer instance
+- `cargo_fuzzer`: CargoFuzzer instance
+- `file_scanner`: FileScanner instance
+
+### Configuration Fixtures
+- `atheris_config`: Default Atheris configuration
+- `cargo_fuzz_config`: Default cargo-fuzz configuration
+- `gitleaks_config`: Default Gitleaks configuration
+
+### Mock Fixtures
+- `mock_stats_callback`: Mock stats callback for fuzzing
+- `mock_temporal_context`: Mock Temporal activity context
+
+## Writing Tests
+
+### Unit Test Example
+```python
+import pytest
+
+@pytest.mark.asyncio
+async def test_module_execution(cargo_fuzzer, rust_test_workspace, cargo_fuzz_config):
+    """Test module execution"""
+    result = await cargo_fuzzer.execute(cargo_fuzz_config, rust_test_workspace)
+
+    assert result.status == "success"
+    assert result.execution_time > 0
+```
+
+### Integration Test Example
+```python
+@pytest.mark.integration
+async def test_end_to_end_workflow():
+    """Test complete workflow execution"""
+    # Test full workflow with real services
+    pass
+```
+
+## CI/CD Integration
+
+Tests run automatically on:
+- **Push to main/develop**: Full test suite
+- **Pull requests**: Full test suite + coverage
+- **Nightly**: Extended integration tests
+
+See `.github/workflows/test.yml` for configuration.
+
+## Code Coverage
+
+Target coverage: **80%+** for core modules
+
+View coverage report:
+```bash
+pytest tests/ --cov --cov-report=html
+open htmlcov/index.html
+```
--- a/backend/tests/conftest.py
+++ b/backend/tests/conftest.py
@@ -0,0 +1,230 @@
+# Copyright (c) 2025 FuzzingLabs
+#
+# Licensed under the Business Source License 1.1 (BSL). See the LICENSE file
+# at the root of this repository for details.
+#
+# After the Change Date (four years from publication), this version of the
+# Licensed Work will be made available under the Apache License, Version 2.0.
+# See the LICENSE-APACHE file or http://www.apache.org/licenses/LICENSE-2.0
+#
+# Additional attribution and requirements are provided in the NOTICE file.
+
+import sys
+from pathlib import Path
+from typing import Dict, Any
+import pytest
+
+# Ensure project root is on sys.path so `src` is importable
+ROOT = Path(__file__).resolve().parents[1]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+
+# Add toolbox to path for module imports
+TOOLBOX = ROOT / "toolbox"
+if str(TOOLBOX) not in sys.path:
+    sys.path.insert(0, str(TOOLBOX))
+
+
+# ============================================================================
+# Workspace Fixtures
+# ============================================================================
+
+@pytest.fixture
+def temp_workspace(tmp_path):
+    """Create a temporary workspace directory for testing"""
+    workspace = tmp_path / "workspace"
+    workspace.mkdir()
+    return workspace
+
+
+@pytest.fixture
+def python_test_workspace(temp_workspace):
+    """Create a Python test workspace with sample files"""
+    # Create a simple Python project structure
+    (temp_workspace / "main.py").write_text("""
+def process_data(data):
+    # Intentional bug: no bounds checking
+    return data[0:100]
+
+def divide(a, b):
+    # Division by zero vulnerability
+    return a / b
+""")
+
+    (temp_workspace / "config.py").write_text("""
+# Hardcoded secrets for testing
+API_KEY = "sk_test_1234567890abcdef"
+DATABASE_URL = "postgresql://admin:password123@localhost/db"
+AWS_SECRET = "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
+""")
+
+    return temp_workspace
+
+
+@pytest.fixture
+def rust_test_workspace(temp_workspace):
+    """Create a Rust test workspace with fuzz targets"""
+    # Create Cargo.toml
+    (temp_workspace / "Cargo.toml").write_text("""[package]
+name = "test_project"
+version = "0.1.0"
+edition = "2021"
+
+[dependencies]
+""")
+
+    # Create src/lib.rs
+    src_dir = temp_workspace / "src"
+    src_dir.mkdir()
+    (src_dir / "lib.rs").write_text("""
+pub fn process_buffer(data: &[u8]) -> Vec<u8> {
+    if data.len() < 4 {
+        return Vec::new();
+    }
+
+    // Vulnerability: bounds checking issue
+    let size = data[0] as usize;
+    let mut result = Vec::new();
+    for i in 0..size {
+        result.push(data[i]);
+    }
+    result
+}
+""")
+
+    # Create fuzz directory structure
+    fuzz_dir = temp_workspace / "fuzz"
+    fuzz_dir.mkdir()
+
+    (fuzz_dir / "Cargo.toml").write_text("""[package]
+name = "test_project-fuzz"
+version = "0.0.0"
+edition = "2021"
+
+[dependencies]
+libfuzzer-sys = "0.4"
+
+[dependencies.test_project]
+path = ".."
+
+[[bin]]
+name = "fuzz_target_1"
+path = "fuzz_targets/fuzz_target_1.rs"
+""")
+
+    fuzz_targets_dir = fuzz_dir / "fuzz_targets"
+    fuzz_targets_dir.mkdir()
+
+    (fuzz_targets_dir / "fuzz_target_1.rs").write_text("""#![no_main]
+use libfuzzer_sys::fuzz_target;
+use test_project::process_buffer;
+
+fuzz_target!(|data: &[u8]| {
+    let _ = process_buffer(data);
+});
+""")
+
+    return temp_workspace
+
+
+# ============================================================================
+# Module Configuration Fixtures
+# ============================================================================
+
+@pytest.fixture
+def atheris_config():
+    """Default Atheris fuzzer configuration"""
+    return {
+        "target_file": "auto-discover",
+        "max_iterations": 1000,
+        "timeout_seconds": 10,
+        "corpus_dir": None
+    }
+
+
+@pytest.fixture
+def cargo_fuzz_config():
+    """Default cargo-fuzz configuration"""
+    return {
+        "target_name": None,
+        "max_iterations": 1000,
+        "timeout_seconds": 10,
+        "sanitizer": "address"
+    }
+
+
+@pytest.fixture
+def gitleaks_config():
+    """Default Gitleaks configuration"""
+    return {
+        "config_path": None,
+        "scan_uncommitted": True
+    }
+
+
+@pytest.fixture
+def file_scanner_config():
+    """Default file scanner configuration"""
+    return {
+        "scan_patterns": ["*.py", "*.rs", "*.js"],
+        "exclude_patterns": ["*.test.*", "*.spec.*"],
+        "max_file_size": 1048576  # 1MB
+    }
+
+
+# ============================================================================
+# Module Instance Fixtures
+# ============================================================================
+
+@pytest.fixture
+def atheris_fuzzer():
+    """Create an AtherisFuzzer instance"""
+    from modules.fuzzer.atheris_fuzzer import AtherisFuzzer
+    return AtherisFuzzer()
+
+
+@pytest.fixture
+def cargo_fuzzer():
+    """Create a CargoFuzzer instance"""
+    from modules.fuzzer.cargo_fuzzer import CargoFuzzer
+    return CargoFuzzer()
+
+
+@pytest.fixture
+def file_scanner():
+    """Create a FileScanner instance"""
+    from modules.scanner.file_scanner import FileScanner
+    return FileScanner()
+
+
+# ============================================================================
+# Mock Fixtures
+# ============================================================================
+
+@pytest.fixture
+def mock_stats_callback():
+    """Mock stats callback for fuzzing"""
+    stats_received = []
+
+    async def callback(stats: Dict[str, Any]):
+        stats_received.append(stats)
+
+    callback.stats_received = stats_received
+    return callback
+
+
+@pytest.fixture
+def mock_temporal_context():
+    """Mock Temporal activity context"""
+    class MockActivityInfo:
+        def __init__(self):
+            self.workflow_id = "test-workflow-123"
+            self.activity_id = "test-activity-1"
+            self.attempt = 1
+
+    class MockContext:
+        def __init__(self):
+            self.info = MockActivityInfo()
+
+    return MockContext()
+
--- a/backend/tests/fixtures/init.py
+++ b/backend/tests/fixtures/init.py
--- a/fuzzforge-common/tests/unit/init.py
+++ b/fuzzforge-common/tests/unit/init.py
--- a/fuzzforge-common/tests/unit/engines/init.py
+++ b/fuzzforge-common/tests/unit/engines/init.py
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Songbird	22f01562ba	Add env-configurable timeout for proxy providers	2025-10-21 14:26:34 +02:00
Songbird	092a90df5d	feat: seed governance config and responses routing	2025-10-18 15:52:59 +02:00