NeuroSploit v3.3.0 — Autonomous MD-Agent Engine

Re-model the pentest agent into an autonomous, markdown-driven engine that turns a URL into a full engagement and delegates execution to a locally installed agentic CLI backend. Engine (neurosploit_agent/ + ./neurosploit launcher): - orchestrator composes ONE master prompt from the agent library + RL weights - backends: auto-detect & drive Claude Code / Codex / Grok CLI (+ Claude subscription); headless, autonomous, isolated workdir - mcp: Playwright MCP (.mcp.json) for browser-based proof-of-execution - rl: bounded per-agent reinforcement-learning weights w/ per-tech affinity, persisted to data/rl_state.json - models: latest registry incl. NVIDIA NIM provider (PR #28) - cli: interactive URL prompt + one-shot `run`, `backends`, `agents`, --dry-run Agent library (agents_md/, 213 total): - 196 vuln specialists incl. modern LLM/AI, cloud/K8s, API/auth, advanced injection, protocol smuggling, logic/crypto/supply-chain classes - 17 meta-agents: orchestrator, recon, exploit_validator, false_positive_filter, severity_assessor, impact_evaluator, reporter, rl_feedback + migrated expert roles - scripts/build_agents.py data-driven builder; REGISTRY.md index Docs: rewritten README.md, v3.3.0 RELEASE.md, .env.example (NVIDIA NIM, xAI, engine vars). Retire legacy Python orchestration (neurosploit.py + agent classes) to legacy/. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 07:15:30 +02:00 · 2026-06-14 20:57:38 -03:00
parent 59f8f42d80
commit 55af0d4634
248 changed files with 18707 additions and 577 deletions
@@ -1,625 +1,178 @@
-# NeuroSploit v3
+# NeuroSploit v3.3.0

-![NeuroSploit](https://img.shields.io/badge/NeuroSploit-AI--Powered%20Pentesting-blueviolet)
-![Version](https://img.shields.io/badge/Version-3.0.0-blue)
+![NeuroSploit](https://img.shields.io/badge/NeuroSploit-Autonomous%20AI%20Pentest-blueviolet)
+![Version](https://img.shields.io/badge/Version-3.3.0-blue)
 ![License](https://img.shields.io/badge/License-MIT-green)
-![Python](https://img.shields.io/badge/Python-3.10+-yellow)
-![React](https://img.shields.io/badge/React-18-61dafb)
-![Vuln Types](https://img.shields.io/badge/Vuln%20Types-100-red)
-![Docker](https://img.shields.io/badge/Docker-Kali%20Sandbox-informational)
+![Agents](https://img.shields.io/badge/MD%20Agents-213-red)
+![Backends](https://img.shields.io/badge/CLI%20Backends-Claude%20%7C%20Codex%20%7C%20Grok-informational)
+![MCP](https://img.shields.io/badge/MCP-Playwright-orange)

-**AI-Powered Autonomous Penetration Testing Platform**
+**Autonomous, markdown-driven AI penetration testing.**

-NeuroSploit v3 is an advanced security assessment platform that combines AI-driven autonomous agents with 100 vulnerability types, per-scan isolated Kali Linux containers, false-positive hardening, exploit chaining, and a modern React web interface with real-time monitoring.
+NeuroSploit v3.3.0 is a ground-up re-model of the pentest agent. Instead of a
+monolithic Python orchestrator, it is now a **lean engine that turns a URL into
+an autonomous engagement**: it composes a master prompt from a curated library
+of **213 markdown agents** and hands execution to whichever **agentic CLI
+backend** you have installed — **Claude Code, Codex, or Grok CLI** (or a Claude
+subscription) — augmented with **Playwright MCP** for real browser-based proof,
+and a **reinforcement-learning** loop that gets smarter every run.
+
+> The previous Python orchestration now lives in [`legacy/`](legacy/README.md).

 ---

-## Highlights
+## Why this architecture

- **100 Vulnerability Types** across 10 categories with AI-driven testing prompts
- **Autonomous Agent** - 3-stream parallel pentest (recon + junior tester + tool runner)
- **Per-Scan Kali Containers** - Each scan runs in its own isolated Docker container
- **Anti-Hallucination Pipeline** - Negative controls, proof-of-execution, confidence scoring
- **Exploit Chain Engine** - Automatically chains findings (SSRF->internal, SQLi->DB-specific, etc.)
- **WAF Detection & Bypass** - 16 WAF signatures, 12 bypass techniques
- **Smart Strategy Adaptation** - Dead endpoint detection, diminishing returns, priority recomputation
- **Multi-Provider LLM** - Claude, GPT, Gemini, Ollama, LMStudio, OpenRouter
- **Real-Time Dashboard** - WebSocket-powered live scan progress, findings, and reports
- **Sandbox Dashboard** - Monitor running Kali containers, tools, health checks in real-time
+| Old (≤ v3.2.4) | New (v3.3.0) |
+|----------------|-------------|
+| 2,500-line Python orchestrator + hand-coded agent classes | Markdown agents + thin engine |
+| One embedded LLM loop | Pluggable agentic CLI backends (Claude/Codex/Grok) |
+| Provider SDK juggling | Backend owns the agent loop; engine just composes & collects |
+| Static agent list | RL-weighted, recon-aware agent selection |
+| Reflection-based "evidence" | Playwright MCP proof-of-execution + adversarial validation |

 ---

-## Table of Contents
+## How it works

- [Quick Start](#quick-start)
- [Architecture](#architecture)
- [Autonomous Agent](#autonomous-agent)
- [100 Vulnerability Types](#100-vulnerability-types)
- [Kali Sandbox System](#kali-sandbox-system)
- [Anti-Hallucination & Validation](#anti-hallucination--validation)
- [Web GUI](#web-gui)
- [API Reference](#api-reference)
- [Configuration](#configuration)
- [Development](#development)
- [Security Notice](#security-notice)
+```
+          ┌──────────────────────────────────────────────────────────────┐
+   URL ──▶ │  neurosploit (terminal)                                       │
+          │     │                                                          │
+          │     ▼                                                          │
+          │  orchestrator ── loads agents_md/ (213) ── applies RL weights  │
+          │     │                                                          │
+          │     ▼  composes ONE master prompt                              │
+          │  backend (Claude Code | Codex | Grok)  ◀── Playwright MCP      │
+          │     │  autonomously runs the pipeline below                    │
+          │     ▼                                                          │
+          │  recon → select agents → exploit → VALIDATE → filter FPs       │
+          │        → severity → impact → report → RL feedback              │
+          └──────────────────────────────────────────────────────────────┘
+                       │                          │
+                       ▼                          ▼
+              results/findings.json        data/rl_state.json (learns)
+```
+
+The engine never fabricates findings: every candidate is independently
+re-exploited (`meta/exploit_validator`), run through an adversarial skeptic
+(`meta/false_positive_filter`), and only then scored and reported.

 ---

-## Quick Start
+## The agent library (`agents_md/`)

-### Option 1: Docker (Recommended)
+**213 agents** — see [`agents_md/REGISTRY.md`](agents_md/REGISTRY.md).
+
+- **196 vulnerability specialists** (`agents_md/vulns/`) — each a self-contained
+  playbook with a real methodology, payloads, CWE mapping, and a strict
+  anti-false-positive `## System Prompt`. Coverage includes the classic OWASP
+  web set **plus modern classes**:
+  - **LLM/AI security** (OWASP LLM Top 10): prompt injection (direct/indirect),
+    jailbreak, system-prompt leak, insecure output handling, RAG poisoning,
+    tool-invocation/function-calling abuse, excessive agency, PII leakage…
+  - **Cloud/K8s/containers**: IMDS SSRF (AWS/GCP/Azure), kubelet/dashboard
+    exposure, container & docker-socket escape, bucket takeover, IAM privesc…
+  - **Modern API/auth**: JWT alg/kid/jwk confusion, OAuth PKCE downgrade, SAML
+    XSW, OIDC, CSWSH, refresh-token & MFA bypass, account-takeover chains…
+  - **Advanced injection**: SSTI (Jinja2/FreeMarker/Velocity/Thymeleaf), SSPP,
+    XXE OOB, YAML/pickle deserialization, JNDI, XSLT…
+  - **Protocol/cache/smuggling**: HTTP/2 & CL.TE/TE.CL desync, h2c, web cache
+    deception/poisoning, response splitting, path-confusion…
+  - **Logic/crypto/supply-chain**: dependency confusion, padding oracle, weak
+    JWT secret, price/coupon/workflow abuse, exposed `.git`/`.env`/CI secrets…
+
+- **17 meta-agents** (`agents_md/meta/`): `orchestrator`, `recon`,
+  `exploit_validator`, `false_positive_filter`, `severity_assessor`,
+  `impact_evaluator`, `reporter`, `rl_feedback`, plus migrated expert roles.
+
+Add your own by dropping a `.md` into `agents_md/vulns/` (or extend the
+data-driven builder, `scripts/build_agents.py`). It is picked up automatically.
+
+---
+
+## Quickstart

 ```bash
-# Clone repository
-git clone https://github.com/your-org/NeuroSploitv2.git
-cd NeuroSploitv2
+# 1. Have at least one agentic CLI installed: Claude Code, Codex, or Grok CLI
+#    (Playwright MCP needs Node/npx)
+./neurosploit backends          # show what's detected
+./neurosploit agents            # {'vulns': 196, 'meta': 17, 'total': 213}

-# Copy environment file and add your API keys
-cp .env.example .env
-nano .env  # Add ANTHROPIC_API_KEY, OPENAI_API_KEY, or GEMINI_API_KEY
+# 2. Interactive: enter a URL, pick a backend + model, go
+./neurosploit

-# Build the Kali sandbox image (first time only, ~5 min)
-./scripts/build-kali.sh
+# 3. Or one-shot:
+./neurosploit run https://target.example \
+    --backend claude --model claude-opus-4-8 \
+    --collaborator oob.your-collab.net

-# Start backend
-uvicorn backend.main:app --host 0.0.0.0 --port 8000
+# 4. Preview the composed master prompt without executing the backend:
+./neurosploit run https://target.example --dry-run
 ```

-### Option 2: Manual Setup
+Outputs land in `results/<target>/findings.json` and `reports/`, and the RL
+state updates in `data/rl_state.json`.

-```bash
-# Backend
-pip install -r requirements.txt
-uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload
+### Backends

-# Frontend (new terminal)
-cd frontend
-npm install
-npm run dev
-```
+| Backend | Binary | Autonomy flag | Subscription |
+|---------|--------|---------------|--------------|
+| Claude Code | `claude` | `--dangerously-skip-permissions` | ✅ via Claude login |
+| Codex CLI | `codex` | `--dangerously-bypass-approvals-and-sandbox` | — |
+| Grok CLI | `grok` | `--yolo` | — |

-### Build Kali Sandbox Image
+The engine auto-detects installed backends and only offers those. In the
+interactive flow, answering **yes** to "Use Claude subscription" runs Claude Code
+against your logged-in subscription instead of an API key.

-```bash
-# Normal build (uses Docker cache)
-./scripts/build-kali.sh
+### Models

-# Full rebuild (no cache)
-./scripts/build-kali.sh --fresh
-
-# Build + run health check
-./scripts/build-kali.sh --test
-
-# Or via docker-compose
-docker compose -f docker/docker-compose.kali.yml build
-```
-
-Access the web interface at **http://localhost:8000** (production build) or **http://localhost:5173** (dev mode).
+Latest models per provider live in `neurosploit_agent/models.py`, including the
+**NVIDIA NIM** provider (PR #28, OpenAI-compatible at
+`https://integrate.api.nvidia.com/v1`, `nvapi-` keys), Anthropic Claude 4.x,
+OpenAI, xAI Grok, Gemini, OpenRouter, and local Ollama.

 ---

-## Architecture
+## Reinforcement learning

-```
-NeuroSploitv3/
-├── backend/                         # FastAPI Backend
-│   ├── api/v1/                      # REST API (13 routers)
-│   │   ├── scans.py                 # Scan CRUD + pause/resume/stop
-│   │   ├── agent.py                 # AI Agent control
-│   │   ├── agent_tasks.py           # Scan task tracking
-│   │   ├── dashboard.py             # Stats + activity feed
-│   │   ├── reports.py               # Report generation (HTML/PDF/JSON)
-│   │   ├── scheduler.py             # Cron/interval scheduling
-│   │   ├── vuln_lab.py              # Per-type vulnerability lab
-│   │   ├── terminal.py              # Terminal agent (10 endpoints)
-│   │   ├── sandbox.py               # Sandbox container monitoring
-│   │   ├── targets.py               # Target validation
-│   │   ├── prompts.py               # Preset prompts
-│   │   ├── vulnerabilities.py       # Vulnerability management
-│   │   └── settings.py              # Runtime settings
-│   ├── core/
-│   │   ├── autonomous_agent.py      # Main AI agent (~7000 lines)
-│   │   ├── vuln_engine/             # 100-type vulnerability engine
-│   │   │   ├── registry.py          # 100 VULNERABILITY_INFO entries
-│   │   │   ├── payload_generator.py # 526 payloads across 95 libraries
-│   │   │   ├── ai_prompts.py        # Per-vuln AI decision prompts
-│   │   │   ├── system_prompts.py    # 12 anti-hallucination prompts
-│   │   │   └── testers/             # 10 category tester modules
-│   │   ├── validation/              # False-positive hardening
-│   │   │   ├── negative_control.py  # Benign request control engine
-│   │   │   ├── proof_of_execution.py # Per-type proof checks (25+ methods)
-│   │   │   ├── confidence_scorer.py # Numeric 0-100 scoring
-│   │   │   └── validation_judge.py  # Sole authority for finding approval
-│   │   ├── request_engine.py        # Retry, rate limit, circuit breaker
-│   │   ├── waf_detector.py          # 16 WAF signatures + bypass
-│   │   ├── strategy_adapter.py      # Mid-scan strategy adaptation
-│   │   ├── chain_engine.py          # 10 exploit chain rules
-│   │   ├── auth_manager.py          # Multi-user auth management
-│   │   ├── xss_context_analyzer.py  # 8-context XSS analysis
-│   │   ├── poc_generator.py         # 20+ per-type PoC generators
-│   │   ├── execution_history.py     # Cross-scan learning
-│   │   ├── access_control_learner.py # Adaptive BOLA/BFLA/IDOR learning
-│   │   ├── response_verifier.py     # 4-signal response verification
-│   │   ├── agent_memory.py          # Bounded dedup agent memory
-│   │   └── report_engine/           # OHVR report generator
-│   ├── models/                      # SQLAlchemy ORM models
-│   ├── db/                          # Database layer
-│   ├── config.py                    # Pydantic settings
-│   └── main.py                      # FastAPI app entry
-│
-├── core/                            # Shared core modules
-│   ├── llm_manager.py               # Multi-provider LLM routing
-│   ├── sandbox_manager.py           # BaseSandbox ABC + legacy shared sandbox
-│   ├── kali_sandbox.py              # Per-scan Kali container manager
-│   ├── container_pool.py            # Global container pool coordinator
-│   ├── tool_registry.py             # 56 tool install recipes for Kali
-│   ├── mcp_server.py                # MCP server (12 tools, stdio)
-│   ├── scheduler.py                 # APScheduler scan scheduling
-│   └── browser_validator.py         # Playwright browser validation
-│
-├── frontend/                        # React + TypeScript Frontend
-│   ├── src/
-│   │   ├── pages/
-│   │   │   ├── HomePage.tsx             # Dashboard with stats
-│   │   │   ├── AutoPentestPage.tsx      # 3-stream auto pentest
-│   │   │   ├── VulnLabPage.tsx          # Per-type vulnerability lab
-│   │   │   ├── TerminalAgentPage.tsx    # AI terminal chat
-│   │   │   ├── SandboxDashboardPage.tsx # Container monitoring
-│   │   │   ├── ScanDetailsPage.tsx      # Findings + validation
-│   │   │   ├── SchedulerPage.tsx        # Cron/interval scheduling
-│   │   │   ├── SettingsPage.tsx         # Configuration
-│   │   │   └── ReportsPage.tsx          # Report management
-│   │   ├── components/              # Reusable UI components
-│   │   ├── services/api.ts          # API client layer
-│   │   └── types/index.ts           # TypeScript interfaces
-│   └── package.json
-│
-├── docker/
-│   ├── Dockerfile.kali              # Multi-stage Kali sandbox (11 Go tools)
-│   ├── Dockerfile.sandbox           # Legacy Debian sandbox
-│   ├── Dockerfile.backend           # Backend container
-│   ├── Dockerfile.frontend          # Frontend container
-│   ├── docker-compose.kali.yml      # Kali sandbox build
-│   └── docker-compose.sandbox.yml   # Legacy sandbox
-│
-├── config/config.json               # Profiles, tools, sandbox, MCP
-├── data/
-│   ├── vuln_knowledge_base.json     # 100 vuln type definitions
-│   ├── execution_history.json       # Cross-scan learning data
-│   └── access_control_learning.json # BOLA/BFLA adaptive data
-│
-├── scripts/
-│   └── build-kali.sh               # Build/rebuild Kali image
-├── tools/
-│   └── benchmark_runner.py          # 104 CTF challenges
-├── agents/base_agent.py             # BaseAgent class
-├── neurosploit.py                   # CLI entry point
-└── requirements.txt
-```
+Every run produces per-agent reward signals (`meta/rl_feedback` +
+`neurosploit_agent/rl.py`): validated findings reward an agent (weighted by
+severity), rejected false positives penalize it, correct skips stay neutral.
+Weights are bounded `[0.05, 1.0]` and carry per-tech-stack affinity, so the
+engine learns, e.g., to prioritize `ssti_jinja2` on Flask targets. State is
+explainable and persisted to `data/rl_state.json`.

 ---

-## Autonomous Agent
+## Safety & authorization

-The AI agent (`autonomous_agent.py`) orchestrates the entire penetration test autonomously.
-
-### 3-Stream Parallel Architecture
-
-```
-                    ┌─────────────────────┐
-                    │   Auto Pentest      │
-                    │   Target URL(s)     │
-                    └────────┬────────────┘
-                             │
-              ┌──────────────┼──────────────┐
-              ▼              ▼              ▼
-   ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
-   │  Stream 1    │ │  Stream 2    │ │  Stream 3    │
-   │  Recon       │ │  Junior Test │ │  Tool Runner │
-   │  ─────────── │ │  ─────────── │ │  ─────────── │
-   │  Crawl pages │ │  Test target │ │  Nuclei scan │
-   │  Find params │ │  AI-priority │ │  Naabu ports │
-   │  Tech detect │ │  3 payloads  │ │  AI decides  │
-   │  WAF detect  │ │  per endpoint│ │  extra tools │
-   └──────┬───────┘ └──────┬───────┘ └──────┬───────┘
-          │                │                │
-          └────────────────┼────────────────┘
-                           ▼
-              ┌─────────────────────┐
-              │  Deep Analysis      │
-              │  100 vuln types     │
-              │  Full payload sets  │
-              │  Chain exploitation │
-              └─────────┬───────────┘
-                        ▼
-              ┌─────────────────────┐
-              │  Report Generation  │
-              │  AI executive brief │
-              │  PoC code per find  │
-              └─────────────────────┘
-```
-
-### Agent Autonomy Modules
-
-| Module | Description |
-|--------|-------------|
-| **Request Engine** | Retry with backoff, per-host rate limiting, circuit breaker, adaptive timeouts |
-| **WAF Detector** | 16 WAF signatures (Cloudflare, AWS, Akamai, Imperva, etc.), 12 bypass techniques |
-| **Strategy Adapter** | Dead endpoint detection, diminishing returns, 403 bypass, priority recomputation |
-| **Chain Engine** | 10 chain rules (SSRF->internal, SQLi->DB-specific, LFI->config, IDOR pattern transfer) |
-| **Auth Manager** | Multi-user contexts (user_a, user_b, admin), login form detection, session management |
-
-### Scan Features
-
- **Pause / Resume / Stop** with checkpoints
- **Manual Validation** - Confirm or reject AI findings
- **Screenshot Capture** on confirmed findings (Playwright)
- **Cross-Scan Learning** - Historical success rates influence future priorities
- **CVE Testing** - Regex detection + AI-generated payloads
+NeuroSploit is for **authorized** security testing only. Every agent's system
+prompt enforces scope and proof-of-exploitation; DoS-class agents refuse to
+flood and require explicit rules-of-engagement. You are responsible for having
+written permission for any target you point it at.

 ---

-## 100 Vulnerability Types
-
-### Categories
-
-| Category | Types | Examples |
-|----------|-------|---------|
-| **Injection** | 38 | XSS (reflected/stored/DOM), SQLi, NoSQLi, Command Injection, SSTI, LDAP, XPath, CRLF, Header Injection, Log Injection, GraphQL Injection |
-| **Inspection** | 21 | Security Headers, CORS, Clickjacking, Info Disclosure, Debug Endpoints, Error Disclosure, Source Code Exposure |
-| **AI-Driven** | 41 | BOLA, BFLA, IDOR, Race Condition, Business Logic, JWT Manipulation, OAuth Flaws, Prototype Pollution, WebSocket Hijacking, Cache Poisoning, HTTP Request Smuggling |
-| **Authentication** | 8 | Auth Bypass, Session Fixation, Credential Stuffing, Password Reset Flaws, MFA Bypass, Default Credentials |
-| **Authorization** | 6 | BOLA, BFLA, IDOR, Privilege Escalation, Forced Browsing, Function-Level Access Control |
-| **File Access** | 5 | LFI, RFI, Path Traversal, File Upload, XXE |
-| **Request Forgery** | 4 | SSRF, CSRF, Cloud Metadata, DNS Rebinding |
-| **Client-Side** | 8 | CORS, Clickjacking, Open Redirect, DOM Clobbering, Prototype Pollution, PostMessage, CSS Injection |
-| **Infrastructure** | 6 | SSL/TLS, HTTP Methods, Subdomain Takeover, Host Header, CNAME Hijacking |
-| **Cloud/Supply** | 4 | Cloud Metadata, S3 Bucket Misconfiguration, Dependency Confusion, Third-Party Script |
-
-### Payload Engine
-
- **526 payloads** across 95 libraries
- **73 XSS stored payloads** + 5 context-specific sets
- Per-type AI decision prompts with anti-hallucination directives
- WAF-adaptive payload transformation (12 techniques)
-
---
-
-## Kali Sandbox System
-
-Each scan runs in its own **isolated Kali Linux Docker container**, providing:
-
- **Complete Isolation** - No interference between concurrent scans
- **On-Demand Tools** - 56 tools installed only when needed
- **Auto Cleanup** - Containers destroyed when scan completes
- **Resource Limits** - Per-container memory (2GB) and CPU (2 cores) limits
-
-### Pre-Installed Tools (28)
-
-| Category | Tools |
-|----------|-------|
-| **Scanners** | nuclei, naabu, httpx, nmap, nikto, masscan, whatweb |
-| **Discovery** | subfinder, katana, dnsx, uncover, ffuf, gobuster, waybackurls |
-| **Exploitation** | dalfox, sqlmap |
-| **System** | curl, wget, git, python3, pip3, go, jq, dig, whois, openssl, netcat, bash |
-
-### On-Demand Tools (28 more)
-
-Installed automatically inside the container when first requested:
-
- **APT**: wpscan, dirb, hydra, john, hashcat, testssl, sslscan, enum4linux, dnsrecon, amass, medusa, crackmapexec, etc.
- **Go**: gau, gitleaks, anew, httprobe
- **Pip**: dirsearch, wfuzz, arjun, wafw00f, sslyze, commix, trufflehog, retire
-
-### Container Pool
+## Repository layout

 ```
-ContainerPool (global coordinator, max 5 concurrent)
-  ├── KaliSandbox(scan_id="abc") → docker: neurosploit-abc
-  ├── KaliSandbox(scan_id="def") → docker: neurosploit-def
-  └── KaliSandbox(scan_id="ghi") → docker: neurosploit-ghi
+neurosploit                 # launcher (./neurosploit)
+neurosploit_agent/          # the v3.3.0 engine
+  cli.py  orchestrator.py  agent_loader.py  backends.py  rl.py  mcp.py  models.py  config.py
+agents_md/
+  vulns/   (196)            # vulnerability specialist agents
+  meta/    (17)             # orchestrator, recon, validator, scorers, reporter, RL, roles
+  REGISTRY.md               # generated index
+scripts/build_agents.py     # data-driven agent builder
+legacy/                     # retired pre-v3.3.0 Python orchestration
 ```

- **TTL enforcement** - Containers auto-destroyed after 60 min
- **Orphan cleanup** - Stale containers removed on server startup
- **Graceful fallback** - Falls back to shared container if Docker unavailable
-
---
-
-## Anti-Hallucination & Validation
-
-NeuroSploit uses a multi-layered validation pipeline to eliminate false positives:
-
-### Validation Pipeline
-
-```
-Finding Candidate
-    │
-    ▼
-┌─────────────────────┐
-│ Negative Controls    │  Send benign/empty requests as controls
-│ Same behavior = FP   │  -60 confidence if same response
-└─────────┬───────────┘
-          ▼
-┌─────────────────────┐
-│ Proof of Execution   │  25+ per-vuln-type proof methods
-│ XSS: context check   │  SSRF: metadata markers
-│ SQLi: DB errors       │  BOLA: data comparison
-└─────────┬───────────┘
-          ▼
-┌─────────────────────┐
-│ AI Interpretation    │  LLM with anti-hallucination prompts
-│ Per-type system msgs │  12 composable prompt templates
-└─────────┬───────────┘
-          ▼
-┌─────────────────────┐
-│ Confidence Scorer    │  0-100 numeric score
-│ ≥90 = confirmed      │  +proof, +impact, +controls
-│ ≥60 = likely          │  -baseline_only, -same_behavior
-│ <60 = rejected        │  Breakdown visible in UI
-└─────────┬───────────┘
-          ▼
-┌─────────────────────┐
-│ Validation Judge     │  Final verdict authority
-│ approve / reject     │  Records for adaptive learning
-└─────────────────────┘
-```
-
-### Anti-Hallucination System Prompts
-
-12 composable prompts applied across 7 task contexts:
- `anti_hallucination` - Core truthfulness directives
- `proof_of_execution` - Require concrete evidence
- `negative_controls` - Compare with benign requests
- `anti_severity_inflation` - Accurate severity ratings
- `access_control_intelligence` - BOLA/BFLA data comparison methodology
-
-### Access Control Adaptive Learning
-
- Records TP/FP outcomes per domain for BOLA/BFLA/IDOR
- 9 default response patterns, 6 known FP patterns (WSO2, Keycloak, etc.)
- Historical FP rate influences future confidence scoring
-
---
-
-## Web GUI
-
-### Pages
-
-| Page | Route | Description |
-|------|-------|-------------|
-| **Dashboard** | `/` | Stats overview, severity distribution, recent activity feed |
-| **Auto Pentest** | `/auto` | One-click autonomous pentest with 3-stream live display |
-| **Vuln Lab** | `/vuln-lab` | Per-type vulnerability testing (100 types, 11 categories) |
-| **Terminal Agent** | `/terminal` | AI-powered interactive security chat + tool execution |
-| **Sandboxes** | `/sandboxes` | Real-time Docker container monitoring + management |
-| **AI Agent** | `/scan/new` | Manual scan creation with prompt selection |
-| **Scan Details** | `/scan/:id` | Findings with confidence badges, pause/resume/stop |
-| **Scheduler** | `/scheduler` | Cron/interval automated scan scheduling |
-| **Reports** | `/reports` | HTML/PDF/JSON report generation and viewing |
-| **Settings** | `/settings` | LLM providers, model routing, feature toggles |
-
-### Sandbox Dashboard
-
-Real-time monitoring of per-scan Kali containers:
- **Pool stats** - Active/max containers, Docker status, TTL
- **Capacity bar** - Visual utilization indicator
- **Per-container cards** - Name, scan link, uptime, installed tools, status
- **Actions** - Health check, destroy (with confirmation), cleanup expired/orphans
- **5-second auto-polling** for real-time updates
-
---
-
-## API Reference
-
-### Base URL
-
-```
-http://localhost:8000/api/v1
-```
-
-### Endpoints
-
-#### Scans
-
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| `POST` | `/scans` | Create new scan |
-| `GET` | `/scans` | List all scans |
-| `GET` | `/scans/{id}` | Get scan details |
-| `POST` | `/scans/{id}/start` | Start scan |
-| `POST` | `/scans/{id}/stop` | Stop scan |
-| `POST` | `/scans/{id}/pause` | Pause scan |
-| `POST` | `/scans/{id}/resume` | Resume scan |
-| `DELETE` | `/scans/{id}` | Delete scan |
-
-#### AI Agent
-
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| `POST` | `/agent/run` | Launch autonomous agent |
-| `GET` | `/agent/status/{id}` | Get agent status + findings |
-| `GET` | `/agent/by-scan/{scan_id}` | Get agent by scan ID |
-| `POST` | `/agent/stop/{id}` | Stop agent |
-| `POST` | `/agent/pause/{id}` | Pause agent |
-| `POST` | `/agent/resume/{id}` | Resume agent |
-| `GET` | `/agent/findings/{id}` | Get findings with details |
-| `GET` | `/agent/logs/{id}` | Get agent logs |
-
-#### Sandbox
-
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| `GET` | `/sandbox` | List containers + pool status |
-| `GET` | `/sandbox/{scan_id}` | Health check container |
-| `DELETE` | `/sandbox/{scan_id}` | Destroy container |
-| `POST` | `/sandbox/cleanup` | Remove expired containers |
-| `POST` | `/sandbox/cleanup-orphans` | Remove orphan containers |
-
-#### Scheduler
-
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| `GET` | `/scheduler` | List scheduled jobs |
-| `POST` | `/scheduler` | Create scheduled job |
-| `DELETE` | `/scheduler/{id}` | Delete job |
-| `POST` | `/scheduler/{id}/pause` | Pause job |
-| `POST` | `/scheduler/{id}/resume` | Resume job |
-
-#### Vulnerability Lab
-
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| `GET` | `/vuln-lab/types` | List 100 vuln types by category |
-| `POST` | `/vuln-lab/run` | Run per-type vulnerability test |
-| `GET` | `/vuln-lab/challenges` | List challenge runs |
-| `GET` | `/vuln-lab/stats` | Detection rate stats |
-
-#### Reports & Dashboard
-
-| Method | Endpoint | Description |
-|--------|----------|-------------|
-| `POST` | `/reports` | Generate report |
-| `POST` | `/reports/ai-generate` | AI-powered report |
-| `GET` | `/reports/{id}/view` | View HTML report |
-| `GET` | `/dashboard/stats` | Dashboard statistics |
-| `GET` | `/dashboard/activity-feed` | Recent activity |
-
-### WebSocket
-
-```
-ws://localhost:8000/ws/scan/{scan_id}
-```
-
-Events: `scan_started`, `progress_update`, `finding_discovered`, `scan_completed`, `scan_error`
-
-### API Docs
-
-Interactive docs available at:
- Swagger UI: `http://localhost:8000/api/docs`
- ReDoc: `http://localhost:8000/api/redoc`
-
---
-
-## Configuration
-
-### Environment Variables
-
-```bash
-# LLM API Keys (at least one required)
-ANTHROPIC_API_KEY=your-key
-OPENAI_API_KEY=your-key
-GEMINI_API_KEY=your-key
-
-# Local LLM (optional)
-OLLAMA_BASE_URL=http://localhost:11434
-LMSTUDIO_BASE_URL=http://localhost:1234
-OPENROUTER_API_KEY=your-key
-
-# Database
-DATABASE_URL=sqlite+aiosqlite:///./data/neurosploit.db
-
-# Server
-HOST=0.0.0.0
-PORT=8000
-DEBUG=false
-```
-
-### config/config.json
-
-```json
-{
-  "llm": {
-    "default_profile": "gemini_pro_default",
-    "profiles": { ... }
-  },
-  "agent_roles": {
-    "pentest_generalist": { "vuln_coverage": 100 },
-    "bug_bounty_hunter": { "vuln_coverage": 100 }
-  },
-  "sandbox": {
-    "mode": "per_scan",
-    "kali": {
-      "enabled": true,
-      "image": "neurosploit-kali:latest",
-      "max_concurrent": 5,
-      "container_ttl_minutes": 60
-    }
-  },
-  "mcp_servers": {
-    "neurosploit_tools": {
-      "transport": "stdio",
-      "command": "python3",
-      "args": ["-m", "core.mcp_server"]
-    }
-  }
-}
-```
-
---
-
-## Development
-
-### Backend
-
-```bash
-pip install -r requirements.txt
-uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000
-
-# API docs: http://localhost:8000/api/docs
-```
-
-### Frontend
-
-```bash
-cd frontend
-npm install
-npm run dev        # Dev server at http://localhost:5173
-npm run build      # Production build
-```
-
-### Build Kali Sandbox
-
-```bash
-./scripts/build-kali.sh --test    # Build + health check
-```
-
-### MCP Server
-
-```bash
-python3 -m core.mcp_server        # Starts stdio MCP server (12 tools)
-```
-
---
-
-## Security Notice
-
-**This tool is for authorized security testing only.**
-
- Only test systems you own or have explicit written permission to test
- Follow responsible disclosure practices
- Comply with all applicable laws and regulations
- Unauthorized access to computer systems is illegal
+See [`RELEASE.md`](RELEASE.md) for the full v3.3.0 changelog.

 ---

 ## License

-MIT License - See [LICENSE](LICENSE) for details.
-
---
-
-## Tech Stack
-
-| Layer | Technologies |
-|-------|-------------|
-| **Backend** | Python, FastAPI, SQLAlchemy, Pydantic, aiohttp |
-| **Frontend** | React 18, TypeScript, TailwindCSS, Vite |
-| **AI/LLM** | Anthropic Claude, OpenAI GPT, Google Gemini, Ollama, LMStudio, OpenRouter |
-| **Sandbox** | Docker, Kali Linux, ProjectDiscovery suite, Nmap, SQLMap, Nikto |
-| **Tools** | Nuclei, Naabu, httpx, Subfinder, Katana, FFuf, Gobuster, Dalfox |
-| **Infra** | Docker Compose, MCP Protocol, Playwright, APScheduler |
-
---
-
-**NeuroSploit v3** - *AI-Powered Autonomous Penetration Testing Platform*
+MIT.