Distilled from reviewing real AI-pentest output that kept stopping at "exposed" instead of "exploited". Pure-additive, back-compatible. Behavior (injected into black/grey/chain exploit prompts via DEPTH_DOCTRINE): - Exposed → exploited: any info-disclosure / exposed service/WSDL / leaked credential|token / reachable dev host MUST be used before it's a finding; otherwise it's a lead, not a confirmed High/Critical. - Chain across modules: reuse obtained session/JWT/cookie/credential and pivot to IDOR/privesc/exfil; report the chain, not isolated parts. - Decode & fingerprint → CVE; audit tokens (alg-confusion/none/kid/JWKS, weak HS256 secret cracking, lifecycle). Deterministic post-pass (new crates/harness/src/hygiene.rs, wired into finish()): - calibrate severity to PROVEN impact — unproven High/Critical (hedged, no payload, thin evidence) capped to Medium and re-titled "(potential)"; - depth_audit — flag exposures on a host with no real exploit; - hygiene_summary — advise consolidating hygiene classes repeated across assets. Unit tests cover calibration + depth audit. 5 new doctrine meta-agents (scripts/build_methodology_v352.py → agents_md/meta/): exploit_depth_doctrine, finding_chainer, artifact_decoder, token_auditor, report_calibrator (meta 17→22, total 343→348). Version bumped 3.5.1 → 3.5.2 across crates/app/installers/docs; RELEASE/README updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
🧠 NeuroSploit v3.5.2
Autonomous, multi-model penetration-testing harness — Rust, CLI-only.
by Joas A Santos & Red Team Leaders
⭐ If this is useful, star the repo — it helps a lot.
📖 New here? Read the full Tutorial & User Guide → — every mode, flag, config and example explained.
🆕 New in v3.5.2 — Exploitation Depth & Report Hygiene: a DEPTH doctrine makes the engine use what it finds (exposed → exploited), chain findings across modules, decode/fingerprint artifacts → CVEs, and audit tokens (JWT alg-confusion / weak HS256 secrets). A deterministic post-pass calibrates severity to proven impact and consolidates duplicated hygiene findings. See RELEASE.md.
NeuroSploit turns a URL, a source repository, a running app, or a host/IP into
an autonomous security engagement. A Rust harness (tokio) drives a pool of
LLMs — via API key or local subscription (Claude Code / Codex / Gemini /
Grok) — recons the target, intelligently selects only the agents that match the
discovered surface, runs them in parallel, chains findings into deeper
impact, and validates every claim by cross-model voting + tool-receipt
grounding before reporting. It ships 329 markdown agents and a Mission
Control TUI.
Engagement modes
| Mode | Command | What it does |
|---|---|---|
| Black-box | neurosploit run <url> |
recon → select → exploit → vote → report |
| White-box | neurosploit whitebox <repo> |
source/SAST review (file:line evidence) |
| Grey-box | neurosploit greybox <repo> --url <app> |
code review + live exploitation together |
| Host/Infra | neurosploit host <ip> --creds creds.yaml |
Linux / Windows / Active Directory testing |
| Mission Control | neurosploit tui <url> |
live TUI panels + composer during the run |
| Interactive | neurosploit |
persistent REPL session (resumes per project) |
Highlights
- 🧠 POMDP belief + value-of-information — the target is partially observable,
so findings aren't booleans: a property-graph belief carries probabilities,
and "scan more vs exploit now" falls out of belief entropy. The
may_assertgate is a mathematical anti-hallucination rule (don't claim exploitability while the belief is diffuse). - 🧾 Grounding — hard rule: no claim without a tool receipt (raw tool
output, not paraphrase). Empirical for black-box, symbolic (
file:line) for white-box; ungrounded claims are demoted. - 🔗 Attack chaining — 12 multi-stage chain agents (SQLi→RCE→LPE, SSRF→AWS creds, upload→LFI→RCE→LPE, default-creds→domain, …); each stage proven before advancing.
- 🗺️ Attack graph & kill chain — findings mapped to OWASP / CWE / MITRE ATT&CK / stage; rendered as a Mermaid graph in the report.
- ✅ Cross-model validation — a different model adjudicates each finding; RL-weighted, recon-aware agent selection.
- 🛰️ Mission Control TUI — live header/feed/findings/targets panels + a
composer you can type in while the run streams (
summary,pause, …). - 💾 Per-project memory —
<cwd>/.neurosploit/keeps session, run history and command history; the REPL resumes on reopen. No database required. - 🪙 Token/cost telemetry, per-agent attribution, graceful Ctrl-C → report or discard, Typst/HTML/JSON/MD reports.
This is the slim, Rust-only distribution (
neurosploit-rs/+agents_md/). The earlier Python engine and web GUIs live on the olderv3.4.0branch.
📦 Install (one line)
Linux / macOS (x64 & arm64):
curl -fsSL https://raw.githubusercontent.com/JoasASantos/NeuroSploit/main/setup.sh | bash
Windows (PowerShell, x64 & arm64):
irm https://raw.githubusercontent.com/JoasASantos/NeuroSploit/main/install.ps1 | iex
Supported platforms
| OS | x64 | arm64 |
|---|---|---|
| Linux (Kali recommended) | ✅ | ✅ |
| macOS | ✅ | ✅ (Apple Silicon) |
| Windows | ✅ | ✅ |
Pure Rust + stdlib, so it builds natively everywhere a stable Rust toolchain runs.
The installer auto-detects OS/arch and installs Rust if missing. On native Windows
use install.ps1; under WSL2 / Git Bash the setup.sh one-liner also works.
The installer auto-installs Rust if needed, clones the repo to ~/.neurosploit,
builds the release binary, and links neurosploit into ~/.local/bin. Re-run it
any time to update. Tweak with env vars: NEUROSPLOIT_REF (branch/tag),
NEUROSPLOIT_DIR, PREFIX.
Prefer to build by hand?
git clone https://github.com/JoasASantos/NeuroSploit && cd NeuroSploit/neurosploit-rs
cargo build --release # → target/release/neurosploit
⚡ Quick start (60 seconds)
# easiest path — just run it; the interactive session asks everything:
neurosploit
# or one-liner (subscription login, no API key needed):
neurosploit run http://testphp.vulnweb.com/ --subscription --model anthropic:claude-opus-4-8 -v
# white-box — review a source repository (SAST agents, file:line evidence):
git clone https://github.com/digininja/DVWA /tmp/DVWA
neurosploit whitebox /tmp/DVWA --subscription --model anthropic:claude-opus-4-8 -v
# grey-box — review the code AND exploit the running app together:
neurosploit greybox /tmp/DVWA --url http://localhost:8080/ --creds creds.yaml \
--subscription --model anthropic:claude-opus-4-8 --mcp -v
# host / infra — Linux / Windows / Active Directory (SSH/Win creds in creds.yaml):
neurosploit host 10.0.0.10 --creds creds.yaml --subscription --model anthropic:claude-opus-4-8 -v
# 🛰 Mission Control TUI — live panels (header/feed/findings/targets) + a composer
# you can type in WHILE the run streams (summary · pause · errors · notes):
neurosploit tui http://testphp.vulnweb.com/ --subscription --model anthropic:claude-opus-4-8 --mcp
Full step-by-step for every mode (black/white/grey/host) is in TUTORIAL.md.
No login? Use an API key instead — see Authentication.
Build
cd neurosploit-rs
cargo build --release # → target/release/neurosploit
Requires a Rust toolchain (rustup). Recommended: run on Kali Linux (or the
Kali Docker image) so the offensive tools the agents use are already present:
docker run -it --rm kalilinux/kali-rolling
apt update && apt install -y curl nmap ffuf nodejs npm
# rustscan (faster port scan): cargo install rustscan (or grab a release from GitHub)
The agents degrade gracefully: if rustscan isn't installed they use nmap; if
neither, they probe with curl. If a Playwright MCP browser is available they use
it for JS-heavy pages, otherwise they fall back to curl.
Usage
Run with no arguments for an interactive wizard:
./target/release/neurosploit
Or drive it directly:
# Black-box — subscription (no API key), Opus, browser via Playwright if present, verbose
./target/release/neurosploit run http://testphp.vulnweb.com/ \
--subscription --model anthropic:claude-opus-4-8 --mcp -v
# Black-box — API keys, multi-model voting panel (1st finds, others adjudicate)
./target/release/neurosploit run http://testphp.vulnweb.com/ \
--model anthropic:claude-opus-4-8 --model openai:gpt-5.1 --vote-n 3
# White-box — clone a vulnerable app and review its source
git clone https://github.com/digininja/DVWA /tmp/DVWA
./target/release/neurosploit whitebox /tmp/DVWA \
--subscription --model anthropic:claude-opus-4-8 -v
# Offline pipeline self-test (no keys/login needed)
./target/release/neurosploit run http://testphp.vulnweb.com/ --offline
# Utilities
./target/release/neurosploit agents # library counts
./target/release/neurosploit models # providers & models
./target/release/neurosploit --help # full help with examples
Options (run / whitebox)
| Flag | Meaning |
|---|---|
--model provider:model |
Repeatable. First = primary; the rest fail over and form the voting jury. |
--subscription |
Use the local CLI login (Claude/Codex/Gemini/Grok) instead of an API key. |
--mcp |
Enable Playwright MCP (auto-provisioned via npx; backends without MCP use built-in tools). |
--vote-n N |
How many models must agree a finding is real (default 3 / 2 for whitebox). |
--max-agents N |
Cap agents run (0 = all matching the recon). |
--offline |
Exercise the full pipeline without calling any model. |
-v, --verbose |
Log each agent as it launches, recon, and votes. |
Authentication — run via API key or subscription
You can run NeuroSploit two ways. They're independent: pick per run.
1) Via API (provider API key)
Export the key(s) for the providers in your model panel, then run without
--subscription. Any OpenAI-compatible provider works.
# pick one or more, depending on the models you select
export ANTHROPIC_API_KEY=sk-ant-... # anthropic:claude-*
export OPENAI_API_KEY=sk-... # openai:gpt-*
export GEMINI_API_KEY=AIza... # gemini:gemini-*
export XAI_API_KEY=xai-... # xai:grok-*
export NVIDIA_NIM_API_KEY=nvapi-... # nvidia_nim:*
export DEEPSEEK_API_KEY=... # deepseek:*
export MISTRAL_API_KEY=... # mistral:*
export DASHSCOPE_API_KEY=... # qwen:* (Alibaba DashScope)
export GROQ_API_KEY=... # groq:*
export TOGETHER_API_KEY=... # together:*
export OPENROUTER_API_KEY=... # openrouter:*
# ollama needs no key (local)
# then run via API (note: NO --subscription)
./target/release/neurosploit run http://testphp.vulnweb.com/ \
--model anthropic:claude-opus-4-8 --vote-n 3 -v
# multi-provider voting panel via API (1st finds, the others adjudicate)
./target/release/neurosploit run http://testphp.vulnweb.com/ \
--model anthropic:claude-opus-4-8 --model openai:gpt-5.1 --model gemini:gemini-2.5-pro
Or put the keys in a .env and source it (cp .env.example .env; edit; set -a; . ./.env; set +a).
Provider → env var → endpoint (all OpenAI-compatible):
--model prefix |
Env var | Base URL |
|---|---|---|
anthropic: |
ANTHROPIC_API_KEY |
api.anthropic.com |
openai: |
OPENAI_API_KEY |
api.openai.com |
gemini: |
GEMINI_API_KEY |
generativelanguage.googleapis.com |
xai: |
XAI_API_KEY |
api.x.ai |
nvidia_nim: |
NVIDIA_NIM_API_KEY |
integrate.api.nvidia.com |
deepseek: |
DEEPSEEK_API_KEY |
api.deepseek.com |
mistral: |
MISTRAL_API_KEY |
api.mistral.ai |
qwen: |
DASHSCOPE_API_KEY |
dashscope-intl.aliyuncs.com |
groq: |
GROQ_API_KEY |
api.groq.com |
together: |
TOGETHER_API_KEY |
api.together.xyz |
openrouter: |
OPENROUTER_API_KEY |
openrouter.ai |
ollama: |
(none) | localhost:11434 |
Run ./target/release/neurosploit models for the full provider/model list.
2) Via subscription (no API key)
--subscription drives your local agentic-CLI login instead of an API key —
install and log into one of the CLIs first:
--model prefix |
CLI used | Login |
|---|---|---|
anthropic: |
claude (Claude Code) |
claude then /login |
openai: |
codex |
codex login |
gemini: |
gemini |
gemini login |
xai: |
grok |
grok login |
./target/release/neurosploit run http://testphp.vulnweb.com/ \
--subscription --model anthropic:claude-opus-4-8 --mcp -v
How it works
target ─▶ recon (curl/nmap/…) ─▶ INTELLIGENT agent selection (recon-aware)
─▶ parallel exploitation ─▶ cross-model validation vote
─▶ severity/score ─▶ report (HTML + Typst PDF) ─▶ RL reward update
Every run writes a self-contained folder runs/ns-<ts>-<target>/:
| File | Contents |
|---|---|
status.json |
running → complete with a summary |
recon.json / recon.md |
mapped attack surface |
exploitation.md |
raw per-agent transcript |
findings.json / findings.md |
validated findings (reuse by other tools/AIs) |
report.html, report.typ, report.pdf |
final report (PDF via the Typst engine) |
A reinforcement-learning reward store (data/rl_state_rs.json) biases agent
selection on future runs.
Agent library — agents_md/ (303)
| Category | Count | Purpose |
|---|---|---|
vulns/ |
196 | Exploit a specific vulnerability class |
recon/ |
12 | Information gathering / attack surface |
code/ |
78 | White-box source-code (SAST) review |
meta/ |
17 | Orchestrator, validator, scorers, reporter, RL |
Each agent is a self-contained markdown playbook (## User Prompt methodology +
## System Prompt strict anti-false-positive rules). Drop a new .md into the
matching folder and the harness picks it up.
Safety
For authorized testing only. Agents are instructed to stay in scope, never run destructive/DoS actions, and require proof-of-exploitation. You are responsible for having permission for any target.
Credits
Joas A Santos & Red Team Leaders.
License
MIT.