- TUTORIAL 5.2 white-box: how source review works (context collection, agent selection, source→sink dataflow, file:line symbolic grounding, validation), examples and tips. - TUTORIAL 5.3 grey-box: code review leads → live exploitation flow, auth via creds.yaml, MCP, REPL repo+target = greybox. - README quick-start gains white-box / grey-box / host one-liners + tutorial link. - REPL /help shows the MODES line (black/white/grey/host) and Ctrl-O hint. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
20 KiB
NeuroSploit — Tutorial & User Guide (v3.5.1)
A complete, hands-on guide to installing, configuring and running NeuroSploit — the autonomous, multi-model penetration-testing harness.
⚠️ Authorized testing only. Every agent is instructed to stay in scope and never run destructive/DoS actions. You are responsible for having written permission for any target you point it at.
Table of contents
- Concepts in 60 seconds
- Install
- Authentication: API key vs subscription
- Choosing models
- Engagement modes
- The interactive REPL
- Mission Control TUI
- Credentials (
creds.yaml) - Steering the tests (focus & instructions)
- Outputs, reports & artifacts
- Per-project memory & resume
- How it decides: POMDP, grounding, chaining
- The agent library
- Playwright MCP & extra tools
- Tips, tuning & troubleshooting
- Command & flag reference
1. Concepts in 60 seconds
You give NeuroSploit a target (URL, repo, app, or host/IP). It:
- Recons the target with real tools (curl/nmap/…).
- Intelligently selects only the agents whose preconditions match the recon (it does not blindly run all 329).
- Exploits in parallel — each agent works in a ReAct loop and must prove its claim with a tool receipt (raw output).
- Validates every candidate by cross-model voting (a different model adjudicates) and a grounding gate (no claim without a receipt).
- Chains confirmed findings into deeper impact (SQLi→RCE→LPE, SSRF→cloud…).
- Reports — HTML + Typst PDF + JSON/MD, with an attack-graph / kill-chain mapped to OWASP / CWE / MITRE ATT&CK.
It runs on a pool of LLMs you choose, authenticated either by API key or your local subscription (Claude Code / Codex / Gemini / Grok CLI).
2. Install
One-liner
Linux / macOS (x64 & arm64):
curl -fsSL https://raw.githubusercontent.com/JoasASantos/NeuroSploit/main/setup.sh | bash
Windows (PowerShell, x64 & arm64):
irm https://raw.githubusercontent.com/JoasASantos/NeuroSploit/main/install.ps1 | iex
The installer detects your OS/arch, installs the Rust toolchain if needed, clones
the repo, builds the release binary and puts neurosploit on your PATH. Re-run it
any time to update. Env knobs: NEUROSPLOIT_REF (branch/tag), NEUROSPLOIT_DIR,
PREFIX.
Manual build
git clone https://github.com/JoasASantos/NeuroSploit
cd NeuroSploit/neurosploit-rs
cargo build --release # → target/release/neurosploit
Recommended runtime
Run inside Kali Linux (or the Docker image) so the offensive tools the agents use are already present:
docker run -it --rm kalilinux/kali-rolling
apt update && apt install -y curl nmap ffuf nodejs npm
# optional: cargo install rustscan ; cargo install typst-cli
Agents degrade gracefully: if rustscan is absent they use nmap; if neither,
curl. With Playwright MCP present they drive a real browser; otherwise curl.
Verify
neurosploit --version # neurosploit 3.5.1
neurosploit agents # {"vulns":196,...,"chains":12,"total":329}
neurosploit models # all providers & models
3. Authentication: API key vs subscription
You pick per run. They're independent.
A) Via API key
Export the key for each provider you'll use, then run without --subscription:
export ANTHROPIC_API_KEY=sk-ant-... # anthropic:claude-*
export OPENAI_API_KEY=sk-... # openai:gpt-*
export GEMINI_API_KEY=AIza... # gemini:gemini-*
export XAI_API_KEY=xai-... # xai:grok-*
export NVIDIA_NIM_API_KEY=nvapi-... # nvidia_nim:*
export DEEPSEEK_API_KEY=... # deepseek:*
export MISTRAL_API_KEY=... # mistral:*
export DASHSCOPE_API_KEY=... # qwen:* (Alibaba DashScope)
export GROQ_API_KEY=... # groq:*
export TOGETHER_API_KEY=... # together:*
export OPENROUTER_API_KEY=... # openrouter:*
# ollama: no key (local)
# LiteLLM proxy: point at your gateway and route any model through it:
export LITELLM_BASE_URL=http://localhost:4000/v1 # your LiteLLM proxy
export LITELLM_API_KEY=sk-... # litellm:<model the proxy routes>
neurosploit run http://testphp.vulnweb.com/ --model anthropic:claude-opus-4-8 --vote-n 3 -v
Or put them in a .env and source it (cp .env.example .env; edit; set -a; . ./.env; set +a).
In the REPL you can also run /key anthropic sk-ant-... (it lists which providers
your selected models need).
B) Via subscription (no API key)
Install and log into a local agentic CLI, then pass --subscription:
--model prefix |
CLI | Login |
|---|---|---|
anthropic: |
Claude Code (claude) |
claude → /login |
openai: |
Codex (codex) |
codex login |
gemini: |
Gemini (gemini) |
gemini login |
xai: |
Grok (grok) |
grok login |
neurosploit run http://testphp.vulnweb.com/ --subscription --model anthropic:claude-opus-4-8 --mcp -v
4. Choosing models
--model provider:model is repeatable. The first model is the primary
(does recon & exploitation); the rest fail over if it errors and form the
validator voting jury (a different model adjudicates each finding → fewer false
positives).
# single model
--model anthropic:claude-opus-4-8
# voting panel (Opus finds, GPT-5.5 + Gemini-3 adjudicate)
--model anthropic:claude-opus-4-8 --model openai:gpt-5.5 --model gemini:gemini-3-pro
A built-in router sends fast/cheap models to recon & triage and the strongest
to exploitation, to save tokens. See neurosploit models for the full list
(Claude 4.x, GPT-5.x incl. Codex, Gemini 3/2.5, Grok, NVIDIA NIM, DeepSeek,
Mistral, Qwen, Groq, Together, OpenRouter, Ollama).
5. Engagement modes
5.1 Black-box (URL)
neurosploit run http://testphp.vulnweb.com/ \
--subscription --model anthropic:claude-opus-4-8 \
--focus "injection and broken access control" --mcp -v
5.2 White-box (source repo)
Reviews a local code repository with the 78 source-review (SAST) agents: SQLi, command injection, SSRF, XSS, path traversal, insecure deserialization, hardcoded secrets, weak crypto, auth/IDOR, XXE, SSTI, language-specific sinks (PHP/Java/.NET/Go/Node/Python), and more.
# 1. clone or point at the code you own
git clone https://github.com/digininja/DVWA /tmp/DVWA
# 2. review it (subscription or --model with an API key)
neurosploit whitebox /tmp/DVWA --subscription --model anthropic:claude-opus-4-8 -v
# focus a specific class, cap agents, raise the voting bar:
neurosploit whitebox /tmp/DVWA --focus "injection and access control" \
--max-agents 8 --vote-n 2 --model openai:gpt-5.5
How it works
- Collects source context — walks the repo (skips
.git/node_modules/target/ vendor), reads supported source files into a bounded review context. - Selects code agents for the languages/frameworks it sees.
- Each agent traces source → sink dataflow and must quote the exact
vulnerable lines as
file:line. - Grounding is symbolic: a finding is only kept if its
file:line/ quoted code actually exists in the reviewed source (no hallucinated locations). - Validated by cross-model voting, then reported with the code reference, CWE/OWASP, PoC and remediation.
Tips
- No
--mcpis used in white-box (there's no live app to browse). - For huge repos, narrow with
--focusor point at a subdirectory. - Each finding's
endpointfield is thefile:line;evidencequotes the code;payloadis the PoC / vulnerable snippet — view it all with/finding.
5.3 Grey-box (code + live app)
The strongest mode: review the source and exploit the running app together. Code-review findings become leads that the live agents confirm against the deployed application (so a SQLi spotted in code is proven exploitable on the running endpoint).
# code repo + the URL where that code is actually running
neurosploit greybox /tmp/DVWA --url http://localhost:8080/ \
--creds creds.yaml --focus "auth and IDOR" \
--subscription --model anthropic:claude-opus-4-8 --mcp -v
How it works
- Recon the live app (
--url). - Review the source with the code agents → produces a list of leads (suspected vulns with file:line).
- Live exploitation runs with those leads injected as context, so agents go straight for the proven-in-code weaknesses and prove them on the live app (empirical receipt: real request/response).
- Validate (cross-model) → chain → report.
Notes
- Pass
--creds creds.yamlso agents test authenticated flows (login / JWT / cookie) — essential for IDOR/BOLA/auth findings. --mcpenables the Playwright browser for client-side proof (e.g. XSS firing).- In the REPL: set both
/repo <path>and/target <url>→ grey-box is auto-selected;/showdisplaysmode: greybox (code + live).
5.4 Host / Infra (Linux / Windows / AD)
Target an IP/host with SSH or Windows/AD credentials from creds.yaml:
neurosploit host 10.0.0.10 --creds creds.yaml \
--focus "privilege escalation and AD" --subscription --model anthropic:claude-opus-4-8 -v
Runs infra agents: port/service scan, SMB enum, Linux privesc/sudo/cron/SSH, Windows privesc/SMB-signing/WinRM, and AD kerberoasting / AS-REP / ACL abuse / DCSync / default-creds.
6. The interactive REPL
Run with no arguments for a persistent session:
neurosploit
A context bar shows model auth · cwd · mode▸target. Key commands:
/model [a:b,..] set models (no arg → arrow-key multi-select)
/key [prov key] configure API keys for your models (no arg → guided)
/sub on|off use subscription login instead of API key
/target <url> black-box target /repo <path> add a repo (repo+target = greybox)
/auth <value> send an auth header /creds <file> load creds.yaml
/focus <text> steer the tests (or just type the instruction)
@path @dir @f:1-20 attach a file/folder/line-range to context (Tab → menu)
/mcp on|off /offline on|off /votes <n> /agents <n> /theme color|mono
/run launch the engagement
/runs /results [n] /report [n] /status [n]
/diff what changed vs the previous run
/retest [n] re-verify a past run's findings
/quit
Line editing: ↑/↓ history, Tab completes commands & @paths, Ctrl-A/E/K,
end a line with \ for multiline.
Runs are non-blocking
/run launches the engagement in the background and immediately returns the
prompt — you keep typing while it streams live above the prompt. While it runs:
/status— live phase, a progress bar (agents done / total), elapsed time, token/cost and the possible findings so far./stop— gracefully stop (a report is still generated from partial results).- Findings are color-coded by severity (Critical = red … Info = grey), and a confirmed vote shows green ✓.
- When it finishes you get
◀ run #n done — N validated finding(s) · /results n · /report n.
(When stdin is piped/non-interactive, /run falls back to blocking mode.)
7. Mission Control TUI
A live dashboard with concurrent panels and a composer you can type in while the run streams:
neurosploit tui http://testphp.vulnweb.com/ --subscription --model anthropic:claude-opus-4-8 --mcp
# greybox: add --repo /path/to/repo
- Header: target · mode · model · phase · elapsed · 🪙 tokens/cost · findings · ⏸
- Activity feed (color-coded), Findings panel (live), Targets map
- Composer (non-blocking):
summary(partial summary),pause(graceful stop),errors(filter),clear, or a free-text note - Esc / Ctrl-C → graceful stop; the report is generated on exit
8. Credentials (creds.yaml)
One file covers web auth, SSH and Windows/AD. See neurosploit-rs/creds.example.yaml.
# --- web auth (pick one) ---
jwt: eyJhbGciOi... # → Authorization: Bearer <jwt>
# header: "X-Api-Key: abc123"
# cookie: "session=deadbeef"
# --- OR an automated login the harness performs to capture a live session ---
login:
url: http://localhost:8080/login
method: POST
username_field: username
password_field: password
username: admin
password: password
success: Logout # text shown on a successful login
# --- Linux host (SSH) ---
ssh:
host: 10.0.0.5
port: 22
user: ubuntu
password: s3cret # or:
key: /home/op/id_ed25519
# --- Windows / Active Directory ---
windows:
host: 10.0.0.10
domain: CORP
user: jdoe
password: Winter2026! # or pass-the-hash:
hash: aad3b435b51404eeaad3b435b51404ee:NThashhere
jwt/header/cookieare used as-is.- A
login:block is executed (real HTTP) to capture a live session cookie/token; if it fails, agents are told to authenticate themselves. ssh:/windows:tell host agents how to authenticate.
Use with --creds creds.yaml on run / greybox / host, or /creds in the REPL.
9. Steering the tests
Tell the harness what to prioritise — it biases both agent selection and execution:
--focus "find injection and broken access control"
In the REPL just type the instruction (no slash) or use /focus. Attach scope or a
stack trace with @file, @folder, or @file:10-40.
10. Outputs, reports & artifacts
Every run writes a self-contained folder runs/ns-<ts>-<target>/:
| File | Contents |
|---|---|
status.json |
running → complete/stopped with a summary |
recon.json / recon.md |
mapped attack surface |
exploitation.md |
raw per-agent transcript (the receipts) |
findings.json / findings.md |
validated findings (reuse by other tools/AIs) |
report.html |
HTML report + Mermaid attack-graph / kill-chain |
report.typ / report.pdf |
Typst source + compiled PDF (if typst installed) |
The CLI prints a severity summary, an ASCII kill-chain, and the token/cost total.
11. Per-project memory & resume
When you launch the REPL in a project directory, NeuroSploit creates
<cwd>/.neurosploit/:
.neurosploit/
session.json # your config (models, target, repo, auth, focus)
runs.json # run history (for /runs, /results, /report, /diff, /retest)
history.txt # command history (↑/↓)
Close and reopen in the same folder → it resumes automatically
(↻ resumed project session). No database needed — it's structured state.
12. How it decides
NeuroSploit treats the target as partially observable (a POMDP):
- Belief world model — a property graph whose nodes (host/service/vuln/ exploit/credential) carry probabilities, updated by observations.
- Value-of-information — "scan more vs exploit now" falls out of belief entropy: when a node's belief is diffuse, recon is worth more than exploiting.
- Anti-hallucination gate (
may_assert) — the agent may not claim exploitability while the belief is diffuse; it must observe more first. - Grounding — no claim without a tool receipt: empirical for black-box
(real HTTP/OOB/error output), symbolic (
file:line) for white-box. Ungrounded claims are demoted and flagged. - Chaining — confirmed findings are chained into deeper impact, each stage proven before advancing.
White-box collapses the POMDP toward a near-deterministic MDP (the world model is built from SAST/dataflow), so uncertainty becomes path reachability, not state.
13. The agent library
agents_md/ holds 329 markdown agents in categories:
| Category | Dir | Count | Purpose |
|---|---|---|---|
| Vulnerability specialists | vulns/ |
196 | exploit a specific class |
| Recon | recon/ |
12 | information gathering |
| Code (SAST) | code/ |
78 | white-box source review |
| Infra | infra/ |
14 | Linux / Windows / AD host testing |
| Chains | chains/ |
12 | multi-stage exploitation chains |
| Meta | meta/ |
17 | orchestrator, validator, scorers, reporter, RL |
Each agent is a self-contained playbook (## User Prompt methodology + ## System Prompt strict anti-false-positive rules). Add your own by dropping a .md into
the matching folder — it's picked up automatically.
14. Playwright MCP & extra tools
--mcp (subscription path) drives a real Playwright browser for JS-heavy pages
and to prove client-side issues (XSS firing, DOM, screenshots). It's
auto-provisioned via npx when available; backends that don't support MCP fall
back to curl. You can add more MCP servers by placing a mcp.servers.json
({ "mcpServers": { ... } }) in the project root — they're merged into the run.
15. Tips, tuning & troubleshooting
- No findings on a live target? It may be unreachable from your network, or the
app is genuinely static — the harness refuses to fabricate. Check
recon.md. - Quick smoke test:
neurosploit run http://x --offlineexercises the pipeline without calling any model. - Cost control: start with
--max-agents 4 --vote-n 1; scale up later. The router already routes cheap models to recon. - Rate limits (subscription): the harness retries with backoff and caps parallel CLI processes; if you hit your 5-hour quota, add more models to the panel or switch to an API key.
- Run as root: the harness sets
IS_SANDBOX=1so Claude Code's autonomy works. - Stuck? Ctrl-C once for a graceful stop (→ keep/discard report); twice aborts.
16. Command & flag reference
neurosploit # interactive REPL (resumes per project)
neurosploit run <url> # black-box
neurosploit whitebox <repo> # white-box source review
neurosploit greybox <repo> --url <app> # code + live
neurosploit host <ip> # Linux/Windows/AD (with --creds)
neurosploit tui <url> # Mission Control TUI (--repo for greybox)
neurosploit agents # library counts
neurosploit models # providers & models
neurosploit --help # full help
Common flags (run / greybox / host / tui):
--model provider:model repeatable; 1st = primary, rest = failover + voting jury
--subscription use local CLI login instead of an API key
--mcp enable Playwright MCP browser (subscription path)
--creds <file.yaml> jwt/header/cookie/login + ssh/windows credentials
--focus "<text>" steer agent selection & execution
--vote-n <n> validator votes per finding (default 3)
--max-agents <n> cap agents (0 = all matching)
--offline pipeline self-test, no model calls
-v, --verbose log each agent, recon, votes
NeuroSploit — by Joas A Santos & Red Team Leaders. MIT licensed. Authorized testing only.