Chaining:
- agents_md/chains/ (12 multi-stage exploitation playbooks): SQLi→RCE→LPE,
SSRF→AWS-creds, SSRF→RCE, upload→RCE, upload→LFI→RCE→LPE, XSS→ATO, IDOR→ATO,
SSTI→RCE→cloud, default-creds→domain, deserialization→RCE, exposed-git→RCE,
subdomain-takeover→trusted-abuse. Each stage proven by a tool receipt before
advancing; reports chains_from edges.
- Loaded as a `chains` category (→ 329 agents). chain_round now injects the chain
recipes as a menu so the LLM applies proven multi-stage paths.
Persistence (no DB — structured state):
- Per-project `<cwd>/.neurosploit/` holding session.json (config), runs.json
(history), history.txt (readline). REPL resumes target/repo/auth/focus/models
on reopen; saves on /run and /quit.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Live findings feed: each candidate is surfaced (✦ possible finding [sev] title
@ endpoint) the moment an agent returns it, not only at the end.
- 🔔 notifications in the feed: evidence saved, phase complete (with severity
breakdown = automatic partial summary). Renderer styles notify/finding tags.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Partial observability is now first-class:
- belief.rs — property-graph world model; nodes (host/service/vuln/exploit/cred)
carry a probability, not a boolean. Bayesian observation updates; per-node
Shannon entropy; mean-uncertainty + recon-frontier. Black-box = diffuse priors
that sharpen with observation; white-box collapses toward deterministic (MDP).
- pomdp.rs — value_of_information(), decide() (recon vs exploit falls out of
belief entropy), and may_assert() — the mathematical anti-hallucination gate:
no exploitability claim while the belief is diffuse (high entropy) → observe first.
- grounding.rs — verification engine, hard rule "no claim without a tool receipt":
empirical grounding for black-box (raw HTTP/OOB/error markers), symbolic for
white-box (file:line into reviewed source). Ungrounded claims demoted + flagged
receipt_missing (feeds future reward shaping).
- pipeline.finish(): grounding gate before reporting + belief-uncertainty readout.
- bump 3.5.0 → 3.5.1; README documents the v3.5.1 belief/grounding architecture
and the infra/bandit/reward roadmap.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Streamed Claude events now tagged with the agent label (@name) so every
command/tool/file is attributable to the agent that ran it.
- Token/cost telemetry: parse usage from the stream-json result event; feed shows
per-call in/out/cost and a running total in the run summary.
- Ctrl-C during a run no longer hard-kills: it cancels cooperatively (no new
agents launch, in-flight bounded), then asks "generate report from partial
results? [Y/n]" — discard removes the run dir. Second Ctrl-C aborts.
- pool: cancel handle + is_cancelled; one()/complete_routed/chat_cli carry a label.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Finding enriched with owasp / mitre / kill-chain stage / exploitability /
business_impact / chains_from (attack-path edges).
- attack_graph module: derive OWASP Top 10 + MITRE ATT&CK technique + kill-chain
stage from CWE (heuristic, no extra model call); render a Mermaid attack-path
flowchart (findings grouped by stage, explicit + implicit edges) and an ASCII
kill chain for the REPL.
- enrich() runs in finish() for every engagement.
- HTML report gains an "Attack Path & Kill Chain" section (Mermaid via CDN, dark)
plus a stage/sev/OWASP/MITRE/exploitability table.
- REPL print_findings shows the ASCII kill-chain + severity summary after a run.
- models: add GPT-5.5, GPT-5.4, GPT-5.4-mini, GPT-5.3-codex, GPT-5.2.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harness:
- ModelPool gains a progress channel (set_progress); chat_cli forwards it.
- New chat_claude_stream: drives Claude Code with --output-format stream-json and
parses the event stream live — assistant text, and tool_use blocks categorized
into tagged events (exec/danger command, read/edit file, net request/browser,
grep/glob tool). 900s bound; clear error surfacing.
- Wired set_progress into run / whitebox / greybox.
REPL renderer (render_line):
- Tagged events render as the conversation feed: tool/command/network as compact
CARDS (tool-runner visual), files/edits/AI text/states as iconized lines.
- Clear "what the AI is doing" states: reconning, planning, testing, validating,
chaining, report, complete — plus a ⚠ DANGEROUS marker for risky commands.
- Untagged harness lines mapped to the same state vocabulary.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harness:
- Exploit-chaining round: after validation, chain confirmed findings into deeper
impact (SSRF→metadata, SQLi→dump→reuse, IDOR→ATO, file-read→secrets→RCE),
validate the new findings, merge. Wired into black-box and greybox.
- Latest top models surfaced: claude-opus-4-8, gpt-5.1/gpt-5.1-codex, gemini-3-pro.
REPL:
- Real line editing via rustyline: ↑/↓ command-history recall, Ctrl-A/E/K, paste;
Ctrl-C cancels the line, Ctrl-D exits. Command history persists to
data/repl_history.txt. Graceful plain-stdin fallback when not a TTY.
- /model with no arg → arrow-key multi-select (dialoguer); with arg accepts any
provider:model names.
- /key is model-aware: lists the providers your selected models need (set/missing)
and prompts for the missing keys; /key <prov> <key> still works.
- Run history persists to data/repl_runs.json and reloads across sessions
(/runs lists past + current; /results /report /status by run number).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- RunOutput exposes `workdir` so the session can locate reports.
- Session now records every run (RunRecord: id, mode, target, workdir, findings).
- New commands:
/runs list runs done this session (mode, target, severity counts)
/results [n] show findings of run n (default last), severity-sorted
/report [n] open the PDF/HTML report (open/xdg-open)
/status [n] print the run's status.json
/offline on|off pipeline self-test toggle (no model calls)
- Each /run prints "saved as run #n" with the quick commands.
- Verified offline: run → /runs → /results → /status all work.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- harness/creds::login(): performs the real HTTP login (POST/GET form), captures
a session Cookie from Set-Cookie or a Bearer token from the JSON body, with a
soft success check (no hard fail on 302). Redirects not followed so Set-Cookie
is visible.
- apply_creds is now async: direct material (jwt/header/cookie) used as-is; a
`login:` flow is EXECUTED to obtain a live session; on failure, falls back to
instructing the agents to log in themselves.
- --creds + --focus added to `run` (authenticated black-box) too.
- Verified live against a local mock: POST /login → 302 + Set-Cookie captured as
the auth header used on subsequent requests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- New GREYBOX mode: review a repo's source AND exploit the running app in one
pipeline — code-review findings become LEADS injected into live exploitation.
CLI: `neurosploit greybox <repo> --url <app> [--creds creds.yaml] [--focus ...]`
REPL: set both /repo and /target → greybox auto-selected.
- Credentials (harness/src/creds.rs, dependency-free YAML subset): jwt / header /
cookie, or an automated `login:` flow. Derives an auth header and/or a
"authenticate first via curl" directive injected into prompts so agents test
authenticated. --creds flag + /creds command + creds.example.yaml.
- RunConfig gains `repo`; run_engagement refactored to a Mode enum (Black/White/Grey).
- Verified offline: greybox loads creds, combines repo+URL, runs pipeline, writes report.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- New persistent interactive session (app/src/repl.rs), launched when run with no args:
banner, model selection, API-key config (/key) or subscription (/sub), then a live
session to set /target, /repo, /auth, and free-text /focus instructions (or just type
them) that STEER which agents run and how.
- Slash-commands: /model /providers /key /sub /target /repo /auth /focus /mcp /votes
/agents /show /run /quit (+ bare text = focus).
- RunConfig gains `instructions` and `auth`:
* instructions bias both LLM agent-selection and the heuristic (focus keywords →
injection/access-control/etc. agents get a strong boost)
* operator directives (focus + auth) injected into recon and exploit prompts so agents
test as an authenticated user and prioritise the requested vuln classes
- bump 3.4.1 → 3.5.0 (CLI, harness, reports, credits)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Task-based model ROUTER (recon/select prefer a fast model; exploit prefers primary; validate uses a different model than the finder)
- ReAct doctrine injected into exploit prompts (Thought→Action→Observation, token-efficient)
- Dedup: unique agents per run + findings deduped by CWE/endpoint/title (highest confidence kept)
- Token economy: recon blob capped for selector + per-agent context
- Configurable MCP: merge user mcp.servers.json into the pipeline's .mcp.json
- +54 white-box/code-analysis agents (NoSQLi, LDAP/XPath, JWT-none, Java/.NET/PHP/Go/Node/Python
specifics, SSTI, ReDoS, deserialization, etc.) → 303 agents total (78 code)
- Credits: Joas A Santos & Red Team Leaders (CLI banner, interactive header, HTML+Typst report)
- README: GitHub stars/forks badges, 60-second quick start, full API config steps, intuitive layout
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Root cause of empty results: models emit findings with confidence as a string
('High') or cvss as a number, but the Finding struct typed confidence as f64, so
serde failed the ENTIRE array on any mismatch -> 0 findings every run.
extract_findings now parses into serde_json::Value and coerces each field
(string/number/word), normalizes severity, and accepts qualitative confidence
(High->0.9 etc). Verified live: whitebox on a vulnerable sample now yields
validated findings (IDOR confirmed by vote).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 'recon failed (claude subscription CLI failed: )' was a transient CLI failure
(rate limit / cold start) reported with a blank message and no retry.
- chat_cli: on non-zero exit, surface exit code + stdout (CLI writes the real
reason there, not stderr); treat empty output as an error
- pool.one(): retry up to 3x with backoff for transient failures (both
subscription and API paths)
- with_auth: cap concurrency to 3 on the subscription path — spawning many
parallel CLI processes itself trips provider rate limits
Verified: live subscription run recovers and completes recon → select → exploit
→ vote → artifacts.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>