CRITICAL BUG: truncate()/source-context slices cut strings by BYTE, panicking on
a multibyte char (e.g. '—'). The panic crashed agent tasks → task.await returned
JoinError → unwrap_or_default() → empty RunOutput. Result: real confirmed findings
(win.ini traversal, HTML injection) were silently lost, workdir was empty, report
missing. Now all string truncation is char-safe (models.rs, pipeline.rs, repl.rs).
Also:
- Background runs: /run now runs in the BACKGROUND via rustyline's ExternalPrinter
— the REPL keeps accepting commands while the engagement streams live. New
/status (live phase + progress bar + findings) and /stop (graceful). Findings
persist to history + report on completion (finalize_run ensures workdir is set
even on abort, fixing "no report file in ").
- Progress bar: agents-done/total with %, shown in /status.
- Severity colors in the live feed (Critical=red…Info=grey); confirmed vote = green.
- /help reformatted into clear aligned sections.
- TUTORIAL: document non-blocking runs, /status progress, /stop, colors.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- BUG: /auth (and /creds /focus /target /repo) with no argument CLEARED the value
instead of showing it — so typing /auth to view wiped your credential. Now no-arg
prints the current value; clear only with an explicit `clear`.
- /show now also displays API-key status (set/missing) for the selected models'
providers, and a hint of which commands edit config.
- REPL /run prints a clear "▶ RUNNING (prompt returns when done; use tui for live)"
banner before and "◀ back to the NeuroSploit REPL" after, so it's obvious the
REPL didn't disappear during a run.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Claude-Code-style @ menu: rustyline CompletionType::List so @path shows a
file/folder selection list (Tab), not inline cycling.
- /diff (/changed): shows new (+) / gone (-) findings between the last two runs.
- /retest [n]: loads a past run's target/repo and seeds a re-verify focus on its
findings → /run to check if they're fixed.
- Both added to Tab-complete and /help.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Chaining:
- agents_md/chains/ (12 multi-stage exploitation playbooks): SQLi→RCE→LPE,
SSRF→AWS-creds, SSRF→RCE, upload→RCE, upload→LFI→RCE→LPE, XSS→ATO, IDOR→ATO,
SSTI→RCE→cloud, default-creds→domain, deserialization→RCE, exposed-git→RCE,
subdomain-takeover→trusted-abuse. Each stage proven by a tool receipt before
advancing; reports chains_from edges.
- Loaded as a `chains` category (→ 329 agents). chain_round now injects the chain
recipes as a menu so the LLM applies proven multi-stage paths.
Persistence (no DB — structured state):
- Per-project `<cwd>/.neurosploit/` holding session.json (config), runs.json
(history), history.txt (readline). REPL resumes target/repo/auth/focus/models
on reopen; saves on /run and /quit.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Live findings feed: each candidate is surfaced (✦ possible finding [sev] title
@ endpoint) the moment an agent returns it, not only at the end.
- 🔔 notifications in the feed: evidence saved, phase complete (with severity
breakdown = automatic partial summary). Renderer styles notify/finding tags.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Partial observability is now first-class:
- belief.rs — property-graph world model; nodes (host/service/vuln/exploit/cred)
carry a probability, not a boolean. Bayesian observation updates; per-node
Shannon entropy; mean-uncertainty + recon-frontier. Black-box = diffuse priors
that sharpen with observation; white-box collapses toward deterministic (MDP).
- pomdp.rs — value_of_information(), decide() (recon vs exploit falls out of
belief entropy), and may_assert() — the mathematical anti-hallucination gate:
no exploitability claim while the belief is diffuse (high entropy) → observe first.
- grounding.rs — verification engine, hard rule "no claim without a tool receipt":
empirical grounding for black-box (raw HTTP/OOB/error markers), symbolic for
white-box (file:line into reviewed source). Ungrounded claims demoted + flagged
receipt_missing (feeds future reward shaping).
- pipeline.finish(): grounding gate before reporting + belief-uncertainty readout.
- bump 3.5.0 → 3.5.1; README documents the v3.5.1 belief/grounding architecture
and the infra/bandit/reward roadmap.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Streamed Claude events now tagged with the agent label (@name) so every
command/tool/file is attributable to the agent that ran it.
- Token/cost telemetry: parse usage from the stream-json result event; feed shows
per-call in/out/cost and a running total in the run summary.
- Ctrl-C during a run no longer hard-kills: it cancels cooperatively (no new
agents launch, in-flight bounded), then asks "generate report from partial
results? [Y/n]" — discard removes the run dir. Second Ctrl-C aborts.
- pool: cancel handle + is_cancelled; one()/complete_routed/chat_cli carry a label.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
REPL (rustyline Helper):
- Tab autocomplete for /commands and @filesystem-paths.
- @path attach: @file, @folder, @file:LINE / @file:START-END fold scope files /
stack traces into the agent context; /attach <path> and /context to manage.
- Multiline input: end a line with `\` to continue (validator-driven).
- /theme color|mono, /config (=/show); history (↑/↓) persists as before.
- Attachments are merged into the run's instruction context.
Install:
- setup.sh: `curl … | bash` — auto-installs Rust, clones to ~/.neurosploit,
builds release, links neurosploit into ~/.local/bin; idempotent; env-tunable.
README: v3.5.0, 🧠 (back to "neuro"), one-line install section, neurosploit-on-PATH usage.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Finding enriched with owasp / mitre / kill-chain stage / exploitability /
business_impact / chains_from (attack-path edges).
- attack_graph module: derive OWASP Top 10 + MITRE ATT&CK technique + kill-chain
stage from CWE (heuristic, no extra model call); render a Mermaid attack-path
flowchart (findings grouped by stage, explicit + implicit edges) and an ASCII
kill chain for the REPL.
- enrich() runs in finish() for every engagement.
- HTML report gains an "Attack Path & Kill Chain" section (Mermaid via CDN, dark)
plus a stage/sev/OWASP/MITRE/exploitability table.
- REPL print_findings shows the ASCII kill-chain + severity summary after a run.
- models: add GPT-5.5, GPT-5.4, GPT-5.4-mini, GPT-5.3-codex, GPT-5.2.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harness:
- ModelPool gains a progress channel (set_progress); chat_cli forwards it.
- New chat_claude_stream: drives Claude Code with --output-format stream-json and
parses the event stream live — assistant text, and tool_use blocks categorized
into tagged events (exec/danger command, read/edit file, net request/browser,
grep/glob tool). 900s bound; clear error surfacing.
- Wired set_progress into run / whitebox / greybox.
REPL renderer (render_line):
- Tagged events render as the conversation feed: tool/command/network as compact
CARDS (tool-runner visual), files/edits/AI text/states as iconized lines.
- Clear "what the AI is doing" states: reconning, planning, testing, validating,
chaining, report, complete — plus a ⚠ DANGEROUS marker for risky commands.
- Untagged harness lines mapped to the same state vocabulary.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harness:
- Exploit-chaining round: after validation, chain confirmed findings into deeper
impact (SSRF→metadata, SQLi→dump→reuse, IDOR→ATO, file-read→secrets→RCE),
validate the new findings, merge. Wired into black-box and greybox.
- Latest top models surfaced: claude-opus-4-8, gpt-5.1/gpt-5.1-codex, gemini-3-pro.
REPL:
- Real line editing via rustyline: ↑/↓ command-history recall, Ctrl-A/E/K, paste;
Ctrl-C cancels the line, Ctrl-D exits. Command history persists to
data/repl_history.txt. Graceful plain-stdin fallback when not a TTY.
- /model with no arg → arrow-key multi-select (dialoguer); with arg accepts any
provider:model names.
- /key is model-aware: lists the providers your selected models need (set/missing)
and prompts for the missing keys; /key <prov> <key> still works.
- Run history persists to data/repl_runs.json and reloads across sessions
(/runs lists past + current; /results /report /status by run number).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- RunOutput exposes `workdir` so the session can locate reports.
- Session now records every run (RunRecord: id, mode, target, workdir, findings).
- New commands:
/runs list runs done this session (mode, target, severity counts)
/results [n] show findings of run n (default last), severity-sorted
/report [n] open the PDF/HTML report (open/xdg-open)
/status [n] print the run's status.json
/offline on|off pipeline self-test toggle (no model calls)
- Each /run prints "saved as run #n" with the quick commands.
- Verified offline: run → /runs → /results → /status all work.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- harness/creds::login(): performs the real HTTP login (POST/GET form), captures
a session Cookie from Set-Cookie or a Bearer token from the JSON body, with a
soft success check (no hard fail on 302). Redirects not followed so Set-Cookie
is visible.
- apply_creds is now async: direct material (jwt/header/cookie) used as-is; a
`login:` flow is EXECUTED to obtain a live session; on failure, falls back to
instructing the agents to log in themselves.
- --creds + --focus added to `run` (authenticated black-box) too.
- Verified live against a local mock: POST /login → 302 + Set-Cookie captured as
the auth header used on subsequent requests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- New GREYBOX mode: review a repo's source AND exploit the running app in one
pipeline — code-review findings become LEADS injected into live exploitation.
CLI: `neurosploit greybox <repo> --url <app> [--creds creds.yaml] [--focus ...]`
REPL: set both /repo and /target → greybox auto-selected.
- Credentials (harness/src/creds.rs, dependency-free YAML subset): jwt / header /
cookie, or an automated `login:` flow. Derives an auth header and/or a
"authenticate first via curl" directive injected into prompts so agents test
authenticated. --creds flag + /creds command + creds.example.yaml.
- RunConfig gains `repo`; run_engagement refactored to a Mode enum (Black/White/Grey).
- Verified offline: greybox loads creds, combines repo+URL, runs pipeline, writes report.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- New persistent interactive session (app/src/repl.rs), launched when run with no args:
banner, model selection, API-key config (/key) or subscription (/sub), then a live
session to set /target, /repo, /auth, and free-text /focus instructions (or just type
them) that STEER which agents run and how.
- Slash-commands: /model /providers /key /sub /target /repo /auth /focus /mcp /votes
/agents /show /run /quit (+ bare text = focus).
- RunConfig gains `instructions` and `auth`:
* instructions bias both LLM agent-selection and the heuristic (focus keywords →
injection/access-control/etc. agents get a strong boost)
* operator directives (focus + auth) injected into recon and exploit prompts so agents
test as an authenticated user and prioritise the requested vuln classes
- bump 3.4.1 → 3.5.0 (CLI, harness, reports, credits)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Task-based model ROUTER (recon/select prefer a fast model; exploit prefers primary; validate uses a different model than the finder)
- ReAct doctrine injected into exploit prompts (Thought→Action→Observation, token-efficient)
- Dedup: unique agents per run + findings deduped by CWE/endpoint/title (highest confidence kept)
- Token economy: recon blob capped for selector + per-agent context
- Configurable MCP: merge user mcp.servers.json into the pipeline's .mcp.json
- +54 white-box/code-analysis agents (NoSQLi, LDAP/XPath, JWT-none, Java/.NET/PHP/Go/Node/Python
specifics, SSTI, ReDoS, deserialization, etc.) → 303 agents total (78 code)
- Credits: Joas A Santos & Red Team Leaders (CLI banner, interactive header, HTML+Typst report)
- README: GitHub stars/forks badges, 60-second quick start, full API config steps, intuitive layout
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>