`whitebox <arg>`, `greybox --repo <arg>`, `tui --repo`, and the REPL `/repo`
now accept a git URL (https://github.com/owner/repo[.git], git@…, ssh://, *.git)
or an `owner/repo` shorthand. A new resolve_source() shallow-clones it into
<base>/repos/<name> (cached, .gitignored) and reviews it; existing local paths
are used unchanged. Works identically with API-key (--model) and --subscription.
Verified: `neurosploit whitebox https://github.com/digininja/DVWA --offline`
clones DVWA and runs the 78 code agents over 120KB of source.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Distilled from reviewing real AI-pentest output that kept stopping at "exposed"
instead of "exploited". Pure-additive, back-compatible.
Behavior (injected into black/grey/chain exploit prompts via DEPTH_DOCTRINE):
- Exposed → exploited: any info-disclosure / exposed service/WSDL / leaked
credential|token / reachable dev host MUST be used before it's a finding;
otherwise it's a lead, not a confirmed High/Critical.
- Chain across modules: reuse obtained session/JWT/cookie/credential and pivot
to IDOR/privesc/exfil; report the chain, not isolated parts.
- Decode & fingerprint → CVE; audit tokens (alg-confusion/none/kid/JWKS, weak
HS256 secret cracking, lifecycle).
Deterministic post-pass (new crates/harness/src/hygiene.rs, wired into finish()):
- calibrate severity to PROVEN impact — unproven High/Critical (hedged, no
payload, thin evidence) capped to Medium and re-titled "(potential)";
- depth_audit — flag exposures on a host with no real exploit;
- hygiene_summary — advise consolidating hygiene classes repeated across assets.
Unit tests cover calibration + depth audit.
5 new doctrine meta-agents (scripts/build_methodology_v352.py → agents_md/meta/):
exploit_depth_doctrine, finding_chainer, artifact_decoder, token_auditor,
report_calibrator (meta 17→22, total 343→348).
Version bumped 3.5.1 → 3.5.2 across crates/app/installers/docs; RELEASE/README
updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
These session/runs/history files are runtime state generated during local
testing; .neurosploit/ is already in .gitignore. Untrack them so the repo
doesn't carry test artifacts.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Token/quota exhaustion no longer silently drops agents. When every candidate
model is rate-limited / out of quota, the run PARKS (keeping all state) and
prints "⏸ token/quota exhausted … PAUSED". The user can:
- wait for renewal and /continue (retry same model), or
- /model <provider:model> (or the /model selector) then /continue to switch.
Implemented via ModelPool: is_exhaustion() detection, park_exhausted() that
awaits a resume Notify, and a fallback-model slot tried first on retry. /model
queues the chosen models into a paused run's fallback so a plain /continue
resumes on them.
Findings now survive a crash/quit: each finding is checkpointed live to
.neurosploit/active_run.json; on next launch an interrupted run is recovered
into /runs (a raw report is materialized) so /results, /finding and /report
keep working.
/stop now actually halts immediately on raw/discard: one() races the in-flight
model call against the hard-cancel flag, so the CLI child (kill_on_drop) is
terminated at once instead of finishing its whole command sequence. The
validate path still soft-stops (lets validation run).
Docs: TUTORIAL documents the 3-way /stop, crash recovery and pause/continue;
/help lists /continue and the new behaviors.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- New `litellm` provider (kind=api). Use `litellm:<model>` — model names pass
through to your gateway. No hardcoded key required (proxy may be open).
- Env-configurable base URL: LITELLM_BASE_URL (default http://localhost:4000/v1),
LITELLM_API_KEY. OLLAMA_BASE_URL override added too.
- TUTORIAL documents the LiteLLM env config.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CRITICAL BUG: truncate()/source-context slices cut strings by BYTE, panicking on
a multibyte char (e.g. '—'). The panic crashed agent tasks → task.await returned
JoinError → unwrap_or_default() → empty RunOutput. Result: real confirmed findings
(win.ini traversal, HTML injection) were silently lost, workdir was empty, report
missing. Now all string truncation is char-safe (models.rs, pipeline.rs, repl.rs).
Also:
- Background runs: /run now runs in the BACKGROUND via rustyline's ExternalPrinter
— the REPL keeps accepting commands while the engagement streams live. New
/status (live phase + progress bar + findings) and /stop (graceful). Findings
persist to history + report on completion (finalize_run ensures workdir is set
even on abort, fixing "no report file in ").
- Progress bar: agents-done/total with %, shown in /status.
- Severity colors in the live feed (Critical=red…Info=grey); confirmed vote = green.
- /help reformatted into clear aligned sections.
- TUTORIAL: document non-blocking runs, /status progress, /stop, colors.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- BUG: /auth (and /creds /focus /target /repo) with no argument CLEARED the value
instead of showing it — so typing /auth to view wiped your credential. Now no-arg
prints the current value; clear only with an explicit `clear`.
- /show now also displays API-key status (set/missing) for the selected models'
providers, and a hint of which commands edit config.
- REPL /run prints a clear "▶ RUNNING (prompt returns when done; use tui for live)"
banner before and "◀ back to the NeuroSploit REPL" after, so it's obvious the
REPL didn't disappear during a run.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Claude-Code-style @ menu: rustyline CompletionType::List so @path shows a
file/folder selection list (Tab), not inline cycling.
- /diff (/changed): shows new (+) / gone (-) findings between the last two runs.
- /retest [n]: loads a past run's target/repo and seeds a re-verify focus on its
findings → /run to check if they're fixed.
- Both added to Tab-complete and /help.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Chaining:
- agents_md/chains/ (12 multi-stage exploitation playbooks): SQLi→RCE→LPE,
SSRF→AWS-creds, SSRF→RCE, upload→RCE, upload→LFI→RCE→LPE, XSS→ATO, IDOR→ATO,
SSTI→RCE→cloud, default-creds→domain, deserialization→RCE, exposed-git→RCE,
subdomain-takeover→trusted-abuse. Each stage proven by a tool receipt before
advancing; reports chains_from edges.
- Loaded as a `chains` category (→ 329 agents). chain_round now injects the chain
recipes as a menu so the LLM applies proven multi-stage paths.
Persistence (no DB — structured state):
- Per-project `<cwd>/.neurosploit/` holding session.json (config), runs.json
(history), history.txt (readline). REPL resumes target/repo/auth/focus/models
on reopen; saves on /run and /quit.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Live findings feed: each candidate is surfaced (✦ possible finding [sev] title
@ endpoint) the moment an agent returns it, not only at the end.
- 🔔 notifications in the feed: evidence saved, phase complete (with severity
breakdown = automatic partial summary). Renderer styles notify/finding tags.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Partial observability is now first-class:
- belief.rs — property-graph world model; nodes (host/service/vuln/exploit/cred)
carry a probability, not a boolean. Bayesian observation updates; per-node
Shannon entropy; mean-uncertainty + recon-frontier. Black-box = diffuse priors
that sharpen with observation; white-box collapses toward deterministic (MDP).
- pomdp.rs — value_of_information(), decide() (recon vs exploit falls out of
belief entropy), and may_assert() — the mathematical anti-hallucination gate:
no exploitability claim while the belief is diffuse (high entropy) → observe first.
- grounding.rs — verification engine, hard rule "no claim without a tool receipt":
empirical grounding for black-box (raw HTTP/OOB/error markers), symbolic for
white-box (file:line into reviewed source). Ungrounded claims demoted + flagged
receipt_missing (feeds future reward shaping).
- pipeline.finish(): grounding gate before reporting + belief-uncertainty readout.
- bump 3.5.0 → 3.5.1; README documents the v3.5.1 belief/grounding architecture
and the infra/bandit/reward roadmap.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Streamed Claude events now tagged with the agent label (@name) so every
command/tool/file is attributable to the agent that ran it.
- Token/cost telemetry: parse usage from the stream-json result event; feed shows
per-call in/out/cost and a running total in the run summary.
- Ctrl-C during a run no longer hard-kills: it cancels cooperatively (no new
agents launch, in-flight bounded), then asks "generate report from partial
results? [Y/n]" — discard removes the run dir. Second Ctrl-C aborts.
- pool: cancel handle + is_cancelled; one()/complete_routed/chat_cli carry a label.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
REPL (rustyline Helper):
- Tab autocomplete for /commands and @filesystem-paths.
- @path attach: @file, @folder, @file:LINE / @file:START-END fold scope files /
stack traces into the agent context; /attach <path> and /context to manage.
- Multiline input: end a line with `\` to continue (validator-driven).
- /theme color|mono, /config (=/show); history (↑/↓) persists as before.
- Attachments are merged into the run's instruction context.
Install:
- setup.sh: `curl … | bash` — auto-installs Rust, clones to ~/.neurosploit,
builds release, links neurosploit into ~/.local/bin; idempotent; env-tunable.
README: v3.5.0, 🧠 (back to "neuro"), one-line install section, neurosploit-on-PATH usage.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Finding enriched with owasp / mitre / kill-chain stage / exploitability /
business_impact / chains_from (attack-path edges).
- attack_graph module: derive OWASP Top 10 + MITRE ATT&CK technique + kill-chain
stage from CWE (heuristic, no extra model call); render a Mermaid attack-path
flowchart (findings grouped by stage, explicit + implicit edges) and an ASCII
kill chain for the REPL.
- enrich() runs in finish() for every engagement.
- HTML report gains an "Attack Path & Kill Chain" section (Mermaid via CDN, dark)
plus a stage/sev/OWASP/MITRE/exploitability table.
- REPL print_findings shows the ASCII kill-chain + severity summary after a run.
- models: add GPT-5.5, GPT-5.4, GPT-5.4-mini, GPT-5.3-codex, GPT-5.2.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harness:
- ModelPool gains a progress channel (set_progress); chat_cli forwards it.
- New chat_claude_stream: drives Claude Code with --output-format stream-json and
parses the event stream live — assistant text, and tool_use blocks categorized
into tagged events (exec/danger command, read/edit file, net request/browser,
grep/glob tool). 900s bound; clear error surfacing.
- Wired set_progress into run / whitebox / greybox.
REPL renderer (render_line):
- Tagged events render as the conversation feed: tool/command/network as compact
CARDS (tool-runner visual), files/edits/AI text/states as iconized lines.
- Clear "what the AI is doing" states: reconning, planning, testing, validating,
chaining, report, complete — plus a ⚠ DANGEROUS marker for risky commands.
- Untagged harness lines mapped to the same state vocabulary.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harness:
- Exploit-chaining round: after validation, chain confirmed findings into deeper
impact (SSRF→metadata, SQLi→dump→reuse, IDOR→ATO, file-read→secrets→RCE),
validate the new findings, merge. Wired into black-box and greybox.
- Latest top models surfaced: claude-opus-4-8, gpt-5.1/gpt-5.1-codex, gemini-3-pro.
REPL:
- Real line editing via rustyline: ↑/↓ command-history recall, Ctrl-A/E/K, paste;
Ctrl-C cancels the line, Ctrl-D exits. Command history persists to
data/repl_history.txt. Graceful plain-stdin fallback when not a TTY.
- /model with no arg → arrow-key multi-select (dialoguer); with arg accepts any
provider:model names.
- /key is model-aware: lists the providers your selected models need (set/missing)
and prompts for the missing keys; /key <prov> <key> still works.
- Run history persists to data/repl_runs.json and reloads across sessions
(/runs lists past + current; /results /report /status by run number).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- RunOutput exposes `workdir` so the session can locate reports.
- Session now records every run (RunRecord: id, mode, target, workdir, findings).
- New commands:
/runs list runs done this session (mode, target, severity counts)
/results [n] show findings of run n (default last), severity-sorted
/report [n] open the PDF/HTML report (open/xdg-open)
/status [n] print the run's status.json
/offline on|off pipeline self-test toggle (no model calls)
- Each /run prints "saved as run #n" with the quick commands.
- Verified offline: run → /runs → /results → /status all work.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- harness/creds::login(): performs the real HTTP login (POST/GET form), captures
a session Cookie from Set-Cookie or a Bearer token from the JSON body, with a
soft success check (no hard fail on 302). Redirects not followed so Set-Cookie
is visible.
- apply_creds is now async: direct material (jwt/header/cookie) used as-is; a
`login:` flow is EXECUTED to obtain a live session; on failure, falls back to
instructing the agents to log in themselves.
- --creds + --focus added to `run` (authenticated black-box) too.
- Verified live against a local mock: POST /login → 302 + Set-Cookie captured as
the auth header used on subsequent requests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- New GREYBOX mode: review a repo's source AND exploit the running app in one
pipeline — code-review findings become LEADS injected into live exploitation.
CLI: `neurosploit greybox <repo> --url <app> [--creds creds.yaml] [--focus ...]`
REPL: set both /repo and /target → greybox auto-selected.
- Credentials (harness/src/creds.rs, dependency-free YAML subset): jwt / header /
cookie, or an automated `login:` flow. Derives an auth header and/or a
"authenticate first via curl" directive injected into prompts so agents test
authenticated. --creds flag + /creds command + creds.example.yaml.
- RunConfig gains `repo`; run_engagement refactored to a Mode enum (Black/White/Grey).
- Verified offline: greybox loads creds, combines repo+URL, runs pipeline, writes report.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- New persistent interactive session (app/src/repl.rs), launched when run with no args:
banner, model selection, API-key config (/key) or subscription (/sub), then a live
session to set /target, /repo, /auth, and free-text /focus instructions (or just type
them) that STEER which agents run and how.
- Slash-commands: /model /providers /key /sub /target /repo /auth /focus /mcp /votes
/agents /show /run /quit (+ bare text = focus).
- RunConfig gains `instructions` and `auth`:
* instructions bias both LLM agent-selection and the heuristic (focus keywords →
injection/access-control/etc. agents get a strong boost)
* operator directives (focus + auth) injected into recon and exploit prompts so agents
test as an authenticated user and prioritise the requested vuln classes
- bump 3.4.1 → 3.5.0 (CLI, harness, reports, credits)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Task-based model ROUTER (recon/select prefer a fast model; exploit prefers primary; validate uses a different model than the finder)
- ReAct doctrine injected into exploit prompts (Thought→Action→Observation, token-efficient)
- Dedup: unique agents per run + findings deduped by CWE/endpoint/title (highest confidence kept)
- Token economy: recon blob capped for selector + per-agent context
- Configurable MCP: merge user mcp.servers.json into the pipeline's .mcp.json
- +54 white-box/code-analysis agents (NoSQLi, LDAP/XPath, JWT-none, Java/.NET/PHP/Go/Node/Python
specifics, SSTI, ReDoS, deserialization, etc.) → 303 agents total (78 code)
- Credits: Joas A Santos & Red Team Leaders (CLI banner, interactive header, HTML+Typst report)
- README: GitHub stars/forks badges, 60-second quick start, full API config steps, intuitive layout
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Root cause of empty results: models emit findings with confidence as a string
('High') or cvss as a number, but the Finding struct typed confidence as f64, so
serde failed the ENTIRE array on any mismatch -> 0 findings every run.
extract_findings now parses into serde_json::Value and coerces each field
(string/number/word), normalizes severity, and accepts qualitative confidence
(High->0.9 etc). Verified live: whitebox on a vulnerable sample now yields
validated findings (IDOR confirmed by vote).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 'recon failed (claude subscription CLI failed: )' was a transient CLI failure
(rate limit / cold start) reported with a blank message and no retry.
- chat_cli: on non-zero exit, surface exit code + stdout (CLI writes the real
reason there, not stderr); treat empty output as an error
- pool.one(): retry up to 3x with backoff for transient failures (both
subscription and API paths)
- with_auth: cap concurrency to 3 on the subscription path — spawning many
parallel CLI processes itself trips provider rate limits
Verified: live subscription run recovers and completes recon → select → exploit
→ vote → artifacts.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>