Chaining:
- agents_md/chains/ (12 multi-stage exploitation playbooks): SQLi→RCE→LPE,
SSRF→AWS-creds, SSRF→RCE, upload→RCE, upload→LFI→RCE→LPE, XSS→ATO, IDOR→ATO,
SSTI→RCE→cloud, default-creds→domain, deserialization→RCE, exposed-git→RCE,
subdomain-takeover→trusted-abuse. Each stage proven by a tool receipt before
advancing; reports chains_from edges.
- Loaded as a `chains` category (→ 329 agents). chain_round now injects the chain
recipes as a menu so the LLM applies proven multi-stage paths.
Persistence (no DB — structured state):
- Per-project `<cwd>/.neurosploit/` holding session.json (config), runs.json
(history), history.txt (readline). REPL resumes target/repo/auth/focus/models
on reopen; saves on /run and /quit.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Live findings feed: each candidate is surfaced (✦ possible finding [sev] title
@ endpoint) the moment an agent returns it, not only at the end.
- 🔔 notifications in the feed: evidence saved, phase complete (with severity
breakdown = automatic partial summary). Renderer styles notify/finding tags.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Partial observability is now first-class:
- belief.rs — property-graph world model; nodes (host/service/vuln/exploit/cred)
carry a probability, not a boolean. Bayesian observation updates; per-node
Shannon entropy; mean-uncertainty + recon-frontier. Black-box = diffuse priors
that sharpen with observation; white-box collapses toward deterministic (MDP).
- pomdp.rs — value_of_information(), decide() (recon vs exploit falls out of
belief entropy), and may_assert() — the mathematical anti-hallucination gate:
no exploitability claim while the belief is diffuse (high entropy) → observe first.
- grounding.rs — verification engine, hard rule "no claim without a tool receipt":
empirical grounding for black-box (raw HTTP/OOB/error markers), symbolic for
white-box (file:line into reviewed source). Ungrounded claims demoted + flagged
receipt_missing (feeds future reward shaping).
- pipeline.finish(): grounding gate before reporting + belief-uncertainty readout.
- bump 3.5.0 → 3.5.1; README documents the v3.5.1 belief/grounding architecture
and the infra/bandit/reward roadmap.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Streamed Claude events now tagged with the agent label (@name) so every
command/tool/file is attributable to the agent that ran it.
- Token/cost telemetry: parse usage from the stream-json result event; feed shows
per-call in/out/cost and a running total in the run summary.
- Ctrl-C during a run no longer hard-kills: it cancels cooperatively (no new
agents launch, in-flight bounded), then asks "generate report from partial
results? [Y/n]" — discard removes the run dir. Second Ctrl-C aborts.
- pool: cancel handle + is_cancelled; one()/complete_routed/chat_cli carry a label.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
REPL (rustyline Helper):
- Tab autocomplete for /commands and @filesystem-paths.
- @path attach: @file, @folder, @file:LINE / @file:START-END fold scope files /
stack traces into the agent context; /attach <path> and /context to manage.
- Multiline input: end a line with `\` to continue (validator-driven).
- /theme color|mono, /config (=/show); history (↑/↓) persists as before.
- Attachments are merged into the run's instruction context.
Install:
- setup.sh: `curl … | bash` — auto-installs Rust, clones to ~/.neurosploit,
builds release, links neurosploit into ~/.local/bin; idempotent; env-tunable.
README: v3.5.0, 🧠 (back to "neuro"), one-line install section, neurosploit-on-PATH usage.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Finding enriched with owasp / mitre / kill-chain stage / exploitability /
business_impact / chains_from (attack-path edges).
- attack_graph module: derive OWASP Top 10 + MITRE ATT&CK technique + kill-chain
stage from CWE (heuristic, no extra model call); render a Mermaid attack-path
flowchart (findings grouped by stage, explicit + implicit edges) and an ASCII
kill chain for the REPL.
- enrich() runs in finish() for every engagement.
- HTML report gains an "Attack Path & Kill Chain" section (Mermaid via CDN, dark)
plus a stage/sev/OWASP/MITRE/exploitability table.
- REPL print_findings shows the ASCII kill-chain + severity summary after a run.
- models: add GPT-5.5, GPT-5.4, GPT-5.4-mini, GPT-5.3-codex, GPT-5.2.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harness:
- ModelPool gains a progress channel (set_progress); chat_cli forwards it.
- New chat_claude_stream: drives Claude Code with --output-format stream-json and
parses the event stream live — assistant text, and tool_use blocks categorized
into tagged events (exec/danger command, read/edit file, net request/browser,
grep/glob tool). 900s bound; clear error surfacing.
- Wired set_progress into run / whitebox / greybox.
REPL renderer (render_line):
- Tagged events render as the conversation feed: tool/command/network as compact
CARDS (tool-runner visual), files/edits/AI text/states as iconized lines.
- Clear "what the AI is doing" states: reconning, planning, testing, validating,
chaining, report, complete — plus a ⚠ DANGEROUS marker for risky commands.
- Untagged harness lines mapped to the same state vocabulary.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harness:
- Exploit-chaining round: after validation, chain confirmed findings into deeper
impact (SSRF→metadata, SQLi→dump→reuse, IDOR→ATO, file-read→secrets→RCE),
validate the new findings, merge. Wired into black-box and greybox.
- Latest top models surfaced: claude-opus-4-8, gpt-5.1/gpt-5.1-codex, gemini-3-pro.
REPL:
- Real line editing via rustyline: ↑/↓ command-history recall, Ctrl-A/E/K, paste;
Ctrl-C cancels the line, Ctrl-D exits. Command history persists to
data/repl_history.txt. Graceful plain-stdin fallback when not a TTY.
- /model with no arg → arrow-key multi-select (dialoguer); with arg accepts any
provider:model names.
- /key is model-aware: lists the providers your selected models need (set/missing)
and prompts for the missing keys; /key <prov> <key> still works.
- Run history persists to data/repl_runs.json and reloads across sessions
(/runs lists past + current; /results /report /status by run number).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- RunOutput exposes `workdir` so the session can locate reports.
- Session now records every run (RunRecord: id, mode, target, workdir, findings).
- New commands:
/runs list runs done this session (mode, target, severity counts)
/results [n] show findings of run n (default last), severity-sorted
/report [n] open the PDF/HTML report (open/xdg-open)
/status [n] print the run's status.json
/offline on|off pipeline self-test toggle (no model calls)
- Each /run prints "saved as run #n" with the quick commands.
- Verified offline: run → /runs → /results → /status all work.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- harness/creds::login(): performs the real HTTP login (POST/GET form), captures
a session Cookie from Set-Cookie or a Bearer token from the JSON body, with a
soft success check (no hard fail on 302). Redirects not followed so Set-Cookie
is visible.
- apply_creds is now async: direct material (jwt/header/cookie) used as-is; a
`login:` flow is EXECUTED to obtain a live session; on failure, falls back to
instructing the agents to log in themselves.
- --creds + --focus added to `run` (authenticated black-box) too.
- Verified live against a local mock: POST /login → 302 + Set-Cookie captured as
the auth header used on subsequent requests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- New GREYBOX mode: review a repo's source AND exploit the running app in one
pipeline — code-review findings become LEADS injected into live exploitation.
CLI: `neurosploit greybox <repo> --url <app> [--creds creds.yaml] [--focus ...]`
REPL: set both /repo and /target → greybox auto-selected.
- Credentials (harness/src/creds.rs, dependency-free YAML subset): jwt / header /
cookie, or an automated `login:` flow. Derives an auth header and/or a
"authenticate first via curl" directive injected into prompts so agents test
authenticated. --creds flag + /creds command + creds.example.yaml.
- RunConfig gains `repo`; run_engagement refactored to a Mode enum (Black/White/Grey).
- Verified offline: greybox loads creds, combines repo+URL, runs pipeline, writes report.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- New persistent interactive session (app/src/repl.rs), launched when run with no args:
banner, model selection, API-key config (/key) or subscription (/sub), then a live
session to set /target, /repo, /auth, and free-text /focus instructions (or just type
them) that STEER which agents run and how.
- Slash-commands: /model /providers /key /sub /target /repo /auth /focus /mcp /votes
/agents /show /run /quit (+ bare text = focus).
- RunConfig gains `instructions` and `auth`:
* instructions bias both LLM agent-selection and the heuristic (focus keywords →
injection/access-control/etc. agents get a strong boost)
* operator directives (focus + auth) injected into recon and exploit prompts so agents
test as an authenticated user and prioritise the requested vuln classes
- bump 3.4.1 → 3.5.0 (CLI, harness, reports, credits)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Task-based model ROUTER (recon/select prefer a fast model; exploit prefers primary; validate uses a different model than the finder)
- ReAct doctrine injected into exploit prompts (Thought→Action→Observation, token-efficient)
- Dedup: unique agents per run + findings deduped by CWE/endpoint/title (highest confidence kept)
- Token economy: recon blob capped for selector + per-agent context
- Configurable MCP: merge user mcp.servers.json into the pipeline's .mcp.json
- +54 white-box/code-analysis agents (NoSQLi, LDAP/XPath, JWT-none, Java/.NET/PHP/Go/Node/Python
specifics, SSTI, ReDoS, deserialization, etc.) → 303 agents total (78 code)
- Credits: Joas A Santos & Red Team Leaders (CLI banner, interactive header, HTML+Typst report)
- README: GitHub stars/forks badges, 60-second quick start, full API config steps, intuitive layout
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Keep only the Rust harness (neurosploit-rs/) + the agent library (agents_md/) it
loads at runtime, plus docs. Remove the Python engine, web GUIs, legacy stack,
docker, build scripts and scratch test files from THIS branch only (other
branches keep everything). Rust-focused README with Kali/Docker + tool-install
guidance and testphp/DVWA usage examples.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Root cause of empty results: models emit findings with confidence as a string
('High') or cvss as a number, but the Finding struct typed confidence as f64, so
serde failed the ENTIRE array on any mismatch -> 0 findings every run.
extract_findings now parses into serde_json::Value and coerces each field
(string/number/word), normalizes severity, and accepts qualitative confidence
(High->0.9 etc). Verified live: whitebox on a vulnerable sample now yields
validated findings (IDOR confirmed by vote).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 'recon failed (claude subscription CLI failed: )' was a transient CLI failure
(rate limit / cold start) reported with a blank message and no retry.
- chat_cli: on non-zero exit, surface exit code + stdout (CLI writes the real
reason there, not stderr); treat empty output as an error
- pool.one(): retry up to 3x with backoff for transient failures (both
subscription and API paths)
- with_auth: cap concurrency to 3 on the subscription path — spawning many
parallel CLI processes itself trips provider rate limits
Verified: live subscription run recovers and completes recon → select → exploit
→ vote → artifacts.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Zero-dependency (stdlib http.server) front-end exposing only the essential
options — URL, backend, model, collaborator, RL + Playwright-MCP toggles — with
a live progress console. Calls neurosploit_agent directly; no npm/build.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Overhauled frontend with 2026 hacking HUD aesthetic (neon colors, glassmorphism)
- Added native support for NVIDIA NIM as a Tier 2 provider
- Fixed critical backend crashes in autonomous_agent.py and knowledge_processor.py
- Updated Kali sandbox build to Go 1.26 and fixed health check reliability
- Integrated Space Grotesk and JetBrains Mono fonts
- MD Agent system restructured: real HTTP exploitation, retry with exponential backoff, reduced concurrency (2 parallel, 2s stagger)
- Claude 4.6 model support (Opus/Sonnet) with corrected API version headers
- SmartRouter true failover with provider preference cascade
- WAFResult attribute error fix in autonomous_agent.py
- CVSS data sanitization for all vulnerability database saves
- AI recon JSON parsing robustness improvements
- rebuild.sh simplified from 714 to 196 lines
- Frontend: removed unused routes, simplified Auto Pentest page
- Agent grid: reduced max tests per agent (8→5), condensed recon prompts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New feature: Full LLM Pentest mode where the AI drives the entire
penetration test cycle autonomously. The LLM plans HTTP requests,
the system executes them, and the LLM analyzes real responses to
identify vulnerabilities — like a human pentester using Burp Suite.
- New OperationMode.FULL_LLM_PENTEST + AgentMode enum
- _run_full_llm_pentest(): 30-round ReACT loop (plan→execute→analyze→adapt)
- 3 new prompt functions in ai_prompts.py (system, round, report)
- Anti-hallucination: findings without real evidence are rejected
- All findings routed through ValidationJudge pipeline
- FullIATestingPage updated: 4-phase UI (Recon→Testing→PostExploit→Report)
- No Kali sandbox required — uses system HTTP client directly
- Methodology injection from pentestcompleto_en.md (118KB)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Missed occurrence in the OpenAI chat.completions.create() call
inside generate(). Now uses gpt-4o consistently.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Issues fixed:
- OpenRouter API key not recognized: _set_no_provider_error() now checks all 7
provider keys (was only checking Anthropic/OpenAI/Google), so users with only
OPENROUTER_API_KEY set no longer get "No API keys configured" error
- Error message now lists all 8 providers (added OpenRouter, Together, Fireworks)
instead of only 5 (Anthropic, OpenAI, Google, Ollama, LM Studio)
- gpt-4-turbo-preview (deprecated by OpenAI, 404 error) replaced with gpt-4o
as default OpenAI model in LLMClient init and generate() fallback
- Settings API model list updated: removed gpt-4-turbo-preview and o1-preview/mini,
added gpt-4.1, gpt-4.1-mini, o3-mini
- .env.example comment updated to reference gpt-4o instead of gpt-4-turbo
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>