From e4efa9bbb0e7ec1ed30578bd8361a9a6f5e8a5df Mon Sep 17 00:00:00 2001 From: CyberSecurityUP Date: Fri, 26 Jun 2026 11:31:11 -0300 Subject: [PATCH] =?UTF-8?q?v3.5.2=20=E2=80=94=20Exploitation=20Depth=20&?= =?UTF-8?q?=20Report=20Hygiene?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Distilled from reviewing real AI-pentest output that kept stopping at "exposed" instead of "exploited". Pure-additive, back-compatible. Behavior (injected into black/grey/chain exploit prompts via DEPTH_DOCTRINE): - Exposed → exploited: any info-disclosure / exposed service/WSDL / leaked credential|token / reachable dev host MUST be used before it's a finding; otherwise it's a lead, not a confirmed High/Critical. - Chain across modules: reuse obtained session/JWT/cookie/credential and pivot to IDOR/privesc/exfil; report the chain, not isolated parts. - Decode & fingerprint → CVE; audit tokens (alg-confusion/none/kid/JWKS, weak HS256 secret cracking, lifecycle). Deterministic post-pass (new crates/harness/src/hygiene.rs, wired into finish()): - calibrate severity to PROVEN impact — unproven High/Critical (hedged, no payload, thin evidence) capped to Medium and re-titled "(potential)"; - depth_audit — flag exposures on a host with no real exploit; - hygiene_summary — advise consolidating hygiene classes repeated across assets. Unit tests cover calibration + depth audit. 5 new doctrine meta-agents (scripts/build_methodology_v352.py → agents_md/meta/): exploit_depth_doctrine, finding_chainer, artifact_decoder, token_auditor, report_calibrator (meta 17→22, total 343→348). Version bumped 3.5.1 → 3.5.2 across crates/app/installers/docs; RELEASE/README updated. Co-Authored-By: Claude Opus 4.8 (1M context) --- README.md | 11 +- RELEASE.md | 60 ++++++ TUTORIAL.md | 4 +- agents_md/meta/artifact_decoder.md | 27 +++ agents_md/meta/exploit_depth_doctrine.md | 30 +++ agents_md/meta/finding_chainer.md | 25 +++ agents_md/meta/report_calibrator.md | 30 +++ agents_md/meta/token_auditor.md | 26 +++ install.ps1 | 2 +- neurosploit-rs/Cargo.lock | 4 +- neurosploit-rs/Cargo.toml | 2 +- neurosploit-rs/app/src/main.rs | 8 +- neurosploit-rs/app/src/repl.rs | 4 +- neurosploit-rs/app/src/tui.rs | 2 +- neurosploit-rs/crates/harness/src/belief.rs | 2 +- .../crates/harness/src/grounding.rs | 2 +- neurosploit-rs/crates/harness/src/hygiene.rs | 186 ++++++++++++++++++ neurosploit-rs/crates/harness/src/lib.rs | 3 +- neurosploit-rs/crates/harness/src/pipeline.rs | 35 +++- neurosploit-rs/crates/harness/src/pomdp.rs | 2 +- neurosploit-rs/crates/harness/src/report.rs | 6 +- scripts/build_methodology_v352.py | 183 +++++++++++++++++ setup.sh | 2 +- 23 files changed, 628 insertions(+), 28 deletions(-) create mode 100644 agents_md/meta/artifact_decoder.md create mode 100644 agents_md/meta/exploit_depth_doctrine.md create mode 100644 agents_md/meta/finding_chainer.md create mode 100644 agents_md/meta/report_calibrator.md create mode 100644 agents_md/meta/token_auditor.md create mode 100644 neurosploit-rs/crates/harness/src/hygiene.rs create mode 100644 scripts/build_methodology_v352.py diff --git a/README.md b/README.md index 81c8f0a..b409f43 100755 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -

🧠 NeuroSploit v3.5.1

+

🧠 NeuroSploit v3.5.2

Stars @@ -8,7 +8,7 @@

- + @@ -24,6 +24,13 @@ > > 📖 **New here? Read the [full Tutorial & User Guide →](TUTORIAL.md)** — every mode, flag, config and example explained. +> 🆕 **New in v3.5.2 — Exploitation Depth & Report Hygiene:** a **DEPTH doctrine** +> makes the engine *use* what it finds (exposed → exploited), **chain** findings +> across modules, decode/fingerprint artifacts → CVEs, and **audit tokens** (JWT +> alg-confusion / weak HS256 secrets). A deterministic post-pass **calibrates +> severity to proven impact** and **consolidates duplicated hygiene** findings. +> See [RELEASE.md](RELEASE.md). + --- **NeuroSploit** turns a URL, a source repository, a running app, or a host/IP into diff --git a/RELEASE.md b/RELEASE.md index a1037d9..ba5600f 100644 --- a/RELEASE.md +++ b/RELEASE.md @@ -1,3 +1,63 @@ +# NeuroSploit v3.5.2 — Release Notes + +**Release Date:** June 2026 +**Codename:** Exploitation Depth & Report Hygiene +**License:** MIT +**Credits:** Joas A Santos & Red Team Leaders + +--- + +## TL;DR + +v3.5.2 hard-codes the discipline that separates a great pentest from a noisy +one — distilled from reviewing real AI-pentest output that kept stopping at +*"exposed"* instead of *"exploited"*. The engine now pushes every exposure to +demonstrated impact, **chains** findings, decodes/fingerprints artifacts and +correlates CVEs, audits tokens, and keeps the final report honest (deduplicated +and severity-calibrated). + +## Highlights + +- **DEPTH doctrine (exploit, don't just expose).** A new doctrine is injected + into every exploitation prompt (black/grey/chain): any info-disclosure, + exposed service/catalog/WSDL, leaked credential/token, or reachable dev host + **must be USED** before it can be a finding — call it, decode it, log in, hit + the dev host. If it was only observed, it's reported as a **lead**, not a + confirmed High/Critical. +- **Finding chaining.** Reuse any session/JWT/cookie/credential obtained in one + step across all other modules; pivot access into IDOR/privesc/exfil and report + the **chain**, not isolated parts (e.g. captcha-bypass→admin JWT→authenticated + surface; enum + no-rate-limit→password spraying). +- **Decode & fingerprint → CVE.** Decode opaque tokens/paths (base64/JSON/marshal) + and pin exact library/gem/plugin/CMS versions, then correlate to known CVEs and + attempt a safe PoC. +- **Token auditor.** JWT alg-confusion (RS→HS), `alg:none`, kid/jku injection, + real signature verification, **weak HS256 secret cracking**, and token + lifecycle (logout/expiry/refresh). +- **Report-hygiene & depth pass (deterministic, in the harness).** After + validation the run now: + - **calibrates severity to proven impact** — an unproven High/Critical + (hedged language, no payload, thin evidence) is capped to Medium and + re-titled "(potential)"; + - flags **"exposed → exploited" gaps** — exposures on a host with no actual + exploit get an advisory to go use them; + - advises **consolidating hygiene** classes (headers/cookies/TLS/HSTS/ + clickjacking/disclosure) repeated across many assets into ONE finding with + an affected-asset table, instead of inflating the count one-per-host. +- **5 new doctrine meta-agents** (`agents_md/meta/`): `exploit_depth_doctrine`, + `finding_chainer`, `artifact_decoder`, `token_auditor`, `report_calibrator` + (meta agents 17 → 22; total library 343 → 348). + +## Notes + +- Pure-additive and back-compatible: existing modes, REPL, TUI, pause/continue, + crash-recovery and reports are unchanged. The hygiene pass only annotates and + down-calibrates unproven severities — it never invents or drops findings. +- New unit tests cover the calibration and depth-audit logic + (`harness::hygiene`). + +--- + # NeuroSploit v3.5.1 — Release Notes **Release Date:** June 2026 diff --git a/TUTORIAL.md b/TUTORIAL.md index 4989c60..1e9e758 100644 --- a/TUTORIAL.md +++ b/TUTORIAL.md @@ -1,4 +1,4 @@ -# NeuroSploit — Tutorial & User Guide (v3.5.1) +# NeuroSploit — Tutorial & User Guide (v3.5.2) A complete, hands-on guide to installing, configuring and running NeuroSploit — the autonomous, multi-model penetration-testing harness. @@ -98,7 +98,7 @@ Agents **degrade gracefully**: if `rustscan` is absent they use `nmap`; if neith ### Verify ```bash -neurosploit --version # neurosploit 3.5.1 +neurosploit --version # neurosploit 3.5.2 neurosploit agents # {"vulns":196,...,"chains":12,"total":329} neurosploit models # all providers & models ``` diff --git a/agents_md/meta/artifact_decoder.md b/agents_md/meta/artifact_decoder.md new file mode 100644 index 0000000..5064978 --- /dev/null +++ b/agents_md/meta/artifact_decoder.md @@ -0,0 +1,27 @@ +# Artifact Decoder & CVE Correlator Agent + +> Meta-agent (v3.5.2 doctrine). Decodes opaque tokens/paths, fingerprints the stack, and maps versions to CVEs. + +## User Prompt +For **{target}**, inspect every opaque or technology-revealing artifact seen in +recon and responses: + +1. **Decode** opaque tokens, IDs and URL paths (base64 / base64url / JSON / + marshal / JWT segments). A decoded value often reveals the framework or an + internal file path (e.g. a Dragonfly job `[["f","...file"]]`, a signed-URL + structure, a serialized object). +2. **Fingerprint** the stack: server, framework, language, and exact library / + gem / plugin / CMS versions (headers, asset paths, readme/changelog, error + pages, manifests). +3. **Correlate to CVEs**: map each exact version to known CVEs; prioritize + unauth RCE / SQLi / auth-bypass with a reliable, non-destructive PoC, and + attempt a safe confirmation (version/echo/OOB), never a destructive payload. + +Output JSON: {decoded:[{artifact, decoded_value, implication}], +stack:[{component, version}], cves:[{component, version, cve, cvss, exploitable, poc}]}. + +## System Prompt +You decode the opaque and correlate the obvious. Base64/JSON/marshal blobs and +version banners are leads, not noise — you decode them, fingerprint exact +versions, and check them against known CVEs, confirming only with a safe PoC and +a real receipt. Authorized engagement; no destructive or DoS actions. Credits: Joas A Santos and Red Team Leaders. diff --git a/agents_md/meta/exploit_depth_doctrine.md b/agents_md/meta/exploit_depth_doctrine.md new file mode 100644 index 0000000..dede0dd --- /dev/null +++ b/agents_md/meta/exploit_depth_doctrine.md @@ -0,0 +1,30 @@ +# Exploitation Depth Doctrine Agent + +> Meta-agent (v3.5.2 doctrine). Turns every exposure into an exploitation attempt before it becomes a finding. + +## User Prompt +You are reviewing the candidate findings and live transcript for **{target}**. + +For EACH candidate that merely *exposes* something (information disclosure, +exposed service/catalog/WSDL, leaked credential or token, reachable dev/staging +host, permissive CORS, open .git), drive it one step further BEFORE it is +reported: + +1. **Use what was exposed.** Call the exposed endpoint, decode the leaked + artifact, log in with the leaked credential, hit the dev host, send the + cross-origin request. Capture the real request/response. +2. **Decide honestly.** If using it proved impact → keep/raise severity with the + new evidence. If it could not be used → down-rate to a LEAD (low confidence), + never a confirmed High/Critical. +3. **Report the gap.** List any exposure you could not yet exploit, with the + exact next command to try, so the next round (or the human) can finish it. + +Output JSON: {"escalations":[{id, action_taken, new_evidence, new_severity}], +"leads":[{id, why_not_proven, next_command}]}. + +## System Prompt +You are a senior exploitation lead. Detection is not a finding — impact is. You +never let an info-disclosure, exposed service, leaked secret or reachable +non-prod host be reported as confirmed without an attempt to actually use it, +backed by a real tool receipt. Unproven impact is a lead, not a High. Authorized +engagement; no destructive or DoS actions. Credits: Joas A Santos and Red Team Leaders. diff --git a/agents_md/meta/finding_chainer.md b/agents_md/meta/finding_chainer.md new file mode 100644 index 0000000..10330ee --- /dev/null +++ b/agents_md/meta/finding_chainer.md @@ -0,0 +1,25 @@ +# Finding Chainer Agent + +> Meta-agent (v3.5.2 doctrine). Reuses obtained access across modules and reports the chain, not the parts. + +## User Prompt +Given the confirmed findings and any sessions/tokens/credentials obtained during +the engagement on **{target}**, build exploitation CHAINS: + +- Reuse every session/JWT/cookie/credential from one step against ALL other + modules and hosts in scope (a captcha/login bypass that yields a token unlocks + the entire authenticated surface — use it). +- Pivot access into higher impact: IDOR/BOLA, horizontal/vertical privesc, mass + assignment, data exfiltration, account takeover. +- Combine separate weaknesses (e.g. user-enumeration + missing rate-limit = + password spraying; token-in-URL + no throttle = mass exfil). + +For each chain output: {chain_id, steps:[{finding_id, action}], combined_impact, +combined_severity, evidence}. Prefer ONE well-evidenced chain over several +isolated low-severity items. + +## System Prompt +You are an exploit-chaining specialist. Isolated findings understate risk; the +real story is the chain. You always try to reuse obtained access across the +whole scope and escalate to business impact, reporting the combined chain with +concrete evidence. Authorized engagement; no destructive or DoS actions. Credits: Joas A Santos and Red Team Leaders. diff --git a/agents_md/meta/report_calibrator.md b/agents_md/meta/report_calibrator.md new file mode 100644 index 0000000..a6c96f5 --- /dev/null +++ b/agents_md/meta/report_calibrator.md @@ -0,0 +1,30 @@ +# Report Calibrator Agent + +> Meta-agent (v3.5.2 doctrine). Dedups by class, calibrates severity to proven impact, demands evidence per claim. + +## User Prompt +Before the final report for **{target}**, clean and calibrate the findings: + +1. **Consolidate hygiene by class.** Merge repeated hygiene findings (missing + security headers, clickjacking, cookie flags, weak TLS, HSTS, version/banner + disclosure) into ONE finding per class with an affected-asset TABLE — do not + inflate the count one-per-host. +2. **Calibrate severity to PROVEN impact.** High/Critical requires demonstrated + impact with evidence. Unproven DoS/abuse, "could/may/potential" language, or a + finding with no concrete payload/PoC → cap to Low/Medium or mark + "(potential)". Recompute the CVSS vector to match the proven impact. +3. **Evidence per claim.** Every finding — and every item in the "tests + performed" log — must carry a concrete request/response receipt; flag any + claim that has none, and any contradiction between the test log and the + findings. + +Output JSON: {merged:[{class, severity, assets:[...]}], +recalibrated:[{id, old_severity, new_severity, reason}], +unevidenced:[{id_or_test, missing}]}. + +## System Prompt +You are a meticulous report editor. You group hygiene by class with an +asset table, calibrate every severity to demonstrated impact (no inflated +High/Critical, no padding the count with duplicates), and require a real +receipt behind every claim — including each line of the tests-performed log. +Honest, deduplicated, evidence-backed reporting only. Credits: Joas A Santos and Red Team Leaders. diff --git a/agents_md/meta/token_auditor.md b/agents_md/meta/token_auditor.md new file mode 100644 index 0000000..22bb718 --- /dev/null +++ b/agents_md/meta/token_auditor.md @@ -0,0 +1,26 @@ +# Token & JWT Auditor Agent + +> Meta-agent (v3.5.2 doctrine). Attacks tokens: alg-confusion, none, kid/jku, signature checks, weak HS256 secrets. + +## User Prompt +For any session token or JWT issued by **{target}**, run a full auth-token audit: + +1. **Decode** the header/payload; note alg (HS*/RS*/none), kid, jku, exp, claims. +2. **Algorithm attacks**: try `alg:none`, RS→HS confusion (sign with the public + key as HMAC secret), and kid/jku injection. Confirm whether the server + actually verifies the signature (tamper a claim and replay). +3. **Weak secret**: for HS256, attempt to crack the signing secret offline + (wordlist/rules); a static or guessable shared secret (e.g. an `x-auth-*` + header value) is a strong lead — if cracked, forge a token for any user. +4. **Lifecycle**: test reuse after logout, expiry enforcement, and refresh-token + revocation. + +Output JSON: {token_type, alg, verified:true|false, +attacks:[{name, result, evidence}], forged_token_possible:true|false}. + +## System Prompt +You are a token-security specialist. Every JWT/session token gets audited for +algorithm confusion, none, kid/jku injection, real signature verification, weak +HS256 secrets, and lifecycle (logout/expiry/refresh). A forged or replayable +token is account takeover — you prove it with a real receipt. Authorized +engagement; no destructive or DoS actions. Credits: Joas A Santos and Red Team Leaders. diff --git a/install.ps1 b/install.ps1 index d3b1757..df20fb4 100644 --- a/install.ps1 +++ b/install.ps1 @@ -11,7 +11,7 @@ function Ok ($m) { Write-Host " + $m" -ForegroundColor Green } function Warn($m){ Write-Host " ! $m" -ForegroundColor Yellow } Write-Host "" -Write-Host " NeuroSploit installer (Windows) — v3.5.1" -ForegroundColor Cyan +Write-Host " NeuroSploit installer (Windows) — v3.5.2" -ForegroundColor Cyan $arch = $env:PROCESSOR_ARCHITECTURE Say "Platform: Windows / $arch" diff --git a/neurosploit-rs/Cargo.lock b/neurosploit-rs/Cargo.lock index fcb0566..540372c 100644 --- a/neurosploit-rs/Cargo.lock +++ b/neurosploit-rs/Cargo.lock @@ -871,7 +871,7 @@ dependencies = [ [[package]] name = "neurosploit" -version = "3.5.1" +version = "3.5.2" dependencies = [ "anyhow", "clap", @@ -888,7 +888,7 @@ dependencies = [ [[package]] name = "neurosploit-harness" -version = "3.5.1" +version = "3.5.2" dependencies = [ "anyhow", "futures", diff --git a/neurosploit-rs/Cargo.toml b/neurosploit-rs/Cargo.toml index 691b392..26e6873 100644 --- a/neurosploit-rs/Cargo.toml +++ b/neurosploit-rs/Cargo.toml @@ -3,7 +3,7 @@ members = ["crates/harness", "app"] resolver = "2" [workspace.package] -version = "3.5.1" +version = "3.5.2" edition = "2021" license = "MIT" repository = "https://github.com/JoasASantos/NeuroSploit" diff --git a/neurosploit-rs/app/src/main.rs b/neurosploit-rs/app/src/main.rs index 9156de4..b030412 100644 --- a/neurosploit-rs/app/src/main.rs +++ b/neurosploit-rs/app/src/main.rs @@ -1,4 +1,4 @@ -//! NeuroSploit v3.5.1 — interactive harness + CLI (`run` / `whitebox` / `agents` / `models`). +//! NeuroSploit v3.5.2 — interactive harness + CLI (`run` / `whitebox` / `agents` / `models`). mod repl; mod tui; @@ -11,8 +11,8 @@ use std::path::{Path, PathBuf}; #[command( name = "neurosploit", version, - about = "NeuroSploit v3.5.1 — multi-model autonomous pentest harness", - long_about = "NeuroSploit v3.5.1 — a Rust multi-model harness that drives a pool of LLMs \ + about = "NeuroSploit v3.5.2 — multi-model autonomous pentest harness", + long_about = "NeuroSploit v3.5.2 — a Rust multi-model harness that drives a pool of LLMs \ (API key or local subscription: Claude/Codex/Gemini/Grok) to autonomously test a target. \ After recon it INTELLIGENTLY selects only the agents matching the discovered surface, runs \ them in parallel, then validates every finding by cross-model voting before reporting.\n\n\ @@ -379,7 +379,7 @@ pub(crate) fn spawn_engagement(base: &Path, mut cfg: RunConfig, mcp: bool, mode: cfg.rl_path = Some(base.join("data").join("rl_state_rs.json").display().to_string()); write_status(&workdir, "running", &format!("\"target\":{:?}", cfg.target)); - println!(" ┌─ NeuroSploit v3.5.1 · by Joas A Santos & Red Team Leaders"); + println!(" ┌─ NeuroSploit v3.5.2 · by Joas A Santos & Red Team Leaders"); println!(" │ run id : {run_id}"); println!(" │ target : {}", cfg.target); println!(" │ models : {}", cfg.models.join(", ")); diff --git a/neurosploit-rs/app/src/repl.rs b/neurosploit-rs/app/src/repl.rs index 6ecc8ef..070a614 100644 --- a/neurosploit-rs/app/src/repl.rs +++ b/neurosploit-rs/app/src/repl.rs @@ -1,4 +1,4 @@ -//! NeuroSploit v3.5.1 — interactive session (Claude-Code / Codex / Cursor-CLI style). +//! NeuroSploit v3.5.2 — interactive session (Claude-Code / Codex / Cursor-CLI style). //! //! Launched when `neurosploit` runs with no subcommand. A persistent REPL with //! real line editing (arrow-key history recall, Ctrl-A/E/K, paste), model @@ -299,7 +299,7 @@ pub async fn repl(base: &Path) -> anyhow::Result<()> { let backends = harness::installed_cli_backends(); println!("\x1b[1m"); println!(" ███╗ ██╗███████╗██╗ ██╗██████╗ ██████╗"); - println!(" ████╗ ██║██╔════╝██║ ██║██╔══██╗██╔═══██╗ NeuroSploit v3.5.1"); + println!(" ████╗ ██║██╔════╝██║ ██║██╔══██╗██╔═══██╗ NeuroSploit v3.5.2"); println!(" ██╔██╗ ██║█████╗ ██║ ██║██████╔╝██║ ██║ interactive harness"); println!(" ██║╚██╗██║██╔══╝ ██║ ██║██╔══██╗██║ ██║ by Joas A Santos"); println!(" ██║ ╚████║███████╗╚██████╔╝██║ ██║╚██████╔╝ & Red Team Leaders"); diff --git a/neurosploit-rs/app/src/tui.rs b/neurosploit-rs/app/src/tui.rs index 97b1315..e297d58 100644 --- a/neurosploit-rs/app/src/tui.rs +++ b/neurosploit-rs/app/src/tui.rs @@ -1,4 +1,4 @@ -//! NeuroSploit v3.5.1 — TUI "Mission Control" mode. +//! NeuroSploit v3.5.2 — TUI "Mission Control" mode. //! //! Concurrent panels that update live while the engagement runs in the //! background, with a composer input that stays active during execution: diff --git a/neurosploit-rs/crates/harness/src/belief.rs b/neurosploit-rs/crates/harness/src/belief.rs index 5772350..af12d31 100644 --- a/neurosploit-rs/crates/harness/src/belief.rs +++ b/neurosploit-rs/crates/harness/src/belief.rs @@ -1,4 +1,4 @@ -//! POMDP belief-state world model (v3.5.1). +//! POMDP belief-state world model (v3.5.2). //! //! The target is only partially observable, so we don't track booleans — we //! track a **belief**: a property graph whose nodes (host / service / vuln / diff --git a/neurosploit-rs/crates/harness/src/grounding.rs b/neurosploit-rs/crates/harness/src/grounding.rs index a550c3e..c607f55 100644 --- a/neurosploit-rs/crates/harness/src/grounding.rs +++ b/neurosploit-rs/crates/harness/src/grounding.rs @@ -1,4 +1,4 @@ -//! Verification / grounding engine (v3.5.1). +//! Verification / grounding engine (v3.5.2). //! //! Hard rule: **no claim enters the world model without a tool receipt** — raw //! tool output, not the LLM's paraphrase. This is the empirical anti-hallucination diff --git a/neurosploit-rs/crates/harness/src/hygiene.rs b/neurosploit-rs/crates/harness/src/hygiene.rs new file mode 100644 index 0000000..3dde5fd --- /dev/null +++ b/neurosploit-rs/crates/harness/src/hygiene.rs @@ -0,0 +1,186 @@ +//! Report-hygiene & exploitation-depth pass (v3.5.2). +//! +//! Encodes the post-engagement discipline learned from reviewing real +//! AI-pentest output, applied deterministically after validation: +//! 1. **Calibrate severity to PROVEN impact** — an unproven High/Critical +//! (hedged language, no payload, thin evidence) is capped to Medium and +//! re-titled "(potential)". No inflated severities. +//! 2. **Exposed → exploited** — flag info-disclosure / exposed-service / +//! leaked-credential findings on a host that has no actual exploit, so the +//! operator knows to *use* what was exposed (or down-rate it to a lead). +//! 3. **Consolidate hygiene** — when the same hygiene class (missing headers, +//! clickjacking, cookie flags, TLS, info-disclosure…) repeats across many +//! assets, advise merging into ONE finding with an affected-asset table, +//! instead of inflating the count one-per-host. +//! +//! All functions are pure/deterministic; only `calibrate` mutates findings +//! (severity/title/confidence). The rest return advisory strings streamed to +//! the operator and recorded with the run. +use crate::types::Finding; + +fn host_of(endpoint: &str) -> String { + let s = endpoint.trim(); + let s = s.split("://").last().unwrap_or(s); + let s = s.split('/').next().unwrap_or(s); + s.split('?').next().unwrap_or(s).to_lowercase() +} + +fn sev_rank(s: &str) -> u8 { + match s.to_lowercase().as_str() { + x if x.starts_with("crit") => 4, + x if x.starts_with("high") => 3, + x if x.starts_with("med") => 2, + x if x.starts_with("low") => 1, + _ => 0, + } +} + +fn short(s: &str) -> String { + s.chars().take(64).collect() +} + +/// Hedging words that signal an impact was described but not demonstrated +/// (English + Portuguese, since engagements are bilingual). +const WEASEL: &[&str] = &[ + "could ", "may ", "might ", "potential", "possible", "possibly", "teóric", "theoret", + "poderia", "possív", "potencial", "if the ", "caso o", "caso a", "would allow", "permitiria", +]; + +/// A finding that *exposes* something (recon/disclosure) rather than being an +/// exploit with demonstrated impact. +fn is_exposure(f: &Finding) -> bool { + let cwe = f.cwe.to_lowercase(); + let t = f.title.to_lowercase(); + ["200", "527", "538", "942", "497", "209", "548", "16"].iter().any(|c| cwe.contains(c)) + || [ + "disclosure", "exposed", "exposi", "exposure", "catalog", "catálogo", "cors", + "banner", "version", "versão", "header", "cabeçalho", ".git", "enumerat", + "fingerprint", "wsdl", "swagger", "missing security", "outdated", "eol", + ] + .iter() + .any(|k| t.contains(k)) +} + +/// Reads as unproven: hedged or thin evidence AND no concrete payload. +fn looks_unproven(f: &Finding) -> bool { + let blob = format!("{} {} {}", f.title, f.impact, f.evidence).to_lowercase(); + let hedged = WEASEL.iter().any(|w| blob.contains(w)); + let weak_ev = f.evidence.trim().chars().count() < 40; + let no_payload = f.payload.trim().is_empty(); + (hedged || weak_ev) && no_payload +} + +/// Normalized hygiene class, for consolidation advice. +fn class_of(f: &Finding) -> &'static str { + let t = f.title.to_lowercase(); + if t.contains("header") || t.contains("cabeçalho") { "missing-security-headers" } + else if t.contains("clickjack") || t.contains("frame") { "clickjacking" } + else if t.contains("hsts") || t.contains("strict-transport") { "missing-hsts" } + else if t.contains("cookie") { "cookie-flags" } + else if t.contains("tls") || t.contains("ssl") { "weak-tls" } + else if t.contains("cors") { "cors-misconfig" } + else if t.contains("version") || t.contains("versão") || t.contains("banner") || t.contains("eol") || t.contains("outdated") { "version-disclosure" } + else { "information-disclosure" } +} + +/// Cap inflated, unproven High/Critical findings to Medium. Returns advisories. +pub fn calibrate(findings: &mut [Finding]) -> Vec { + let mut notes = Vec::new(); + for f in findings.iter_mut() { + if sev_rank(&f.severity) >= 3 && looks_unproven(f) { + let old = f.severity.clone(); + f.severity = "Medium".into(); + f.confidence = f.confidence.min(0.5); + let low = f.title.to_lowercase(); + if !low.contains("potential") && !low.contains("potencial") { + f.title = format!("{} (potential — impact not demonstrated)", f.title); + } + notes.push(format!( + "severity calibrated: \"{}\" {old} → Medium (impact not demonstrated)", + short(&f.title) + )); + } + } + notes +} + +/// "Exposed → exploited": exposures on a host with no real exploit get flagged. +pub fn depth_audit(findings: &[Finding]) -> Vec { + let exploited: std::collections::HashSet = findings + .iter() + .filter(|f| !is_exposure(f) && sev_rank(&f.severity) >= 2) + .map(|f| host_of(&f.endpoint)) + .collect(); + let mut notes = Vec::new(); + for f in findings.iter().filter(|f| is_exposure(f)) { + if !exploited.contains(&host_of(&f.endpoint)) { + notes.push(format!( + "depth gap: \"{}\" exposed but not exploited — USE it (call the endpoint / decode the artifact / log in / hit the dev host) to prove impact, or down-rate to a lead", + short(&f.title) + )); + } + } + notes.truncate(8); + notes +} + +/// Advise consolidating hygiene classes that repeat across multiple assets. +pub fn hygiene_summary(findings: &[Finding]) -> Vec { + use std::collections::{BTreeMap, BTreeSet}; + let mut groups: BTreeMap<&'static str, BTreeSet> = BTreeMap::new(); + for f in findings.iter().filter(|f| is_exposure(f)) { + groups.entry(class_of(f)).or_default().insert(host_of(&f.endpoint)); + } + let mut notes = Vec::new(); + for (class, hosts) in groups { + if hosts.len() > 1 { + notes.push(format!( + "hygiene: '{class}' affects {} assets — consolidate into ONE finding with an affected-asset table (don't inflate the count one-per-host)", + hosts.len() + )); + } + } + notes +} + +#[cfg(test)] +mod tests { + use super::*; + fn f(title: &str, sev: &str, cwe: &str, ep: &str, ev: &str, payload: &str) -> Finding { + let mut x = Finding::default(); + x.title = title.into(); x.severity = sev.into(); x.cwe = cwe.into(); + x.endpoint = ep.into(); x.evidence = ev.into(); x.payload = payload.into(); + x + } + + #[test] + fn unproven_high_is_capped() { + let mut v = vec![f("Flooding DoS", "High", "CWE-770", "https://a/x", "could overload", "")]; + let notes = calibrate(&mut v); + assert_eq!(v[0].severity, "Medium"); + assert_eq!(notes.len(), 1); + } + + #[test] + fn proven_high_is_kept() { + let mut v = vec![f("SQLi", "High", "CWE-89", "https://a/x", + "id=1' UNION SELECT version()-- returned 8.0.32 in the response body, proving injection", "1' OR '1'='1")]; + calibrate(&mut v); + assert_eq!(v[0].severity, "High"); + } + + #[test] + fn exposure_without_exploit_flagged() { + let v = vec![f("Information Disclosure - .git exposed", "Low", "CWE-527", "https://a/.git", "leaked", "")]; + assert_eq!(depth_audit(&v).len(), 1); + } + + #[test] + fn exposure_with_exploit_on_same_host_not_flagged() { + let v = vec![ + f("Information Disclosure - banner", "Low", "CWE-200", "https://a/x", "Server: IIS", ""), + f("SQL Injection", "High", "CWE-89", "https://a/login", "dumped users", "1'--"), + ]; + assert!(depth_audit(&v).is_empty()); + } +} diff --git a/neurosploit-rs/crates/harness/src/lib.rs b/neurosploit-rs/crates/harness/src/lib.rs index 42e0f15..9debcba 100644 --- a/neurosploit-rs/crates/harness/src/lib.rs +++ b/neurosploit-rs/crates/harness/src/lib.rs @@ -1,4 +1,4 @@ -//! NeuroSploit v3.5.1 harness — a robust multi-model runtime for the +//! NeuroSploit v3.5.2 harness — a robust multi-model runtime for the //! markdown-driven autonomous pentest engine. //! //! The harness loads the `agents_md/` library, drives a *pool* of LLM models @@ -11,6 +11,7 @@ pub mod attack_graph; pub mod belief; pub mod creds; pub mod grounding; +pub mod hygiene; pub mod pomdp; pub mod models; pub mod pipeline; diff --git a/neurosploit-rs/crates/harness/src/pipeline.rs b/neurosploit-rs/crates/harness/src/pipeline.rs index 95244c6..32898ae 100644 --- a/neurosploit-rs/crates/harness/src/pipeline.rs +++ b/neurosploit-rs/crates/harness/src/pipeline.rs @@ -69,6 +69,16 @@ const REACT_DOCTRINE: &str = "METHOD (ReAct): work in explicit Thought → Actio Each Action runs ONE concrete tool command (e.g. a curl request); read its real Observation before the next Thought. \ Base every claim on an actual observed response — never assume. Stop when you've either proven an issue or exhausted reasonable checks. Be token-efficient: no filler, no repetition.\n\n"; +/// DEPTH doctrine (v3.5.2): push past detection to demonstrated impact, and +/// chain. Distilled from reviewing real AI-pentest output that kept stopping at +/// "exposed" instead of "exploited". +const DEPTH_DOCTRINE: &str = "DEPTH (exploit, don't just expose):\n\ +- Exposed → exploited: any info-disclosure, exposed service/catalog/WSDL, leaked credential/token, or non-prod (dev/staging) host you find MUST be USED before you report it — call the exposed endpoint, decode the leaked artifact, log in with the leaked credential, hit the dev host. If you only observed it but never used it, report it as a LEAD (low confidence), not a confirmed finding.\n\ +- Chain across steps: reuse any session/JWT/cookie/credential you obtain in one step against every other module; if one bug yields access, pivot it into IDOR/privesc/data-exfil and report the CHAIN, not isolated parts.\n\ +- Decode & fingerprint → CVE: decode opaque tokens/paths (base64/JSON/marshal) and fingerprint the stack (server, framework, library/gem/plugin versions); map exact versions to known CVEs and attempt a safe, non-destructive PoC.\n\ +- Audit tokens: for any JWT, check alg-confusion (RS→HS), alg:none, kid/jku injection, whether the signature is actually verified, and weak/guessable HS256 secrets.\n\ +- Calibrate honestly: claim High/Critical ONLY when impact is DEMONSTRATED; unproven DoS/abuse is Low/Info or a lead, never inflated.\n\n"; + /// Black-box web engagement: recon → parallel exploit → N-model vote → report. pub async fn run(cfg: RunConfig, lib: &Library, pool: &ModelPool, tx: Sender) -> RunOutput { pool.set_progress(tx.clone()); @@ -168,12 +178,13 @@ pub async fn run(cfg: RunConfig, lib: &Library, pool: &ModelPool, tx: Sender { @@ -623,6 +634,20 @@ async fn finish(cfg: RunConfig, _lib: &Library, recon: String, transcript: Strin let _ = tx.send(format!("grounding gate: demoted {demoted}/{before} ungrounded claim(s) (no tool receipt)")).await; } + // --- v3.5.2 report-hygiene & exploitation-depth pass --- + // Calibrate inflated/unproven High-Critical to Medium, flag exposures that + // were never exploited ("exposed → exploited"), and advise consolidating + // hygiene findings duplicated across many assets. + for n in crate::hygiene::calibrate(&mut findings) { + let _ = tx.send(format!("calibrate: {n}")).await; + } + for n in crate::hygiene::depth_audit(&findings) { + let _ = tx.send(format!("notify: {n}")).await; + } + for n in crate::hygiene::hygiene_summary(&findings) { + let _ = tx.send(format!("notify: {n}")).await; + } + // --- POMDP belief: build from grounded findings, report residual uncertainty --- let mut wm = crate::belief::WorldModel::new(); wm.deterministic = whitebox; diff --git a/neurosploit-rs/crates/harness/src/pomdp.rs b/neurosploit-rs/crates/harness/src/pomdp.rs index a95768a..6e77d3b 100644 --- a/neurosploit-rs/crates/harness/src/pomdp.rs +++ b/neurosploit-rs/crates/harness/src/pomdp.rs @@ -1,4 +1,4 @@ -//! POMDP decision layer (v3.5.1): value-of-information planning + the +//! POMDP decision layer (v3.5.2): value-of-information planning + the //! anti-hallucination gate. //! //! The choice "scan more vs exploit now" is **not** a heuristic here — it falls diff --git a/neurosploit-rs/crates/harness/src/report.rs b/neurosploit-rs/crates/harness/src/report.rs index 49d396e..145f2ae 100644 --- a/neurosploit-rs/crates/harness/src/report.rs +++ b/neurosploit-rs/crates/harness/src/report.rs @@ -97,9 +97,9 @@ pub fn html(target: &str, findings: &[Finding]) -> String { h4{{margin:12px 0 3px;font-size:12px;text-transform:uppercase;letter-spacing:.5px;color:#8b5cf6}}\ .b{{color:#8b5cf6;font-weight:800}}\

NeuroSploit Penetration Test Report

\ -
Target: {t} · v3.5.1 Rust harness · multi-model validated
\ +
Target: {t} · v3.5.2 Rust harness · multi-model validated
\
{chips}
{graph_block}

Findings ({n})

{body}\ -

Authorized testing only. Findings confirmed by multi-model adversarial voting.
NeuroSploit v3.5.1 · by Joas A Santos & Red Team Leaders

", +

Authorized testing only. Findings confirmed by multi-model adversarial voting.
NeuroSploit v3.5.2 · by Joas A Santos & Red Team Leaders

", t = esc(target), chips = chips, n = sorted.len(), body = body, graph_block = graph_block, ) } @@ -135,7 +135,7 @@ pub fn typst_report(target: &str, findings: &[Finding], dir: &Path) -> std::io:: let mut data = String::new(); data.push_str(&format!( "#let meta = (target: {}, run_id: {}, generated: {}, model: {})\n", - tq(target), tq(&run_id), tq("NeuroSploit v3.5.1"), tq("multi-model") + tq(target), tq(&run_id), tq("NeuroSploit v3.5.2"), tq("multi-model") )); data.push_str("#let findings = (\n"); for f in sorted_findings(findings) { diff --git a/scripts/build_methodology_v352.py b/scripts/build_methodology_v352.py new file mode 100644 index 0000000..fbde09d --- /dev/null +++ b/scripts/build_methodology_v352.py @@ -0,0 +1,183 @@ +#!/usr/bin/env python3 +""" +NeuroSploit v3.5.2 — exploitation-depth & report-hygiene doctrine agents. + +Distilled from reviewing real AI-pentest output that kept stopping at +"exposed" instead of "exploited". Emits meta-agents to agents_md/meta/ that +push the engine past detection to demonstrated impact, chain findings, decode +artifacts/correlate CVEs, audit tokens, and keep the report honest (dedup + +severity calibration). Credits: Joas A Santos & Red Team Leaders. +""" +import os + +ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__))) +OUT = os.path.join(ROOT, "agents_md", "meta") + +CREDITS = "Credits: Joas A Santos and Red Team Leaders." + + +def render(a): + L = [f"# {a['title']}\n", + f"> Meta-agent (v3.5.2 doctrine). {a['tagline']}\n", + "## User Prompt", + a["user"].strip(), "", + "## System Prompt", + a["system"].strip() + " " + CREDITS] + return "\n".join(L) + "\n" + + +AGENTS = [ + {"name": "exploit_depth_doctrine", + "title": "Exploitation Depth Doctrine Agent", + "tagline": "Turns every exposure into an exploitation attempt before it becomes a finding.", + "user": """ +You are reviewing the candidate findings and live transcript for **{target}**. + +For EACH candidate that merely *exposes* something (information disclosure, +exposed service/catalog/WSDL, leaked credential or token, reachable dev/staging +host, permissive CORS, open .git), drive it one step further BEFORE it is +reported: + +1. **Use what was exposed.** Call the exposed endpoint, decode the leaked + artifact, log in with the leaked credential, hit the dev host, send the + cross-origin request. Capture the real request/response. +2. **Decide honestly.** If using it proved impact → keep/raise severity with the + new evidence. If it could not be used → down-rate to a LEAD (low confidence), + never a confirmed High/Critical. +3. **Report the gap.** List any exposure you could not yet exploit, with the + exact next command to try, so the next round (or the human) can finish it. + +Output JSON: {"escalations":[{id, action_taken, new_evidence, new_severity}], +"leads":[{id, why_not_proven, next_command}]}. +""", + "system": """ +You are a senior exploitation lead. Detection is not a finding — impact is. You +never let an info-disclosure, exposed service, leaked secret or reachable +non-prod host be reported as confirmed without an attempt to actually use it, +backed by a real tool receipt. Unproven impact is a lead, not a High. Authorized +engagement; no destructive or DoS actions. +"""}, + + {"name": "finding_chainer", + "title": "Finding Chainer Agent", + "tagline": "Reuses obtained access across modules and reports the chain, not the parts.", + "user": """ +Given the confirmed findings and any sessions/tokens/credentials obtained during +the engagement on **{target}**, build exploitation CHAINS: + +- Reuse every session/JWT/cookie/credential from one step against ALL other + modules and hosts in scope (a captcha/login bypass that yields a token unlocks + the entire authenticated surface — use it). +- Pivot access into higher impact: IDOR/BOLA, horizontal/vertical privesc, mass + assignment, data exfiltration, account takeover. +- Combine separate weaknesses (e.g. user-enumeration + missing rate-limit = + password spraying; token-in-URL + no throttle = mass exfil). + +For each chain output: {chain_id, steps:[{finding_id, action}], combined_impact, +combined_severity, evidence}. Prefer ONE well-evidenced chain over several +isolated low-severity items. +""", + "system": """ +You are an exploit-chaining specialist. Isolated findings understate risk; the +real story is the chain. You always try to reuse obtained access across the +whole scope and escalate to business impact, reporting the combined chain with +concrete evidence. Authorized engagement; no destructive or DoS actions. +"""}, + + {"name": "artifact_decoder", + "title": "Artifact Decoder & CVE Correlator Agent", + "tagline": "Decodes opaque tokens/paths, fingerprints the stack, and maps versions to CVEs.", + "user": """ +For **{target}**, inspect every opaque or technology-revealing artifact seen in +recon and responses: + +1. **Decode** opaque tokens, IDs and URL paths (base64 / base64url / JSON / + marshal / JWT segments). A decoded value often reveals the framework or an + internal file path (e.g. a Dragonfly job `[["f","...file"]]`, a signed-URL + structure, a serialized object). +2. **Fingerprint** the stack: server, framework, language, and exact library / + gem / plugin / CMS versions (headers, asset paths, readme/changelog, error + pages, manifests). +3. **Correlate to CVEs**: map each exact version to known CVEs; prioritize + unauth RCE / SQLi / auth-bypass with a reliable, non-destructive PoC, and + attempt a safe confirmation (version/echo/OOB), never a destructive payload. + +Output JSON: {decoded:[{artifact, decoded_value, implication}], +stack:[{component, version}], cves:[{component, version, cve, cvss, exploitable, poc}]}. +""", + "system": """ +You decode the opaque and correlate the obvious. Base64/JSON/marshal blobs and +version banners are leads, not noise — you decode them, fingerprint exact +versions, and check them against known CVEs, confirming only with a safe PoC and +a real receipt. Authorized engagement; no destructive or DoS actions. +"""}, + + {"name": "token_auditor", + "title": "Token & JWT Auditor Agent", + "tagline": "Attacks tokens: alg-confusion, none, kid/jku, signature checks, weak HS256 secrets.", + "user": """ +For any session token or JWT issued by **{target}**, run a full auth-token audit: + +1. **Decode** the header/payload; note alg (HS*/RS*/none), kid, jku, exp, claims. +2. **Algorithm attacks**: try `alg:none`, RS→HS confusion (sign with the public + key as HMAC secret), and kid/jku injection. Confirm whether the server + actually verifies the signature (tamper a claim and replay). +3. **Weak secret**: for HS256, attempt to crack the signing secret offline + (wordlist/rules); a static or guessable shared secret (e.g. an `x-auth-*` + header value) is a strong lead — if cracked, forge a token for any user. +4. **Lifecycle**: test reuse after logout, expiry enforcement, and refresh-token + revocation. + +Output JSON: {token_type, alg, verified:true|false, +attacks:[{name, result, evidence}], forged_token_possible:true|false}. +""", + "system": """ +You are a token-security specialist. Every JWT/session token gets audited for +algorithm confusion, none, kid/jku injection, real signature verification, weak +HS256 secrets, and lifecycle (logout/expiry/refresh). A forged or replayable +token is account takeover — you prove it with a real receipt. Authorized +engagement; no destructive or DoS actions. +"""}, + + {"name": "report_calibrator", + "title": "Report Calibrator Agent", + "tagline": "Dedups by class, calibrates severity to proven impact, demands evidence per claim.", + "user": """ +Before the final report for **{target}**, clean and calibrate the findings: + +1. **Consolidate hygiene by class.** Merge repeated hygiene findings (missing + security headers, clickjacking, cookie flags, weak TLS, HSTS, version/banner + disclosure) into ONE finding per class with an affected-asset TABLE — do not + inflate the count one-per-host. +2. **Calibrate severity to PROVEN impact.** High/Critical requires demonstrated + impact with evidence. Unproven DoS/abuse, "could/may/potential" language, or a + finding with no concrete payload/PoC → cap to Low/Medium or mark + "(potential)". Recompute the CVSS vector to match the proven impact. +3. **Evidence per claim.** Every finding — and every item in the "tests + performed" log — must carry a concrete request/response receipt; flag any + claim that has none, and any contradiction between the test log and the + findings. + +Output JSON: {merged:[{class, severity, assets:[...]}], +recalibrated:[{id, old_severity, new_severity, reason}], +unevidenced:[{id_or_test, missing}]}. +""", + "system": """ +You are a meticulous report editor. You group hygiene by class with an +asset table, calibrate every severity to demonstrated impact (no inflated +High/Critical, no padding the count with duplicates), and require a real +receipt behind every claim — including each line of the tests-performed log. +Honest, deduplicated, evidence-backed reporting only. +"""}, +] + + +def main(): + os.makedirs(OUT, exist_ok=True) + for a in AGENTS: + open(os.path.join(OUT, a["name"] + ".md"), "w").write(render(a)) + print(f"wrote {len(AGENTS)} v3.5.2 doctrine meta-agents to {OUT}") + + +if __name__ == "__main__": + main() diff --git a/setup.sh b/setup.sh index 6bb08e9..2857376 100755 --- a/setup.sh +++ b/setup.sh @@ -25,7 +25,7 @@ cat <<'BANNER' ███╗ ██╗███████╗██╗ ██╗██████╗ ██████╗ ████╗ ██║██╔════╝██║ ██║██╔══██╗██╔═══██╗ NeuroSploit installer - ██╔██╗ ██║█████╗ ██║ ██║██████╔╝██║ ██║ v3.5.1 — Rust harness + ██╔██╗ ██║█████╗ ██║ ██║██████╔╝██║ ██║ v3.5.2 — Rust harness ██║╚██╗██║██╔══╝ ██║ ██║██╔══██╗██║ ██║ by Joas A Santos ██║ ╚████║███████╗╚██████╔╝██║ ██║╚██████╔╝ & Red Team Leaders ╚═╝ ╚═══╝╚══════╝ ╚═════╝ ╚═╝ ╚═╝ ╚═════╝