v3.5.2 — Exploitation Depth & Report Hygiene

Distilled from reviewing real AI-pentest output that kept stopping at "exposed" instead of "exploited". Pure-additive, back-compatible. Behavior (injected into black/grey/chain exploit prompts via DEPTH_DOCTRINE): - Exposed → exploited: any info-disclosure / exposed service/WSDL / leaked credential|token / reachable dev host MUST be used before it's a finding; otherwise it's a lead, not a confirmed High/Critical. - Chain across modules: reuse obtained session/JWT/cookie/credential and pivot to IDOR/privesc/exfil; report the chain, not isolated parts. - Decode & fingerprint → CVE; audit tokens (alg-confusion/none/kid/JWKS, weak HS256 secret cracking, lifecycle). Deterministic post-pass (new crates/harness/src/hygiene.rs, wired into finish()): - calibrate severity to PROVEN impact — unproven High/Critical (hedged, no payload, thin evidence) capped to Medium and re-titled "(potential)"; - depth_audit — flag exposures on a host with no real exploit; - hygiene_summary — advise consolidating hygiene classes repeated across assets. Unit tests cover calibration + depth audit. 5 new doctrine meta-agents (scripts/build_methodology_v352.py → agents_md/meta/): exploit_depth_doctrine, finding_chainer, artifact_decoder, token_auditor, report_calibrator (meta 17→22, total 343→348). Version bumped 3.5.1 → 3.5.2 across crates/app/installers/docs; RELEASE/README updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-02 01:35:32 +02:00 · 2026-06-26 11:31:11 -03:00
parent ac84db024c
commit e4efa9bbb0
23 changed files with 628 additions and 28 deletions
@@ -871,7 +871,7 @@ dependencies = [

 [[package]]
 name = "neurosploit"
-version = "3.5.1"
+version = "3.5.2"
 dependencies = [
 "anyhow",
 "clap",
@@ -888,7 +888,7 @@ dependencies = [

 [[package]]
 name = "neurosploit-harness"
-version = "3.5.1"
+version = "3.5.2"
 dependencies = [
 "anyhow",
 "futures",
@@ -3,7 +3,7 @@ members = ["crates/harness", "app"]
 resolver = "2"

 [workspace.package]
-version = "3.5.1"
+version = "3.5.2"
 edition = "2021"
 license = "MIT"
 repository = "https://github.com/JoasASantos/NeuroSploit"
@@ -1,4 +1,4 @@
-//! NeuroSploit v3.5.1 — interactive harness + CLI (`run` / `whitebox` / `agents` / `models`).
+//! NeuroSploit v3.5.2 — interactive harness + CLI (`run` / `whitebox` / `agents` / `models`).

 mod repl;
 mod tui;
@@ -11,8 +11,8 @@ use std::path::{Path, PathBuf};
 #[command(
    name = "neurosploit",
    version,
-    about = "NeuroSploit v3.5.1 — multi-model autonomous pentest harness",
-    long_about = "NeuroSploit v3.5.1 — a Rust multi-model harness that drives a pool of LLMs \
+    about = "NeuroSploit v3.5.2 — multi-model autonomous pentest harness",
+    long_about = "NeuroSploit v3.5.2 — a Rust multi-model harness that drives a pool of LLMs \
 (API key or local subscription: Claude/Codex/Gemini/Grok) to autonomously test a target. \
 After recon it INTELLIGENTLY selects only the agents matching the discovered surface, runs \
 them in parallel, then validates every finding by cross-model voting before reporting.\n\n\
@@ -379,7 +379,7 @@ pub(crate) fn spawn_engagement(base: &Path, mut cfg: RunConfig, mcp: bool, mode:
    cfg.rl_path = Some(base.join("data").join("rl_state_rs.json").display().to_string());
    write_status(&workdir, "running", &format!("\"target\":{:?}", cfg.target));

-    println!("  ┌─ NeuroSploit v3.5.1  ·  by Joas A Santos & Red Team Leaders");
+    println!("  ┌─ NeuroSploit v3.5.2  ·  by Joas A Santos & Red Team Leaders");
    println!("  │  run id : {run_id}");
    println!("  │  target : {}", cfg.target);
    println!("  │  models : {}", cfg.models.join(", "));
@@ -1,4 +1,4 @@
-//! NeuroSploit v3.5.1 — interactive session (Claude-Code / Codex / Cursor-CLI style).
+//! NeuroSploit v3.5.2 — interactive session (Claude-Code / Codex / Cursor-CLI style).
 //!
 //! Launched when `neurosploit` runs with no subcommand. A persistent REPL with
 //! real line editing (arrow-key history recall, Ctrl-A/E/K, paste), model
@@ -299,7 +299,7 @@ pub async fn repl(base: &Path) -> anyhow::Result<()> {
    let backends = harness::installed_cli_backends();
    println!("\x1b[1m");
    println!("  ███╗   ██╗███████╗██╗   ██╗██████╗  ██████╗");
-    println!("  ████╗  ██║██╔════╝██║   ██║██╔══██╗██╔═══██╗   NeuroSploit v3.5.1");
+    println!("  ████╗  ██║██╔════╝██║   ██║██╔══██╗██╔═══██╗   NeuroSploit v3.5.2");
    println!("  ██╔██╗ ██║█████╗  ██║   ██║██████╔╝██║   ██║   interactive harness");
    println!("  ██║╚██╗██║██╔══╝  ██║   ██║██╔══██╗██║   ██║   by Joas A Santos");
    println!("  ██║ ╚████║███████╗╚██████╔╝██║  ██║╚██████╔╝   & Red Team Leaders");
@@ -1,4 +1,4 @@
-//! NeuroSploit v3.5.1 — TUI "Mission Control" mode.
+//! NeuroSploit v3.5.2 — TUI "Mission Control" mode.
 //!
 //! Concurrent panels that update live while the engagement runs in the
 //! background, with a composer input that stays active during execution:
@@ -1,4 +1,4 @@
-//! POMDP belief-state world model (v3.5.1).
+//! POMDP belief-state world model (v3.5.2).
 //!
 //! The target is only partially observable, so we don't track booleans — we
 //! track a **belief**: a property graph whose nodes (host / service / vuln /
@@ -1,4 +1,4 @@
-//! Verification / grounding engine (v3.5.1).
+//! Verification / grounding engine (v3.5.2).
 //!
 //! Hard rule: **no claim enters the world model without a tool receipt** — raw
 //! tool output, not the LLM's paraphrase. This is the empirical anti-hallucination
@@ -0,0 +1,186 @@
+//! Report-hygiene & exploitation-depth pass (v3.5.2).
+//!
+//! Encodes the post-engagement discipline learned from reviewing real
+//! AI-pentest output, applied deterministically after validation:
+//!  1. **Calibrate severity to PROVEN impact** — an unproven High/Critical
+//!     (hedged language, no payload, thin evidence) is capped to Medium and
+//!     re-titled "(potential)". No inflated severities.
+//!  2. **Exposed → exploited** — flag info-disclosure / exposed-service /
+//!     leaked-credential findings on a host that has no actual exploit, so the
+//!     operator knows to *use* what was exposed (or down-rate it to a lead).
+//!  3. **Consolidate hygiene** — when the same hygiene class (missing headers,
+//!     clickjacking, cookie flags, TLS, info-disclosure…) repeats across many
+//!     assets, advise merging into ONE finding with an affected-asset table,
+//!     instead of inflating the count one-per-host.
+//!
+//! All functions are pure/deterministic; only `calibrate` mutates findings
+//! (severity/title/confidence). The rest return advisory strings streamed to
+//! the operator and recorded with the run.
+use crate::types::Finding;
+
+fn host_of(endpoint: &str) -> String {
+    let s = endpoint.trim();
+    let s = s.split("://").last().unwrap_or(s);
+    let s = s.split('/').next().unwrap_or(s);
+    s.split('?').next().unwrap_or(s).to_lowercase()
+}
+
+fn sev_rank(s: &str) -> u8 {
+    match s.to_lowercase().as_str() {
+        x if x.starts_with("crit") => 4,
+        x if x.starts_with("high") => 3,
+        x if x.starts_with("med") => 2,
+        x if x.starts_with("low") => 1,
+        _ => 0,
+    }
+}
+
+fn short(s: &str) -> String {
+    s.chars().take(64).collect()
+}
+
+/// Hedging words that signal an impact was described but not demonstrated
+/// (English + Portuguese, since engagements are bilingual).
+const WEASEL: &[&str] = &[
+    "could ", "may ", "might ", "potential", "possible", "possibly", "teóric", "theoret",
+    "poderia", "possív", "potencial", "if the ", "caso o", "caso a", "would allow", "permitiria",
+];
+
+/// A finding that *exposes* something (recon/disclosure) rather than being an
+/// exploit with demonstrated impact.
+fn is_exposure(f: &Finding) -> bool {
+    let cwe = f.cwe.to_lowercase();
+    let t = f.title.to_lowercase();
+    ["200", "527", "538", "942", "497", "209", "548", "16"].iter().any(|c| cwe.contains(c))
+        || [
+            "disclosure", "exposed", "exposi", "exposure", "catalog", "catálogo", "cors",
+            "banner", "version", "versão", "header", "cabeçalho", ".git", "enumerat",
+            "fingerprint", "wsdl", "swagger", "missing security", "outdated", "eol",
+        ]
+        .iter()
+        .any(|k| t.contains(k))
+}
+
+/// Reads as unproven: hedged or thin evidence AND no concrete payload.
+fn looks_unproven(f: &Finding) -> bool {
+    let blob = format!("{} {} {}", f.title, f.impact, f.evidence).to_lowercase();
+    let hedged = WEASEL.iter().any(|w| blob.contains(w));
+    let weak_ev = f.evidence.trim().chars().count() < 40;
+    let no_payload = f.payload.trim().is_empty();
+    (hedged || weak_ev) && no_payload
+}
+
+/// Normalized hygiene class, for consolidation advice.
+fn class_of(f: &Finding) -> &'static str {
+    let t = f.title.to_lowercase();
+    if t.contains("header") || t.contains("cabeçalho") { "missing-security-headers" }
+    else if t.contains("clickjack") || t.contains("frame") { "clickjacking" }
+    else if t.contains("hsts") || t.contains("strict-transport") { "missing-hsts" }
+    else if t.contains("cookie") { "cookie-flags" }
+    else if t.contains("tls") || t.contains("ssl") { "weak-tls" }
+    else if t.contains("cors") { "cors-misconfig" }
+    else if t.contains("version") || t.contains("versão") || t.contains("banner") || t.contains("eol") || t.contains("outdated") { "version-disclosure" }
+    else { "information-disclosure" }
+}
+
+/// Cap inflated, unproven High/Critical findings to Medium. Returns advisories.
+pub fn calibrate(findings: &mut [Finding]) -> Vec<String> {
+    let mut notes = Vec::new();
+    for f in findings.iter_mut() {
+        if sev_rank(&f.severity) >= 3 && looks_unproven(f) {
+            let old = f.severity.clone();
+            f.severity = "Medium".into();
+            f.confidence = f.confidence.min(0.5);
+            let low = f.title.to_lowercase();
+            if !low.contains("potential") && !low.contains("potencial") {
+                f.title = format!("{} (potential — impact not demonstrated)", f.title);
+            }
+            notes.push(format!(
+                "severity calibrated: \"{}\" {old} → Medium (impact not demonstrated)",
+                short(&f.title)
+            ));
+        }
+    }
+    notes
+}
+
+/// "Exposed → exploited": exposures on a host with no real exploit get flagged.
+pub fn depth_audit(findings: &[Finding]) -> Vec<String> {
+    let exploited: std::collections::HashSet<String> = findings
+        .iter()
+        .filter(|f| !is_exposure(f) && sev_rank(&f.severity) >= 2)
+        .map(|f| host_of(&f.endpoint))
+        .collect();
+    let mut notes = Vec::new();
+    for f in findings.iter().filter(|f| is_exposure(f)) {
+        if !exploited.contains(&host_of(&f.endpoint)) {
+            notes.push(format!(
+                "depth gap: \"{}\" exposed but not exploited — USE it (call the endpoint / decode the artifact / log in / hit the dev host) to prove impact, or down-rate to a lead",
+                short(&f.title)
+            ));
+        }
+    }
+    notes.truncate(8);
+    notes
+}
+
+/// Advise consolidating hygiene classes that repeat across multiple assets.
+pub fn hygiene_summary(findings: &[Finding]) -> Vec<String> {
+    use std::collections::{BTreeMap, BTreeSet};
+    let mut groups: BTreeMap<&'static str, BTreeSet<String>> = BTreeMap::new();
+    for f in findings.iter().filter(|f| is_exposure(f)) {
+        groups.entry(class_of(f)).or_default().insert(host_of(&f.endpoint));
+    }
+    let mut notes = Vec::new();
+    for (class, hosts) in groups {
+        if hosts.len() > 1 {
+            notes.push(format!(
+                "hygiene: '{class}' affects {} assets — consolidate into ONE finding with an affected-asset table (don't inflate the count one-per-host)",
+                hosts.len()
+            ));
+        }
+    }
+    notes
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    fn f(title: &str, sev: &str, cwe: &str, ep: &str, ev: &str, payload: &str) -> Finding {
+        let mut x = Finding::default();
+        x.title = title.into(); x.severity = sev.into(); x.cwe = cwe.into();
+        x.endpoint = ep.into(); x.evidence = ev.into(); x.payload = payload.into();
+        x
+    }
+
+    #[test]
+    fn unproven_high_is_capped() {
+        let mut v = vec![f("Flooding DoS", "High", "CWE-770", "https://a/x", "could overload", "")];
+        let notes = calibrate(&mut v);
+        assert_eq!(v[0].severity, "Medium");
+        assert_eq!(notes.len(), 1);
+    }
+
+    #[test]
+    fn proven_high_is_kept() {
+        let mut v = vec![f("SQLi", "High", "CWE-89", "https://a/x",
+            "id=1' UNION SELECT version()-- returned 8.0.32 in the response body, proving injection", "1' OR '1'='1")];
+        calibrate(&mut v);
+        assert_eq!(v[0].severity, "High");
+    }
+
+    #[test]
+    fn exposure_without_exploit_flagged() {
+        let v = vec![f("Information Disclosure - .git exposed", "Low", "CWE-527", "https://a/.git", "leaked", "")];
+        assert_eq!(depth_audit(&v).len(), 1);
+    }
+
+    #[test]
+    fn exposure_with_exploit_on_same_host_not_flagged() {
+        let v = vec![
+            f("Information Disclosure - banner", "Low", "CWE-200", "https://a/x", "Server: IIS", ""),
+            f("SQL Injection", "High", "CWE-89", "https://a/login", "dumped users", "1'--"),
+        ];
+        assert!(depth_audit(&v).is_empty());
+    }
+}
@@ -1,4 +1,4 @@
-//! NeuroSploit v3.5.1 harness — a robust multi-model runtime for the
+//! NeuroSploit v3.5.2 harness — a robust multi-model runtime for the
 //! markdown-driven autonomous pentest engine.
 //!
 //! The harness loads the `agents_md/` library, drives a *pool* of LLM models
@@ -11,6 +11,7 @@ pub mod attack_graph;
 pub mod belief;
 pub mod creds;
 pub mod grounding;
+pub mod hygiene;
 pub mod pomdp;
 pub mod models;
 pub mod pipeline;
@@ -69,6 +69,16 @@ const REACT_DOCTRINE: &str = "METHOD (ReAct): work in explicit Thought → Actio
 Each Action runs ONE concrete tool command (e.g. a curl request); read its real Observation before the next Thought. \
 Base every claim on an actual observed response — never assume. Stop when you've either proven an issue or exhausted reasonable checks. Be token-efficient: no filler, no repetition.\n\n";

+/// DEPTH doctrine (v3.5.2): push past detection to demonstrated impact, and
+/// chain. Distilled from reviewing real AI-pentest output that kept stopping at
+/// "exposed" instead of "exploited".
+const DEPTH_DOCTRINE: &str = "DEPTH (exploit, don't just expose):\n\
+- Exposed → exploited: any info-disclosure, exposed service/catalog/WSDL, leaked credential/token, or non-prod (dev/staging) host you find MUST be USED before you report it — call the exposed endpoint, decode the leaked artifact, log in with the leaked credential, hit the dev host. If you only observed it but never used it, report it as a LEAD (low confidence), not a confirmed finding.\n\
+- Chain across steps: reuse any session/JWT/cookie/credential you obtain in one step against every other module; if one bug yields access, pivot it into IDOR/privesc/data-exfil and report the CHAIN, not isolated parts.\n\
+- Decode & fingerprint → CVE: decode opaque tokens/paths (base64/JSON/marshal) and fingerprint the stack (server, framework, library/gem/plugin versions); map exact versions to known CVEs and attempt a safe, non-destructive PoC.\n\
+- Audit tokens: for any JWT, check alg-confusion (RS→HS), alg:none, kid/jku injection, whether the signature is actually verified, and weak/guessable HS256 secrets.\n\
+- Calibrate honestly: claim High/Critical ONLY when impact is DEMONSTRATED; unproven DoS/abuse is Low/Info or a lead, never inflated.\n\n";
+
 /// Black-box web engagement: recon → parallel exploit → N-model vote → report.
 pub async fn run(cfg: RunConfig, lib: &Library, pool: &ModelPool, tx: Sender<String>) -> RunOutput {
    pool.set_progress(tx.clone());
@@ -168,12 +178,13 @@ pub async fn run(cfg: RunConfig, lib: &Library, pool: &ModelPool, tx: Sender<Str
                let user = format!(
                    "AUTHORIZED engagement — you have explicit permission to test {target}. \
                     Do not ask for confirmation — proceed and PROVE each issue.\n\n\
-                     {directives}{react}{doctrine}{body}\n\nWhen done, reply with ONLY a JSON array of confirmed findings (may be empty []). \
+                     {directives}{react}{depth}{doctrine}{body}\n\nWhen done, reply with ONLY a JSON array of confirmed findings (may be empty []). \
                     Each item: {{id,title,severity,cwe,endpoint,payload,evidence,impact,remediation,confidence}}. \
                     `evidence` must contain the concrete proof (request/response excerpt).",
                    target = target,
                    directives = directives,
                    react = REACT_DOCTRINE,
+                    depth = DEPTH_DOCTRINE,
                    doctrine = tool_doctrine(mcp_on),
                    body = ag.user.replace("{target}", &target).replace("{recon_json}", &recon),
                );
@@ -387,11 +398,11 @@ pub async fn run_greybox(cfg: RunConfig, lib: &Library, pool: &ModelPool, tx: Se
                }
                let user = format!(
                    "AUTHORIZED greybox engagement on {target} — you also have the source review below. \
-                     Proceed and PROVE each issue against the LIVE app.\n\n{directives}{leads}{react}{doctrine}{body}\n\n\
+                     Proceed and PROVE each issue against the LIVE app.\n\n{directives}{leads}{react}{depth}{doctrine}{body}\n\n\
                     Reply ONLY a JSON array of confirmed findings (may be []): \
                     {{id,title,severity,cwe,endpoint,payload,evidence,impact,remediation,confidence}}.",
                    target = target, directives = directives, leads = leads,
-                    react = REACT_DOCTRINE, doctrine = tool_doctrine(mcp_on),
+                    react = REACT_DOCTRINE, depth = DEPTH_DOCTRINE, doctrine = tool_doctrine(mcp_on),
                    body = ag.user.replace("{target}", &target).replace("{recon_json}", &recon),
                );
                match pool.complete_routed(Task::Exploit, &ag.name, &ag.system, &user).await {
@@ -439,12 +450,12 @@ async fn chain_round(pool: &ModelPool, target: &str, recon: &str, directives: &s
    let _ = tx.send(format!("chaining {} confirmed finding(s) for deeper impact…", confirmed.len())).await;
    let recon_ctx: String = recon.chars().take(2500).collect();
    let user = format!(
-        "AUTHORIZED engagement on {target}.\n\n{directives}{react}{doctrine}{recipe_block}\
+        "AUTHORIZED engagement on {target}.\n\n{directives}{react}{depth}{doctrine}{recipe_block}\
         CONFIRMED FINDINGS TO CHAIN:\n{summary}\n\nRecon:\n{recon_ctx}\n\n\
         Chain these into deeper impact (e.g. SQLi→RCE→LPE, SSRF→cloud creds, upload→LFI→RCE) and PROVE each stage. \
         Reply ONLY a JSON array of NEW findings \
         (may be []): {{id,title,severity,cwe,endpoint,payload,evidence,impact,remediation,confidence}}.",
-        react = REACT_DOCTRINE, doctrine = tool_doctrine(pool.mcp_config.is_some()),
+        react = REACT_DOCTRINE, depth = DEPTH_DOCTRINE, doctrine = tool_doctrine(pool.mcp_config.is_some()),
    );
    match pool.complete_routed(Task::Exploit, "chain", CHAIN_SYS, &user).await {
        Ok((m, text)) => {
@@ -623,6 +634,20 @@ async fn finish(cfg: RunConfig, _lib: &Library, recon: String, transcript: Strin
        let _ = tx.send(format!("grounding gate: demoted {demoted}/{before} ungrounded claim(s) (no tool receipt)")).await;
    }

+    // --- v3.5.2 report-hygiene & exploitation-depth pass ---
+    // Calibrate inflated/unproven High-Critical to Medium, flag exposures that
+    // were never exploited ("exposed → exploited"), and advise consolidating
+    // hygiene findings duplicated across many assets.
+    for n in crate::hygiene::calibrate(&mut findings) {
+        let _ = tx.send(format!("calibrate: {n}")).await;
+    }
+    for n in crate::hygiene::depth_audit(&findings) {
+        let _ = tx.send(format!("notify: {n}")).await;
+    }
+    for n in crate::hygiene::hygiene_summary(&findings) {
+        let _ = tx.send(format!("notify: {n}")).await;
+    }
+
    // --- POMDP belief: build from grounded findings, report residual uncertainty ---
    let mut wm = crate::belief::WorldModel::new();
    wm.deterministic = whitebox;
@@ -1,4 +1,4 @@
-//! POMDP decision layer (v3.5.1): value-of-information planning + the
+//! POMDP decision layer (v3.5.2): value-of-information planning + the
 //! anti-hallucination gate.
 //!
 //! The choice "scan more vs exploit now" is **not** a heuristic here — it falls
@@ -97,9 +97,9 @@ pub fn html(target: &str, findings: &[Finding]) -> String {
         h4{{margin:12px 0 3px;font-size:12px;text-transform:uppercase;letter-spacing:.5px;color:#8b5cf6}}\
         .b{{color:#8b5cf6;font-weight:800}}</style></head><body>\
         <h1><span class=b>NeuroSploit</span> Penetration Test Report</h1>\
-         <div class=meta>Target: <b>{t}</b> · v3.5.1 Rust harness · multi-model validated</div>\
+         <div class=meta>Target: <b>{t}</b> · v3.5.2 Rust harness · multi-model validated</div>\
         <div>{chips}</div>{graph_block}<h2>Findings ({n})</h2>{body}\
-         <p class=meta>Authorized testing only. Findings confirmed by multi-model adversarial voting.<br>NeuroSploit v3.5.1 · by <b>Joas A Santos</b> &amp; <b>Red Team Leaders</b></p></body></html>",
+         <p class=meta>Authorized testing only. Findings confirmed by multi-model adversarial voting.<br>NeuroSploit v3.5.2 · by <b>Joas A Santos</b> &amp; <b>Red Team Leaders</b></p></body></html>",
        t = esc(target), chips = chips, n = sorted.len(), body = body, graph_block = graph_block,
    )
 }
@@ -135,7 +135,7 @@ pub fn typst_report(target: &str, findings: &[Finding], dir: &Path) -> std::io::
    let mut data = String::new();
    data.push_str(&format!(
        "#let meta = (target: {}, run_id: {}, generated: {}, model: {})\n",
-        tq(target), tq(&run_id), tq("NeuroSploit v3.5.1"), tq("multi-model")
+        tq(target), tq(&run_id), tq("NeuroSploit v3.5.2"), tq("multi-model")
    ));
    data.push_str("#let findings = (\n");
    for f in sorted_findings(findings) {