v3.5.2 — Exploitation Depth & Report Hygiene

Distilled from reviewing real AI-pentest output that kept stopping at "exposed"
instead of "exploited". Pure-additive, back-compatible.

Behavior (injected into black/grey/chain exploit prompts via DEPTH_DOCTRINE):
- Exposed → exploited: any info-disclosure / exposed service/WSDL / leaked
  credential|token / reachable dev host MUST be used before it's a finding;
  otherwise it's a lead, not a confirmed High/Critical.
- Chain across modules: reuse obtained session/JWT/cookie/credential and pivot
  to IDOR/privesc/exfil; report the chain, not isolated parts.
- Decode & fingerprint → CVE; audit tokens (alg-confusion/none/kid/JWKS, weak
  HS256 secret cracking, lifecycle).

Deterministic post-pass (new crates/harness/src/hygiene.rs, wired into finish()):
- calibrate severity to PROVEN impact — unproven High/Critical (hedged, no
  payload, thin evidence) capped to Medium and re-titled "(potential)";
- depth_audit — flag exposures on a host with no real exploit;
- hygiene_summary — advise consolidating hygiene classes repeated across assets.
Unit tests cover calibration + depth audit.

5 new doctrine meta-agents (scripts/build_methodology_v352.py → agents_md/meta/):
exploit_depth_doctrine, finding_chainer, artifact_decoder, token_auditor,
report_calibrator (meta 17→22, total 343→348).

Version bumped 3.5.1 → 3.5.2 across crates/app/installers/docs; RELEASE/README
updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
CyberSecurityUP
2026-06-26 11:31:11 -03:00
parent ac84db024c
commit e4efa9bbb0
23 changed files with 628 additions and 28 deletions
+2 -2
View File
@@ -871,7 +871,7 @@ dependencies = [
[[package]]
name = "neurosploit"
version = "3.5.1"
version = "3.5.2"
dependencies = [
"anyhow",
"clap",
@@ -888,7 +888,7 @@ dependencies = [
[[package]]
name = "neurosploit-harness"
version = "3.5.1"
version = "3.5.2"
dependencies = [
"anyhow",
"futures",
+1 -1
View File
@@ -3,7 +3,7 @@ members = ["crates/harness", "app"]
resolver = "2"
[workspace.package]
version = "3.5.1"
version = "3.5.2"
edition = "2021"
license = "MIT"
repository = "https://github.com/JoasASantos/NeuroSploit"
+4 -4
View File
@@ -1,4 +1,4 @@
//! NeuroSploit v3.5.1 — interactive harness + CLI (`run` / `whitebox` / `agents` / `models`).
//! NeuroSploit v3.5.2 — interactive harness + CLI (`run` / `whitebox` / `agents` / `models`).
mod repl;
mod tui;
@@ -11,8 +11,8 @@ use std::path::{Path, PathBuf};
#[command(
name = "neurosploit",
version,
about = "NeuroSploit v3.5.1 — multi-model autonomous pentest harness",
long_about = "NeuroSploit v3.5.1 — a Rust multi-model harness that drives a pool of LLMs \
about = "NeuroSploit v3.5.2 — multi-model autonomous pentest harness",
long_about = "NeuroSploit v3.5.2 — a Rust multi-model harness that drives a pool of LLMs \
(API key or local subscription: Claude/Codex/Gemini/Grok) to autonomously test a target. \
After recon it INTELLIGENTLY selects only the agents matching the discovered surface, runs \
them in parallel, then validates every finding by cross-model voting before reporting.\n\n\
@@ -379,7 +379,7 @@ pub(crate) fn spawn_engagement(base: &Path, mut cfg: RunConfig, mcp: bool, mode:
cfg.rl_path = Some(base.join("data").join("rl_state_rs.json").display().to_string());
write_status(&workdir, "running", &format!("\"target\":{:?}", cfg.target));
println!(" ┌─ NeuroSploit v3.5.1 · by Joas A Santos & Red Team Leaders");
println!(" ┌─ NeuroSploit v3.5.2 · by Joas A Santos & Red Team Leaders");
println!(" │ run id : {run_id}");
println!(" │ target : {}", cfg.target);
println!(" │ models : {}", cfg.models.join(", "));
+2 -2
View File
@@ -1,4 +1,4 @@
//! NeuroSploit v3.5.1 — interactive session (Claude-Code / Codex / Cursor-CLI style).
//! NeuroSploit v3.5.2 — interactive session (Claude-Code / Codex / Cursor-CLI style).
//!
//! Launched when `neurosploit` runs with no subcommand. A persistent REPL with
//! real line editing (arrow-key history recall, Ctrl-A/E/K, paste), model
@@ -299,7 +299,7 @@ pub async fn repl(base: &Path) -> anyhow::Result<()> {
let backends = harness::installed_cli_backends();
println!("\x1b[1m");
println!(" ███╗ ██╗███████╗██╗ ██╗██████╗ ██████╗");
println!(" ████╗ ██║██╔════╝██║ ██║██╔══██╗██╔═══██╗ NeuroSploit v3.5.1");
println!(" ████╗ ██║██╔════╝██║ ██║██╔══██╗██╔═══██╗ NeuroSploit v3.5.2");
println!(" ██╔██╗ ██║█████╗ ██║ ██║██████╔╝██║ ██║ interactive harness");
println!(" ██║╚██╗██║██╔══╝ ██║ ██║██╔══██╗██║ ██║ by Joas A Santos");
println!(" ██║ ╚████║███████╗╚██████╔╝██║ ██║╚██████╔╝ & Red Team Leaders");
+1 -1
View File
@@ -1,4 +1,4 @@
//! NeuroSploit v3.5.1 — TUI "Mission Control" mode.
//! NeuroSploit v3.5.2 — TUI "Mission Control" mode.
//!
//! Concurrent panels that update live while the engagement runs in the
//! background, with a composer input that stays active during execution:
+1 -1
View File
@@ -1,4 +1,4 @@
//! POMDP belief-state world model (v3.5.1).
//! POMDP belief-state world model (v3.5.2).
//!
//! The target is only partially observable, so we don't track booleans — we
//! track a **belief**: a property graph whose nodes (host / service / vuln /
@@ -1,4 +1,4 @@
//! Verification / grounding engine (v3.5.1).
//! Verification / grounding engine (v3.5.2).
//!
//! Hard rule: **no claim enters the world model without a tool receipt** — raw
//! tool output, not the LLM's paraphrase. This is the empirical anti-hallucination
@@ -0,0 +1,186 @@
//! Report-hygiene & exploitation-depth pass (v3.5.2).
//!
//! Encodes the post-engagement discipline learned from reviewing real
//! AI-pentest output, applied deterministically after validation:
//! 1. **Calibrate severity to PROVEN impact** — an unproven High/Critical
//! (hedged language, no payload, thin evidence) is capped to Medium and
//! re-titled "(potential)". No inflated severities.
//! 2. **Exposed → exploited** — flag info-disclosure / exposed-service /
//! leaked-credential findings on a host that has no actual exploit, so the
//! operator knows to *use* what was exposed (or down-rate it to a lead).
//! 3. **Consolidate hygiene** — when the same hygiene class (missing headers,
//! clickjacking, cookie flags, TLS, info-disclosure…) repeats across many
//! assets, advise merging into ONE finding with an affected-asset table,
//! instead of inflating the count one-per-host.
//!
//! All functions are pure/deterministic; only `calibrate` mutates findings
//! (severity/title/confidence). The rest return advisory strings streamed to
//! the operator and recorded with the run.
use crate::types::Finding;
fn host_of(endpoint: &str) -> String {
let s = endpoint.trim();
let s = s.split("://").last().unwrap_or(s);
let s = s.split('/').next().unwrap_or(s);
s.split('?').next().unwrap_or(s).to_lowercase()
}
fn sev_rank(s: &str) -> u8 {
match s.to_lowercase().as_str() {
x if x.starts_with("crit") => 4,
x if x.starts_with("high") => 3,
x if x.starts_with("med") => 2,
x if x.starts_with("low") => 1,
_ => 0,
}
}
fn short(s: &str) -> String {
s.chars().take(64).collect()
}
/// Hedging words that signal an impact was described but not demonstrated
/// (English + Portuguese, since engagements are bilingual).
const WEASEL: &[&str] = &[
"could ", "may ", "might ", "potential", "possible", "possibly", "teóric", "theoret",
"poderia", "possív", "potencial", "if the ", "caso o", "caso a", "would allow", "permitiria",
];
/// A finding that *exposes* something (recon/disclosure) rather than being an
/// exploit with demonstrated impact.
fn is_exposure(f: &Finding) -> bool {
let cwe = f.cwe.to_lowercase();
let t = f.title.to_lowercase();
["200", "527", "538", "942", "497", "209", "548", "16"].iter().any(|c| cwe.contains(c))
|| [
"disclosure", "exposed", "exposi", "exposure", "catalog", "catálogo", "cors",
"banner", "version", "versão", "header", "cabeçalho", ".git", "enumerat",
"fingerprint", "wsdl", "swagger", "missing security", "outdated", "eol",
]
.iter()
.any(|k| t.contains(k))
}
/// Reads as unproven: hedged or thin evidence AND no concrete payload.
fn looks_unproven(f: &Finding) -> bool {
let blob = format!("{} {} {}", f.title, f.impact, f.evidence).to_lowercase();
let hedged = WEASEL.iter().any(|w| blob.contains(w));
let weak_ev = f.evidence.trim().chars().count() < 40;
let no_payload = f.payload.trim().is_empty();
(hedged || weak_ev) && no_payload
}
/// Normalized hygiene class, for consolidation advice.
fn class_of(f: &Finding) -> &'static str {
let t = f.title.to_lowercase();
if t.contains("header") || t.contains("cabeçalho") { "missing-security-headers" }
else if t.contains("clickjack") || t.contains("frame") { "clickjacking" }
else if t.contains("hsts") || t.contains("strict-transport") { "missing-hsts" }
else if t.contains("cookie") { "cookie-flags" }
else if t.contains("tls") || t.contains("ssl") { "weak-tls" }
else if t.contains("cors") { "cors-misconfig" }
else if t.contains("version") || t.contains("versão") || t.contains("banner") || t.contains("eol") || t.contains("outdated") { "version-disclosure" }
else { "information-disclosure" }
}
/// Cap inflated, unproven High/Critical findings to Medium. Returns advisories.
pub fn calibrate(findings: &mut [Finding]) -> Vec<String> {
let mut notes = Vec::new();
for f in findings.iter_mut() {
if sev_rank(&f.severity) >= 3 && looks_unproven(f) {
let old = f.severity.clone();
f.severity = "Medium".into();
f.confidence = f.confidence.min(0.5);
let low = f.title.to_lowercase();
if !low.contains("potential") && !low.contains("potencial") {
f.title = format!("{} (potential — impact not demonstrated)", f.title);
}
notes.push(format!(
"severity calibrated: \"{}\" {old} → Medium (impact not demonstrated)",
short(&f.title)
));
}
}
notes
}
/// "Exposed → exploited": exposures on a host with no real exploit get flagged.
pub fn depth_audit(findings: &[Finding]) -> Vec<String> {
let exploited: std::collections::HashSet<String> = findings
.iter()
.filter(|f| !is_exposure(f) && sev_rank(&f.severity) >= 2)
.map(|f| host_of(&f.endpoint))
.collect();
let mut notes = Vec::new();
for f in findings.iter().filter(|f| is_exposure(f)) {
if !exploited.contains(&host_of(&f.endpoint)) {
notes.push(format!(
"depth gap: \"{}\" exposed but not exploited — USE it (call the endpoint / decode the artifact / log in / hit the dev host) to prove impact, or down-rate to a lead",
short(&f.title)
));
}
}
notes.truncate(8);
notes
}
/// Advise consolidating hygiene classes that repeat across multiple assets.
pub fn hygiene_summary(findings: &[Finding]) -> Vec<String> {
use std::collections::{BTreeMap, BTreeSet};
let mut groups: BTreeMap<&'static str, BTreeSet<String>> = BTreeMap::new();
for f in findings.iter().filter(|f| is_exposure(f)) {
groups.entry(class_of(f)).or_default().insert(host_of(&f.endpoint));
}
let mut notes = Vec::new();
for (class, hosts) in groups {
if hosts.len() > 1 {
notes.push(format!(
"hygiene: '{class}' affects {} assets — consolidate into ONE finding with an affected-asset table (don't inflate the count one-per-host)",
hosts.len()
));
}
}
notes
}
#[cfg(test)]
mod tests {
use super::*;
fn f(title: &str, sev: &str, cwe: &str, ep: &str, ev: &str, payload: &str) -> Finding {
let mut x = Finding::default();
x.title = title.into(); x.severity = sev.into(); x.cwe = cwe.into();
x.endpoint = ep.into(); x.evidence = ev.into(); x.payload = payload.into();
x
}
#[test]
fn unproven_high_is_capped() {
let mut v = vec![f("Flooding DoS", "High", "CWE-770", "https://a/x", "could overload", "")];
let notes = calibrate(&mut v);
assert_eq!(v[0].severity, "Medium");
assert_eq!(notes.len(), 1);
}
#[test]
fn proven_high_is_kept() {
let mut v = vec![f("SQLi", "High", "CWE-89", "https://a/x",
"id=1' UNION SELECT version()-- returned 8.0.32 in the response body, proving injection", "1' OR '1'='1")];
calibrate(&mut v);
assert_eq!(v[0].severity, "High");
}
#[test]
fn exposure_without_exploit_flagged() {
let v = vec![f("Information Disclosure - .git exposed", "Low", "CWE-527", "https://a/.git", "leaked", "")];
assert_eq!(depth_audit(&v).len(), 1);
}
#[test]
fn exposure_with_exploit_on_same_host_not_flagged() {
let v = vec![
f("Information Disclosure - banner", "Low", "CWE-200", "https://a/x", "Server: IIS", ""),
f("SQL Injection", "High", "CWE-89", "https://a/login", "dumped users", "1'--"),
];
assert!(depth_audit(&v).is_empty());
}
}
+2 -1
View File
@@ -1,4 +1,4 @@
//! NeuroSploit v3.5.1 harness — a robust multi-model runtime for the
//! NeuroSploit v3.5.2 harness — a robust multi-model runtime for the
//! markdown-driven autonomous pentest engine.
//!
//! The harness loads the `agents_md/` library, drives a *pool* of LLM models
@@ -11,6 +11,7 @@ pub mod attack_graph;
pub mod belief;
pub mod creds;
pub mod grounding;
pub mod hygiene;
pub mod pomdp;
pub mod models;
pub mod pipeline;
+30 -5
View File
@@ -69,6 +69,16 @@ const REACT_DOCTRINE: &str = "METHOD (ReAct): work in explicit Thought → Actio
Each Action runs ONE concrete tool command (e.g. a curl request); read its real Observation before the next Thought. \
Base every claim on an actual observed response — never assume. Stop when you've either proven an issue or exhausted reasonable checks. Be token-efficient: no filler, no repetition.\n\n";
/// DEPTH doctrine (v3.5.2): push past detection to demonstrated impact, and
/// chain. Distilled from reviewing real AI-pentest output that kept stopping at
/// "exposed" instead of "exploited".
const DEPTH_DOCTRINE: &str = "DEPTH (exploit, don't just expose):\n\
- Exposed → exploited: any info-disclosure, exposed service/catalog/WSDL, leaked credential/token, or non-prod (dev/staging) host you find MUST be USED before you report it — call the exposed endpoint, decode the leaked artifact, log in with the leaked credential, hit the dev host. If you only observed it but never used it, report it as a LEAD (low confidence), not a confirmed finding.\n\
- Chain across steps: reuse any session/JWT/cookie/credential you obtain in one step against every other module; if one bug yields access, pivot it into IDOR/privesc/data-exfil and report the CHAIN, not isolated parts.\n\
- Decode & fingerprint → CVE: decode opaque tokens/paths (base64/JSON/marshal) and fingerprint the stack (server, framework, library/gem/plugin versions); map exact versions to known CVEs and attempt a safe, non-destructive PoC.\n\
- Audit tokens: for any JWT, check alg-confusion (RS→HS), alg:none, kid/jku injection, whether the signature is actually verified, and weak/guessable HS256 secrets.\n\
- Calibrate honestly: claim High/Critical ONLY when impact is DEMONSTRATED; unproven DoS/abuse is Low/Info or a lead, never inflated.\n\n";
/// Black-box web engagement: recon → parallel exploit → N-model vote → report.
pub async fn run(cfg: RunConfig, lib: &Library, pool: &ModelPool, tx: Sender<String>) -> RunOutput {
pool.set_progress(tx.clone());
@@ -168,12 +178,13 @@ pub async fn run(cfg: RunConfig, lib: &Library, pool: &ModelPool, tx: Sender<Str
let user = format!(
"AUTHORIZED engagement — you have explicit permission to test {target}. \
Do not ask for confirmation — proceed and PROVE each issue.\n\n\
{directives}{react}{doctrine}{body}\n\nWhen done, reply with ONLY a JSON array of confirmed findings (may be empty []). \
{directives}{react}{depth}{doctrine}{body}\n\nWhen done, reply with ONLY a JSON array of confirmed findings (may be empty []). \
Each item: {{id,title,severity,cwe,endpoint,payload,evidence,impact,remediation,confidence}}. \
`evidence` must contain the concrete proof (request/response excerpt).",
target = target,
directives = directives,
react = REACT_DOCTRINE,
depth = DEPTH_DOCTRINE,
doctrine = tool_doctrine(mcp_on),
body = ag.user.replace("{target}", &target).replace("{recon_json}", &recon),
);
@@ -387,11 +398,11 @@ pub async fn run_greybox(cfg: RunConfig, lib: &Library, pool: &ModelPool, tx: Se
}
let user = format!(
"AUTHORIZED greybox engagement on {target} — you also have the source review below. \
Proceed and PROVE each issue against the LIVE app.\n\n{directives}{leads}{react}{doctrine}{body}\n\n\
Proceed and PROVE each issue against the LIVE app.\n\n{directives}{leads}{react}{depth}{doctrine}{body}\n\n\
Reply ONLY a JSON array of confirmed findings (may be []): \
{{id,title,severity,cwe,endpoint,payload,evidence,impact,remediation,confidence}}.",
target = target, directives = directives, leads = leads,
react = REACT_DOCTRINE, doctrine = tool_doctrine(mcp_on),
react = REACT_DOCTRINE, depth = DEPTH_DOCTRINE, doctrine = tool_doctrine(mcp_on),
body = ag.user.replace("{target}", &target).replace("{recon_json}", &recon),
);
match pool.complete_routed(Task::Exploit, &ag.name, &ag.system, &user).await {
@@ -439,12 +450,12 @@ async fn chain_round(pool: &ModelPool, target: &str, recon: &str, directives: &s
let _ = tx.send(format!("chaining {} confirmed finding(s) for deeper impact…", confirmed.len())).await;
let recon_ctx: String = recon.chars().take(2500).collect();
let user = format!(
"AUTHORIZED engagement on {target}.\n\n{directives}{react}{doctrine}{recipe_block}\
"AUTHORIZED engagement on {target}.\n\n{directives}{react}{depth}{doctrine}{recipe_block}\
CONFIRMED FINDINGS TO CHAIN:\n{summary}\n\nRecon:\n{recon_ctx}\n\n\
Chain these into deeper impact (e.g. SQLi→RCE→LPE, SSRF→cloud creds, upload→LFI→RCE) and PROVE each stage. \
Reply ONLY a JSON array of NEW findings \
(may be []): {{id,title,severity,cwe,endpoint,payload,evidence,impact,remediation,confidence}}.",
react = REACT_DOCTRINE, doctrine = tool_doctrine(pool.mcp_config.is_some()),
react = REACT_DOCTRINE, depth = DEPTH_DOCTRINE, doctrine = tool_doctrine(pool.mcp_config.is_some()),
);
match pool.complete_routed(Task::Exploit, "chain", CHAIN_SYS, &user).await {
Ok((m, text)) => {
@@ -623,6 +634,20 @@ async fn finish(cfg: RunConfig, _lib: &Library, recon: String, transcript: Strin
let _ = tx.send(format!("grounding gate: demoted {demoted}/{before} ungrounded claim(s) (no tool receipt)")).await;
}
// --- v3.5.2 report-hygiene & exploitation-depth pass ---
// Calibrate inflated/unproven High-Critical to Medium, flag exposures that
// were never exploited ("exposed → exploited"), and advise consolidating
// hygiene findings duplicated across many assets.
for n in crate::hygiene::calibrate(&mut findings) {
let _ = tx.send(format!("calibrate: {n}")).await;
}
for n in crate::hygiene::depth_audit(&findings) {
let _ = tx.send(format!("notify: {n}")).await;
}
for n in crate::hygiene::hygiene_summary(&findings) {
let _ = tx.send(format!("notify: {n}")).await;
}
// --- POMDP belief: build from grounded findings, report residual uncertainty ---
let mut wm = crate::belief::WorldModel::new();
wm.deterministic = whitebox;
+1 -1
View File
@@ -1,4 +1,4 @@
//! POMDP decision layer (v3.5.1): value-of-information planning + the
//! POMDP decision layer (v3.5.2): value-of-information planning + the
//! anti-hallucination gate.
//!
//! The choice "scan more vs exploit now" is **not** a heuristic here — it falls
+3 -3
View File
@@ -97,9 +97,9 @@ pub fn html(target: &str, findings: &[Finding]) -> String {
h4{{margin:12px 0 3px;font-size:12px;text-transform:uppercase;letter-spacing:.5px;color:#8b5cf6}}\
.b{{color:#8b5cf6;font-weight:800}}</style></head><body>\
<h1><span class=b>NeuroSploit</span> Penetration Test Report</h1>\
<div class=meta>Target: <b>{t}</b> · v3.5.1 Rust harness · multi-model validated</div>\
<div class=meta>Target: <b>{t}</b> · v3.5.2 Rust harness · multi-model validated</div>\
<div>{chips}</div>{graph_block}<h2>Findings ({n})</h2>{body}\
<p class=meta>Authorized testing only. Findings confirmed by multi-model adversarial voting.<br>NeuroSploit v3.5.1 · by <b>Joas A Santos</b> &amp; <b>Red Team Leaders</b></p></body></html>",
<p class=meta>Authorized testing only. Findings confirmed by multi-model adversarial voting.<br>NeuroSploit v3.5.2 · by <b>Joas A Santos</b> &amp; <b>Red Team Leaders</b></p></body></html>",
t = esc(target), chips = chips, n = sorted.len(), body = body, graph_block = graph_block,
)
}
@@ -135,7 +135,7 @@ pub fn typst_report(target: &str, findings: &[Finding], dir: &Path) -> std::io::
let mut data = String::new();
data.push_str(&format!(
"#let meta = (target: {}, run_id: {}, generated: {}, model: {})\n",
tq(target), tq(&run_id), tq("NeuroSploit v3.5.1"), tq("multi-model")
tq(target), tq(&run_id), tq("NeuroSploit v3.5.2"), tq("multi-model")
));
data.push_str("#let findings = (\n");
for f in sorted_findings(findings) {