v3.5.2 — Exploitation Depth & Report Hygiene

Distilled from reviewing real AI-pentest output that kept stopping at "exposed"
instead of "exploited". Pure-additive, back-compatible.

Behavior (injected into black/grey/chain exploit prompts via DEPTH_DOCTRINE):
- Exposed → exploited: any info-disclosure / exposed service/WSDL / leaked
  credential|token / reachable dev host MUST be used before it's a finding;
  otherwise it's a lead, not a confirmed High/Critical.
- Chain across modules: reuse obtained session/JWT/cookie/credential and pivot
  to IDOR/privesc/exfil; report the chain, not isolated parts.
- Decode & fingerprint → CVE; audit tokens (alg-confusion/none/kid/JWKS, weak
  HS256 secret cracking, lifecycle).

Deterministic post-pass (new crates/harness/src/hygiene.rs, wired into finish()):
- calibrate severity to PROVEN impact — unproven High/Critical (hedged, no
  payload, thin evidence) capped to Medium and re-titled "(potential)";
- depth_audit — flag exposures on a host with no real exploit;
- hygiene_summary — advise consolidating hygiene classes repeated across assets.
Unit tests cover calibration + depth audit.

5 new doctrine meta-agents (scripts/build_methodology_v352.py → agents_md/meta/):
exploit_depth_doctrine, finding_chainer, artifact_decoder, token_auditor,
report_calibrator (meta 17→22, total 343→348).

Version bumped 3.5.1 → 3.5.2 across crates/app/installers/docs; RELEASE/README
updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
CyberSecurityUP
2026-06-26 11:31:11 -03:00
parent ac84db024c
commit e4efa9bbb0
23 changed files with 628 additions and 28 deletions
+9 -2
View File
@@ -1,4 +1,4 @@
<h1 align="center">🧠 NeuroSploit v3.5.1</h1>
<h1 align="center">🧠 NeuroSploit v3.5.2</h1>
<p align="center">
<a href="https://github.com/JoasASantos/NeuroSploit/stargazers"><img src="https://img.shields.io/github/stars/JoasASantos/NeuroSploit?style=for-the-badge&logo=github&color=8b5cf6" alt="Stars"></a>
@@ -8,7 +8,7 @@
</p>
<p align="center">
<img src="https://img.shields.io/badge/Version-3.5.1-blue?style=flat-square">
<img src="https://img.shields.io/badge/Version-3.5.2-blue?style=flat-square">
<img src="https://img.shields.io/badge/Harness-Rust%20%7C%20tokio-e6b673?style=flat-square">
<img src="https://img.shields.io/badge/License-MIT-green?style=flat-square">
<img src="https://img.shields.io/badge/MD%20Agents-329-red?style=flat-square">
@@ -24,6 +24,13 @@
>
> 📖 **New here? Read the [full Tutorial & User Guide →](TUTORIAL.md)** — every mode, flag, config and example explained.
> 🆕 **New in v3.5.2 — Exploitation Depth & Report Hygiene:** a **DEPTH doctrine**
> makes the engine *use* what it finds (exposed → exploited), **chain** findings
> across modules, decode/fingerprint artifacts → CVEs, and **audit tokens** (JWT
> alg-confusion / weak HS256 secrets). A deterministic post-pass **calibrates
> severity to proven impact** and **consolidates duplicated hygiene** findings.
> See [RELEASE.md](RELEASE.md).
---
**NeuroSploit** turns a URL, a source repository, a running app, or a host/IP into
+60
View File
@@ -1,3 +1,63 @@
# NeuroSploit v3.5.2 — Release Notes
**Release Date:** June 2026
**Codename:** Exploitation Depth & Report Hygiene
**License:** MIT
**Credits:** Joas A Santos & Red Team Leaders
---
## TL;DR
v3.5.2 hard-codes the discipline that separates a great pentest from a noisy
one — distilled from reviewing real AI-pentest output that kept stopping at
*"exposed"* instead of *"exploited"*. The engine now pushes every exposure to
demonstrated impact, **chains** findings, decodes/fingerprints artifacts and
correlates CVEs, audits tokens, and keeps the final report honest (deduplicated
and severity-calibrated).
## Highlights
- **DEPTH doctrine (exploit, don't just expose).** A new doctrine is injected
into every exploitation prompt (black/grey/chain): any info-disclosure,
exposed service/catalog/WSDL, leaked credential/token, or reachable dev host
**must be USED** before it can be a finding — call it, decode it, log in, hit
the dev host. If it was only observed, it's reported as a **lead**, not a
confirmed High/Critical.
- **Finding chaining.** Reuse any session/JWT/cookie/credential obtained in one
step across all other modules; pivot access into IDOR/privesc/exfil and report
the **chain**, not isolated parts (e.g. captcha-bypass→admin JWT→authenticated
surface; enum + no-rate-limit→password spraying).
- **Decode & fingerprint → CVE.** Decode opaque tokens/paths (base64/JSON/marshal)
and pin exact library/gem/plugin/CMS versions, then correlate to known CVEs and
attempt a safe PoC.
- **Token auditor.** JWT alg-confusion (RS→HS), `alg:none`, kid/jku injection,
real signature verification, **weak HS256 secret cracking**, and token
lifecycle (logout/expiry/refresh).
- **Report-hygiene & depth pass (deterministic, in the harness).** After
validation the run now:
- **calibrates severity to proven impact** — an unproven High/Critical
(hedged language, no payload, thin evidence) is capped to Medium and
re-titled "(potential)";
- flags **"exposed → exploited" gaps** — exposures on a host with no actual
exploit get an advisory to go use them;
- advises **consolidating hygiene** classes (headers/cookies/TLS/HSTS/
clickjacking/disclosure) repeated across many assets into ONE finding with
an affected-asset table, instead of inflating the count one-per-host.
- **5 new doctrine meta-agents** (`agents_md/meta/`): `exploit_depth_doctrine`,
`finding_chainer`, `artifact_decoder`, `token_auditor`, `report_calibrator`
(meta agents 17 → 22; total library 343 → 348).
## Notes
- Pure-additive and back-compatible: existing modes, REPL, TUI, pause/continue,
crash-recovery and reports are unchanged. The hygiene pass only annotates and
down-calibrates unproven severities — it never invents or drops findings.
- New unit tests cover the calibration and depth-audit logic
(`harness::hygiene`).
---
# NeuroSploit v3.5.1 — Release Notes
**Release Date:** June 2026
+2 -2
View File
@@ -1,4 +1,4 @@
# NeuroSploit — Tutorial & User Guide (v3.5.1)
# NeuroSploit — Tutorial & User Guide (v3.5.2)
A complete, hands-on guide to installing, configuring and running NeuroSploit —
the autonomous, multi-model penetration-testing harness.
@@ -98,7 +98,7 @@ Agents **degrade gracefully**: if `rustscan` is absent they use `nmap`; if neith
### Verify
```bash
neurosploit --version # neurosploit 3.5.1
neurosploit --version # neurosploit 3.5.2
neurosploit agents # {"vulns":196,...,"chains":12,"total":329}
neurosploit models # all providers & models
```
+27
View File
@@ -0,0 +1,27 @@
# Artifact Decoder & CVE Correlator Agent
> Meta-agent (v3.5.2 doctrine). Decodes opaque tokens/paths, fingerprints the stack, and maps versions to CVEs.
## User Prompt
For **{target}**, inspect every opaque or technology-revealing artifact seen in
recon and responses:
1. **Decode** opaque tokens, IDs and URL paths (base64 / base64url / JSON /
marshal / JWT segments). A decoded value often reveals the framework or an
internal file path (e.g. a Dragonfly job `[["f","...file"]]`, a signed-URL
structure, a serialized object).
2. **Fingerprint** the stack: server, framework, language, and exact library /
gem / plugin / CMS versions (headers, asset paths, readme/changelog, error
pages, manifests).
3. **Correlate to CVEs**: map each exact version to known CVEs; prioritize
unauth RCE / SQLi / auth-bypass with a reliable, non-destructive PoC, and
attempt a safe confirmation (version/echo/OOB), never a destructive payload.
Output JSON: {decoded:[{artifact, decoded_value, implication}],
stack:[{component, version}], cves:[{component, version, cve, cvss, exploitable, poc}]}.
## System Prompt
You decode the opaque and correlate the obvious. Base64/JSON/marshal blobs and
version banners are leads, not noise — you decode them, fingerprint exact
versions, and check them against known CVEs, confirming only with a safe PoC and
a real receipt. Authorized engagement; no destructive or DoS actions. Credits: Joas A Santos and Red Team Leaders.
+30
View File
@@ -0,0 +1,30 @@
# Exploitation Depth Doctrine Agent
> Meta-agent (v3.5.2 doctrine). Turns every exposure into an exploitation attempt before it becomes a finding.
## User Prompt
You are reviewing the candidate findings and live transcript for **{target}**.
For EACH candidate that merely *exposes* something (information disclosure,
exposed service/catalog/WSDL, leaked credential or token, reachable dev/staging
host, permissive CORS, open .git), drive it one step further BEFORE it is
reported:
1. **Use what was exposed.** Call the exposed endpoint, decode the leaked
artifact, log in with the leaked credential, hit the dev host, send the
cross-origin request. Capture the real request/response.
2. **Decide honestly.** If using it proved impact → keep/raise severity with the
new evidence. If it could not be used → down-rate to a LEAD (low confidence),
never a confirmed High/Critical.
3. **Report the gap.** List any exposure you could not yet exploit, with the
exact next command to try, so the next round (or the human) can finish it.
Output JSON: {"escalations":[{id, action_taken, new_evidence, new_severity}],
"leads":[{id, why_not_proven, next_command}]}.
## System Prompt
You are a senior exploitation lead. Detection is not a finding — impact is. You
never let an info-disclosure, exposed service, leaked secret or reachable
non-prod host be reported as confirmed without an attempt to actually use it,
backed by a real tool receipt. Unproven impact is a lead, not a High. Authorized
engagement; no destructive or DoS actions. Credits: Joas A Santos and Red Team Leaders.
+25
View File
@@ -0,0 +1,25 @@
# Finding Chainer Agent
> Meta-agent (v3.5.2 doctrine). Reuses obtained access across modules and reports the chain, not the parts.
## User Prompt
Given the confirmed findings and any sessions/tokens/credentials obtained during
the engagement on **{target}**, build exploitation CHAINS:
- Reuse every session/JWT/cookie/credential from one step against ALL other
modules and hosts in scope (a captcha/login bypass that yields a token unlocks
the entire authenticated surface — use it).
- Pivot access into higher impact: IDOR/BOLA, horizontal/vertical privesc, mass
assignment, data exfiltration, account takeover.
- Combine separate weaknesses (e.g. user-enumeration + missing rate-limit =
password spraying; token-in-URL + no throttle = mass exfil).
For each chain output: {chain_id, steps:[{finding_id, action}], combined_impact,
combined_severity, evidence}. Prefer ONE well-evidenced chain over several
isolated low-severity items.
## System Prompt
You are an exploit-chaining specialist. Isolated findings understate risk; the
real story is the chain. You always try to reuse obtained access across the
whole scope and escalate to business impact, reporting the combined chain with
concrete evidence. Authorized engagement; no destructive or DoS actions. Credits: Joas A Santos and Red Team Leaders.
+30
View File
@@ -0,0 +1,30 @@
# Report Calibrator Agent
> Meta-agent (v3.5.2 doctrine). Dedups by class, calibrates severity to proven impact, demands evidence per claim.
## User Prompt
Before the final report for **{target}**, clean and calibrate the findings:
1. **Consolidate hygiene by class.** Merge repeated hygiene findings (missing
security headers, clickjacking, cookie flags, weak TLS, HSTS, version/banner
disclosure) into ONE finding per class with an affected-asset TABLE — do not
inflate the count one-per-host.
2. **Calibrate severity to PROVEN impact.** High/Critical requires demonstrated
impact with evidence. Unproven DoS/abuse, "could/may/potential" language, or a
finding with no concrete payload/PoC → cap to Low/Medium or mark
"(potential)". Recompute the CVSS vector to match the proven impact.
3. **Evidence per claim.** Every finding — and every item in the "tests
performed" log — must carry a concrete request/response receipt; flag any
claim that has none, and any contradiction between the test log and the
findings.
Output JSON: {merged:[{class, severity, assets:[...]}],
recalibrated:[{id, old_severity, new_severity, reason}],
unevidenced:[{id_or_test, missing}]}.
## System Prompt
You are a meticulous report editor. You group hygiene by class with an
asset table, calibrate every severity to demonstrated impact (no inflated
High/Critical, no padding the count with duplicates), and require a real
receipt behind every claim — including each line of the tests-performed log.
Honest, deduplicated, evidence-backed reporting only. Credits: Joas A Santos and Red Team Leaders.
+26
View File
@@ -0,0 +1,26 @@
# Token & JWT Auditor Agent
> Meta-agent (v3.5.2 doctrine). Attacks tokens: alg-confusion, none, kid/jku, signature checks, weak HS256 secrets.
## User Prompt
For any session token or JWT issued by **{target}**, run a full auth-token audit:
1. **Decode** the header/payload; note alg (HS*/RS*/none), kid, jku, exp, claims.
2. **Algorithm attacks**: try `alg:none`, RS→HS confusion (sign with the public
key as HMAC secret), and kid/jku injection. Confirm whether the server
actually verifies the signature (tamper a claim and replay).
3. **Weak secret**: for HS256, attempt to crack the signing secret offline
(wordlist/rules); a static or guessable shared secret (e.g. an `x-auth-*`
header value) is a strong lead — if cracked, forge a token for any user.
4. **Lifecycle**: test reuse after logout, expiry enforcement, and refresh-token
revocation.
Output JSON: {token_type, alg, verified:true|false,
attacks:[{name, result, evidence}], forged_token_possible:true|false}.
## System Prompt
You are a token-security specialist. Every JWT/session token gets audited for
algorithm confusion, none, kid/jku injection, real signature verification, weak
HS256 secrets, and lifecycle (logout/expiry/refresh). A forged or replayable
token is account takeover — you prove it with a real receipt. Authorized
engagement; no destructive or DoS actions. Credits: Joas A Santos and Red Team Leaders.
+1 -1
View File
@@ -11,7 +11,7 @@ function Ok ($m) { Write-Host " + $m" -ForegroundColor Green }
function Warn($m){ Write-Host " ! $m" -ForegroundColor Yellow }
Write-Host ""
Write-Host " NeuroSploit installer (Windows) — v3.5.1" -ForegroundColor Cyan
Write-Host " NeuroSploit installer (Windows) — v3.5.2" -ForegroundColor Cyan
$arch = $env:PROCESSOR_ARCHITECTURE
Say "Platform: Windows / $arch"
+2 -2
View File
@@ -871,7 +871,7 @@ dependencies = [
[[package]]
name = "neurosploit"
version = "3.5.1"
version = "3.5.2"
dependencies = [
"anyhow",
"clap",
@@ -888,7 +888,7 @@ dependencies = [
[[package]]
name = "neurosploit-harness"
version = "3.5.1"
version = "3.5.2"
dependencies = [
"anyhow",
"futures",
+1 -1
View File
@@ -3,7 +3,7 @@ members = ["crates/harness", "app"]
resolver = "2"
[workspace.package]
version = "3.5.1"
version = "3.5.2"
edition = "2021"
license = "MIT"
repository = "https://github.com/JoasASantos/NeuroSploit"
+4 -4
View File
@@ -1,4 +1,4 @@
//! NeuroSploit v3.5.1 — interactive harness + CLI (`run` / `whitebox` / `agents` / `models`).
//! NeuroSploit v3.5.2 — interactive harness + CLI (`run` / `whitebox` / `agents` / `models`).
mod repl;
mod tui;
@@ -11,8 +11,8 @@ use std::path::{Path, PathBuf};
#[command(
name = "neurosploit",
version,
about = "NeuroSploit v3.5.1 — multi-model autonomous pentest harness",
long_about = "NeuroSploit v3.5.1 — a Rust multi-model harness that drives a pool of LLMs \
about = "NeuroSploit v3.5.2 — multi-model autonomous pentest harness",
long_about = "NeuroSploit v3.5.2 — a Rust multi-model harness that drives a pool of LLMs \
(API key or local subscription: Claude/Codex/Gemini/Grok) to autonomously test a target. \
After recon it INTELLIGENTLY selects only the agents matching the discovered surface, runs \
them in parallel, then validates every finding by cross-model voting before reporting.\n\n\
@@ -379,7 +379,7 @@ pub(crate) fn spawn_engagement(base: &Path, mut cfg: RunConfig, mcp: bool, mode:
cfg.rl_path = Some(base.join("data").join("rl_state_rs.json").display().to_string());
write_status(&workdir, "running", &format!("\"target\":{:?}", cfg.target));
println!(" ┌─ NeuroSploit v3.5.1 · by Joas A Santos & Red Team Leaders");
println!(" ┌─ NeuroSploit v3.5.2 · by Joas A Santos & Red Team Leaders");
println!(" │ run id : {run_id}");
println!(" │ target : {}", cfg.target);
println!(" │ models : {}", cfg.models.join(", "));
+2 -2
View File
@@ -1,4 +1,4 @@
//! NeuroSploit v3.5.1 — interactive session (Claude-Code / Codex / Cursor-CLI style).
//! NeuroSploit v3.5.2 — interactive session (Claude-Code / Codex / Cursor-CLI style).
//!
//! Launched when `neurosploit` runs with no subcommand. A persistent REPL with
//! real line editing (arrow-key history recall, Ctrl-A/E/K, paste), model
@@ -299,7 +299,7 @@ pub async fn repl(base: &Path) -> anyhow::Result<()> {
let backends = harness::installed_cli_backends();
println!("\x1b[1m");
println!(" ███╗ ██╗███████╗██╗ ██╗██████╗ ██████╗");
println!(" ████╗ ██║██╔════╝██║ ██║██╔══██╗██╔═══██╗ NeuroSploit v3.5.1");
println!(" ████╗ ██║██╔════╝██║ ██║██╔══██╗██╔═══██╗ NeuroSploit v3.5.2");
println!(" ██╔██╗ ██║█████╗ ██║ ██║██████╔╝██║ ██║ interactive harness");
println!(" ██║╚██╗██║██╔══╝ ██║ ██║██╔══██╗██║ ██║ by Joas A Santos");
println!(" ██║ ╚████║███████╗╚██████╔╝██║ ██║╚██████╔╝ & Red Team Leaders");
+1 -1
View File
@@ -1,4 +1,4 @@
//! NeuroSploit v3.5.1 — TUI "Mission Control" mode.
//! NeuroSploit v3.5.2 — TUI "Mission Control" mode.
//!
//! Concurrent panels that update live while the engagement runs in the
//! background, with a composer input that stays active during execution:
+1 -1
View File
@@ -1,4 +1,4 @@
//! POMDP belief-state world model (v3.5.1).
//! POMDP belief-state world model (v3.5.2).
//!
//! The target is only partially observable, so we don't track booleans — we
//! track a **belief**: a property graph whose nodes (host / service / vuln /
@@ -1,4 +1,4 @@
//! Verification / grounding engine (v3.5.1).
//! Verification / grounding engine (v3.5.2).
//!
//! Hard rule: **no claim enters the world model without a tool receipt** — raw
//! tool output, not the LLM's paraphrase. This is the empirical anti-hallucination
@@ -0,0 +1,186 @@
//! Report-hygiene & exploitation-depth pass (v3.5.2).
//!
//! Encodes the post-engagement discipline learned from reviewing real
//! AI-pentest output, applied deterministically after validation:
//! 1. **Calibrate severity to PROVEN impact** — an unproven High/Critical
//! (hedged language, no payload, thin evidence) is capped to Medium and
//! re-titled "(potential)". No inflated severities.
//! 2. **Exposed → exploited** — flag info-disclosure / exposed-service /
//! leaked-credential findings on a host that has no actual exploit, so the
//! operator knows to *use* what was exposed (or down-rate it to a lead).
//! 3. **Consolidate hygiene** — when the same hygiene class (missing headers,
//! clickjacking, cookie flags, TLS, info-disclosure…) repeats across many
//! assets, advise merging into ONE finding with an affected-asset table,
//! instead of inflating the count one-per-host.
//!
//! All functions are pure/deterministic; only `calibrate` mutates findings
//! (severity/title/confidence). The rest return advisory strings streamed to
//! the operator and recorded with the run.
use crate::types::Finding;
fn host_of(endpoint: &str) -> String {
let s = endpoint.trim();
let s = s.split("://").last().unwrap_or(s);
let s = s.split('/').next().unwrap_or(s);
s.split('?').next().unwrap_or(s).to_lowercase()
}
fn sev_rank(s: &str) -> u8 {
match s.to_lowercase().as_str() {
x if x.starts_with("crit") => 4,
x if x.starts_with("high") => 3,
x if x.starts_with("med") => 2,
x if x.starts_with("low") => 1,
_ => 0,
}
}
fn short(s: &str) -> String {
s.chars().take(64).collect()
}
/// Hedging words that signal an impact was described but not demonstrated
/// (English + Portuguese, since engagements are bilingual).
const WEASEL: &[&str] = &[
"could ", "may ", "might ", "potential", "possible", "possibly", "teóric", "theoret",
"poderia", "possív", "potencial", "if the ", "caso o", "caso a", "would allow", "permitiria",
];
/// A finding that *exposes* something (recon/disclosure) rather than being an
/// exploit with demonstrated impact.
fn is_exposure(f: &Finding) -> bool {
let cwe = f.cwe.to_lowercase();
let t = f.title.to_lowercase();
["200", "527", "538", "942", "497", "209", "548", "16"].iter().any(|c| cwe.contains(c))
|| [
"disclosure", "exposed", "exposi", "exposure", "catalog", "catálogo", "cors",
"banner", "version", "versão", "header", "cabeçalho", ".git", "enumerat",
"fingerprint", "wsdl", "swagger", "missing security", "outdated", "eol",
]
.iter()
.any(|k| t.contains(k))
}
/// Reads as unproven: hedged or thin evidence AND no concrete payload.
fn looks_unproven(f: &Finding) -> bool {
let blob = format!("{} {} {}", f.title, f.impact, f.evidence).to_lowercase();
let hedged = WEASEL.iter().any(|w| blob.contains(w));
let weak_ev = f.evidence.trim().chars().count() < 40;
let no_payload = f.payload.trim().is_empty();
(hedged || weak_ev) && no_payload
}
/// Normalized hygiene class, for consolidation advice.
fn class_of(f: &Finding) -> &'static str {
let t = f.title.to_lowercase();
if t.contains("header") || t.contains("cabeçalho") { "missing-security-headers" }
else if t.contains("clickjack") || t.contains("frame") { "clickjacking" }
else if t.contains("hsts") || t.contains("strict-transport") { "missing-hsts" }
else if t.contains("cookie") { "cookie-flags" }
else if t.contains("tls") || t.contains("ssl") { "weak-tls" }
else if t.contains("cors") { "cors-misconfig" }
else if t.contains("version") || t.contains("versão") || t.contains("banner") || t.contains("eol") || t.contains("outdated") { "version-disclosure" }
else { "information-disclosure" }
}
/// Cap inflated, unproven High/Critical findings to Medium. Returns advisories.
pub fn calibrate(findings: &mut [Finding]) -> Vec<String> {
let mut notes = Vec::new();
for f in findings.iter_mut() {
if sev_rank(&f.severity) >= 3 && looks_unproven(f) {
let old = f.severity.clone();
f.severity = "Medium".into();
f.confidence = f.confidence.min(0.5);
let low = f.title.to_lowercase();
if !low.contains("potential") && !low.contains("potencial") {
f.title = format!("{} (potential — impact not demonstrated)", f.title);
}
notes.push(format!(
"severity calibrated: \"{}\" {old} → Medium (impact not demonstrated)",
short(&f.title)
));
}
}
notes
}
/// "Exposed → exploited": exposures on a host with no real exploit get flagged.
pub fn depth_audit(findings: &[Finding]) -> Vec<String> {
let exploited: std::collections::HashSet<String> = findings
.iter()
.filter(|f| !is_exposure(f) && sev_rank(&f.severity) >= 2)
.map(|f| host_of(&f.endpoint))
.collect();
let mut notes = Vec::new();
for f in findings.iter().filter(|f| is_exposure(f)) {
if !exploited.contains(&host_of(&f.endpoint)) {
notes.push(format!(
"depth gap: \"{}\" exposed but not exploited — USE it (call the endpoint / decode the artifact / log in / hit the dev host) to prove impact, or down-rate to a lead",
short(&f.title)
));
}
}
notes.truncate(8);
notes
}
/// Advise consolidating hygiene classes that repeat across multiple assets.
pub fn hygiene_summary(findings: &[Finding]) -> Vec<String> {
use std::collections::{BTreeMap, BTreeSet};
let mut groups: BTreeMap<&'static str, BTreeSet<String>> = BTreeMap::new();
for f in findings.iter().filter(|f| is_exposure(f)) {
groups.entry(class_of(f)).or_default().insert(host_of(&f.endpoint));
}
let mut notes = Vec::new();
for (class, hosts) in groups {
if hosts.len() > 1 {
notes.push(format!(
"hygiene: '{class}' affects {} assets — consolidate into ONE finding with an affected-asset table (don't inflate the count one-per-host)",
hosts.len()
));
}
}
notes
}
#[cfg(test)]
mod tests {
use super::*;
fn f(title: &str, sev: &str, cwe: &str, ep: &str, ev: &str, payload: &str) -> Finding {
let mut x = Finding::default();
x.title = title.into(); x.severity = sev.into(); x.cwe = cwe.into();
x.endpoint = ep.into(); x.evidence = ev.into(); x.payload = payload.into();
x
}
#[test]
fn unproven_high_is_capped() {
let mut v = vec![f("Flooding DoS", "High", "CWE-770", "https://a/x", "could overload", "")];
let notes = calibrate(&mut v);
assert_eq!(v[0].severity, "Medium");
assert_eq!(notes.len(), 1);
}
#[test]
fn proven_high_is_kept() {
let mut v = vec![f("SQLi", "High", "CWE-89", "https://a/x",
"id=1' UNION SELECT version()-- returned 8.0.32 in the response body, proving injection", "1' OR '1'='1")];
calibrate(&mut v);
assert_eq!(v[0].severity, "High");
}
#[test]
fn exposure_without_exploit_flagged() {
let v = vec![f("Information Disclosure - .git exposed", "Low", "CWE-527", "https://a/.git", "leaked", "")];
assert_eq!(depth_audit(&v).len(), 1);
}
#[test]
fn exposure_with_exploit_on_same_host_not_flagged() {
let v = vec![
f("Information Disclosure - banner", "Low", "CWE-200", "https://a/x", "Server: IIS", ""),
f("SQL Injection", "High", "CWE-89", "https://a/login", "dumped users", "1'--"),
];
assert!(depth_audit(&v).is_empty());
}
}
+2 -1
View File
@@ -1,4 +1,4 @@
//! NeuroSploit v3.5.1 harness — a robust multi-model runtime for the
//! NeuroSploit v3.5.2 harness — a robust multi-model runtime for the
//! markdown-driven autonomous pentest engine.
//!
//! The harness loads the `agents_md/` library, drives a *pool* of LLM models
@@ -11,6 +11,7 @@ pub mod attack_graph;
pub mod belief;
pub mod creds;
pub mod grounding;
pub mod hygiene;
pub mod pomdp;
pub mod models;
pub mod pipeline;
+30 -5
View File
@@ -69,6 +69,16 @@ const REACT_DOCTRINE: &str = "METHOD (ReAct): work in explicit Thought → Actio
Each Action runs ONE concrete tool command (e.g. a curl request); read its real Observation before the next Thought. \
Base every claim on an actual observed response never assume. Stop when you've either proven an issue or exhausted reasonable checks. Be token-efficient: no filler, no repetition.\n\n";
/// DEPTH doctrine (v3.5.2): push past detection to demonstrated impact, and
/// chain. Distilled from reviewing real AI-pentest output that kept stopping at
/// "exposed" instead of "exploited".
const DEPTH_DOCTRINE: &str = "DEPTH (exploit, don't just expose):\n\
- Exposed exploited: any info-disclosure, exposed service/catalog/WSDL, leaked credential/token, or non-prod (dev/staging) host you find MUST be USED before you report it call the exposed endpoint, decode the leaked artifact, log in with the leaked credential, hit the dev host. If you only observed it but never used it, report it as a LEAD (low confidence), not a confirmed finding.\n\
- Chain across steps: reuse any session/JWT/cookie/credential you obtain in one step against every other module; if one bug yields access, pivot it into IDOR/privesc/data-exfil and report the CHAIN, not isolated parts.\n\
- Decode & fingerprint CVE: decode opaque tokens/paths (base64/JSON/marshal) and fingerprint the stack (server, framework, library/gem/plugin versions); map exact versions to known CVEs and attempt a safe, non-destructive PoC.\n\
- Audit tokens: for any JWT, check alg-confusion (RSHS), alg:none, kid/jku injection, whether the signature is actually verified, and weak/guessable HS256 secrets.\n\
- Calibrate honestly: claim High/Critical ONLY when impact is DEMONSTRATED; unproven DoS/abuse is Low/Info or a lead, never inflated.\n\n";
/// Black-box web engagement: recon → parallel exploit → N-model vote → report.
pub async fn run(cfg: RunConfig, lib: &Library, pool: &ModelPool, tx: Sender<String>) -> RunOutput {
pool.set_progress(tx.clone());
@@ -168,12 +178,13 @@ pub async fn run(cfg: RunConfig, lib: &Library, pool: &ModelPool, tx: Sender<Str
let user = format!(
"AUTHORIZED engagement — you have explicit permission to test {target}. \
Do not ask for confirmation proceed and PROVE each issue.\n\n\
{directives}{react}{doctrine}{body}\n\nWhen done, reply with ONLY a JSON array of confirmed findings (may be empty []). \
{directives}{react}{depth}{doctrine}{body}\n\nWhen done, reply with ONLY a JSON array of confirmed findings (may be empty []). \
Each item: {{id,title,severity,cwe,endpoint,payload,evidence,impact,remediation,confidence}}. \
`evidence` must contain the concrete proof (request/response excerpt).",
target = target,
directives = directives,
react = REACT_DOCTRINE,
depth = DEPTH_DOCTRINE,
doctrine = tool_doctrine(mcp_on),
body = ag.user.replace("{target}", &target).replace("{recon_json}", &recon),
);
@@ -387,11 +398,11 @@ pub async fn run_greybox(cfg: RunConfig, lib: &Library, pool: &ModelPool, tx: Se
}
let user = format!(
"AUTHORIZED greybox engagement on {target} — you also have the source review below. \
Proceed and PROVE each issue against the LIVE app.\n\n{directives}{leads}{react}{doctrine}{body}\n\n\
Proceed and PROVE each issue against the LIVE app.\n\n{directives}{leads}{react}{depth}{doctrine}{body}\n\n\
Reply ONLY a JSON array of confirmed findings (may be []): \
{{id,title,severity,cwe,endpoint,payload,evidence,impact,remediation,confidence}}.",
target = target, directives = directives, leads = leads,
react = REACT_DOCTRINE, doctrine = tool_doctrine(mcp_on),
react = REACT_DOCTRINE, depth = DEPTH_DOCTRINE, doctrine = tool_doctrine(mcp_on),
body = ag.user.replace("{target}", &target).replace("{recon_json}", &recon),
);
match pool.complete_routed(Task::Exploit, &ag.name, &ag.system, &user).await {
@@ -439,12 +450,12 @@ async fn chain_round(pool: &ModelPool, target: &str, recon: &str, directives: &s
let _ = tx.send(format!("chaining {} confirmed finding(s) for deeper impact…", confirmed.len())).await;
let recon_ctx: String = recon.chars().take(2500).collect();
let user = format!(
"AUTHORIZED engagement on {target}.\n\n{directives}{react}{doctrine}{recipe_block}\
"AUTHORIZED engagement on {target}.\n\n{directives}{react}{depth}{doctrine}{recipe_block}\
CONFIRMED FINDINGS TO CHAIN:\n{summary}\n\nRecon:\n{recon_ctx}\n\n\
Chain these into deeper impact (e.g. SQLiRCELPE, SSRFcloud creds, uploadLFIRCE) and PROVE each stage. \
Reply ONLY a JSON array of NEW findings \
(may be []): {{id,title,severity,cwe,endpoint,payload,evidence,impact,remediation,confidence}}.",
react = REACT_DOCTRINE, doctrine = tool_doctrine(pool.mcp_config.is_some()),
react = REACT_DOCTRINE, depth = DEPTH_DOCTRINE, doctrine = tool_doctrine(pool.mcp_config.is_some()),
);
match pool.complete_routed(Task::Exploit, "chain", CHAIN_SYS, &user).await {
Ok((m, text)) => {
@@ -623,6 +634,20 @@ async fn finish(cfg: RunConfig, _lib: &Library, recon: String, transcript: Strin
let _ = tx.send(format!("grounding gate: demoted {demoted}/{before} ungrounded claim(s) (no tool receipt)")).await;
}
// --- v3.5.2 report-hygiene & exploitation-depth pass ---
// Calibrate inflated/unproven High-Critical to Medium, flag exposures that
// were never exploited ("exposed → exploited"), and advise consolidating
// hygiene findings duplicated across many assets.
for n in crate::hygiene::calibrate(&mut findings) {
let _ = tx.send(format!("calibrate: {n}")).await;
}
for n in crate::hygiene::depth_audit(&findings) {
let _ = tx.send(format!("notify: {n}")).await;
}
for n in crate::hygiene::hygiene_summary(&findings) {
let _ = tx.send(format!("notify: {n}")).await;
}
// --- POMDP belief: build from grounded findings, report residual uncertainty ---
let mut wm = crate::belief::WorldModel::new();
wm.deterministic = whitebox;
+1 -1
View File
@@ -1,4 +1,4 @@
//! POMDP decision layer (v3.5.1): value-of-information planning + the
//! POMDP decision layer (v3.5.2): value-of-information planning + the
//! anti-hallucination gate.
//!
//! The choice "scan more vs exploit now" is **not** a heuristic here — it falls
+3 -3
View File
@@ -97,9 +97,9 @@ pub fn html(target: &str, findings: &[Finding]) -> String {
h4{{margin:12px 0 3px;font-size:12px;text-transform:uppercase;letter-spacing:.5px;color:#8b5cf6}}\
.b{{color:#8b5cf6;font-weight:800}}</style></head><body>\
<h1><span class=b>NeuroSploit</span> Penetration Test Report</h1>\
<div class=meta>Target: <b>{t}</b> · v3.5.1 Rust harness · multi-model validated</div>\
<div class=meta>Target: <b>{t}</b> · v3.5.2 Rust harness · multi-model validated</div>\
<div>{chips}</div>{graph_block}<h2>Findings ({n})</h2>{body}\
<p class=meta>Authorized testing only. Findings confirmed by multi-model adversarial voting.<br>NeuroSploit v3.5.1 · by <b>Joas A Santos</b> &amp; <b>Red Team Leaders</b></p></body></html>",
<p class=meta>Authorized testing only. Findings confirmed by multi-model adversarial voting.<br>NeuroSploit v3.5.2 · by <b>Joas A Santos</b> &amp; <b>Red Team Leaders</b></p></body></html>",
t = esc(target), chips = chips, n = sorted.len(), body = body, graph_block = graph_block,
)
}
@@ -135,7 +135,7 @@ pub fn typst_report(target: &str, findings: &[Finding], dir: &Path) -> std::io::
let mut data = String::new();
data.push_str(&format!(
"#let meta = (target: {}, run_id: {}, generated: {}, model: {})\n",
tq(target), tq(&run_id), tq("NeuroSploit v3.5.1"), tq("multi-model")
tq(target), tq(&run_id), tq("NeuroSploit v3.5.2"), tq("multi-model")
));
data.push_str("#let findings = (\n");
for f in sorted_findings(findings) {
+183
View File
@@ -0,0 +1,183 @@
#!/usr/bin/env python3
"""
NeuroSploit v3.5.2 exploitation-depth & report-hygiene doctrine agents.
Distilled from reviewing real AI-pentest output that kept stopping at
"exposed" instead of "exploited". Emits meta-agents to agents_md/meta/ that
push the engine past detection to demonstrated impact, chain findings, decode
artifacts/correlate CVEs, audit tokens, and keep the report honest (dedup +
severity calibration). Credits: Joas A Santos & Red Team Leaders.
"""
import os
ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
OUT = os.path.join(ROOT, "agents_md", "meta")
CREDITS = "Credits: Joas A Santos and Red Team Leaders."
def render(a):
L = [f"# {a['title']}\n",
f"> Meta-agent (v3.5.2 doctrine). {a['tagline']}\n",
"## User Prompt",
a["user"].strip(), "",
"## System Prompt",
a["system"].strip() + " " + CREDITS]
return "\n".join(L) + "\n"
AGENTS = [
{"name": "exploit_depth_doctrine",
"title": "Exploitation Depth Doctrine Agent",
"tagline": "Turns every exposure into an exploitation attempt before it becomes a finding.",
"user": """
You are reviewing the candidate findings and live transcript for **{target}**.
For EACH candidate that merely *exposes* something (information disclosure,
exposed service/catalog/WSDL, leaked credential or token, reachable dev/staging
host, permissive CORS, open .git), drive it one step further BEFORE it is
reported:
1. **Use what was exposed.** Call the exposed endpoint, decode the leaked
artifact, log in with the leaked credential, hit the dev host, send the
cross-origin request. Capture the real request/response.
2. **Decide honestly.** If using it proved impact keep/raise severity with the
new evidence. If it could not be used down-rate to a LEAD (low confidence),
never a confirmed High/Critical.
3. **Report the gap.** List any exposure you could not yet exploit, with the
exact next command to try, so the next round (or the human) can finish it.
Output JSON: {"escalations":[{id, action_taken, new_evidence, new_severity}],
"leads":[{id, why_not_proven, next_command}]}.
""",
"system": """
You are a senior exploitation lead. Detection is not a finding impact is. You
never let an info-disclosure, exposed service, leaked secret or reachable
non-prod host be reported as confirmed without an attempt to actually use it,
backed by a real tool receipt. Unproven impact is a lead, not a High. Authorized
engagement; no destructive or DoS actions.
"""},
{"name": "finding_chainer",
"title": "Finding Chainer Agent",
"tagline": "Reuses obtained access across modules and reports the chain, not the parts.",
"user": """
Given the confirmed findings and any sessions/tokens/credentials obtained during
the engagement on **{target}**, build exploitation CHAINS:
- Reuse every session/JWT/cookie/credential from one step against ALL other
modules and hosts in scope (a captcha/login bypass that yields a token unlocks
the entire authenticated surface use it).
- Pivot access into higher impact: IDOR/BOLA, horizontal/vertical privesc, mass
assignment, data exfiltration, account takeover.
- Combine separate weaknesses (e.g. user-enumeration + missing rate-limit =
password spraying; token-in-URL + no throttle = mass exfil).
For each chain output: {chain_id, steps:[{finding_id, action}], combined_impact,
combined_severity, evidence}. Prefer ONE well-evidenced chain over several
isolated low-severity items.
""",
"system": """
You are an exploit-chaining specialist. Isolated findings understate risk; the
real story is the chain. You always try to reuse obtained access across the
whole scope and escalate to business impact, reporting the combined chain with
concrete evidence. Authorized engagement; no destructive or DoS actions.
"""},
{"name": "artifact_decoder",
"title": "Artifact Decoder & CVE Correlator Agent",
"tagline": "Decodes opaque tokens/paths, fingerprints the stack, and maps versions to CVEs.",
"user": """
For **{target}**, inspect every opaque or technology-revealing artifact seen in
recon and responses:
1. **Decode** opaque tokens, IDs and URL paths (base64 / base64url / JSON /
marshal / JWT segments). A decoded value often reveals the framework or an
internal file path (e.g. a Dragonfly job `[["f","...file"]]`, a signed-URL
structure, a serialized object).
2. **Fingerprint** the stack: server, framework, language, and exact library /
gem / plugin / CMS versions (headers, asset paths, readme/changelog, error
pages, manifests).
3. **Correlate to CVEs**: map each exact version to known CVEs; prioritize
unauth RCE / SQLi / auth-bypass with a reliable, non-destructive PoC, and
attempt a safe confirmation (version/echo/OOB), never a destructive payload.
Output JSON: {decoded:[{artifact, decoded_value, implication}],
stack:[{component, version}], cves:[{component, version, cve, cvss, exploitable, poc}]}.
""",
"system": """
You decode the opaque and correlate the obvious. Base64/JSON/marshal blobs and
version banners are leads, not noise you decode them, fingerprint exact
versions, and check them against known CVEs, confirming only with a safe PoC and
a real receipt. Authorized engagement; no destructive or DoS actions.
"""},
{"name": "token_auditor",
"title": "Token & JWT Auditor Agent",
"tagline": "Attacks tokens: alg-confusion, none, kid/jku, signature checks, weak HS256 secrets.",
"user": """
For any session token or JWT issued by **{target}**, run a full auth-token audit:
1. **Decode** the header/payload; note alg (HS*/RS*/none), kid, jku, exp, claims.
2. **Algorithm attacks**: try `alg:none`, RSHS confusion (sign with the public
key as HMAC secret), and kid/jku injection. Confirm whether the server
actually verifies the signature (tamper a claim and replay).
3. **Weak secret**: for HS256, attempt to crack the signing secret offline
(wordlist/rules); a static or guessable shared secret (e.g. an `x-auth-*`
header value) is a strong lead if cracked, forge a token for any user.
4. **Lifecycle**: test reuse after logout, expiry enforcement, and refresh-token
revocation.
Output JSON: {token_type, alg, verified:true|false,
attacks:[{name, result, evidence}], forged_token_possible:true|false}.
""",
"system": """
You are a token-security specialist. Every JWT/session token gets audited for
algorithm confusion, none, kid/jku injection, real signature verification, weak
HS256 secrets, and lifecycle (logout/expiry/refresh). A forged or replayable
token is account takeover you prove it with a real receipt. Authorized
engagement; no destructive or DoS actions.
"""},
{"name": "report_calibrator",
"title": "Report Calibrator Agent",
"tagline": "Dedups by class, calibrates severity to proven impact, demands evidence per claim.",
"user": """
Before the final report for **{target}**, clean and calibrate the findings:
1. **Consolidate hygiene by class.** Merge repeated hygiene findings (missing
security headers, clickjacking, cookie flags, weak TLS, HSTS, version/banner
disclosure) into ONE finding per class with an affected-asset TABLE do not
inflate the count one-per-host.
2. **Calibrate severity to PROVEN impact.** High/Critical requires demonstrated
impact with evidence. Unproven DoS/abuse, "could/may/potential" language, or a
finding with no concrete payload/PoC cap to Low/Medium or mark
"(potential)". Recompute the CVSS vector to match the proven impact.
3. **Evidence per claim.** Every finding and every item in the "tests
performed" log — must carry a concrete request/response receipt; flag any
claim that has none, and any contradiction between the test log and the
findings.
Output JSON: {merged:[{class, severity, assets:[...]}],
recalibrated:[{id, old_severity, new_severity, reason}],
unevidenced:[{id_or_test, missing}]}.
""",
"system": """
You are a meticulous report editor. You group hygiene by class with an
asset table, calibrate every severity to demonstrated impact (no inflated
High/Critical, no padding the count with duplicates), and require a real
receipt behind every claim including each line of the tests-performed log.
Honest, deduplicated, evidence-backed reporting only.
"""},
]
def main():
os.makedirs(OUT, exist_ok=True)
for a in AGENTS:
open(os.path.join(OUT, a["name"] + ".md"), "w").write(render(a))
print(f"wrote {len(AGENTS)} v3.5.2 doctrine meta-agents to {OUT}")
if __name__ == "__main__":
main()
+1 -1
View File
@@ -25,7 +25,7 @@ cat <<'BANNER'
███╗ ██╗███████╗██╗ ██╗██████╗ ██████╗
████╗ ██║██╔════╝██║ ██║██╔══██╗██╔═══██╗ NeuroSploit installer
██╔██╗ ██║█████╗ ██║ ██║██████╔╝██║ ██║ v3.5.1 — Rust harness
██╔██╗ ██║█████╗ ██║ ██║██████╔╝██║ ██║ v3.5.2 — Rust harness
██║╚██╗██║██╔══╝ ██║ ██║██╔══██╗██║ ██║ by Joas A Santos
██║ ╚████║███████╗╚██████╔╝██║ ██║╚██████╔╝ & Red Team Leaders
╚═╝ ╚═══╝╚══════╝ ╚═════╝ ╚═╝ ╚═╝ ╚═════╝