mirror of https://github.com/CyberSecurityUP/NeuroSploit.git synced 2026-06-30 07:15:30 +02:00

Files

T

CyberSecurityUP 56d3f0c723 NeuroSploit v3.4.0 — Rust multi-model harness + Axum dashboard

New cargo workspace `neurosploit-rs/` (single `neurosploit` binary):

harness crate:
- models.rs: 11 OpenAI-compatible providers / 31 models (Claude, GPT, Grok,
  NVIDIA NIM, DeepSeek, Mistral, Qwen, Groq, Together, OpenRouter, Ollama)
- pool.rs: ModelPool with bounded concurrency, provider failover, and N-model
  validator voting (the panel doubles as the jury)
- agents.rs: loads the existing agents_md/ library (213 agents)
- pipeline.rs: recon → parallel exploit (semaphore-bounded) → N-model
  adversarial vote → score; streams live progress over a channel
- report.rs: HTML report
- tokio + reqwest(rustls); offline mode runs the pipeline without API keys

app binary:
- clap CLI: serve | run | agents | models  (run supports --model x N, --vote-n,
  --max-agents, --offline)
- axum web dashboard with multi-model panel, live console, findings, agent
  browser, embedded report; single binary serves the SPA (no npm/build)

Verified: cargo build clean; agents/models/offline-run CLI; server endpoints
(/api/info, /api/run lifecycle, /report); dashboard + live run in Playwright.

Docs: README v3.4.0 callout + RELEASE.md notes. target/ gitignored.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-21 19:58:43 -03:00

9.8 KiB

Executable File

Raw Blame History

NeuroSploit v3.3.0

Autonomous, markdown-driven AI penetration testing.

NeuroSploit v3.3.0 is a ground-up re-model of the pentest agent. Instead of a monolithic Python orchestrator, it is now a lean engine that turns a URL into an autonomous engagement: it composes a master prompt from a curated library of 213 markdown agents and hands execution to whichever agentic CLI backend you have installed — Claude Code, Codex, or Grok CLI (or a Claude subscription) — augmented with Playwright MCP for real browser-based proof, and a reinforcement-learning loop that gets smarter every run.

The previous Python orchestration now lives in legacy/.

🦀 v3.4.0 — Rust multi-model harness. A new high-performance harness lives in neurosploit-rs/: a single Rust binary (tokio + axum) that drives a pool of LLM models with concurrency, provider failover, and N-model validator voting (N models must agree a finding is real before it counts). It serves its own solid web dashboard. Build & run:
cd neurosploit-rs && cargo build --release
./target/release/neurosploit serve            # web dashboard → :8788
./target/release/neurosploit run https://target.example --model anthropic:claude-opus-4-8 --model openai:gpt-5.1
./target/release/neurosploit run https://t.example --offline   # pipeline self-test, no API keys
11 OpenAI-compatible providers / 31 models (Claude, GPT, Grok, NVIDIA NIM, DeepSeek, Mistral, Qwen, Groq, Together, OpenRouter, Ollama). Reads the same agents_md/ library (213 agents).

Why this architecture

Old (≤ v3.2.4)	New (v3.3.0)
2,500-line Python orchestrator + hand-coded agent classes	Markdown agents + thin engine
One embedded LLM loop	Pluggable agentic CLI backends (Claude/Codex/Grok)
Provider SDK juggling	Backend owns the agent loop; engine just composes & collects
Static agent list	RL-weighted, recon-aware agent selection
Reflection-based "evidence"	Playwright MCP proof-of-execution + adversarial validation

How it works

          ┌──────────────────────────────────────────────────────────────┐
   URL ──▶ │  neurosploit (terminal)                                       │
          │     │                                                          │
          │     ▼                                                          │
          │  orchestrator ── loads agents_md/ (213) ── applies RL weights  │
          │     │                                                          │
          │     ▼  composes ONE master prompt                              │
          │  backend (Claude Code | Codex | Grok)  ◀── Playwright MCP      │
          │     │  autonomously runs the pipeline below                    │
          │     ▼                                                          │
          │  recon → select agents → exploit → VALIDATE → filter FPs       │
          │        → severity → impact → report → RL feedback              │
          └──────────────────────────────────────────────────────────────┘
                       │                          │
                       ▼                          ▼
              results/findings.json        data/rl_state.json (learns)

The engine never fabricates findings: every candidate is independently re-exploited (meta/exploit_validator), run through an adversarial skeptic (meta/false_positive_filter), and only then scored and reported.

The agent library (`agents_md/`)

213 agents — see agents_md/REGISTRY.md.

196 vulnerability specialists (agents_md/vulns/) — each a self-contained playbook with a real methodology, payloads, CWE mapping, and a strict anti-false-positive ## System Prompt. Coverage includes the classic OWASP web set plus modern classes:
- LLM/AI security (OWASP LLM Top 10): prompt injection (direct/indirect), jailbreak, system-prompt leak, insecure output handling, RAG poisoning, tool-invocation/function-calling abuse, excessive agency, PII leakage…
- Cloud/K8s/containers: IMDS SSRF (AWS/GCP/Azure), kubelet/dashboard exposure, container & docker-socket escape, bucket takeover, IAM privesc…
- Modern API/auth: JWT alg/kid/jwk confusion, OAuth PKCE downgrade, SAML XSW, OIDC, CSWSH, refresh-token & MFA bypass, account-takeover chains…
- Advanced injection: SSTI (Jinja2/FreeMarker/Velocity/Thymeleaf), SSPP, XXE OOB, YAML/pickle deserialization, JNDI, XSLT…
- Protocol/cache/smuggling: HTTP/2 & CL.TE/TE.CL desync, h2c, web cache deception/poisoning, response splitting, path-confusion…
- Logic/crypto/supply-chain: dependency confusion, padding oracle, weak JWT secret, price/coupon/workflow abuse, exposed .git/.env/CI secrets…
17 meta-agents (agents_md/meta/): orchestrator, recon, exploit_validator, false_positive_filter, severity_assessor, impact_evaluator, reporter, rl_feedback, plus migrated expert roles.

Add your own by dropping a .md into agents_md/vulns/ (or extend the data-driven builder, scripts/build_agents.py). It is picked up automatically.

Quickstart

# 1. Have at least one agentic CLI installed: Claude Code, Codex, or Grok CLI
#    (Playwright MCP needs Node/npx)
./neurosploit backends          # show what's detected
./neurosploit agents            # {'vulns': 196, 'meta': 17, 'total': 213}

# 2. Interactive: enter a URL, pick a backend + model, go
./neurosploit

# 3. Or one-shot:
./neurosploit run https://target.example \
    --backend claude --model claude-opus-4-8 \
    --collaborator oob.your-collab.net

# 4. Preview the composed master prompt without executing the backend:
./neurosploit run https://target.example --dry-run

Outputs land in results/<target>/findings.json and reports/, and the RL state updates in data/rl_state.json.

Web dashboard

A zero-dependency (Python stdlib only) dashboard — no npm, no build step:

python3 webgui/server.py        # → http://127.0.0.1:8787

Tabs:

Run — multi-target input, backend + provider + model pickers (40 models across CLI and API providers), verbosity, RL/MCP toggles, a live execution console (shows the exact backend command and per-task activity), and findings with screenshots.
Agents — browse all 213 agents and add new .md agents from the UI; the main orchestrator picks them up on the next run.
Insights — interactive chart of RL agent weights + findings by severity.
Reports — download/preview the PDF + HTML reports (Typst engine).
Settings · API — execution mode (CLI vs API), per-provider API keys, orchestrator selection, default verbosity.

It calls neurosploit_agent directly. The previous React app and FastAPI backend were retired to legacy/ (frontend_react/, backend_fastapi/).

Backends

Backend	Binary	Autonomy flag	Subscription
Claude Code	`claude`	`--dangerously-skip-permissions`	✅ via Claude login
Codex CLI	`codex`	`--dangerously-bypass-approvals-and-sandbox`	—
Grok CLI	`grok`	`--yolo`	—

The engine auto-detects installed backends and only offers those. In the interactive flow, answering yes to "Use Claude subscription" runs Claude Code against your logged-in subscription instead of an API key.

Models

Latest models per provider live in neurosploit_agent/models.py, including the NVIDIA NIM provider (PR #28, OpenAI-compatible at https://integrate.api.nvidia.com/v1, nvapi- keys), Anthropic Claude 4.x, OpenAI, xAI Grok, Gemini, OpenRouter, and local Ollama.

Reinforcement learning

Every run produces per-agent reward signals (meta/rl_feedback + neurosploit_agent/rl.py): validated findings reward an agent (weighted by severity), rejected false positives penalize it, correct skips stay neutral. Weights are bounded [0.05, 1.0] and carry per-tech-stack affinity, so the engine learns, e.g., to prioritize ssti_jinja2 on Flask targets. State is explainable and persisted to data/rl_state.json.

Safety & authorization

NeuroSploit is for authorized security testing only. Every agent's system prompt enforces scope and proof-of-exploitation; DoS-class agents refuse to flood and require explicit rules-of-engagement. You are responsible for having written permission for any target you point it at.

Repository layout

neurosploit                 # launcher (./neurosploit)
neurosploit_agent/          # the v3.3.0 engine
  cli.py  orchestrator.py  agent_loader.py  backends.py  rl.py  mcp.py  models.py  config.py
agents_md/
  vulns/   (196)            # vulnerability specialist agents
  meta/    (17)             # orchestrator, recon, validator, scorers, reporter, RL, roles
  REGISTRY.md               # generated index
scripts/build_agents.py     # data-driven agent builder
legacy/                     # retired pre-v3.3.0 Python orchestration

See RELEASE.md for the full v3.3.0 changelog.

License

MIT.

9.8 KiB Executable File Raw Blame History