mirror of
https://github.com/CyberSecurityUP/NeuroSploit.git
synced 2026-06-30 07:15:30 +02:00
a5badefc29
Engine:
- Fix: inject IS_SANDBOX=1 so Claude Code's --dangerously-skip-permissions
works under root (real backend runs were exiting rc=1 immediately)
- models: expand to 40 models / 13 providers, tagged CLI vs API
(NVIDIA NIM, DeepSeek, Mistral, Qwen/DashScope, Groq, Together, OpenRouter,
Ollama, Gemini) — Qwen/DeepSeek/Llama usable via API
- backends: on_start callback surfaces the exact argv ("what runs behind it")
- orchestrator: require a Playwright screenshot per confirmed finding; collect
results/activity.json; auto-generate reports after a run
- report.py: HTML always + PDF via Typst engine (.typ source emitted too)
Web dashboard (webgui/, stdlib only — no npm/build):
- Sidebar dashboard (PentAGI-style): Run / Agents / Insights / Reports / Settings
- Multi-target runs; live execution console + per-task activity; finding cards
with screenshots; backend+provider+model pickers (CLI & API)
- Agents tab: browse 213 + add new .md agents from the UI
- Insights: interactive RL-weight + severity charts
- Reports: download/preview PDF + HTML
- Settings/API: execution mode, per-provider API keys, orchestrator, verbosity
- Endpoints: /api/agents (GET/POST), /api/rl, /api/config, /api/reports,
/reports/* + /shots/* static serving
Cleanup: retire replaced web stack (frontend React, FastAPI backend, core
orchestration, old test) to legacy/. Active engine + GUI are fully standalone.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
202 lines
8.9 KiB
Markdown
Executable File
202 lines
8.9 KiB
Markdown
Executable File
# NeuroSploit v3.3.0
|
|
|
|

|
|

|
|

|
|

|
|

|
|

|
|
|
|
**Autonomous, markdown-driven AI penetration testing.**
|
|
|
|
NeuroSploit v3.3.0 is a ground-up re-model of the pentest agent. Instead of a
|
|
monolithic Python orchestrator, it is now a **lean engine that turns a URL into
|
|
an autonomous engagement**: it composes a master prompt from a curated library
|
|
of **213 markdown agents** and hands execution to whichever **agentic CLI
|
|
backend** you have installed — **Claude Code, Codex, or Grok CLI** (or a Claude
|
|
subscription) — augmented with **Playwright MCP** for real browser-based proof,
|
|
and a **reinforcement-learning** loop that gets smarter every run.
|
|
|
|
> The previous Python orchestration now lives in [`legacy/`](legacy/README.md).
|
|
|
|
---
|
|
|
|
## Why this architecture
|
|
|
|
| Old (≤ v3.2.4) | New (v3.3.0) |
|
|
|----------------|-------------|
|
|
| 2,500-line Python orchestrator + hand-coded agent classes | Markdown agents + thin engine |
|
|
| One embedded LLM loop | Pluggable agentic CLI backends (Claude/Codex/Grok) |
|
|
| Provider SDK juggling | Backend owns the agent loop; engine just composes & collects |
|
|
| Static agent list | RL-weighted, recon-aware agent selection |
|
|
| Reflection-based "evidence" | Playwright MCP proof-of-execution + adversarial validation |
|
|
|
|
---
|
|
|
|
## How it works
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────┐
|
|
URL ──▶ │ neurosploit (terminal) │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ orchestrator ── loads agents_md/ (213) ── applies RL weights │
|
|
│ │ │
|
|
│ ▼ composes ONE master prompt │
|
|
│ backend (Claude Code | Codex | Grok) ◀── Playwright MCP │
|
|
│ │ autonomously runs the pipeline below │
|
|
│ ▼ │
|
|
│ recon → select agents → exploit → VALIDATE → filter FPs │
|
|
│ → severity → impact → report → RL feedback │
|
|
└──────────────────────────────────────────────────────────────┘
|
|
│ │
|
|
▼ ▼
|
|
results/findings.json data/rl_state.json (learns)
|
|
```
|
|
|
|
The engine never fabricates findings: every candidate is independently
|
|
re-exploited (`meta/exploit_validator`), run through an adversarial skeptic
|
|
(`meta/false_positive_filter`), and only then scored and reported.
|
|
|
|
---
|
|
|
|
## The agent library (`agents_md/`)
|
|
|
|
**213 agents** — see [`agents_md/REGISTRY.md`](agents_md/REGISTRY.md).
|
|
|
|
- **196 vulnerability specialists** (`agents_md/vulns/`) — each a self-contained
|
|
playbook with a real methodology, payloads, CWE mapping, and a strict
|
|
anti-false-positive `## System Prompt`. Coverage includes the classic OWASP
|
|
web set **plus modern classes**:
|
|
- **LLM/AI security** (OWASP LLM Top 10): prompt injection (direct/indirect),
|
|
jailbreak, system-prompt leak, insecure output handling, RAG poisoning,
|
|
tool-invocation/function-calling abuse, excessive agency, PII leakage…
|
|
- **Cloud/K8s/containers**: IMDS SSRF (AWS/GCP/Azure), kubelet/dashboard
|
|
exposure, container & docker-socket escape, bucket takeover, IAM privesc…
|
|
- **Modern API/auth**: JWT alg/kid/jwk confusion, OAuth PKCE downgrade, SAML
|
|
XSW, OIDC, CSWSH, refresh-token & MFA bypass, account-takeover chains…
|
|
- **Advanced injection**: SSTI (Jinja2/FreeMarker/Velocity/Thymeleaf), SSPP,
|
|
XXE OOB, YAML/pickle deserialization, JNDI, XSLT…
|
|
- **Protocol/cache/smuggling**: HTTP/2 & CL.TE/TE.CL desync, h2c, web cache
|
|
deception/poisoning, response splitting, path-confusion…
|
|
- **Logic/crypto/supply-chain**: dependency confusion, padding oracle, weak
|
|
JWT secret, price/coupon/workflow abuse, exposed `.git`/`.env`/CI secrets…
|
|
|
|
- **17 meta-agents** (`agents_md/meta/`): `orchestrator`, `recon`,
|
|
`exploit_validator`, `false_positive_filter`, `severity_assessor`,
|
|
`impact_evaluator`, `reporter`, `rl_feedback`, plus migrated expert roles.
|
|
|
|
Add your own by dropping a `.md` into `agents_md/vulns/` (or extend the
|
|
data-driven builder, `scripts/build_agents.py`). It is picked up automatically.
|
|
|
|
---
|
|
|
|
## Quickstart
|
|
|
|
```bash
|
|
# 1. Have at least one agentic CLI installed: Claude Code, Codex, or Grok CLI
|
|
# (Playwright MCP needs Node/npx)
|
|
./neurosploit backends # show what's detected
|
|
./neurosploit agents # {'vulns': 196, 'meta': 17, 'total': 213}
|
|
|
|
# 2. Interactive: enter a URL, pick a backend + model, go
|
|
./neurosploit
|
|
|
|
# 3. Or one-shot:
|
|
./neurosploit run https://target.example \
|
|
--backend claude --model claude-opus-4-8 \
|
|
--collaborator oob.your-collab.net
|
|
|
|
# 4. Preview the composed master prompt without executing the backend:
|
|
./neurosploit run https://target.example --dry-run
|
|
```
|
|
|
|
Outputs land in `results/<target>/findings.json` and `reports/`, and the RL
|
|
state updates in `data/rl_state.json`.
|
|
|
|
### Web dashboard
|
|
|
|
A zero-dependency (Python stdlib only) dashboard — no npm, no build step:
|
|
|
|
```bash
|
|
python3 webgui/server.py # → http://127.0.0.1:8787
|
|
```
|
|
|
|
Tabs:
|
|
- **Run** — multi-target input, backend + provider + model pickers (40 models
|
|
across CLI and API providers), verbosity, RL/MCP toggles, a live execution
|
|
console (shows the exact backend command and per-task activity), and findings
|
|
with screenshots.
|
|
- **Agents** — browse all 213 agents and **add new `.md` agents** from the UI;
|
|
the main orchestrator picks them up on the next run.
|
|
- **Insights** — interactive chart of RL agent weights + findings by severity.
|
|
- **Reports** — download/preview the **PDF + HTML** reports (Typst engine).
|
|
- **Settings · API** — execution mode (CLI vs API), per-provider API keys,
|
|
orchestrator selection, default verbosity.
|
|
|
|
It calls `neurosploit_agent` directly. The previous React app and FastAPI backend
|
|
were retired to `legacy/` (`frontend_react/`, `backend_fastapi/`).
|
|
|
|
### Backends
|
|
|
|
| Backend | Binary | Autonomy flag | Subscription |
|
|
|---------|--------|---------------|--------------|
|
|
| Claude Code | `claude` | `--dangerously-skip-permissions` | ✅ via Claude login |
|
|
| Codex CLI | `codex` | `--dangerously-bypass-approvals-and-sandbox` | — |
|
|
| Grok CLI | `grok` | `--yolo` | — |
|
|
|
|
The engine auto-detects installed backends and only offers those. In the
|
|
interactive flow, answering **yes** to "Use Claude subscription" runs Claude Code
|
|
against your logged-in subscription instead of an API key.
|
|
|
|
### Models
|
|
|
|
Latest models per provider live in `neurosploit_agent/models.py`, including the
|
|
**NVIDIA NIM** provider (PR #28, OpenAI-compatible at
|
|
`https://integrate.api.nvidia.com/v1`, `nvapi-` keys), Anthropic Claude 4.x,
|
|
OpenAI, xAI Grok, Gemini, OpenRouter, and local Ollama.
|
|
|
|
---
|
|
|
|
## Reinforcement learning
|
|
|
|
Every run produces per-agent reward signals (`meta/rl_feedback` +
|
|
`neurosploit_agent/rl.py`): validated findings reward an agent (weighted by
|
|
severity), rejected false positives penalize it, correct skips stay neutral.
|
|
Weights are bounded `[0.05, 1.0]` and carry per-tech-stack affinity, so the
|
|
engine learns, e.g., to prioritize `ssti_jinja2` on Flask targets. State is
|
|
explainable and persisted to `data/rl_state.json`.
|
|
|
|
---
|
|
|
|
## Safety & authorization
|
|
|
|
NeuroSploit is for **authorized** security testing only. Every agent's system
|
|
prompt enforces scope and proof-of-exploitation; DoS-class agents refuse to
|
|
flood and require explicit rules-of-engagement. You are responsible for having
|
|
written permission for any target you point it at.
|
|
|
|
---
|
|
|
|
## Repository layout
|
|
|
|
```
|
|
neurosploit # launcher (./neurosploit)
|
|
neurosploit_agent/ # the v3.3.0 engine
|
|
cli.py orchestrator.py agent_loader.py backends.py rl.py mcp.py models.py config.py
|
|
agents_md/
|
|
vulns/ (196) # vulnerability specialist agents
|
|
meta/ (17) # orchestrator, recon, validator, scorers, reporter, RL, roles
|
|
REGISTRY.md # generated index
|
|
scripts/build_agents.py # data-driven agent builder
|
|
legacy/ # retired pre-v3.3.0 Python orchestration
|
|
```
|
|
|
|
See [`RELEASE.md`](RELEASE.md) for the full v3.3.0 changelog.
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
MIT.
|