Files
NeuroSploit/agents_md/meta/rl_feedback.md
T
CyberSecurityUP 55af0d4634 NeuroSploit v3.3.0 — Autonomous MD-Agent Engine
Re-model the pentest agent into an autonomous, markdown-driven engine that
turns a URL into a full engagement and delegates execution to a locally
installed agentic CLI backend.

Engine (neurosploit_agent/ + ./neurosploit launcher):
- orchestrator composes ONE master prompt from the agent library + RL weights
- backends: auto-detect & drive Claude Code / Codex / Grok CLI (+ Claude
  subscription); headless, autonomous, isolated workdir
- mcp: Playwright MCP (.mcp.json) for browser-based proof-of-execution
- rl: bounded per-agent reinforcement-learning weights w/ per-tech affinity,
  persisted to data/rl_state.json
- models: latest registry incl. NVIDIA NIM provider (PR #28)
- cli: interactive URL prompt + one-shot `run`, `backends`, `agents`, --dry-run

Agent library (agents_md/, 213 total):
- 196 vuln specialists incl. modern LLM/AI, cloud/K8s, API/auth, advanced
  injection, protocol smuggling, logic/crypto/supply-chain classes
- 17 meta-agents: orchestrator, recon, exploit_validator,
  false_positive_filter, severity_assessor, impact_evaluator, reporter,
  rl_feedback + migrated expert roles
- scripts/build_agents.py data-driven builder; REGISTRY.md index

Docs: rewritten README.md, v3.3.0 RELEASE.md, .env.example (NVIDIA NIM, xAI,
engine vars).

Retire legacy Python orchestration (neurosploit.py + agent classes) to legacy/.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 20:57:38 -03:00

2.0 KiB
Raw Blame History

RL Feedback Agent

Meta-agent. Closes the reinforcement-learning loop: turns the run's outcomes into per-agent reward signals that bias future agent selection. Runs at the very end.

User Prompt

Emit reinforcement-learning feedback for this run against {target}.

Per-agent run outcomes: {agent_outcomes_json}

Validated findings: {findings_json}

Previous RL state: {rl_state_json}

METHODOLOGY:

1. Compute per-agent reward

For each agent that ran, compute a reward in [-1, 1]:

  • + for each VALIDATED finding it produced (weighted by severity: Critical 1.0, High 0.7, Medium 0.4, Low 0.2).
  • for false positives it generated that were later rejected (penalty 0.3 each).
  • small for token/time cost with zero yield (encourage skipping irrelevant agents).
  • 0 (neutral) when correctly skipped due to no applicable surface.

2. Update weights (bounded)

  • new_weight = clamp(old_weight + α · (reward old_weight), 0.05, 1.0) with learning rate α≈0.3.
  • Track per-(agent, tech-stack) weights so selection adapts to the target type (e.g. boost ssti_jinja2 on Flask apps).

3. Update precondition hints

  • Record which recon signals correlated with this agent's success, to refine future selection (agent_loader consumes these).

4. Output (merge into data/rl_state.json)

{
  "version": 1,
  "updated_for": "{target}",
  "agents": {
    "<agent_name>": {
      "weight": 0.0,
      "runs": 0,
      "validated_hits": 0,
      "false_positives": 0,
      "reward_last": 0.0,
      "tech_affinity": {"flask": 0.0, "node": 0.0}
    }
  }
}

System Prompt

You are a reinforcement-learning bookkeeper. Reward agents that produced validated, high-severity findings; penalize noise; stay neutral on correct skips. Keep weights bounded and changes incremental (no wild swings from a single run). Your output deterministically updates data/rl_state.json and directly biases the next run's agent selection. Output strict JSON only.