Partial observability is now first-class:
- belief.rs — property-graph world model; nodes (host/service/vuln/exploit/cred)
carry a probability, not a boolean. Bayesian observation updates; per-node
Shannon entropy; mean-uncertainty + recon-frontier. Black-box = diffuse priors
that sharpen with observation; white-box collapses toward deterministic (MDP).
- pomdp.rs — value_of_information(), decide() (recon vs exploit falls out of
belief entropy), and may_assert() — the mathematical anti-hallucination gate:
no exploitability claim while the belief is diffuse (high entropy) → observe first.
- grounding.rs — verification engine, hard rule "no claim without a tool receipt":
empirical grounding for black-box (raw HTTP/OOB/error markers), symbolic for
white-box (file:line into reviewed source). Ungrounded claims demoted + flagged
receipt_missing (feeds future reward shaping).
- pipeline.finish(): grounding gate before reporting + belief-uncertainty readout.
- bump 3.5.0 → 3.5.1; README documents the v3.5.1 belief/grounding architecture
and the infra/bandit/reward roadmap.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Harness:
- Exploit-chaining round: after validation, chain confirmed findings into deeper
impact (SSRF→metadata, SQLi→dump→reuse, IDOR→ATO, file-read→secrets→RCE),
validate the new findings, merge. Wired into black-box and greybox.
- Latest top models surfaced: claude-opus-4-8, gpt-5.1/gpt-5.1-codex, gemini-3-pro.
REPL:
- Real line editing via rustyline: ↑/↓ command-history recall, Ctrl-A/E/K, paste;
Ctrl-C cancels the line, Ctrl-D exits. Command history persists to
data/repl_history.txt. Graceful plain-stdin fallback when not a TTY.
- /model with no arg → arrow-key multi-select (dialoguer); with arg accepts any
provider:model names.
- /key is model-aware: lists the providers your selected models need (set/missing)
and prompts for the missing keys; /key <prov> <key> still works.
- Run history persists to data/repl_runs.json and reloads across sessions
(/runs lists past + current; /results /report /status by run number).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- New persistent interactive session (app/src/repl.rs), launched when run with no args:
banner, model selection, API-key config (/key) or subscription (/sub), then a live
session to set /target, /repo, /auth, and free-text /focus instructions (or just type
them) that STEER which agents run and how.
- Slash-commands: /model /providers /key /sub /target /repo /auth /focus /mcp /votes
/agents /show /run /quit (+ bare text = focus).
- RunConfig gains `instructions` and `auth`:
* instructions bias both LLM agent-selection and the heuristic (focus keywords →
injection/access-control/etc. agents get a strong boost)
* operator directives (focus + auth) injected into recon and exploit prompts so agents
test as an authenticated user and prioritise the requested vuln classes
- bump 3.4.1 → 3.5.0 (CLI, harness, reports, credits)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>