docs: CHANGELOG hardening section + TODOS mark Read/Glob/Grep shipped

CHANGELOG v1.4.0.0 gains a "Hardening during ship" subsection covering the 4 adversarial-review fixes landed after the initial bump (canary split, snapshot envelope, tool-output single-layer BLOCK, Haiku tool-output context). Test count updated 243 → 280 to reflect the source-contracts + adversarial-fix regression suites. TODOS: Read/Glob/Grep tool-output scan marked SHIPPED (was P2 open). Cross-references the hardening commits so follow-up readers see the full arc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-21 09:10:11 +02:00 · 2026-04-20 11:09:22 +08:00
parent c51ebdf456
commit 375a317f5f
2 changed files with 24 additions and 11 deletions
@@ -22,7 +22,7 @@ If an attack fires, a centered alert-heavy banner appears, "Session terminated,
 | Model size shipped | 0 | **22MB** (TestSavantAI BERT-small, int8 quantized) |
 | Optional ensemble model | none | **721MB DeBERTa-v3** (opt-in via `GSTACK_SECURITY_ENSEMBLE=deberta`) |
 | BLOCK decision rule | none | **2-of-2 ML agreement** (or 2-of-3 with ensemble), prevents single-classifier false positives from killing sessions |
-| Tests covering security surface | 12 | **243** (25 foundation + 23 adversarial + 10 integration + 9 classifier + 7 Playwright + 3 bench + 6 bun-native + others) |
+| Tests covering security surface | 12 | **280** (25 foundation + 23 adversarial + 10 integration + 9 classifier + 7 Playwright + 3 bench + 6 bun-native + 15 source-contracts + 11 adversarial-fix regressions + others) |
 | Attack telemetry aggregation | local file only | **community-pulse edge function + gstack-security-dashboard CLI** |

 ### What actually ships
@@ -38,6 +38,17 @@ If an attack fires, a centered alert-heavy banner appears, "Session terminated,
 * **Live Playwright integration test** pins the L1 through L6 defense-in-depth contract
 * **Bun-native classifier research skeleton** plus design doc — WordPiece tokenizer matching transformers.js output, benchmark harness, FFI roadmap for future 5ms native inference

+### Hardening during ship
+
+Two independent adversarial reviewers (Claude subagent and Codex/gpt-5.4) converged on four bypass paths. All four fixed before merge:
+
+* **Canary stream-chunk split** — rolling-buffer detection across consecutive `text_delta` and `input_json_delta` events. Previously `.includes()` ran per-chunk, so an attacker could ask Claude to emit the canary split across two deltas and evade the check.
+* **Snapshot command bypass** — `$B snapshot` emits ARIA-name output from the page, but was missing from `PAGE_CONTENT_COMMANDS`, so malicious aria-labels flowed to Claude without the trust-boundary envelope every other read path gets.
+* **Tool-output single-layer BLOCK** — `combineVerdict` now accepts `{ toolOutput: true }`. On tool-result scans the Stack Overflow FP concern doesn't apply (content wasn't user-authored), so a single ML classifier at BLOCK threshold now blocks directly instead of degrading to WARN.
+* **Transcript classifier tool-output context** — Haiku previously saw only `user_message + tool_calls` (empty input) on tool-result scans, so only testsavant_content got a signal. Now receives the actual tool output text and can vote.
+
+Also: attribute-injection fix in `escapeHtml` (escapes `"` and `'` now), `GSTACK_SECURITY_OFF=1` is now a real gate in `loadTestsavant`/`loadDeberta` (not just a doc promise), device salt cached in-process so FS-unwritable environments don't break hash correlation, tool-use registry entries evicted on `tool_result` (memory leak fix), dashboard uses `jq` for brace-balanced JSON parse when available.
+
 ### Env knobs

 * `GSTACK_SECURITY_OFF=1` — emergency kill switch (canary still injected, ML skipped)
@@ -305,18 +305,20 @@ enabled. Default behavior unchanged (2-of-2 testsavant + transcript).

 #### ~~TestSavantAI + DeBERTa-v3 ensemble~~ — SHIPPED opt-in (see entry above)

-#### Read/Glob/Grep tool-output injection coverage (P2)
+#### ~~Read/Glob/Grep tool-output injection coverage (P2)~~ — SHIPPED

-**What:** Scan content entering Claude's context via Read, Glob, Grep tools in addition to
-browse commands. Codex flagged this in CEO review: "untrusted repo content read via
-Read/Glob/Grep enters Claude's context."
+Commits f2e80dd7 + 0098d574: sidebar-agent.ts now scans tool outputs from
+Read, Glob, Grep, WebFetch, and Bash via `SCANNED_TOOLS` set. Content >= 32
+chars runs through the ML ensemble; BLOCK verdict kills the session and
+emits security_event. The content-security.ts envelope path was already
+wrapping browse-command output; this extension closes the non-browse path
+Codex flagged.

-**Why:** The sidebar agent has access to Read/Glob/Grep tools. If a project has a file
-with injected instructions, Claude reads it and acts on it — the content-security.ts
-envelope wrapping doesn't fire on non-browse-output paths.
-
-**Effort:** M (human: ~1w / CC: ~2h)
-**Priority:** P2
+During /ship for v1.4.0.0 this path got additional hardening (commit
+407c36b4 + 88b12c2b + c51ebdf4): transcript classifier now receives the
+tool output text (was empty before), and combineVerdict accepts a
+`toolOutput: true` opt that blocks on a single ML classifier at BLOCK
+threshold (user-input default unchanged for SO-FP mitigation).

 #### ~~Adversarial + integration + smoke-bench test suites (P1)~~ — SHIPPED