docs: CHANGELOG hardening section + TODOS mark Read/Glob/Grep shipped

CHANGELOG v1.4.0.0 gains a "Hardening during ship" subsection covering
the 4 adversarial-review fixes landed after the initial bump (canary
split, snapshot envelope, tool-output single-layer BLOCK, Haiku
tool-output context). Test count updated 243 → 280 to reflect the
source-contracts + adversarial-fix regression suites.

TODOS: Read/Glob/Grep tool-output scan marked SHIPPED (was P2 open).
Cross-references the hardening commits so follow-up readers see the
full arc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-20 11:09:22 +08:00
parent c51ebdf456
commit 375a317f5f
2 changed files with 24 additions and 11 deletions
+12 -1
View File
@@ -22,7 +22,7 @@ If an attack fires, a centered alert-heavy banner appears, "Session terminated,
| Model size shipped | 0 | **22MB** (TestSavantAI BERT-small, int8 quantized) |
| Optional ensemble model | none | **721MB DeBERTa-v3** (opt-in via `GSTACK_SECURITY_ENSEMBLE=deberta`) |
| BLOCK decision rule | none | **2-of-2 ML agreement** (or 2-of-3 with ensemble), prevents single-classifier false positives from killing sessions |
| Tests covering security surface | 12 | **243** (25 foundation + 23 adversarial + 10 integration + 9 classifier + 7 Playwright + 3 bench + 6 bun-native + others) |
| Tests covering security surface | 12 | **280** (25 foundation + 23 adversarial + 10 integration + 9 classifier + 7 Playwright + 3 bench + 6 bun-native + 15 source-contracts + 11 adversarial-fix regressions + others) |
| Attack telemetry aggregation | local file only | **community-pulse edge function + gstack-security-dashboard CLI** |
### What actually ships
@@ -38,6 +38,17 @@ If an attack fires, a centered alert-heavy banner appears, "Session terminated,
* **Live Playwright integration test** pins the L1 through L6 defense-in-depth contract
* **Bun-native classifier research skeleton** plus design doc — WordPiece tokenizer matching transformers.js output, benchmark harness, FFI roadmap for future 5ms native inference
### Hardening during ship
Two independent adversarial reviewers (Claude subagent and Codex/gpt-5.4) converged on four bypass paths. All four fixed before merge:
* **Canary stream-chunk split** — rolling-buffer detection across consecutive `text_delta` and `input_json_delta` events. Previously `.includes()` ran per-chunk, so an attacker could ask Claude to emit the canary split across two deltas and evade the check.
* **Snapshot command bypass**`$B snapshot` emits ARIA-name output from the page, but was missing from `PAGE_CONTENT_COMMANDS`, so malicious aria-labels flowed to Claude without the trust-boundary envelope every other read path gets.
* **Tool-output single-layer BLOCK**`combineVerdict` now accepts `{ toolOutput: true }`. On tool-result scans the Stack Overflow FP concern doesn't apply (content wasn't user-authored), so a single ML classifier at BLOCK threshold now blocks directly instead of degrading to WARN.
* **Transcript classifier tool-output context** — Haiku previously saw only `user_message + tool_calls` (empty input) on tool-result scans, so only testsavant_content got a signal. Now receives the actual tool output text and can vote.
Also: attribute-injection fix in `escapeHtml` (escapes `"` and `'` now), `GSTACK_SECURITY_OFF=1` is now a real gate in `loadTestsavant`/`loadDeberta` (not just a doc promise), device salt cached in-process so FS-unwritable environments don't break hash correlation, tool-use registry entries evicted on `tool_result` (memory leak fix), dashboard uses `jq` for brace-balanced JSON parse when available.
### Env knobs
* `GSTACK_SECURITY_OFF=1` — emergency kill switch (canary still injected, ML skipped)
+12 -10
View File
@@ -305,18 +305,20 @@ enabled. Default behavior unchanged (2-of-2 testsavant + transcript).
#### ~~TestSavantAI + DeBERTa-v3 ensemble~~ — SHIPPED opt-in (see entry above)
#### Read/Glob/Grep tool-output injection coverage (P2)
#### ~~Read/Glob/Grep tool-output injection coverage (P2)~~ — SHIPPED
**What:** Scan content entering Claude's context via Read, Glob, Grep tools in addition to
browse commands. Codex flagged this in CEO review: "untrusted repo content read via
Read/Glob/Grep enters Claude's context."
Commits f2e80dd7 + 0098d574: sidebar-agent.ts now scans tool outputs from
Read, Glob, Grep, WebFetch, and Bash via `SCANNED_TOOLS` set. Content >= 32
chars runs through the ML ensemble; BLOCK verdict kills the session and
emits security_event. The content-security.ts envelope path was already
wrapping browse-command output; this extension closes the non-browse path
Codex flagged.
**Why:** The sidebar agent has access to Read/Glob/Grep tools. If a project has a file
with injected instructions, Claude reads it and acts on it — the content-security.ts
envelope wrapping doesn't fire on non-browse-output paths.
**Effort:** M (human: ~1w / CC: ~2h)
**Priority:** P2
During /ship for v1.4.0.0 this path got additional hardening (commit
407c36b4 + 88b12c2b + c51ebdf4): transcript classifier now receives the
tool output text (was empty before), and combineVerdict accepts a
`toolOutput: true` opt that blocks on a single ML classifier at BLOCK
threshold (user-input default unchanged for SO-FP mitigation).
#### ~~Adversarial + integration + smoke-bench test suites (P1)~~ — SHIPPED