From 375a317f5f75dee5c1d87d1c1c749f350b0f0966 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Mon, 20 Apr 2026 11:09:22 +0800 Subject: [PATCH] docs: CHANGELOG hardening section + TODOS mark Read/Glob/Grep shipped MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit CHANGELOG v1.4.0.0 gains a "Hardening during ship" subsection covering the 4 adversarial-review fixes landed after the initial bump (canary split, snapshot envelope, tool-output single-layer BLOCK, Haiku tool-output context). Test count updated 243 → 280 to reflect the source-contracts + adversarial-fix regression suites. TODOS: Read/Glob/Grep tool-output scan marked SHIPPED (was P2 open). Cross-references the hardening commits so follow-up readers see the full arc. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 13 ++++++++++++- TODOS.md | 22 ++++++++++++---------- 2 files changed, 24 insertions(+), 11 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index d66abf7a..cf066286 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -22,7 +22,7 @@ If an attack fires, a centered alert-heavy banner appears, "Session terminated, | Model size shipped | 0 | **22MB** (TestSavantAI BERT-small, int8 quantized) | | Optional ensemble model | none | **721MB DeBERTa-v3** (opt-in via `GSTACK_SECURITY_ENSEMBLE=deberta`) | | BLOCK decision rule | none | **2-of-2 ML agreement** (or 2-of-3 with ensemble), prevents single-classifier false positives from killing sessions | -| Tests covering security surface | 12 | **243** (25 foundation + 23 adversarial + 10 integration + 9 classifier + 7 Playwright + 3 bench + 6 bun-native + others) | +| Tests covering security surface | 12 | **280** (25 foundation + 23 adversarial + 10 integration + 9 classifier + 7 Playwright + 3 bench + 6 bun-native + 15 source-contracts + 11 adversarial-fix regressions + others) | | Attack telemetry aggregation | local file only | **community-pulse edge function + gstack-security-dashboard CLI** | ### What actually ships @@ -38,6 +38,17 @@ If an attack fires, a centered alert-heavy banner appears, "Session terminated, * **Live Playwright integration test** pins the L1 through L6 defense-in-depth contract * **Bun-native classifier research skeleton** plus design doc — WordPiece tokenizer matching transformers.js output, benchmark harness, FFI roadmap for future 5ms native inference +### Hardening during ship + +Two independent adversarial reviewers (Claude subagent and Codex/gpt-5.4) converged on four bypass paths. All four fixed before merge: + +* **Canary stream-chunk split** — rolling-buffer detection across consecutive `text_delta` and `input_json_delta` events. Previously `.includes()` ran per-chunk, so an attacker could ask Claude to emit the canary split across two deltas and evade the check. +* **Snapshot command bypass** — `$B snapshot` emits ARIA-name output from the page, but was missing from `PAGE_CONTENT_COMMANDS`, so malicious aria-labels flowed to Claude without the trust-boundary envelope every other read path gets. +* **Tool-output single-layer BLOCK** — `combineVerdict` now accepts `{ toolOutput: true }`. On tool-result scans the Stack Overflow FP concern doesn't apply (content wasn't user-authored), so a single ML classifier at BLOCK threshold now blocks directly instead of degrading to WARN. +* **Transcript classifier tool-output context** — Haiku previously saw only `user_message + tool_calls` (empty input) on tool-result scans, so only testsavant_content got a signal. Now receives the actual tool output text and can vote. + +Also: attribute-injection fix in `escapeHtml` (escapes `"` and `'` now), `GSTACK_SECURITY_OFF=1` is now a real gate in `loadTestsavant`/`loadDeberta` (not just a doc promise), device salt cached in-process so FS-unwritable environments don't break hash correlation, tool-use registry entries evicted on `tool_result` (memory leak fix), dashboard uses `jq` for brace-balanced JSON parse when available. + ### Env knobs * `GSTACK_SECURITY_OFF=1` — emergency kill switch (canary still injected, ML skipped) diff --git a/TODOS.md b/TODOS.md index 05557d9a..6cd633a1 100644 --- a/TODOS.md +++ b/TODOS.md @@ -305,18 +305,20 @@ enabled. Default behavior unchanged (2-of-2 testsavant + transcript). #### ~~TestSavantAI + DeBERTa-v3 ensemble~~ — SHIPPED opt-in (see entry above) -#### Read/Glob/Grep tool-output injection coverage (P2) +#### ~~Read/Glob/Grep tool-output injection coverage (P2)~~ — SHIPPED -**What:** Scan content entering Claude's context via Read, Glob, Grep tools in addition to -browse commands. Codex flagged this in CEO review: "untrusted repo content read via -Read/Glob/Grep enters Claude's context." +Commits f2e80dd7 + 0098d574: sidebar-agent.ts now scans tool outputs from +Read, Glob, Grep, WebFetch, and Bash via `SCANNED_TOOLS` set. Content >= 32 +chars runs through the ML ensemble; BLOCK verdict kills the session and +emits security_event. The content-security.ts envelope path was already +wrapping browse-command output; this extension closes the non-browse path +Codex flagged. -**Why:** The sidebar agent has access to Read/Glob/Grep tools. If a project has a file -with injected instructions, Claude reads it and acts on it — the content-security.ts -envelope wrapping doesn't fire on non-browse-output paths. - -**Effort:** M (human: ~1w / CC: ~2h) -**Priority:** P2 +During /ship for v1.4.0.0 this path got additional hardening (commit +407c36b4 + 88b12c2b + c51ebdf4): transcript classifier now receives the +tool output text (was empty before), and combineVerdict accepts a +`toolOutput: true` opt that blocks on a single ML classifier at BLOCK +threshold (user-input default unchanged for SO-FP mitigation). #### ~~Adversarial + integration + smoke-bench test suites (P1)~~ — SHIPPED