mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
docs: CHANGELOG hardening section + TODOS mark Read/Glob/Grep shipped
CHANGELOG v1.4.0.0 gains a "Hardening during ship" subsection covering the 4 adversarial-review fixes landed after the initial bump (canary split, snapshot envelope, tool-output single-layer BLOCK, Haiku tool-output context). Test count updated 243 → 280 to reflect the source-contracts + adversarial-fix regression suites. TODOS: Read/Glob/Grep tool-output scan marked SHIPPED (was P2 open). Cross-references the hardening commits so follow-up readers see the full arc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+12
-1
@@ -22,7 +22,7 @@ If an attack fires, a centered alert-heavy banner appears, "Session terminated,
|
||||
| Model size shipped | 0 | **22MB** (TestSavantAI BERT-small, int8 quantized) |
|
||||
| Optional ensemble model | none | **721MB DeBERTa-v3** (opt-in via `GSTACK_SECURITY_ENSEMBLE=deberta`) |
|
||||
| BLOCK decision rule | none | **2-of-2 ML agreement** (or 2-of-3 with ensemble), prevents single-classifier false positives from killing sessions |
|
||||
| Tests covering security surface | 12 | **243** (25 foundation + 23 adversarial + 10 integration + 9 classifier + 7 Playwright + 3 bench + 6 bun-native + others) |
|
||||
| Tests covering security surface | 12 | **280** (25 foundation + 23 adversarial + 10 integration + 9 classifier + 7 Playwright + 3 bench + 6 bun-native + 15 source-contracts + 11 adversarial-fix regressions + others) |
|
||||
| Attack telemetry aggregation | local file only | **community-pulse edge function + gstack-security-dashboard CLI** |
|
||||
|
||||
### What actually ships
|
||||
@@ -38,6 +38,17 @@ If an attack fires, a centered alert-heavy banner appears, "Session terminated,
|
||||
* **Live Playwright integration test** pins the L1 through L6 defense-in-depth contract
|
||||
* **Bun-native classifier research skeleton** plus design doc — WordPiece tokenizer matching transformers.js output, benchmark harness, FFI roadmap for future 5ms native inference
|
||||
|
||||
### Hardening during ship
|
||||
|
||||
Two independent adversarial reviewers (Claude subagent and Codex/gpt-5.4) converged on four bypass paths. All four fixed before merge:
|
||||
|
||||
* **Canary stream-chunk split** — rolling-buffer detection across consecutive `text_delta` and `input_json_delta` events. Previously `.includes()` ran per-chunk, so an attacker could ask Claude to emit the canary split across two deltas and evade the check.
|
||||
* **Snapshot command bypass** — `$B snapshot` emits ARIA-name output from the page, but was missing from `PAGE_CONTENT_COMMANDS`, so malicious aria-labels flowed to Claude without the trust-boundary envelope every other read path gets.
|
||||
* **Tool-output single-layer BLOCK** — `combineVerdict` now accepts `{ toolOutput: true }`. On tool-result scans the Stack Overflow FP concern doesn't apply (content wasn't user-authored), so a single ML classifier at BLOCK threshold now blocks directly instead of degrading to WARN.
|
||||
* **Transcript classifier tool-output context** — Haiku previously saw only `user_message + tool_calls` (empty input) on tool-result scans, so only testsavant_content got a signal. Now receives the actual tool output text and can vote.
|
||||
|
||||
Also: attribute-injection fix in `escapeHtml` (escapes `"` and `'` now), `GSTACK_SECURITY_OFF=1` is now a real gate in `loadTestsavant`/`loadDeberta` (not just a doc promise), device salt cached in-process so FS-unwritable environments don't break hash correlation, tool-use registry entries evicted on `tool_result` (memory leak fix), dashboard uses `jq` for brace-balanced JSON parse when available.
|
||||
|
||||
### Env knobs
|
||||
|
||||
* `GSTACK_SECURITY_OFF=1` — emergency kill switch (canary still injected, ML skipped)
|
||||
|
||||
@@ -305,18 +305,20 @@ enabled. Default behavior unchanged (2-of-2 testsavant + transcript).
|
||||
|
||||
#### ~~TestSavantAI + DeBERTa-v3 ensemble~~ — SHIPPED opt-in (see entry above)
|
||||
|
||||
#### Read/Glob/Grep tool-output injection coverage (P2)
|
||||
#### ~~Read/Glob/Grep tool-output injection coverage (P2)~~ — SHIPPED
|
||||
|
||||
**What:** Scan content entering Claude's context via Read, Glob, Grep tools in addition to
|
||||
browse commands. Codex flagged this in CEO review: "untrusted repo content read via
|
||||
Read/Glob/Grep enters Claude's context."
|
||||
Commits f2e80dd7 + 0098d574: sidebar-agent.ts now scans tool outputs from
|
||||
Read, Glob, Grep, WebFetch, and Bash via `SCANNED_TOOLS` set. Content >= 32
|
||||
chars runs through the ML ensemble; BLOCK verdict kills the session and
|
||||
emits security_event. The content-security.ts envelope path was already
|
||||
wrapping browse-command output; this extension closes the non-browse path
|
||||
Codex flagged.
|
||||
|
||||
**Why:** The sidebar agent has access to Read/Glob/Grep tools. If a project has a file
|
||||
with injected instructions, Claude reads it and acts on it — the content-security.ts
|
||||
envelope wrapping doesn't fire on non-browse-output paths.
|
||||
|
||||
**Effort:** M (human: ~1w / CC: ~2h)
|
||||
**Priority:** P2
|
||||
During /ship for v1.4.0.0 this path got additional hardening (commit
|
||||
407c36b4 + 88b12c2b + c51ebdf4): transcript classifier now receives the
|
||||
tool output text (was empty before), and combineVerdict accepts a
|
||||
`toolOutput: true` opt that blocks on a single ML classifier at BLOCK
|
||||
threshold (user-input default unchanged for SO-FP mitigation).
|
||||
|
||||
#### ~~Adversarial + integration + smoke-bench test suites (P1)~~ — SHIPPED
|
||||
|
||||
|
||||
Reference in New Issue
Block a user