diff --git a/CLAUDE.md b/CLAUDE.md index fb60358e..366b1cd4 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -206,6 +206,42 @@ failure modes. The sidebar spans 5 files across 2 codebases (extension + server) with non-obvious ordering dependencies. The doc exists to prevent the kind of silent failures that come from not understanding the cross-component flow. +**Sidebar security stack** (layered defense against prompt injection): + +| Layer | Module | Lives in | +|-------|--------|----------| +| L1-L3 | `content-security.ts` | both server and agent — datamarking, hidden element strip, ARIA regex, URL blocklist, envelope wrapping | +| L4 | `security-classifier.ts` (TestSavantAI ONNX) | **sidebar-agent only** | +| L4b | `security-classifier.ts` (Claude Haiku transcript) | **sidebar-agent only** | +| L5 | `security.ts` (canary) | both — inject in compiled, check in agent | +| L6 | `security.ts` (combineVerdict ensemble) | both | + +**Critical constraint:** `security-classifier.ts` CANNOT be imported from the +compiled browse binary. `@huggingface/transformers` v4 requires `onnxruntime-node` +which fails to `dlopen` from Bun compile's temp extract dir. Only `security.ts` +(pure-string operations — canary, verdict combiner, attack log, status) is safe +for `server.ts`. See `~/.gstack/projects/garrytan-gstack/ceo-plans/2026-04-19-prompt-injection-guard.md` +§"Pre-Impl Gate 1 Outcome" for full architectural decision. + +**Thresholds** (in `security.ts`): +- `BLOCK: 0.85` — single-layer score that would cause BLOCK if cross-confirmed +- `WARN: 0.60` — cross-confirm threshold. When L4 AND L4b both >= 0.60 → BLOCK +- `LOG_ONLY: 0.40` — gates transcript classifier (skip Haiku when all layers < 0.40) + +**Ensemble rule:** BLOCK only when the ML content classifier AND the transcript +classifier both report >= WARN. Single-layer high confidence degrades to WARN — +this is the Stack Overflow instruction-writing FP mitigation. Canary leak +always BLOCKs (deterministic). + +**Env knobs:** +- `GSTACK_SECURITY_OFF=1` — emergency kill switch. Classifier stays off even if + warmed. Canary is still injected; just the ML scan is skipped. +- Classifier model cache: `~/.gstack/models/testsavant-small/` (112MB, first run only) +- Attack log: `~/.gstack/security/attempts.jsonl` (salted sha256 + domain only, + rotates at 10MB, 5 generations) +- Per-device salt: `~/.gstack/security/device-salt` (0600) +- Session state: `~/.gstack/security/session-state.json` (cross-process, atomic) + ## Dev symlink awareness When developing gstack, `.claude/skills/gstack` may be a symlink back to this