mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-05 05:05:08 +02:00
docs(security): document the sidebar security stack in CLAUDE.md
Adds a security section to the Browser interaction block. Covers:
* Layered defense table showing which modules live where (content-security.ts
in both contexts vs security-classifier.ts only in sidebar-agent) and why
the split exists (onnxruntime-node incompatibility with compiled Bun)
* Threshold constants (0.85 / 0.60 / 0.40) and the ensemble rule that
prevents single-classifier false-positives (the Stack Overflow FP story)
* Env knobs — GSTACK_SECURITY_OFF kill switch, cache paths, salt file,
attack log rotation, session state file
This is the "before you modify the security stack, read this" doc. It lives
next to the existing Sidebar architecture note that points at
SIDEBAR_MESSAGE_FLOW.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -206,6 +206,42 @@ failure modes. The sidebar spans 5 files across 2 codebases (extension + server)
|
||||
with non-obvious ordering dependencies. The doc exists to prevent the kind of
|
||||
silent failures that come from not understanding the cross-component flow.
|
||||
|
||||
**Sidebar security stack** (layered defense against prompt injection):
|
||||
|
||||
| Layer | Module | Lives in |
|
||||
|-------|--------|----------|
|
||||
| L1-L3 | `content-security.ts` | both server and agent — datamarking, hidden element strip, ARIA regex, URL blocklist, envelope wrapping |
|
||||
| L4 | `security-classifier.ts` (TestSavantAI ONNX) | **sidebar-agent only** |
|
||||
| L4b | `security-classifier.ts` (Claude Haiku transcript) | **sidebar-agent only** |
|
||||
| L5 | `security.ts` (canary) | both — inject in compiled, check in agent |
|
||||
| L6 | `security.ts` (combineVerdict ensemble) | both |
|
||||
|
||||
**Critical constraint:** `security-classifier.ts` CANNOT be imported from the
|
||||
compiled browse binary. `@huggingface/transformers` v4 requires `onnxruntime-node`
|
||||
which fails to `dlopen` from Bun compile's temp extract dir. Only `security.ts`
|
||||
(pure-string operations — canary, verdict combiner, attack log, status) is safe
|
||||
for `server.ts`. See `~/.gstack/projects/garrytan-gstack/ceo-plans/2026-04-19-prompt-injection-guard.md`
|
||||
§"Pre-Impl Gate 1 Outcome" for full architectural decision.
|
||||
|
||||
**Thresholds** (in `security.ts`):
|
||||
- `BLOCK: 0.85` — single-layer score that would cause BLOCK if cross-confirmed
|
||||
- `WARN: 0.60` — cross-confirm threshold. When L4 AND L4b both >= 0.60 → BLOCK
|
||||
- `LOG_ONLY: 0.40` — gates transcript classifier (skip Haiku when all layers < 0.40)
|
||||
|
||||
**Ensemble rule:** BLOCK only when the ML content classifier AND the transcript
|
||||
classifier both report >= WARN. Single-layer high confidence degrades to WARN —
|
||||
this is the Stack Overflow instruction-writing FP mitigation. Canary leak
|
||||
always BLOCKs (deterministic).
|
||||
|
||||
**Env knobs:**
|
||||
- `GSTACK_SECURITY_OFF=1` — emergency kill switch. Classifier stays off even if
|
||||
warmed. Canary is still injected; just the ML scan is skipped.
|
||||
- Classifier model cache: `~/.gstack/models/testsavant-small/` (112MB, first run only)
|
||||
- Attack log: `~/.gstack/security/attempts.jsonl` (salted sha256 + domain only,
|
||||
rotates at 10MB, 5 generations)
|
||||
- Per-device salt: `~/.gstack/security/device-salt` (0600)
|
||||
- Session state: `~/.gstack/security/session-state.json` (cross-process, atomic)
|
||||
|
||||
## Dev symlink awareness
|
||||
|
||||
When developing gstack, `.claude/skills/gstack` may be a symlink back to this
|
||||
|
||||
Reference in New Issue
Block a user