docs(security): document GSTACK_SECURITY_ENSEMBLE env var

Adds the opt-in DeBERTa-v3 ensemble to the Sidebar security stack section
of CLAUDE.md. Documents:

  * What it does (L4c cross-model classifier, 2-of-3 agreement for BLOCK)
  * How to enable (GSTACK_SECURITY_ENSEMBLE=deberta)
  * The cost (721MB model download on first run)
  * Default behavior (disabled — 2-of-2 testsavant + transcript)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-20 04:55:23 +08:00
parent 4e0516031b
commit 7a815fa7f6
+6
View File
@@ -236,7 +236,13 @@ always BLOCKs (deterministic).
**Env knobs:**
- `GSTACK_SECURITY_OFF=1` — emergency kill switch. Classifier stays off even if
warmed. Canary is still injected; just the ML scan is skipped.
- `GSTACK_SECURITY_ENSEMBLE=deberta` — opt-in DeBERTa-v3 ensemble. Adds
ProtectAI DeBERTa-v3-base-injection-onnx as L4c classifier for cross-model
agreement. 721MB first-run download. With ensemble enabled, BLOCK requires
2-of-3 ML classifiers agreeing at >= WARN (testsavant, deberta, transcript).
Without ensemble (default), BLOCK requires testsavant + transcript at >= WARN.
- Classifier model cache: `~/.gstack/models/testsavant-small/` (112MB, first run only)
plus `~/.gstack/models/deberta-v3-injection/` (721MB, only when ensemble enabled)
- Attack log: `~/.gstack/security/attempts.jsonl` (salted sha256 + domain only,
rotates at 10MB, 5 generations)
- Per-device salt: `~/.gstack/security/device-salt` (0600)