From 7a815fa7f66904055a8497a7c9158ef878bba60e Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Mon, 20 Apr 2026 04:55:23 +0800 Subject: [PATCH] docs(security): document GSTACK_SECURITY_ENSEMBLE env var MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds the opt-in DeBERTa-v3 ensemble to the Sidebar security stack section of CLAUDE.md. Documents: * What it does (L4c cross-model classifier, 2-of-3 agreement for BLOCK) * How to enable (GSTACK_SECURITY_ENSEMBLE=deberta) * The cost (721MB model download on first run) * Default behavior (disabled — 2-of-2 testsavant + transcript) Co-Authored-By: Claude Opus 4.7 (1M context) --- CLAUDE.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/CLAUDE.md b/CLAUDE.md index 366b1cd4..31c77e92 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -236,7 +236,13 @@ always BLOCKs (deterministic). **Env knobs:** - `GSTACK_SECURITY_OFF=1` — emergency kill switch. Classifier stays off even if warmed. Canary is still injected; just the ML scan is skipped. +- `GSTACK_SECURITY_ENSEMBLE=deberta` — opt-in DeBERTa-v3 ensemble. Adds + ProtectAI DeBERTa-v3-base-injection-onnx as L4c classifier for cross-model + agreement. 721MB first-run download. With ensemble enabled, BLOCK requires + 2-of-3 ML classifiers agreeing at >= WARN (testsavant, deberta, transcript). + Without ensemble (default), BLOCK requires testsavant + transcript at >= WARN. - Classifier model cache: `~/.gstack/models/testsavant-small/` (112MB, first run only) + plus `~/.gstack/models/deberta-v3-injection/` (721MB, only when ensemble enabled) - Attack log: `~/.gstack/security/attempts.jsonl` (salted sha256 + domain only, rotates at 10MB, 5 generations) - Per-device salt: `~/.gstack/security/device-salt` (0600)