mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 11:45:20 +02:00
900cc0902b
Establishes the module structure for the L5 canary and L6 verdict aggregation
layers. Pure-string operations only — safe to import from the compiled browse
binary.
Includes:
* THRESHOLDS constants (BLOCK 0.85 / WARN 0.60 / LOG_ONLY 0.40), calibrated
against BrowseSafe-Bench smoke + developer content benign corpus.
* combineVerdict() implementing the ensemble rule: BLOCK only when the ML
content classifier AND the transcript classifier both score >= WARN.
Single-layer high confidence degrades to WARN to prevent any one
classifier's false-positives from killing sessions (Stack Overflow
instruction-writing-style FPs at 0.99 on TestSavantAI alone).
* generateCanary / injectCanary / checkCanaryInStructure — session-scoped
secret token, recursively scans tool arguments, URLs, file writes, and
nested objects per the plan's all-channel coverage decision.
* logAttempt with 10MB rotation (keeps 5 generations). Salted SHA-256 hash,
per-device salt at ~/.gstack/security/device-salt (0600).
* Cross-process session state at ~/.gstack/security/session-state.json
(atomic temp+rename). Required because server.ts (compiled) and
sidebar-agent.ts (non-compiled) are separate processes.
* getStatus() for shield icon rendering via /health.
ML classifier code will live in a separate module (security-classifier.ts)
loaded only by sidebar-agent.ts — compiled browse binary cannot load the
native ONNX runtime.
Plan: ~/.gstack/projects/garrytan-gstack/ceo-plans/2026-04-19-prompt-injection-guard.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>