Files
gstack/browse
Garry Tan b4e49d080d feat(security): 3-way ensemble verdict combiner with deberta_content layer
Updates combineVerdict to support a third ML signal layer (deberta_content)
for opt-in DeBERTa-v3 ensemble. Rule becomes:

  * Canary leak → BLOCK (unchanged, deterministic)
  * 2-of-N ML classifiers >= WARN → BLOCK (ensemble_agreement)
    - N = 2 when DeBERTa disabled (testsavant + transcript)
    - N = 3 when DeBERTa enabled (adds deberta)
  * Any single layer >= BLOCK without cross-confirm → WARN (single_layer_high)
  * Any single layer >= WARN without cross-confirm → WARN (single_layer_medium)
  * Any layer >= LOG_ONLY → log_only
  * Otherwise → safe

Backward compatible: when DeBERTa signal has confidence 0 (meta.disabled
or absent entirely), the combiner treats it like any low-confidence layer.
Existing 2-of-2 ensemble path still fires for testsavant + transcript.

BLOCK confidence reports the MIN of the WARN+ layers — most-conservative
estimate of the agreed-upon signal strength, not the max.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-20 04:55:23 +08:00
..