mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-05 21:25:27 +02:00
merge: origin/main into garrytan/injection-tuning; bump v1.5.1.0 → v1.5.2.0
Main shipped v1.5.1.0 for /make-pdf entity + font fixes while this branch was in flight, creating a version collision. Resolving by bumping this branch's security tuning release to v1.5.2.0 (next PATCH after main's v1.5.1.0) and retaining both CHANGELOG entries: my v1.5.2.0 on top, main's v1.5.1.0 below. Updated v1.5.1.0 → v1.5.2.0 references in security.ts, security-classifier.ts, adversarial.test.ts, bench-ensemble.test.ts, bench-ensemble-live.test.ts, bench.test.ts, and TODOS.md. Main's CHANGELOG entry left untouched. All 231 security tests + fixture-replay gate still pass: TP=146 FN=114 FP=55 TN=185 → 56.2% / 22.9% → GATE PASS Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -241,11 +241,11 @@ defend the compiled-side ingress.
|
||||
|
||||
### ML Prompt Injection Classifier — v2 Follow-ups
|
||||
|
||||
#### ~~Cut Haiku false-positive rate from 44% toward ~15% (P0)~~ — SHIPPED in v1.5.1.0
|
||||
#### ~~Cut Haiku false-positive rate from 44% toward ~15% (P0)~~ — SHIPPED in v1.5.2.0
|
||||
|
||||
Measured result (500-case BrowseSafe-Bench smoke): detection 67.3% → **56.2%**, FP 44.1% → **22.9%**. Gate passes (detection ≥ 55%, FP ≤ 25%). Knobs that landed: label-first ensemble voting (verdict label trumps numeric confidence for transcript layer), hallucination guard (`verdict=block` at conf < 0.40 → warn-vote), new `THRESHOLDS.SOLO_CONTENT_BLOCK = 0.92` for label-less content classifiers, label-first extension to toolOutput path, tighter Haiku prompt + 8 few-shot exemplars, pinned Haiku model, `claude -p` spawn from `os.tmpdir()` so CLAUDE.md can't poison the classifier, timeout bumped 15s → 45s. CI gate: `browse/test/security-bench-ensemble.test.ts` replays fixture, fail-closed on missing fixture + security-layer diff. The original plan's stop-loss revert order didn't move the FP needle (FPs came from single-layer-BLOCK paths, not ensemble); the real levers turned out to be architectural (label-first) plus a new decoupled threshold.
|
||||
|
||||
See CHANGELOG.md [1.5.1.0] for the full shipped summary.
|
||||
See CHANGELOG.md [1.5.2.0] for the full shipped summary.
|
||||
|
||||
#### Original spec (pre-ship, retained for archive)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user