mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 11:45:20 +02:00
b96775191c
Closes the CEO plan E5 regression anchor: load the injection-combined.html
fixture in a real Chromium and verify ALL module layers fire independently.
Previously we had content-security.ts tests (L1-L3) and security.ts tests
(L4-L6) but nothing pinning that both fire on the same attack payload.
5 deterministic tests (always run):
* L2 hidden-element stripper detects the .sneaky div (opacity 0.02 +
off-screen position)
* L2b ARIA regex catches the injected aria-label on the Checkout link
* L3 URL blocklist fires on >= 2 distinct exfil domains (fixture has
webhook.site, pipedream.com, requestbin.com)
* L1 cleaned text excludes the hidden SYSTEM OVERRIDE content while
preserving the visible Premium Widget product copy
* Combined assertion — pins that removing ANY one layer breaks at least
one signal. The E5 regression-guard anchor.
2 ML tests (skipped when model cache is absent):
* L4 TestSavantAI flags the combined fixture's instruction-heavy text
* L4 does NOT flag the benign product-description baseline (no FP on
plain ecommerce copy)
ML tests gracefully skip via test.skipIf when ~/.gstack/models/testsavant-
small/onnx/model.onnx is missing — typical fresh-CI state. Prime by
running the sidebar-agent once to trigger the warmup download.
Runs in 1s total (Playwright reuses the BrowserManager across tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>