mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 11:45:20 +02:00
8f9bb84f3f
3 tests, ~12s hot / ~30s cold (first-run model download). Skips gracefully if ~/.gstack/models/testsavant-small/ isn't populated. Spins up real server + real sidebar-agent + PATH-shimmed mock-claude, HOME re-rooted so neither the chat history nor the attempts log leak from the user's live /open-gstack-browser session. Models dir symlinked through to the real warmed cache so the test doesn't re-download 112MB per run. Covers the half that hermetic tests can't: - real classifier (not a stub) fires on real injection text - sidebar-agent emits a reviewable security_event end-to-end - server writes the on-disk decision file - sidebar-agent's poll loop reads the file and acts - attempts.jsonl gets both block + user_overrode with matching payloadHash (dashboard can aggregate) - the raw payload never appears in attempts.jsonl (privacy contract) Caught a real bug while writing: the server loads pre-existing chat history from ~/.gstack/sidebar-sessions/, so re-rooting HOME for only the agent leaked ghost security_events from the live session into the test. Fix: re-root HOME for both processes. The harness is cleaner for future full-stack tests because of it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>