mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-01 19:25:10 +02:00
28ce883ca5
Extends the existing telemetry pipe with 5 new flags needed for prompt
injection attack reporting:
--url-domain hostname only (never path, never query)
--payload-hash salted sha256 hex (opaque — no payload content ever)
--confidence 0-1 (awk-validated + clamped; malformed → null)
--layer testsavant_content | transcript_classifier | aria_regex | canary
--verdict block | warn | log_only
Backward compatibility:
* Existing skill_run events still work — all new fields default to null
* Event schema is a superset of the old one; downstream edge function can
filter by event_type
No new auth, no new SDK, no new Supabase migration. The same tier gating
(community → upload, anonymous → local only, off → no-op) and the same
sync daemon carry the attack events. This is the "E6 RESOLVED" path from
the CEO plan — riding the existing pipe instead of spinning up parallel infra.
Verified end-to-end:
* attack_attempt event with all fields emits correctly to skill-usage.jsonl
* skill_run event with no security flags still works (backward compat)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>