11 tests pinning the four fixes so future refactors don't silently
re-open the bypasses:
- Canary rolling-buffer detection (DeltaBuffer + slice tail)
- Tool-output single-layer BLOCK (new combineVerdict opt)
- escapeHtml quote escaping (both " and ')
- snapshot in PAGE_CONTENT_COMMANDS
- GSTACK_SECURITY_OFF kill switch gates both load paths
- checkTranscript.tool_output plumbing on tool-result scan
Most are source-level string contracts (not behavior) because the
alternative — real browser/subprocess wiring — would push these into
periodic-tier eval cost. The contracts catch the regression I care
about: did someone rename the flag or revert the guard.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>