gstack/browse/test at b4e49d080d30d26658ab0ca3713a699a235ea0e5 - gstack - MS-GitHub-Backup (Gitea)

CalvinBackup/gstack

mirror of https://github.com/garrytan/gstack.git synced 2026-08-02 12:28:36 +02:00

Files

T

History

Garry TanandClaude Opus 4.7 afc6661f8c test(security): add BrowseSafe-Bench smoke harness (v1 baseline)

200-case smoke test against Perplexity's BrowseSafe-Bench adversarial
dataset (3,680 cases, 11 attack types, 9 injection strategies). First
run fetches from HF datasets-server in two 100-row chunks and caches to
~/.gstack/cache/browsesafe-bench-smoke/test-rows.json — subsequent runs
are hermetic.

V1 baseline (recorded via console.log for regression tracking):
  * Detection rate: ~15% at WARN=0.6
  * FP rate: ~12%
  * Detection > FP rate (non-zero signal separation)

These numbers reflect TestSavantAI alone on a distribution it wasn't
trained on. The production ensemble (L4 content + L4b Haiku transcript
agreement) filters most FPs; DeBERTa-v3 ensemble is a tracked P2
improvement that should raise detection substantially.

Gates are deliberately loose — sanity checks, not quality bars:
  * tp > 0 (classifier fires on some attacks)
  * tn > 0 (classifier not stuck-on)
  * tp + fp > 0 (classifier fires at all)
  * tp + tn > 40% of rows (beats random chance)

Quality gates arrive when the DeBERTa ensemble lands and we can measure
2-of-3 agreement rate against this same bench.

Model cache gate via test.skipIf(!ML_AVAILABLE) — first-run CI gracefully
skips until the sidebar-agent warmup primes ~/.gstack/models/testsavant-
small/. Documented in the test file head comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-20 04:50:53 +08:00

..

feat: browser data platform for AI agents (v0.16.0.0) (#907 )

2026-04-08 00:41:55 -07:00

activity.test.ts

feat: headed mode + sidebar agent + Chrome extension (v0.12.0) (#517 )

2026-03-26 11:15:24 -06:00

adversarial-security.test.ts

fix: security audit remediation — 12 fixes, 20 tests (v0.13.1.0) (#595 )

2026-03-28 08:35:24 -06:00

batch.test.ts

refactor: extract TabSession for per-tab state isolation (v0.15.16.0) (#873 )

2026-04-07 00:23:36 -07:00

browser-manager-unit.test.ts

feat: headed mode + sidebar agent + Chrome extension (v0.12.0) (#517 )

2026-03-26 11:15:24 -06:00

build.test.ts

fix: ngrok Windows build + close CI error-swallowing gap (v0.18.0.1) (#1024 )

2026-04-16 13:49:04 -07:00

bun-polyfill.test.ts

fix: Windows support — Node.js server fallback for Playwright (#255 )

2026-03-20 12:22:11 -07:00

commands.test.ts

feat(browse): Puppeteer parity — load-html, screenshot --selector, viewport --scale, file:// (v1.1.0.0) (#1062 )

2026-04-18 23:25:33 +08:00

compare-board.test.ts

refactor: extract TabSession for per-tab state isolation (v0.15.16.0) (#873 )

2026-04-07 00:23:36 -07:00

config.test.ts

fix: Windows browse — health-check-first ensureServer, detached startServer, Windows process mgmt (v0.11.11.0) (#431 )

2026-03-24 00:38:10 -07:00

content-security.test.ts

feat: content security — 4-layer prompt injection defense for pair-agent (#815 )

2026-04-06 14:41:06 -07:00

cookie-import-browser.test.ts

feat: Wave 3 — community bug fixes & platform support (v0.11.6.0) (#359 )

2026-03-23 22:15:23 -07:00

cookie-picker-routes.test.ts

community wave: 6 PRs + hardening (v0.18.1.0) (#1028 )

2026-04-17 00:45:13 -07:00

data-platform.test.ts

feat: browser data platform for AI agents (v0.16.0.0) (#907 )

2026-04-08 00:41:55 -07:00

dx-polish.test.ts

feat(browse): Puppeteer parity — load-html, screenshot --selector, viewport --scale, file:// (v1.1.0.0) (#1062 )

2026-04-18 23:25:33 +08:00

error-handling.test.ts

refactor: AI slop reduction with cross-model quality review (v0.16.3.0) (#941 )

2026-04-10 17:13:15 -10:00

file-drop.test.ts

feat: headed mode + sidebar agent + Chrome extension (v0.12.0) (#517 )

2026-03-26 11:15:24 -06:00

find-browse.test.ts

feat: multi-agent support — gstack works on Codex, Gemini CLI, and Cursor (v0.9.0) (#226 )

2026-03-19 18:20:50 -07:00

findport.test.ts

feat: community PRs — faster install, skill namespacing, uninstall, Codex fallback, Windows fix, Python patterns (v0.12.9.0) (#561 )

2026-03-27 00:44:37 -06:00

gstack-config.test.ts

feat: composable skills — INVOKE_SKILL resolver + factoring infrastructure (v0.13.7.0) (#644 )

2026-03-29 23:35:17 -06:00

gstack-update-check.test.ts

fix: community PRs + security hardening + E2E stability (v0.12.7.0) (#552 )

2026-03-26 23:21:27 -06:00

handoff.test.ts

refactor: extract TabSession for per-tab state isolation (v0.15.16.0) (#873 )

2026-04-07 00:23:36 -07:00

learnings-injection.test.ts

fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847 )

2026-04-06 00:47:04 -07:00

path-validation.test.ts

fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847 )

2026-04-06 00:47:04 -07:00

platform.test.ts

fix: Windows support — Node.js server fallback for Playwright (#255 )

2026-03-20 12:22:11 -07:00

security-adversarial.test.ts

test(security): adversarial suite for canary + ensemble combiner

2026-04-20 04:18:48 +08:00

security-audit-r2.test.ts

feat(browse): Puppeteer parity — load-html, screenshot --selector, viewport --scale, file:// (v1.1.0.0) (#1062 )

2026-04-18 23:25:33 +08:00

security-bench.test.ts

test(security): add BrowseSafe-Bench smoke harness (v1 baseline)

2026-04-20 04:50:53 +08:00

security-classifier.test.ts

test(security): classifier gating + status contract (9 tests)

2026-04-20 04:21:17 +08:00

security-integration.test.ts

test(security): integration suite — content-security.ts + security.ts coexistence

2026-04-20 04:20:14 +08:00

security-live-playwright.test.ts

test(security): live Playwright integration — defense-in-depth E5 contract

2026-04-20 04:44:07 +08:00

security.test.ts

test(security): add security.ts unit tests (25 tests, 62 assertions)

2026-04-19 19:06:52 +08:00

server-auth.test.ts

fix: cookie picker auth token leak (v0.15.17.0) (#904 )

2026-04-08 10:10:13 -07:00

sidebar-agent-roundtrip.test.ts

fix: sidebar agent uses real tab URL instead of stale Playwright URL (v0.12.6.0) (#544 )

2026-03-26 22:07:03 -06:00

sidebar-agent.test.ts

test(sidebar-agent): regex-tolerant destructure check

2026-04-20 04:32:23 +08:00

sidebar-integration.test.ts

fix: sidebar agent uses real tab URL instead of stale Playwright URL (v0.12.6.0) (#544 )

2026-03-26 22:07:03 -06:00

sidebar-security.test.ts

test(security): assert tool-result ML scan surface (Read/Glob/Grep ingress)

2026-04-20 04:42:20 +08:00

sidebar-unit.test.ts

fix: sidebar agent uses real tab URL instead of stale Playwright URL (v0.12.6.0) (#544 )

2026-03-26 22:07:03 -06:00

sidebar-ux.test.ts

fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847 )

2026-04-06 00:47:04 -07:00

snapshot.test.ts

refactor: extract TabSession for per-tab state isolation (v0.15.16.0) (#873 )

2026-04-07 00:23:36 -07:00

state-ttl.test.ts

fix: security audit remediation — 12 fixes, 20 tests (v0.13.1.0) (#595 )

2026-03-28 08:35:24 -06:00

tab-isolation.test.ts

feat: browser data platform for AI agents (v0.16.0.0) (#907 )

2026-04-08 00:41:55 -07:00

test-server.ts

feat: Phase 3.5 — cookie import, QA testing, team retro (v0.3.1) (#29 )

2026-03-13 00:31:41 -07:00

token-registry.test.ts

feat: browser data platform for AI agents (v0.16.0.0) (#907 )

2026-04-08 00:41:55 -07:00

url-validation.test.ts

feat(browse): Puppeteer parity — load-html, screenshot --selector, viewport --scale, file:// (v1.1.0.0) (#1062 )

2026-04-18 23:25:33 +08:00

watch.test.ts

feat: headed mode + sidebar agent + Chrome extension (v0.12.0) (#517 )

2026-03-26 11:15:24 -06:00

watchdog.test.ts

community wave: 6 PRs + hardening (v0.18.1.0) (#1028 )

2026-04-17 00:45:13 -07:00

welcome-page.test.ts

feat: GStack Browser — double-click AI browser with anti-bot stealth (#695 )

2026-04-04 10:17:05 -07:00