refactor: extract TabSession for per-tab state isolation (v0.15.16.0) (#873)

* plan: batch command endpoint + multi-tab parallel execution for GStack Browser * refactor: extract TabSession from BrowserManager for per-tab state Move per-tab state (refMap, lastSnapshot, frame) into a new TabSession class. BrowserManager delegates to the active TabSession via getActiveSession(). Zero behavior change — all existing tests pass. This is the foundation for the /batch endpoint: both /command and /batch will use the same handler functions with TabSession, eliminating shared state races during parallel tab execution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: update handler signatures to use TabSession Change handleReadCommand and handleSnapshot to take TabSession instead of BrowserManager. Change handleWriteCommand to take both TabSession (per-tab ops) and BrowserManager (global ops like viewport, headers, dialog). handleMetaCommand keeps BrowserManager for tab management. Tests use thin wrapper functions that bridge the old 3-arg call pattern to the new signatures via bm.getActiveSession(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add POST /batch endpoint for parallel multi-tab execution Execute multiple commands across tabs in a single HTTP request. Commands targeting different tabs run concurrently via Promise.allSettled. Commands targeting the same tab run sequentially within that group. Features: - Batch-safe command subset (text, goto, click, snapshot, screenshot, etc.) - newtab/closetab as special commands within batch - SSE streaming mode (stream: true) for partial results - Per-command error isolation (one tab failing doesn't abort the batch) - Max 50 commands per batch, soft batch-level timeout A 143-page crawl drops from ~45 min (serial HTTP) to ~5 min (20 tabs in parallel, batched commands). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add batch endpoint integration tests 10 tests covering: - Multi-tab parallel execution (goto + text on different tabs) - Same-tab sequential ordering - Per-command error isolation (one tab fails, others succeed) - Page-scoped refs (snapshot refs are per-session, not global) - Per-tab lastSnapshot (snapshot -D with independent baselines) - getSession/getActiveSession API - Batch-safe command subset validation - closeTab via page.close preserves at-least-one-page invariant - Parallel goto on 3 tabs simultaneously Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden codex-review E2E — extract SKILL.md section, bump maxTurns to 25 The test was copying the full 55KB/1075-line codex SKILL.md into the fixture, requiring 8 Read calls just to consume it and exhausting the 15-turn budget before reaching the actual codex review command. Now extracts only the review-relevant section (~6KB/148 lines), reducing Read calls from 8 to 1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move batch endpoint plan into BROWSER.md as feature documentation The batch endpoint is implemented — document it as an actual feature in BROWSER.md (architecture, API shape, design decisions, usage pattern) and remove the standalone plan file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.16.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: gstack <ship@gstack.dev> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 19:25:10 +02:00 · 2026-04-07 00:23:36 -07:00
parent 6cc094cd41
commit 1868636f49
17 changed files with 617 additions and 152 deletions
@@ -467,8 +467,18 @@ describeIfSelected('Codex skill E2E', ['codex-review'], () => {
    run('git', ['add', 'user_controller.rb']);
    run('git', ['commit', '-m', 'add vulnerable controller']);

-    // Copy the codex skill file
-    fs.copyFileSync(path.join(ROOT, 'codex', 'SKILL.md'), path.join(codexDir, 'codex-SKILL.md'));
+    // Extract only the review-relevant section from codex SKILL.md (~120 lines vs 1075).
+    // Full SKILL.md is 55KB / ~14K tokens — takes 8 Read calls to consume, exhausting turns.
+    const full = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
+    const startMarker = '# /codex — Multi-AI Second Opinion';
+    const endMarker = '## Plan File Review Report';
+    const start = full.indexOf(startMarker);
+    const end = full.indexOf(endMarker, start);
+    const reviewSection = full.slice(
+      start >= 0 ? start : 0,
+      end > start ? end : undefined,
+    );
+    fs.writeFileSync(path.join(codexDir, 'codex-SKILL.md'), reviewSection);
  });

  afterAll(() => {
@@ -485,11 +495,11 @@ describeIfSelected('Codex skill E2E', ['codex-review'], () => {

    const result = await runSkillTest({
      prompt: `You are in a git repo on branch feature/add-vuln with changes against main.
-Read codex-SKILL.md for the /codex skill instructions.
-Run /codex review to review the current diff against main.
+Read codex-SKILL.md for the /codex review instructions (it's short — ~120 lines).
+Follow those instructions to run codex review against the diff on this branch.
 Write the full output (including the GATE verdict) to ${codexDir}/codex-output.md`,
      workingDirectory: codexDir,
-      maxTurns: 15,
+      maxTurns: 25,
      timeout: 300_000,
      testName: 'codex-review',
      runId,