Commit Graph

257 Commits

Author SHA1 Message Date
Garry Tan 114924c91e fix: improve snapshot flags docs completeness for LLM judge
Adds $B placeholder explanation, explicit syntax line, and detailed
flag behavior (-d depth values, -s CSS selector syntax, -D unified
diff format and baseline persistence, -a screenshot vs text output
relationship). Fixes snapshot flags reference LLM eval scoring
completeness < 4.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 08:36:41 -07:00
Garry Tan 89846594b0 fix: adopt main's headed-mode /health token serving
Our merge kept the old !tunnelActive guard which conflicted with
main's security-audit-r2 tests that require no currentUrl/currentMessage
in /health. Adopts main's approach: serve token conditionally based on
headed mode or chrome-extension origin. Updates server-auth tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:51:43 -07:00
Garry Tan 27d141f357 Merge remote-tracking branch 'origin/main' into garrytan/openclaw-browser-ctrl
Resolved conflicts:
- meta-commands.ts: kept our security pipeline (scope pre-validation +
  executeCommand callback) and integrated main's watch-mode blocking
  for chain write commands
- server.ts: kept our !tunnelActive guard with security documentation
  over main's headed-mode detection approach
- package.json: took main's version

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:48:02 -07:00
Garry Tan 03973c2fab fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847)
* fix(bin): pass search params via env vars (RCE fix) (#819)

Replace shell string interpolation with process.env in gstack-learnings-search
to prevent arbitrary code execution via crafted learnings entries. Also fixes
the CROSS_PROJECT interpolation that the original PR missed.

Adds 3 regression tests verifying no shell interpolation remains in the bun -e block.

Co-authored-by: garagon <garagon@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): add path validation to upload command (#821)

Add isPathWithin() and path traversal checks to the upload command,
blocking file exfiltration via crafted upload paths. Uses existing
SAFE_DIRECTORIES constant instead of a local copy. Adds 3 regression tests.

Co-authored-by: garagon <garagon@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): symlink resolution in meta-commands validateOutputPath (#820)

Add realpathSync to validateOutputPath in meta-commands.ts to catch
symlink-based directory escapes in screenshot, pdf, and responsive
commands. Resolves SAFE_DIRECTORIES through realpathSync to handle
macOS /tmp -> /private/tmp symlinks. Existing path validation tests
pass with the hardened implementation.

Co-authored-by: garagon <garagon@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add uninstall instructions to README (#812)

Community PR #812 by @0531Kim. Adds two uninstall paths: the gstack-uninstall
script (handles everything) and manual removal steps for when the repo isn't
cloned. Includes CLAUDE.md cleanup note and Playwright cache guidance.

Co-Authored-By: 0531Kim <0531Kim@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): Windows launcher extraEnv + headed-mode token (#822)

Community PR #822 by @pieterklue. Three fixes:
1. Windows launcher now merges extraEnv into spawned server env (was
   only passing BROWSE_STATE_FILE, dropping all other env vars)
2. Welcome page fallback serves inline HTML instead of about:blank
   redirect (avoids ERR_UNSAFE_REDIRECT on Windows)
3. /health returns auth token in headed mode even without Origin header
   (fixes Playwright Chromium extensions that don't send it)

Also adds HOME/USERPROFILE fallback for cross-platform compatibility.

Co-Authored-By: pieterklue <pieterklue@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(browse): terminate orphan server when parent process exits (#808)

Community PR #808 by @mmporong. Passes BROWSE_PARENT_PID to the spawned
server process. The server polls every 15s with signal 0 and calls
shutdown() if the parent is gone. Prevents orphaned chrome-headless-shell
processes when Claude Code sessions exit abnormally.

Co-Authored-By: mmporong <mmporong@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(security): IPv6 ULA blocking, cookie redaction, per-tab cancel, targeted token (#664)

Community PR #664 by @mr-k-man (security audit round 1, new parts only).

- IPv6 ULA prefix blocking (fc00::/7) in url-validation.ts with false-positive
  guard for hostnames like fd.example.com
- Cookie value redaction for tokens, API keys, JWTs in browse cookies command
- Per-tab cancel files in killAgent() replacing broken global kill-signal
- design/serve.ts: realpathSync upgrade prevents symlink bypass in /api/reload
- extension: targeted getToken handler replaces token-in-health-broadcast
- Supabase migration 003: column-level GRANT restricts anon UPDATE scope
- Telemetry sync: upsert error logging
- 10 new tests for IPv6, cookie redaction, DNS rebinding, path traversal

Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(security): CSS injection guard, timeout clamping, session validation, tests (#806)

Community PR #806 by @mr-k-man (security audit round 2, new parts only).

- CSS value validation (DANGEROUS_CSS) in cdp-inspector, write-commands, extension inspector
- Queue file permissions (0o700/0o600) in cli, server, sidebar-agent
- escapeRegExp for frame --url ReDoS fix
- Responsive screenshot path validation with validateOutputPath
- State load cookie filtering (reject localhost/.internal/metadata cookies)
- Session ID format validation in loadSession
- /health endpoint: remove currentUrl and currentMessage fields
- QueueEntry interface + isValidQueueEntry validator for sidebar-agent
- SIGTERM->SIGKILL escalation in timeout handler
- Viewport dimension clamping (1-16384), wait timeout clamping (1s-300s)
- Cookie domain validation in cookie-import and cookie-import-browser
- DocumentFragment-based tab switching (XSS fix in sidepanel)
- pollInProgress reentrancy guard for pollChat
- toggleClass/injectCSS input validation in extension inspector
- Snapshot annotated path validation with realpathSync
- 714-line security-audit-r2.test.ts + 33-line learnings-injection.test.ts

Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.15.13.0)

Community security wave: 8 PRs from 4 contributors (@garagon, @mr-k-man,
@mmporong, @0531Kim, @pieterklue). IPv6 ULA blocking, cookie redaction,
per-tab cancel signaling, CSS injection guards, timeout clamping, session
validation, DocumentFragment XSS fix, parent process watchdog, uninstall
docs, Windows fixes, and 750+ lines of security regression tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: garagon <garagon@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: 0531Kim <0531Kim@users.noreply.github.com>
Co-authored-by: pieterklue <pieterklue@users.noreply.github.com>
Co-authored-by: mmporong <mmporong@users.noreply.github.com>
Co-authored-by: mr-k-man <mr-k-man@users.noreply.github.com>
2026-04-06 00:47:04 -07:00
Garry Tan 3acbd4a15d revert: remove batch commands CHANGELOG entry and VERSION bump
The batch endpoint work belongs on the browser-batch-multitab branch
(port-louis), not this branch. Reverting VERSION to 0.15.14.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:41:14 -07:00
Garry Tan 100c406e10 feat: add --domain flag to pair-agent CLI for domain restrictions
Allows passing --domain to pair-agent to restrict the remote agent's
navigation to specific domains (comma-separated).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:34:34 -07:00
Garry Tan 001ba59be0 refactor: checkTabAccess uses options object, add own-only tab policy
Refactors checkTabAccess(tabId, clientId, isWrite) to use an options
object { isWrite?, ownOnly? }. Adds tabPolicy === 'own-only' support
in the server command dispatch — scoped tokens with this policy are
restricted to their own tabs for all commands, not just writes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:34:32 -07:00
Garry Tan 7cf7f6e76e chore: regenerate pair-agent/SKILL.md after main merge
Vendoring deprecation section from main's template wasn't reflected
in the generated file. Fixes check-freshness CI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:27:33 -07:00
Garry Tan 8fd73ec983 Merge remote-tracking branch 'origin/main' into garrytan/openclaw-browser-ctrl
Resolved CHANGELOG.md conflict: main landed 0.15.14.0, bumped our
branch to 0.15.15.0 with the batch endpoint entry on top.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:26:53 -07:00
Garry Tan 170be8dee8 chore: bump VERSION to 0.15.14.0, add CHANGELOG entry for batch endpoint
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:26:25 -07:00
Garry Tan b3d064aabb fix: gstack-team-init detects and removes vendored copies (#848)
* fix: gstack-team-init detects and removes vendored copies in team mode

When running gstack-team-init inside a repo with a vendored
.claude/skills/gstack/, the script now auto-detects and removes it:
git rm --cached, add to .gitignore, rm -rf. Also adds team_mode config
key to setup --team/--no-team, and makes gstack-upgrade Step 4.5
team-mode aware (remove instead of sync).

Includes 5 new integration tests for the vendored copy migration.

* chore: bump version and changelog (v0.15.14.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-06 00:26:20 -07:00
Garry Tan 11d3928463 Merge remote-tracking branch 'origin/main' into garrytan/openclaw-browser-ctrl
Resolved CHANGELOG.md conflict: kept main's 0.15.13.0 Team Mode entry
on top, preserved branch's richer 0.15.12.0 Content Security entry below.
2026-04-06 00:25:52 -07:00
Garry Tan 2e3aeaf3ac refactor: consolidate Hermes into generic HTTP option in pair-agent
Hermes doesn't have a host-specific config — it uses the same generic
curl instructions as any other agent. Removing the dedicated option
simplifies the menu and eliminates a misleading distinction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:25:05 -07:00
Garry Tan 21f2a449eb fix: correct CHANGELOG date from 2026-04-06 to 2026-04-05
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 00:25:01 -07:00
Garry Tan dae251e066 feat: team-friendly gstack install mode (v0.15.7.0) (#809)
* feat: add gstack-settings-hook for atomic Claude Code hook management

DRY helper for adding/removing SessionStart hooks in ~/.claude/settings.json.
Handles missing files, deduplication, malformed JSON, and atomic writes
(.tmp + rename) to prevent corruption on crash or disk-full.

Part of team-install-mode feature (credit: Jared Friedman).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add gstack-session-update for automatic team updates

SessionStart hook target that auto-updates gstack at session start.
Background fork (zero latency), throttled to once/hour, with lockfile
(mkdir + PID), stale lock recovery, GIT_TERMINAL_PROMPT=0, and debug
logging to ~/.gstack/analytics/session-update.log.

Part of team-install-mode feature (credit: Jared Friedman).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add --team, --no-team, -q flags to setup

--team enables auto_upgrade and registers SessionStart hook via
gstack-settings-hook. --no-team reverses it. -q/--quiet suppresses
all informational output (for hook-triggered setup runs). --local
now prints a deprecation warning.

Replaces ~20 echo calls with log() helper for quiet mode support.

Part of team-install-mode feature (credit: Jared Friedman).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add gstack-team-init for repo-level team bootstrapping

Two modes: 'optional' (gentle CLAUDE.md suggestion) and 'required'
(CLAUDE.md enforcement + .claude/hooks/check-gstack.sh PreToolUse hook
that blocks work without gstack installed). Atomic JSON writes,
idempotent, prints git add instructions.

Part of team-install-mode feature (credit: Jared Friedman).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: deprecate vendoring, document team mode, clean up uninstall

- README: replace "Step 2: Add to your repo" vendoring instructions
  with team mode (./setup --team + gstack-team-init)
- CLAUDE.md: rename "Vendored symlink awareness" to "Dev symlink
  awareness", add deprecation note
- CONTRIBUTING.md: remove vendoring language from prefix section
- bin/gstack-uninstall: clean up SessionStart hook on uninstall

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add vendoring deprecation detection to skill preamble

Detects vendored gstack in CWD (.claude/skills/gstack/ that's not a
symlink and has VERSION or .git). Outputs VENDORED_GSTACK: yes/no.
Adds generateVendoringDeprecation() section that offers one-time
migration to team mode via AskUserQuestion.

Part of team-install-mode feature (credit: Jared Friedman).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files with vendoring deprecation preamble

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: team mode (v0.15.7.0) — credit Jared Friedman

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add integration tests for team mode (20 tests)

Covers gstack-settings-hook (add, remove, dedup, preserve existing,
atomic write), gstack-session-update (guards, throttle, non-fatal),
gstack-team-init (optional, required, enforcement hook, idempotent),
and setup flags (-q, --local deprecation).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 23:49:03 -07:00
Garry Tan 30d4550286 test: add source-level security tests for /batch endpoint
8 tests verifying: auth gate placement, scoped token support, max
command limit, nested batch rejection, rate limiting bypass, batch-level
activity events, command field validation, and tabId passthrough.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 23:40:47 -07:00
Garry Tan 11c397138d feat: add POST /batch endpoint for multi-command batching
Remote agents controlling GStack Browser through a tunnel pay 2-5s of
latency per HTTP round-trip. A typical "navigate and read" takes 4
sequential commands = 10-20 seconds. The /batch endpoint collapses N
commands into a single HTTP round-trip, cutting a 20-tab crawl from
~60s to ~5s.

Sequential execution through the full security pipeline (scope, domain,
tab ownership, content wrapping). Rate limiting counts the batch as 1
request. Activity events emitted at batch level, not per-command.
Max 50 commands per batch. Nested batches rejected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 23:40:44 -07:00
Garry Tan d384b095b1 Merge remote-tracking branch 'origin/main' into garrytan/openclaw-browser-ctrl
# Conflicts:
#	CHANGELOG.md
2026-04-05 23:19:57 -07:00
Garry Tan 7f25d4786b fix: verify tunnel is alive before returning URL to pair-agent
Root cause: when ngrok dies externally (pkill, crash, timeout), the server
still reports tunnelActive=true with a dead URL. pair-agent prints an
instruction block pointing at a dead tunnel. The remote agent gets
"endpoint offline" and the user has to manually restart everything.

Three-layer fix:
- Server /pair endpoint: probes tunnel URL before returning it. If dead,
  resets tunnelActive/tunnelUrl and returns null (triggers CLI restart).
- Server /tunnel/start: probes cached tunnel before returning already_active.
  If dead, falls through to restart ngrok automatically.
- CLI pair-agent: double-checks tunnel URL from server before printing
  instruction block. Falls through to auto-start on failure.

4 regression tests verify all three probe points + CLI verification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 22:58:37 -07:00
Garry Tan a94a64f821 fix: snapshot -i auto-detects dropdown/popover interactive elements (#845)
* fix: snapshot -i auto-detects dropdown/popover interactive elements

- Auto-enable cursor-interactive scan (-C) when -i flag is used
- Add floating container detection (portals, popovers, dropdowns)
  - Detects position:fixed/absolute with high z-index
  - Recognizes data-floating-ui-portal, data-radix-* attributes
  - Recognizes role=listbox, role=menu containers
- Elements inside floating containers bypass the hasRole skip
  - Catches dropdown items missed by the accessibility tree
- Role=option/menuitem elements in floating containers captured
  even without cursor:pointer/onclick
- Tag floating container items with 'popover-child' reason
- Include role name in @c ref reasons when present
- Add dropdown.html test fixture
- Add dropdown/popover detection test suite (6 tests)
- Add test: -i alone includes cursor-interactive elements

Fixes: Bookface autocomplete, Radix UI combobox, React portals,
and similar dynamic dropdown patterns where ariaSnapshot() misses
the floating content.

* chore: bump version and changelog (v0.15.12.0)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* docs: update snapshot -i/-C flag descriptions to mention auto-enable behavior

* test: strengthen clickability test guard assertions

The @c ref clickability test previously used if-guards that would
silently pass when no Alice line was found in the snapshot output.
Both Claude and Codex adversarial review flagged this as a test that
could regress without CI noticing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: regenerate top-level SKILL.md with updated flag descriptions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: root <root@localhost>
Co-authored-by: gstack <ship@gstack.dev>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-05 22:57:45 -07:00
root 237ae2abbe Revert "fix: snapshot -i auto-detects dropdown/popover interactive elements (#844)"
This reverts commit 542e7836d0.
2026-04-06 03:27:13 +00:00
Garry Tan 542e7836d0 fix: snapshot -i auto-detects dropdown/popover interactive elements (#844)
- Auto-enable cursor-interactive scan (-C) when -i flag is used
- Add floating container detection (portals, popovers, dropdowns)
  - Detects position:fixed/absolute with high z-index
  - Recognizes data-floating-ui-portal, data-radix-* attributes
  - Recognizes role=listbox, role=menu containers
- Elements inside floating containers bypass the hasRole skip
  - Catches dropdown items missed by the accessibility tree
- Role=option/menuitem elements in floating containers captured
  even without cursor:pointer/onclick
- Tag floating container items with 'popover-child' reason
- Include role name in @c ref reasons when present
- Add dropdown.html test fixture
- Add dropdown/popover detection test suite (6 tests)
- Add test: -i alone includes cursor-interactive elements

Fixes: Bookface autocomplete, Radix UI combobox, React portals,
and similar dynamic dropdown patterns where ariaSnapshot() misses
the floating content.

Co-authored-by: root <root@localhost>
2026-04-05 20:25:12 -07:00
Garry Tan 35bc7e34b1 docs: add security rationale for token in /health on localhost
Explains why this is an accepted risk (no escalation over file-based
token access), CORS protection, and tunnel guard. Prevents future
CSO scans from stripping it without providing an alternative auth path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 18:05:05 -07:00
Garry Tan 160d83ec1d test: verify /health token is localhost-only, never served through tunnel
Updated tests to match the restored token behavior:
- Test 1: token assignment exists AND is inside the !tunnelActive guard
- Test 1b: tunnel branch (else block) does not contain AUTH_TOKEN

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 17:59:35 -07:00
Garry Tan 52226dafe2 fix: restore token in /health for localhost extension auth
The CSO security fix stripped the token from /health to prevent leaking
when tunneled. But the extension needs it to authenticate on localhost.
Now returns token only when not tunneled (safe: localhost-only path).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 17:58:24 -07:00
Garry Tan 7b60c0bbe6 fix: E2E exit reason precedence + worktree prune race condition
Two fixes for E2E test reliability:

1. session-runner.ts: error_max_turns was misclassified as error_api
   because is_error flag was checked before subtype. Now known subtypes
   like error_max_turns are preserved even when is_error is set. The
   is_error override only applies when subtype=success (API failure).

2. worktree.ts: pruneStale() now skips worktrees < 1 hour old to avoid
   deleting worktrees from concurrent test runs still in progress.
   Previously any second test execution would kill the first's worktrees.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 15:45:20 -07:00
Garry Tan 8801a62339 chore: bump version and changelog (v0.15.12.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 14:25:20 -07:00
Garry Tan e8ef9a5b73 Merge remote-tracking branch 'origin/main' into garrytan/openclaw-browser-ctrl
# Conflicts:
#	CHANGELOG.md
#	VERSION
2026-04-05 14:19:22 -07:00
Garry Tan 094447d0fc fix: pair-agent skill compliance + fix all 16 pre-existing test failures
Root cause: pair-agent was added without completing the gen-skill-docs
compliance checklist. All 16 failures traced back to this.

Fixes:
- Sync package.json version to VERSION (0.15.9.0)
- Add "(gstack)" to pair-agent description for discoverability
- Add pair-agent to Codex path exception (legitimately documents ~/.codex/)
- Add CLI_COMMANDS (status, pair-agent, tunnel) to skill parser allowlist
- Regenerate SKILL.md for all hosts (claude, codex, factory, kiro, etc.)
- Update golden file baselines for ship skill
- Fix relink tests: pass GSTACK_INSTALL_DIR to auto-relink calls so they
  use the fast mock install instead of scanning real ~/.claude/skills/gstack

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 12:22:43 -07:00
Garry Tan 422f172fbb feat: ship re-run executes all verification checks (v0.15.10.0) (#833)
* feat: review army idempotency + cross-review dedup resolver

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: ship re-run executes all checks, adds review army + dedup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: regression guards for ship specialist dispatch + dedup + idempotency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.15.10.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 11:43:13 -07:00
Garry Tan 11d74038c3 test: comprehensive content security tests (47 tests)
Covers all 4 defense layers:
- Datamarking: marker format, session consistency, text-only application
- Content envelope: wrapping, ZWSP marker escaping, filter warnings
- Content filter hooks: URL blocklist, custom filters, warn/block modes
- Instruction block: SECURITY section content, ordering, generation
- Centralized wrapping: source-level verification of integration
- Chain security: recursion guard, rate-limit exemption, activity suppression
- Hidden element stripping: 7 CSS techniques, ARIA injection, false positives
- Snapshot split format: scoped vs root output, resume integration

Also fixes: visibility:hidden detection, case-insensitive ARIA pattern matching.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 11:25:19 -07:00
Garry Tan 5dd2491a2f test: add 4 prompt injection test fixtures
- injection-visible.html: visible injection in product review text
- injection-hidden.html: 7 CSS hiding techniques + ARIA injection + false positive
- injection-social.html: social engineering in legitimate-looking content
- injection-combined.html: all attack types + envelope escape attempt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 11:24:05 -07:00
Garry Tan fbe630db36 feat: add SECURITY section to pair-agent instruction block
Instructs remote agents to treat content inside untrusted envelopes
as potentially malicious. Lists common injection phrases to watch for.
Directs agents to only use @refs from the trusted INTERACTIVE ELEMENTS
section, not from page content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 11:23:36 -07:00
Garry Tan 617fe8073c feat: snapshot split output format for scoped tokens
Scoped tokens get a split snapshot: trusted @refs section (for click/fill)
separated from untrusted web content in an envelope. Ref names truncated
to 50 chars in trusted section. Root tokens unchanged (backward compat).
Resume command also uses split format for scoped tokens.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 11:23:14 -07:00
Garry Tan ec7f281a40 feat: hidden element stripping for scoped token text extraction
Detects CSS-hidden elements (opacity, font-size, off-screen, same-color,
clip-path) and ARIA label injection patterns. Marks elements with
data-gstack-hidden, extracts text from a clean clone (no DOM mutation),
then removes markers. Only active for scoped tokens on text command.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 11:08:15 -07:00
Garry Tan 5ba1472b5e feat: centralize content wrapping in handleCommandInternal response path
Single wrapping location replaces fragmented per-handler wrapping:
- Scoped tokens: content filters + datamarking + enhanced envelope
- Root tokens: existing basic wrapping (backward compat)
- Chain subcommands exempt from top-level wrapping (wrapped individually)
- Adds 'attrs' to PAGE_CONTENT_COMMANDS (ARIA value exposure defense)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 11:06:50 -07:00
Garry Tan 5184ea677b feat: add content-security.ts with datamarking, envelope, and filter hooks
Four-layer prompt injection defense for pair-agent browser sharing:
- Datamarking: session-scoped watermark for text exfiltration detection
- Content envelope: trust boundary wrapping with ZWSP marker escaping
- Content filter hooks: extensible filter pipeline with warn/block modes
- Built-in URL blocklist: requestbin, pipedream, webhook.site, etc.

BROWSE_CONTENT_FILTER env var controls mode: off|warn|block (default: warn)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 11:05:38 -07:00
Garry Tan 905f1ddd38 refactor: split handleCommand into handleCommandInternal + HTTP wrapper
Chain subcommands now route through handleCommandInternal for full security
enforcement (scope, domain, tab ownership, rate limiting, content wrapping).
Adds recursion guard for nested chains, rate-limit exemption for chain
subcommands, and activity event suppression (1 event per chain, not per sub).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 11:05:11 -07:00
Garry Tan b3cd3fd68b feat: native OpenClaw skills + ClaHub publishing (v0.15.10.0) (#832)
* feat: add 4 native OpenClaw skills for ClaHub publishing

Hand-crafted methodology skills for the OpenClaw wintermute workspace:
- gstack-openclaw-office-hours (375 lines) — 6 forcing questions, startup + builder modes
- gstack-openclaw-ceo-review (193 lines) — 4 scope modes, 18 cognitive patterns
- gstack-openclaw-investigate (136 lines) — Iron Law, 4-phase debugging
- gstack-openclaw-retro (301 lines) — git analytics, per-person praise/growth

Pure methodology, no gstack infrastructure. All frontmatter uses single-line
inline JSON for OpenClaw parser compatibility.

* feat: add AGENTS.md dispatch section with behavioral rules

Ready-to-paste section for OpenClaw AGENTS.md with 3 iron-clad rules:
1. Always spawn sessions, never redirect user to Claude Code
2. Resolve repo path or ask, don't punt
3. Autoplan runs end-to-end, reports back in chat

Includes full dispatch routing (Simple/Medium/Heavy/Full/Plan tiers).

* chore: clear OpenClaw includeSkills — native skills replace generated

Native ClaHub skills replace the gen-skill-docs pipeline output for
these 4 skills. Updated test to validate empty includeSkills array.

* docs: ClaHub install instructions + dispatch routing rules

- README: add Native OpenClaw Skills section with clawhub install command
- OPENCLAW.md: update dispatch routing with behavioral rules, update
  native skills section to reference ClaHub

* chore: bump version and changelog (v0.15.10.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add gstack-upgrade to OpenClaw dispatch routing

Ensures "upgrade gstack" routes to a Claude Code session with
/gstack-upgrade instead of Wintermute trying to handle it conversationally.

* fix: stop tracking 58MB compiled binary bin/gstack-global-discover

Already in .gitignore but was tracked due to historical mistake.
Same issue as browse/dist/ and design/dist/. The .ts source is right
next to it and ./setup builds from source for every platform.

* test: detect compiled binaries and large files tracked by git

Two new tests in skill-validation:
- No Mach-O or ELF binaries tracked (catches accidental git add of compiled output)
- No files over 2MB tracked (catches bloated binaries sneaking in)

Both print the exact git rm --cached command to fix the issue.

* fix: ClaHub → ClawHub (correct spelling)

* docs: add ClawHub publishing instructions to CLAUDE.md

Documents the clawhub publish command (not clawhub skill publish),
auth flow, version bumping, and verification.

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 10:07:03 -07:00
Garry Tan bd8d44d641 docs: rewrite README OpenClaw install — one paste, real instructions (#818)
OpenClaw section promoted to peer of Claude Code install. Single copy-paste
prompt that installs gstack for Claude Code AND wires up AGENTS.md dispatch.
Usage table shows what happens when you talk naturally. Other agents collapsed
from repeated git clones into one auto-detect command + table. Voice input
moved after 10-15 parallel sprints section.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 02:34:02 -07:00
Garry Tan e2d005c7f4 feat: OpenClaw integration v2 — prompt is the bridge (v0.15.9.0) (#816)
* feat: add includeSkills to HostConfig + update OpenClaw config

Add includeSkills allowlist field with union logic (include minus skip).
Update OpenClaw to generate only 4 native methodology skills (office-hours,
plan-ceo-review, investigate, retro). Remove staticFiles.SOUL.md reference
(pointed to non-existent file).

* feat: OpenClaw integration — gstack-lite/full generation + spawned session detection

Add includeSkills filter to gen-skill-docs pipeline. Generate gstack-lite
(planning discipline for spawned coding sessions) and gstack-full (complete
feature pipeline) for OpenClaw host. Add OPENCLAW_SESSION env var detection
in preamble for spawned session auto-detect. Update setup --host openclaw
to print redirect message.

* docs: OpenClaw architecture doc + regenerate all SKILL.md with spawned session detection

Add docs/OPENCLAW.md with 4-tier dispatch routing and integration architecture.
Generate gstack-lite and gstack-full prompt templates. Regenerate all SKILL.md
files with OPENCLAW_SESSION env var check in preamble.

* test: update golden baselines + OpenClaw includeSkills tests

Update golden SKILL.md baselines for preamble SPAWNED_SESSION change.
Replace staticFiles SOUL.md test with includeSkills validation.

* chore: bump version and changelog (v0.15.9.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove all Wintermute references from source files

Replace with generic "orchestrator" or "OpenClaw" as appropriate.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add Plan dispatch tier — full review gauntlet for Claude Code project planning

New gstack-plan template chains /office-hours → /autoplan (CEO + eng + design + DX
+ codex adversarial), saves the reviewed plan, and reports back to the orchestrator.
The orchestrator persists the plan link to its own memory store. 5 tiers now:
Simple, Medium, Heavy, Full, Plan.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 02:23:59 -07:00
Garry Tan 2b08cfe71e fix: close redundant PRs + friendly error on all design commands (v0.15.8.1) (#817)
* fix: friendly OpenAI org error on all design commands

Previously only generate.ts showed a user-friendly message when the
OpenAI org wasn't verified. Now evolve, iterate, variants, and check
all detect the 403 + "organization must be verified" pattern and show
a clear message with the correct verification URL.

* test: regression test for >128KB Codex session_meta

Documents the current 128KB buffer limitation. When Codex embeds
session_meta beyond 128KB, this test will fail, signaling the need
for a streaming parse or larger buffer.

* chore: bump version and changelog (v0.15.8.1)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-05 02:02:06 -07:00
Garry Tan 49eac4c299 docs: expand /pair-agent, /design-shotgun, /design-html in README
Each skill gets a real narrative paragraph explaining the workflow,
not just a table cell. design-shotgun: visual exploration with taste
memory. design-html: production HTML with Pretext computed layout.
pair-agent: cross-vendor AI agent coordination through shared browser.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 02:00:05 -07:00
Garry Tan adbcd2cb5e docs: tout /pair-agent as headline feature in CHANGELOG + README
Lead with what it does for the user: type /pair-agent, paste into
your other agent, done. First time AI agents from different companies
can coordinate through a shared browser with real security boundaries.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 01:58:12 -07:00
Garry Tan 36a20c5d59 fix: chain scope bypass + /health info leak when tunneled
1. Chain command now pre-validates ALL subcommand scopes before
   executing any. A read+meta token can no longer escalate to
   admin via chain (eval, js, cookies were dispatched without
   scope checks). tokenInfo flows through handleMetaCommand into
   the chain handler. Rejects entire chain if any subcommand fails.

2. /health strips sensitive fields (currentUrl, agent.currentMessage,
   session) when tunnel is active. Only operational metadata (status,
   mode, uptime, tabs) exposed to the internet. Previously anyone
   reaching the ngrok URL could surveil browsing activity.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 00:49:31 -07:00
Garry Tan a5b40045b8 test: comprehensive tests for auth ordering, tunnel, ngrok, headed mode
16 new tests covering:
- /command sits above blanket auth gate (Wintermute bug)
- /command uses getTokenInfo not validateAuth
- /tunnel/start requires root, checks native ngrok config, returns already_active
- /pair creates setup keys not session tokens
- Tab ownership checked before command dispatch
- Activity events include clientId
- Instruction block teaches snapshot→@ref pattern
- pair-agent auto-headed mode, process.execPath, --headless skip
- isNgrokAvailable checks all 3 sources (gstack env, env var, native config)
- handlePairAgent calls /tunnel/start not server restart

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 00:12:05 -07:00
Garry Tan e06f0a6696 feat: pair-agent auto-launches headed mode before pairing
When pair-agent detects headless mode, it auto-switches to headed
(visible Chromium window) so the user can watch what the remote
agent does. Use --headless to skip this. Fixed compiled binary
path resolution (process.execPath, not process.argv[1] which is
virtual /$bunfs/ in Bun compiled binaries).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 00:09:44 -07:00
Diego Sens 1652f224c7 fix(discover): parse Codex sessions with large session_meta (>4KB) (#798)
Merged via PR triage plan. Fixes Codex session discovery for v0.117+ with 15KB+ session_meta. Follow-up: add >128KB regression test.
2026-04-05 00:09:35 -07:00
Matt Van Horn f91ad61a15 fix: user-friendly error when OpenAI org is not verified (#776)
Merged via PR triage plan. Friendly error for unverified OpenAI org. Follow-up: expand to evolve.ts, iterate.ts, variants.ts, check.ts.
2026-04-05 00:09:32 -07:00
Garry Tan 87a3e62569 fix: scoped tokens rejected on /command — auth gate ordering bug
The blanket validateAuth() gate (root-only) sat above the /command
endpoint, rejecting all scoped tokens with 401 before they reached
getTokenInfo(). Moved /command above the gate so both root and
scoped tokens are accepted. This was the bug Wintermute hit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 00:06:52 -07:00