gstack

mirror of https://github.com/garrytan/gstack.git synced 2026-07-13 19:17:22 +02:00

Author	SHA1	Message	Date
Garry Tan	11d3928463	Merge remote-tracking branch 'origin/main' into garrytan/openclaw-browser-ctrl Resolved CHANGELOG.md conflict: kept main's 0.15.13.0 Team Mode entry on top, preserved branch's richer 0.15.12.0 Content Security entry below.	2026-04-06 00:25:52 -07:00
Garry Tan	dae251e066	feat: team-friendly gstack install mode (v0.15.7.0) (#809 ) * feat: add gstack-settings-hook for atomic Claude Code hook management DRY helper for adding/removing SessionStart hooks in ~/.claude/settings.json. Handles missing files, deduplication, malformed JSON, and atomic writes (.tmp + rename) to prevent corruption on crash or disk-full. Part of team-install-mode feature (credit: Jared Friedman). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add gstack-session-update for automatic team updates SessionStart hook target that auto-updates gstack at session start. Background fork (zero latency), throttled to once/hour, with lockfile (mkdir + PID), stale lock recovery, GIT_TERMINAL_PROMPT=0, and debug logging to ~/.gstack/analytics/session-update.log. Part of team-install-mode feature (credit: Jared Friedman). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add --team, --no-team, -q flags to setup --team enables auto_upgrade and registers SessionStart hook via gstack-settings-hook. --no-team reverses it. -q/--quiet suppresses all informational output (for hook-triggered setup runs). --local now prints a deprecation warning. Replaces ~20 echo calls with log() helper for quiet mode support. Part of team-install-mode feature (credit: Jared Friedman). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add gstack-team-init for repo-level team bootstrapping Two modes: 'optional' (gentle CLAUDE.md suggestion) and 'required' (CLAUDE.md enforcement + .claude/hooks/check-gstack.sh PreToolUse hook that blocks work without gstack installed). Atomic JSON writes, idempotent, prints git add instructions. Part of team-install-mode feature (credit: Jared Friedman). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: deprecate vendoring, document team mode, clean up uninstall - README: replace "Step 2: Add to your repo" vendoring instructions with team mode (./setup --team + gstack-team-init) - CLAUDE.md: rename "Vendored symlink awareness" to "Dev symlink awareness", add deprecation note - CONTRIBUTING.md: remove vendoring language from prefix section - bin/gstack-uninstall: clean up SessionStart hook on uninstall Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add vendoring deprecation detection to skill preamble Detects vendored gstack in CWD (.claude/skills/gstack/ that's not a symlink and has VERSION or .git). Outputs VENDORED_GSTACK: yes/no. Adds generateVendoringDeprecation() section that offers one-time migration to team mode via AskUserQuestion. Part of team-install-mode feature (credit: Jared Friedman). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files with vendoring deprecation preamble Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: team mode (v0.15.7.0) — credit Jared Friedman Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add integration tests for team mode (20 tests) Covers gstack-settings-hook (add, remove, dedup, preserve existing, atomic write), gstack-session-update (guards, throttle, non-fatal), gstack-team-init (optional, required, enforcement hook, idempotent), and setup flags (-q, --local deprecation). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 23:49:03 -07:00
Garry Tan	30d4550286	test: add source-level security tests for /batch endpoint 8 tests verifying: auth gate placement, scoped token support, max command limit, nested batch rejection, rate limiting bypass, batch-level activity events, command field validation, and tabId passthrough. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 23:40:47 -07:00
Garry Tan	11c397138d	feat: add POST /batch endpoint for multi-command batching Remote agents controlling GStack Browser through a tunnel pay 2-5s of latency per HTTP round-trip. A typical "navigate and read" takes 4 sequential commands = 10-20 seconds. The /batch endpoint collapses N commands into a single HTTP round-trip, cutting a 20-tab crawl from ~60s to ~5s. Sequential execution through the full security pipeline (scope, domain, tab ownership, content wrapping). Rate limiting counts the batch as 1 request. Activity events emitted at batch level, not per-command. Max 50 commands per batch. Nested batches rejected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 23:40:44 -07:00
Garry Tan	d384b095b1	Merge remote-tracking branch 'origin/main' into garrytan/openclaw-browser-ctrl # Conflicts: # CHANGELOG.md	2026-04-05 23:19:57 -07:00
Garry Tan	7f25d4786b	fix: verify tunnel is alive before returning URL to pair-agent Root cause: when ngrok dies externally (pkill, crash, timeout), the server still reports tunnelActive=true with a dead URL. pair-agent prints an instruction block pointing at a dead tunnel. The remote agent gets "endpoint offline" and the user has to manually restart everything. Three-layer fix: - Server /pair endpoint: probes tunnel URL before returning it. If dead, resets tunnelActive/tunnelUrl and returns null (triggers CLI restart). - Server /tunnel/start: probes cached tunnel before returning already_active. If dead, falls through to restart ngrok automatically. - CLI pair-agent: double-checks tunnel URL from server before printing instruction block. Falls through to auto-start on failure. 4 regression tests verify all three probe points + CLI verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 22:58:37 -07:00
Garry Tan	a94a64f821	fix: snapshot -i auto-detects dropdown/popover interactive elements (#845 ) * fix: snapshot -i auto-detects dropdown/popover interactive elements - Auto-enable cursor-interactive scan (-C) when -i flag is used - Add floating container detection (portals, popovers, dropdowns) - Detects position:fixed/absolute with high z-index - Recognizes data-floating-ui-portal, data-radix-* attributes - Recognizes role=listbox, role=menu containers - Elements inside floating containers bypass the hasRole skip - Catches dropdown items missed by the accessibility tree - Role=option/menuitem elements in floating containers captured even without cursor:pointer/onclick - Tag floating container items with 'popover-child' reason - Include role name in @c ref reasons when present - Add dropdown.html test fixture - Add dropdown/popover detection test suite (6 tests) - Add test: -i alone includes cursor-interactive elements Fixes: Bookface autocomplete, Radix UI combobox, React portals, and similar dynamic dropdown patterns where ariaSnapshot() misses the floating content. * chore: bump version and changelog (v0.15.12.0) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: update snapshot -i/-C flag descriptions to mention auto-enable behavior * test: strengthen clickability test guard assertions The @c ref clickability test previously used if-guards that would silently pass when no Alice line was found in the snapshot output. Both Claude and Codex adversarial review flagged this as a test that could regress without CI noticing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: regenerate top-level SKILL.md with updated flag descriptions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: root <root@localhost> Co-authored-by: gstack <ship@gstack.dev> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 22:57:45 -07:00
root	237ae2abbe	Revert "fix: snapshot -i auto-detects dropdown/popover interactive elements (#844 )" This reverts commit `542e7836d0`.	2026-04-06 03:27:13 +00:00
Garry Tan	542e7836d0	fix: snapshot -i auto-detects dropdown/popover interactive elements (#844 ) - Auto-enable cursor-interactive scan (-C) when -i flag is used - Add floating container detection (portals, popovers, dropdowns) - Detects position:fixed/absolute with high z-index - Recognizes data-floating-ui-portal, data-radix-* attributes - Recognizes role=listbox, role=menu containers - Elements inside floating containers bypass the hasRole skip - Catches dropdown items missed by the accessibility tree - Role=option/menuitem elements in floating containers captured even without cursor:pointer/onclick - Tag floating container items with 'popover-child' reason - Include role name in @c ref reasons when present - Add dropdown.html test fixture - Add dropdown/popover detection test suite (6 tests) - Add test: -i alone includes cursor-interactive elements Fixes: Bookface autocomplete, Radix UI combobox, React portals, and similar dynamic dropdown patterns where ariaSnapshot() misses the floating content. Co-authored-by: root <root@localhost>	2026-04-05 20:25:12 -07:00
Garry Tan	35bc7e34b1	docs: add security rationale for token in /health on localhost Explains why this is an accepted risk (no escalation over file-based token access), CORS protection, and tunnel guard. Prevents future CSO scans from stripping it without providing an alternative auth path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 18:05:05 -07:00
Garry Tan	160d83ec1d	test: verify /health token is localhost-only, never served through tunnel Updated tests to match the restored token behavior: - Test 1: token assignment exists AND is inside the !tunnelActive guard - Test 1b: tunnel branch (else block) does not contain AUTH_TOKEN Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 17:59:35 -07:00
Garry Tan	52226dafe2	fix: restore token in /health for localhost extension auth The CSO security fix stripped the token from /health to prevent leaking when tunneled. But the extension needs it to authenticate on localhost. Now returns token only when not tunneled (safe: localhost-only path). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 17:58:24 -07:00
Garry Tan	e8ef9a5b73	Merge remote-tracking branch 'origin/main' into garrytan/openclaw-browser-ctrl # Conflicts: # CHANGELOG.md # VERSION	2026-04-05 14:19:22 -07:00
Garry Tan	11d74038c3	test: comprehensive content security tests (47 tests) Covers all 4 defense layers: - Datamarking: marker format, session consistency, text-only application - Content envelope: wrapping, ZWSP marker escaping, filter warnings - Content filter hooks: URL blocklist, custom filters, warn/block modes - Instruction block: SECURITY section content, ordering, generation - Centralized wrapping: source-level verification of integration - Chain security: recursion guard, rate-limit exemption, activity suppression - Hidden element stripping: 7 CSS techniques, ARIA injection, false positives - Snapshot split format: scoped vs root output, resume integration Also fixes: visibility:hidden detection, case-insensitive ARIA pattern matching. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 11:25:19 -07:00
Garry Tan	5dd2491a2f	test: add 4 prompt injection test fixtures - injection-visible.html: visible injection in product review text - injection-hidden.html: 7 CSS hiding techniques + ARIA injection + false positive - injection-social.html: social engineering in legitimate-looking content - injection-combined.html: all attack types + envelope escape attempt Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 11:24:05 -07:00
Garry Tan	fbe630db36	feat: add SECURITY section to pair-agent instruction block Instructs remote agents to treat content inside untrusted envelopes as potentially malicious. Lists common injection phrases to watch for. Directs agents to only use @refs from the trusted INTERACTIVE ELEMENTS section, not from page content. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 11:23:36 -07:00
Garry Tan	617fe8073c	feat: snapshot split output format for scoped tokens Scoped tokens get a split snapshot: trusted @refs section (for click/fill) separated from untrusted web content in an envelope. Ref names truncated to 50 chars in trusted section. Root tokens unchanged (backward compat). Resume command also uses split format for scoped tokens. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 11:23:14 -07:00
Garry Tan	ec7f281a40	feat: hidden element stripping for scoped token text extraction Detects CSS-hidden elements (opacity, font-size, off-screen, same-color, clip-path) and ARIA label injection patterns. Marks elements with data-gstack-hidden, extracts text from a clean clone (no DOM mutation), then removes markers. Only active for scoped tokens on text command. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 11:08:15 -07:00
Garry Tan	5ba1472b5e	feat: centralize content wrapping in handleCommandInternal response path Single wrapping location replaces fragmented per-handler wrapping: - Scoped tokens: content filters + datamarking + enhanced envelope - Root tokens: existing basic wrapping (backward compat) - Chain subcommands exempt from top-level wrapping (wrapped individually) - Adds 'attrs' to PAGE_CONTENT_COMMANDS (ARIA value exposure defense) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 11:06:50 -07:00
Garry Tan	5184ea677b	feat: add content-security.ts with datamarking, envelope, and filter hooks Four-layer prompt injection defense for pair-agent browser sharing: - Datamarking: session-scoped watermark for text exfiltration detection - Content envelope: trust boundary wrapping with ZWSP marker escaping - Content filter hooks: extensible filter pipeline with warn/block modes - Built-in URL blocklist: requestbin, pipedream, webhook.site, etc. BROWSE_CONTENT_FILTER env var controls mode: off\|warn\|block (default: warn) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 11:05:38 -07:00
Garry Tan	905f1ddd38	refactor: split handleCommand into handleCommandInternal + HTTP wrapper Chain subcommands now route through handleCommandInternal for full security enforcement (scope, domain, tab ownership, rate limiting, content wrapping). Adds recursion guard for nested chains, rate-limit exemption for chain subcommands, and activity event suppression (1 event per chain, not per sub). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 11:05:11 -07:00
Garry Tan	e2d005c7f4	feat: OpenClaw integration v2 — prompt is the bridge (v0.15.9.0) (#816 ) * feat: add includeSkills to HostConfig + update OpenClaw config Add includeSkills allowlist field with union logic (include minus skip). Update OpenClaw to generate only 4 native methodology skills (office-hours, plan-ceo-review, investigate, retro). Remove staticFiles.SOUL.md reference (pointed to non-existent file). * feat: OpenClaw integration — gstack-lite/full generation + spawned session detection Add includeSkills filter to gen-skill-docs pipeline. Generate gstack-lite (planning discipline for spawned coding sessions) and gstack-full (complete feature pipeline) for OpenClaw host. Add OPENCLAW_SESSION env var detection in preamble for spawned session auto-detect. Update setup --host openclaw to print redirect message. * docs: OpenClaw architecture doc + regenerate all SKILL.md with spawned session detection Add docs/OPENCLAW.md with 4-tier dispatch routing and integration architecture. Generate gstack-lite and gstack-full prompt templates. Regenerate all SKILL.md files with OPENCLAW_SESSION env var check in preamble. * test: update golden baselines + OpenClaw includeSkills tests Update golden SKILL.md baselines for preamble SPAWNED_SESSION change. Replace staticFiles SOUL.md test with includeSkills validation. * chore: bump version and changelog (v0.15.9.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove all Wintermute references from source files Replace with generic "orchestrator" or "OpenClaw" as appropriate. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Plan dispatch tier — full review gauntlet for Claude Code project planning New gstack-plan template chains /office-hours → /autoplan (CEO + eng + design + DX + codex adversarial), saves the reviewed plan, and reports back to the orchestrator. The orchestrator persists the plan link to its own memory store. 5 tiers now: Simple, Medium, Heavy, Full, Plan. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 02:23:59 -07:00
Garry Tan	36a20c5d59	fix: chain scope bypass + /health info leak when tunneled 1. Chain command now pre-validates ALL subcommand scopes before executing any. A read+meta token can no longer escalate to admin via chain (eval, js, cookies were dispatched without scope checks). tokenInfo flows through handleMetaCommand into the chain handler. Rejects entire chain if any subcommand fails. 2. /health strips sensitive fields (currentUrl, agent.currentMessage, session) when tunnel is active. Only operational metadata (status, mode, uptime, tabs) exposed to the internet. Previously anyone reaching the ngrok URL could surveil browsing activity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 00:49:31 -07:00
Garry Tan	a5b40045b8	test: comprehensive tests for auth ordering, tunnel, ngrok, headed mode 16 new tests covering: - /command sits above blanket auth gate (Wintermute bug) - /command uses getTokenInfo not validateAuth - /tunnel/start requires root, checks native ngrok config, returns already_active - /pair creates setup keys not session tokens - Tab ownership checked before command dispatch - Activity events include clientId - Instruction block teaches snapshot→@ref pattern - pair-agent auto-headed mode, process.execPath, --headless skip - isNgrokAvailable checks all 3 sources (gstack env, env var, native config) - handlePairAgent calls /tunnel/start not server restart Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 00:12:05 -07:00
Garry Tan	e06f0a6696	feat: pair-agent auto-launches headed mode before pairing When pair-agent detects headless mode, it auto-switches to headed (visible Chromium window) so the user can watch what the remote agent does. Use --headless to skip this. Fixed compiled binary path resolution (process.execPath, not process.argv[1] which is virtual /$bunfs/ in Bun compiled binaries). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 00:09:44 -07:00
Garry Tan	87a3e62569	fix: scoped tokens rejected on /command — auth gate ordering bug The blanket validateAuth() gate (root-only) sat above the /command endpoint, rejecting all scoped tokens with 401 before they reached getTokenInfo(). Moved /command above the gate so both root and scoped tokens are accepted. This was the bug Wintermute hit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 00:06:52 -07:00
Garry Tan	da624aa554	feat: on-demand tunnel start via POST /tunnel/start pair-agent now auto-starts the ngrok tunnel without restarting the server. New POST /tunnel/start endpoint reads authtoken from env, ~/.gstack/ngrok.env, or ngrok's native config. CLI detects ngrok availability and calls the endpoint automatically. Zero manual steps when ngrok is installed and authed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-05 00:01:41 -07:00
Garry Tan	7ed3b12854	feat: smart ngrok detection + auto-tunnel in pair-agent The pair-agent command now checks ngrok's native config (not just ~/.gstack/ngrok.env) and auto-starts the tunnel when ngrok is available. The skill template walks users through ngrok install and auth if not set up, instead of just printing a dead localhost URL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 23:59:39 -07:00
Garry Tan	376814c3f9	feat: improved instruction block with snapshot→@ref pattern The paste-into-agent instruction block now teaches the snapshot→@ref workflow (the most powerful browsing pattern), shows the server URL prominently, and uses clearer formatting. Tests updated to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 23:53:52 -07:00
Garry Tan	cd85bdc196	fix: CSO security fixes — token leak, domain bypass, input validation 1. Remove root token from /health endpoint entirely (CSO #1 CRITICAL). Origin header is spoofable. Extension reads from ~/.gstack/.auth.json. 2. Add domain check for newtab URL (CSO #5). Previously only goto was checked, allowing domain-restricted agents to bypass via newtab. 3. Validate scope values, rateLimit, expiresSeconds in createToken() (CSO #4). Rejects invalid scopes and negative values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 23:37:36 -07:00
Garry Tan	28b7301c6f	merge: resolve conflicts with main (adopt chrome-extension origin gating)	2026-04-04 23:20:15 -07:00
Garry Tan	32abe70047	test: tab isolation + instruction block generator tests 14 tests covering tab ownership lifecycle (access checks, unowned tabs, transferTab) and instruction block generator (scopes, URLs, admin flag, troubleshooting section). Fix server-auth test that used fragile sliceBetween boundaries broken by new endpoints. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 23:18:37 -07:00
Garry Tan	fafe68b44c	feat: pair-agent CLI command + instruction block generator One command to pair a remote agent: $B pair-agent. Creates a setup key via POST /pair, prints a copy-pasteable instruction block with curl commands. Smart tunnel fallback (tunnel URL > auto-start > localhost). Flags: --for HOST, --local HOST, --admin, --client NAME. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 23:18:32 -07:00
Garry Tan	eb6f57239b	feat: tab enforcement + POST /pair endpoint + activity attribution Server-side tab ownership check blocks scoped agents from writing to unowned tabs. Special-case newtab records ownership for scoped tokens. POST /pair endpoint creates setup keys for the pairing ceremony. Activity events now include clientId for attribution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 23:18:27 -07:00
Garry Tan	8fa3d7b06d	feat: tab isolation for multi-agent browser access Add per-tab ownership tracking to BrowserManager. Scoped agents must create their own tab via newtab before writing. Unowned tabs (pre-existing, user-opened) are root-only for writes. Read access always allowed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 23:18:23 -07:00
Garry Tan	115d81d792	fix: security wave 1 — 14 fixes for audit #783 (v0.15.7.0) (#810 ) * fix: DNS rebinding protection checks AAAA (IPv6) records too Cherry-pick PR #744 by @Gonzih. Closes the IPv6-only DNS rebinding gap by checking both A and AAAA records independently. Co-Authored-By: Gonzih <gonzih@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: validateOutputPath symlink bypass — resolve real path before safe-dir check Cherry-pick PR #745 by @Gonzih. Adds a second pass using fs.realpathSync() to resolve symlinks after lexical path validation. Co-Authored-By: Gonzih <gonzih@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: validate saved URLs before navigation in restoreState Cherry-pick PR #751 by @Gonzih. Prevents navigation to cloud metadata endpoints or file:// URIs embedded in user-writable state files. Co-Authored-By: Gonzih <gonzih@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: telemetry-ingest uses anon key instead of service role key Cherry-pick PR #750 by @Gonzih. The service role key bypasses RLS and grants unrestricted database access — anon key + RLS is the right model for a public telemetry endpoint. Co-Authored-By: Gonzih <gonzih@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: killAgent() actually kills the sidebar claude subprocess Cherry-pick PR #743 by @Gonzih. Implements cross-process kill signaling via kill-file + polling pattern, tracks active processes per-tab. Co-Authored-By: Gonzih <gonzih@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(design): bind server to localhost and validate reload paths Cherry-pick PR #803 by @garagon. Adds hostname: '127.0.0.1' to Bun.serve() and validates /api/reload paths are within cwd() or tmpdir(). Closes C1+C2 from security audit #783. Co-Authored-By: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add auth gate to /inspector/events SSE endpoint (C3) The /inspector/events endpoint had no authentication, unlike /activity/stream which validates tokens. Now requires the same Bearer header or ?token= query param check. Closes C3 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sanitize design feedback with trust boundary markers (C4+H5) Wrap user feedback in <user-feedback> XML markers with tag escaping to prevent prompt injection via malicious feedback text. Cap accumulated feedback to last 5 iterations to limit incremental poisoning. Closes C4 and H5 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden file/directory permissions to owner-only (C5+H9+M9+M10) Add mode 0o700 to all mkdirSync calls for state/session directories. Add mode 0o600 to all writeFileSync calls for session.json, chat.jsonl, and log files. Add umask 077 to setup script. Prevents auth tokens, chat history, and browser logs from being world-readable on multi-user systems. Closes C5, H9, M9, M10 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: TOCTOU race in setup symlink creation (C6) Remove the existence check before mkdir -p (it's idempotent) and validate the target isn't already a symlink before creating the link. Prevents a local attacker from racing between the check and mkdir to redirect SKILL.md writes. Closes C6 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove CORS wildcard, restrict to localhost (H1) Replace Access-Control-Allow-Origin: * with http://127.0.0.1 on sidebar tab/chat endpoints. The Chrome extension uses manifest host_permissions to bypass CORS entirely, so this only blocks malicious websites from making cross-origin requests. Closes H1 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: make cookie picker auth mandatory (H2) Remove the conditional if(authToken) guard that skipped auth when authToken was undefined. Now all cookie picker data/action routes reject unauthenticated requests. Closes H2 from security audit #783. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: gate /health token on chrome-extension Origin header Only return the auth token in /health response when the request Origin starts with chrome-extension://. The Chrome extension always sends this origin via manifest host_permissions. Regular HTTP requests (including tunneled ones from ngrok/SSH) won't get the token. The extension also has a fallback path through background.js that reads the token from the state file directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: update server-auth test for chrome-extension Origin gating The test previously checked for 'localhost-only' comment. Now checks for 'chrome-extension://' since the token is gated on Origin header. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.7.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Gonzih <gonzih@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: garagon <garagon@users.noreply.github.com>	2026-04-04 22:12:04 -07:00
Garry Tan	bc3ca4b786	feat: ngrok tunnel integration + @ngrok/ngrok dependency BROWSE_TUNNEL=1 env var starts an ngrok tunnel after Bun.serve(). Reads NGROK_AUTHTOKEN from env or ~/.gstack/ngrok.env. Reads NGROK_DOMAIN for dedicated domain (stable URL). Updates state file with tunnel URL. Feasibility spike confirmed: SDK works in compiled Bun binary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 16:55:01 -07:00
Garry Tan	385748a767	feat: integrate token registry + scoped auth into browse server Server changes for multi-agent browser access: - /connect endpoint: setup key exchange for /pair-agent ceremony - /token endpoint: root-only minting of scoped sub-tokens - /token/:clientId DELETE: revoke agent tokens - /agents endpoint: list connected agents (root-only) - /health: strips root token when tunnel is active (P0 security fix) - /command: scope/rate/domain checks via token registry before dispatch - Idle timer skips shutdown when tunnel is active Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 16:54:12 -07:00
Garry Tan	3a165d5279	feat: token registry for multi-agent browser access Per-agent scoped tokens with read/write/admin/meta command categories, domain glob restrictions, rate limiting, expiry, and revocation. Setup key exchange for the /pair-agent ceremony (5-min one-time key → 24h session token). Idempotent exchange handles tunnel drops. 39 tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 16:47:34 -07:00
Garry Tan	447851452a	feat: interactive /plan-devex-review + plan mode skill fix (v0.15.5.0) (#796 ) * fix: skill invocation during plan mode takes precedence over generic plan mode Adds a "Skill Invocation During Plan Mode" section to the preamble resolver so all generated SKILL.md files include it. Fixes a bug where Claude treats loaded skill content as reference material instead of executable instructions, and keeps trying to ExitPlanMode instead of following the skill workflow step by step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: interactive /plan-devex-review with persona, benchmarks, and forcing questions Complete rewrite of the DX review skill to match CEO/eng review depth. New flow: investigate (persona, empathy, competitors, magical moment, journey tracing) then force decisions, then score with evidence. Three modes: DX EXPANSION, DX POLISH, DX TRIAGE. 20-45 interactive STOP points vs 10-12 before. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: autoplan DX POLISH mode + review log schema for new devex fields Adds mode selection, persona, competitive, and magical moment override rules to autoplan Phase 3.5. Documents new review log fields (mode, persona, competitive_tier) in the plan-file-review-report schema. Syncs package.json version to VERSION. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.15.5.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 14:36:23 -07:00
Garry Tan	3f080de1b7	feat: GStack Browser — double-click AI browser with anti-bot stealth (#695 ) * feat: CDP inspector module — persistent sessions, CSS cascade, style modification New browse/src/cdp-inspector.ts with full CDP inspection engine: - inspectElement() via CSS.getMatchedStylesForNode + DOM.getBoxModel - modifyStyle() via CSS.setStyleTexts with headless page.evaluate fallback - Persistent CDP session lifecycle (create, reuse, detach on nav, re-create) - Specificity sorting, overridden property detection, UA rule filtering - Modification history with undo support - formatInspectorResult() for CLI output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: browse server inspector endpoints + inspect/style/cleanup/prettyscreenshot CLI Server endpoints: POST /inspector/pick, GET /inspector, POST /inspector/apply, POST /inspector/reset, GET /inspector/history, GET /inspector/events (SSE). CLI commands: inspect (CDP cascade), style (live CSS mod), cleanup (page clutter removal), prettyscreenshot (clean screenshot pipeline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar CSS inspector — element picker, box model, rule cascade, quick edit Extension changes for the visual CSS inspector: - inspector.js: element picker with hover highlight, CSS selector generation, basic mode fallback (getComputedStyle + CSSOM), page alteration handlers - inspector.css: picker overlay styles (blue highlight + tooltip) - background.js: inspector message routing (picker <-> server <-> sidepanel) - sidepanel: Inspector tab with box model viz (gstack palette), matched rules with specificity badges, computed styles, click-to-edit quick edit, Send to Agent/Code button, empty/loading/error states Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: document inspect, style, cleanup, prettyscreenshot browse commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: auto-track user-created tabs and handle tab close browser-manager.ts changes: - context.on('page') listener: automatically tracks tabs opened by the user (Cmd+T, right-click open in new tab, window.open). Previously only programmatic newTab() was tracked, so user tabs were invisible. - page.on('close') handler in wirePageEvents: removes closed tabs from the pages map and switches activeTabId to the last remaining tab. - syncActiveTabByUrl: match Chrome extension's active tab URL to the correct Playwright page for accurate tab identity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: per-tab agent isolation via BROWSE_TAB environment variable Prevents parallel sidebar agents from interfering with each other's tab context. Three-layer fix: - sidebar-agent.ts: passes BROWSE_TAB=<tabId> env var to each claude process, per-tab processing set allows concurrent agents across tabs - cli.ts: reads process.env.BROWSE_TAB and includes tabId in command request body - server.ts: handleCommand() temporarily switches activeTabId when tabId is present, restores after command completes (safe: Bun event loop is single-threaded) Also: per-tab agent state (TabAgentState map), per-tab message queuing, per-tab chat buffers, verbose streaming narration, stop button endpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar per-tab chat context, tab bar sync, stop button, UX polish Extension changes: - sidepanel.js: per-tab chat history (tabChatHistories map), switchChatTab() swaps entire chat view, browserTabActivated handler for instant tab sync, stop button wired to /sidebar-agent/stop, pollTabs renders tab bar - sidepanel.html: updated banner text ("Browser co-pilot"), stop button markup, input placeholder "Ask about this page..." - sidepanel.css: tab bar styles, stop button styles, loading state fixes - background.js: chrome.tabs.onActivated sends browserTabActivated to sidepanel with tab URL for instant tab switch detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: per-tab isolation, BROWSE_TAB pinning, tab tracking, sidebar UX sidebar-agent.test.ts (new tests): - BROWSE_TAB env var passed to claude process - CLI reads BROWSE_TAB and sends tabId in body - handleCommand accepts tabId, saves/restores activeTabId - Tab pinning only activates when tabId provided - Per-tab agent state, queue, concurrency - processingTabs set for parallel agents sidebar-ux.test.ts (new tests): - context.on('page') tracks user-created tabs - page.on('close') removes tabs from pages map - Tab isolation uses BROWSE_TAB not system prompt hack - Per-tab chat context in sidepanel - Tab bar rendering, stop button, banner text Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve merge conflicts — keep security defenses + per-tab isolation Merged main's security improvements (XML escaping, prompt injection defense, allowed commands whitelist, --model opus, Write tool, stderr capture) with our branch's per-tab isolation (BROWSE_TAB env var, processingTabs set, no --resume). Updated test expectations for expanded system prompt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.13.9.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add inspector message types to background.js allowlist Pre-existing bug found by Codex: ALLOWED_TYPES in background.js was missing all inspector message types (startInspector, stopInspector, elementPicked, pickerCancelled, applyStyle, toggleClass, injectCSS, resetAll, inspectResult). Messages were silently rejected, making the inspector broken on ALL pages. Also: separate executeScript and insertCSS into individual try blocks in injectInspector(), store inspectorMode for routing, and add content.js fallback when script injection fails (CSP, chrome:// pages). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: basic element picker in content.js for CSP-restricted pages When inspector.js can't be injected (CSP, chrome:// pages), content.js provides a basic picker using getComputedStyle + CSSOM: - startBasicPicker/stopBasicPicker message handlers - captureBasicData() with ~30 key CSS properties, box model, matched rules - Hover highlight with outline save/restore (never leaves artifacts) - Click uses e.target directly (no re-querying by selector) - Sends inspectResult with mode:'basic' for sidebar rendering - Escape key cancels picker and restores outlines Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: cleanup + screenshot buttons in sidebar inspector toolbar Two action buttons in the inspector toolbar: - Cleanup (🧹): POSTs cleanup --all to server, shows spinner, chat notification on success, resets inspector state (element may be removed) - Screenshot (📸): POSTs screenshot to server, shows spinner, chat notification with saved file path Shared infrastructure: - .inspector-action-btn CSS with loading spinner via ::after pseudo-element - chat-notification type in addChatEntry() for system messages - package.json version bump to 0.13.9.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: inspector allowlist, CSP fallback, cleanup/screenshot buttons 16 new tests in sidebar-ux.test.ts: - Inspector message allowlist includes all inspector types - content.js basic picker (startBasicPicker, captureBasicData, CSSOM, outline save/restore, inspectResult with mode basic, Escape cleanup) - background.js CSP fallback (separate try blocks, inspectorMode, fallback) - Cleanup button (POST /command, inspector reset after success) - Screenshot button (POST /command, notification rendering) - Chat notification type and CSS styles Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.13.9.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: cleanup + screenshot buttons in chat toolbar (not just inspector) Quick actions toolbar (🧹 Cleanup, 📸 Screenshot) now appears above the chat input, always visible. Both inspector and chat buttons share runCleanup() and runScreenshot() helper functions. Clicking either set shows loading state on both simultaneously. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: chat toolbar buttons, shared helpers, quick-action-btn styles Tests that chat toolbar exists (chat-cleanup-btn, chat-screenshot-btn, quick-actions container), CSS styles (.quick-action-btn, .quick-action-btn.loading), shared runCleanup/runScreenshot helper functions, and cleanup inspector reset. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: aggressive cleanup heuristics — overlays, scroll unlock, blur removal Massively expanded CLEANUP_SELECTORS with patterns from uBlock Origin and Readability.js research: - ads: 30+ selectors (Google, Amazon, Outbrain, Taboola, Criteo, etc.) - cookies: OneTrust, Cookiebot, TrustArc, Quantcast + generic patterns - overlays (NEW): paywalls, newsletter popups, interstitials, push prompts, app download banners, survey modals - social: follow prompts, share tools - Cleanup now defaults to --all when no args (sidebar button fix) - Uses !important on all display:none (overrides inline styles) - Unlocks body/html scroll (overflow:hidden from modal lockout) - Removes blur/filter effects (paywall content blur) - Removes max-height truncation (article teaser truncation) - Collapses empty ad placeholder whitespace (empty divs after ad removal) - Skips gstack-ctrl indicator in sticky removal Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: disable action buttons when disconnected, no error spam - setActionButtonsEnabled() toggles .disabled class on all cleanup/screenshot buttons (both chat toolbar and inspector toolbar) - Called with false in updateConnection when server URL is null - Called with true when connection established - runCleanup/runScreenshot silently return when disconnected instead of showing 'Not connected' error notifications - CSS .disabled style: pointer-events:none, opacity:0.3, cursor:not-allowed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: cleanup heuristics, button disabled state, overlay selectors 17 new tests: - cleanup defaults to --all on empty args - CLEANUP_SELECTORS overlays category (paywall, newsletter, interstitial) - Major ad networks in selectors (doubleclick, taboola, criteo, etc.) - Major consent frameworks (OneTrust, Cookiebot, TrustArc, Quantcast) - !important override for inline styles - Scroll unlock (body overflow:hidden) - Blur removal (paywall content blur) - Article truncation removal (max-height) - Empty placeholder collapse - gstack-ctrl indicator skip in sticky cleanup - setActionButtonsEnabled function - Buttons disabled when disconnected - No error spam from cleanup/screenshot when disconnected - CSS disabled styles for action buttons Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: LLM-based page cleanup — agent analyzes page semantically Instead of brittle CSS selectors, the cleanup button now sends a prompt to the sidebar agent (which IS an LLM). The agent: 1. Runs deterministic $B cleanup --all as a quick first pass 2. Takes a snapshot to see what's left 3. Analyzes the page semantically to identify remaining clutter 4. Removes elements intelligently, preserving site branding This means cleanup works correctly on any site without site-specific selectors. The LLM understands that "Your Daily Puzzles" is clutter, "ADVERTISEMENT" is junk, but the SF Chronicle masthead should stay. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: aggressive cleanup heuristics + preserve top nav bar Deterministic cleanup improvements (used as first pass before LLM analysis): - New 'clutter' category: audio players, podcast widgets, sidebar puzzles/games, recirculation widgets (taboola, outbrain, nativo), cross-promotion banners - Text-content detection: removes "ADVERTISEMENT", "Article continues below", "Sponsored", "Paid content" labels and their parent wrappers - Sticky fix: preserves the topmost full-width element near viewport top (site nav bar) instead of hiding all sticky/fixed elements. Sorts by vertical position, preserves the first one that spans >80% viewport width. Tests: clutter category, ad label removal, nav bar preservation logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: LLM-based cleanup architecture, deterministic heuristics, sticky nav 22 new tests covering: - Cleanup button uses /sidebar-command (agent) not /command (deterministic) - Cleanup prompt includes deterministic first pass + agent snapshot analysis - Cleanup prompt lists specific clutter categories for agent guidance - Cleanup prompt preserves site identity (masthead, headline, body, byline) - Cleanup prompt instructs scroll unlock and $B eval removal - Loading state management (async agent, setTimeout) - Deterministic clutter: audio/podcast, games/puzzles, recirculation - Ad label text patterns (ADVERTISEMENT, Sponsored, Article continues) - Ad label parent wrapper hiding for small containers - Sticky nav preservation (sort by position, first full-width near top) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: GStack Browser stealth + branding — anti-bot patches, custom UA, rebrand - Add GSTACK_CHROMIUM_PATH env var for custom Chromium binary - Add BROWSE_EXTENSIONS_DIR env var for extension path override - Move auth token to /health endpoint (fixes read-only .app bundles) - Anti-bot stealth: disable navigator.webdriver, fake plugins, languages - Custom user agent: Chrome/<version> GStackBrowser (auto-detects version) - Rebrand Chromium plist to "GStack Browser" at launch time - Update security test to match new token-via-health approach * feat: GStack Browser .app bundle — launcher script + build system - scripts/app/gstack-browser: dual-mode launcher (dev + .app bundle) - scripts/build-app.sh: compiles binary, bundles Chromium + extension, creates DMG - Rebrands Chromium plist during build for "GStack Browser" in menu bar - 389MB .app, 189MB compressed DMG, launches in ~5s * docs: GStack Browser V0 master plan — AI-native development browser vision 5-phase roadmap from .app wrapper through Chromium fork, 9 capability visions, competitive landscape, architecture diagrams, design system. * fix: restore package.json and sync version to 0.14.3.0 * chore: bump version and changelog (v0.14.4.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: gitignore top-level dist/ (GStack Browser build output) * feat: GStack Browser icon — custom .icns replaces Chromium's Dock icon - Generated 1024px icon: dark terminal window with amber prompt cursor - Converted to .icns with all macOS sizes (16-1024px, 1x and 2x) - build-app.sh copies icon into both the outer .app and bundled Chromium's Resources (Chromium's process owns the Dock icon, not the launcher) - browser-manager.ts patches Chromium's icon at runtime for dev mode too - Both the Dock and Cmd+Tab now show the GStack icon * feat: rename /connect-chrome → /open-gstack-browser - Rename skill directory + update frontmatter name and description - Update SKILL.md.tmpl to reference GStack Browser branding/stealth - Create connect-chrome symlink for backwards compatibility - Setup script creates /connect-chrome alias in .claude/skills/ - Fix package.json version sync (0.14.5.0 → 0.14.6.0) * feat: rename /connect-chrome → /open-gstack-browser across all references Update README skill lists, docs/skills.md deep dive, extension sidepanel banner copy button, and reconnect clipboard text. * feat: left-align sidebar UI + extension-ready event for welcome page - Left-align all sidebar text (chat welcome, loading, empty states, notifications, inspector empty, session placeholder) - Dispatch 'gstack-extension-ready' CustomEvent from content.js so the welcome page can detect when the sidebar is active * chore: add GStack Browser TODOs — CDP stealth patches + Chromium fork P1: rebrowser-style postinstall patcher for Playwright 1.58.2 (suppress Runtime.enable, addBinding context discovery, 6 files, ~200 lines). P2: long-term Chromium fork for permanent stealth + native sidebar. * chore: regenerate open-gstack-browser/SKILL.md from template Fix timeline skill name (connect-chrome → open-gstack-browser) and preamble formatting from merge with main's updated template. * feat: welcome page served from browse server on headed launch - Add /welcome endpoint to server.ts, serves welcome.html - Navigate to /welcome after server starts (not during launchHeaded, which runs before the server is listening) - welcome.html bundled in browse/src/ for portability * feat: auto-open sidebar on every browser launch, not just first install - Add top-level setTimeout in background.js that fires on every service worker startup (onInstalled only fires on install/update) - Remove misaligned arrow from welcome page, replace with text fallback that hides when extension content script fires gstack-extension-ready * fix: sidebar auto-open retry with backoff + welcome page tests - Replace single-attempt sidePanel.open() with autoOpenSidePanel() that retries up to 5 times with 500ms-5000ms backoff - Fire on both onInstalled AND every service worker startup - Remove misaligned arrow from welcome page, replace with text fallback - Add 12 tests: welcome page structure, /welcome endpoint, headed launch navigation timing, sidebar auto-open retry logic, extension-ready event * feat: reload button in sidebar footer Adds a "reload" button next to "debug" and "clear" in the sidebar footer. Calls location.reload() to fully refresh the side panel, re-run connection logic, and clear stale state. * feat: right-pointing arrow hint for sidebar on welcome page Replace invisible text fallback with visible amber bubble + animated right arrow (→) pointing toward where the sidebar opens. Always correct regardless of window size (unlike the old up arrow at toolbar chrome). * fix: sidebar auth race — pass token in getPort response The sidebar called tryConnect() → getPort → got {port, connected} but NO token. All subsequent requests (SSE, chat poll) failed with 401. The token only arrived later via the health broadcast, but by then the SSE connection was already broken. Fix: include authToken in the getPort response so the sidebar has the token from its very first connection attempt. * feat: sidebar debug visibility + auth race tests - Show attempt count in loading screen ("Connecting... attempt 3") - After 5 failed attempts, show debug details (port, connected, token) so stuck users can see exactly what's failing - Add 4 tests: getPort includes token, tryConnect uses token, dead state exists with MAX_RECONNECT_ATTEMPTS, reconnectAttempts visible * fix: startup health check retries every 1s instead of 10s Root cause: extension service worker starts before Bun.serve() is listening. First checkHealth() fails, next attempt is 10 seconds later. User stares at "Connecting..." for 10 seconds. Fix: retry every 1s for up to 15 attempts on startup, then switch to 10s polling once connected (or after 15s gives up). Sidebar should connect within 1-2 seconds of server becoming available. 3 new tests verify the fast-retry → slow-poll transition. * feat: detailed step-by-step status in sidebar loading screen Replace useless "Connecting..." with real-time debug info: - "Looking for browse server... (attempt N)" - Shows port, server responding status, token status - Shows chrome.runtime errors if extension messaging fails - Tells user to run /open-gstack-browser if server not found * fix: sidebar connects directly to /health instead of waiting for background Root cause: sidepanel asked background "are you connected?" but background's health check hadn't succeeded yet (1-10s gap). Sidepanel waited forever. Fix: when background says not connected, sidepanel hits /health directly with fetch(). Gets the token from the response. Bypasses background entirely for initial connection. Shows step-by-step debug info: "Checking server directly... port: 34567 / Trying GET /health..." * fix: suppress fake "session ended" and timeout errors in sidebar Two issues making the sidebar look broken when it's actually working: 1. "Timed out after 300s" error displayed after agent_done — this is a cleanup timer, not a real error. Now suppressed when no active session. 2. "(session ended)" text appended on every idle poll — removed entirely. The thinking spinner is cleaned up silently instead. * fix: sidebar agent passes BROWSE_PORT to child claude Ensures the child claude process connects to the existing headed browse server (port 34567) instead of spawning a new headless one. Without this, sidebar chat commands run in an invisible browser. * feat: BROWSE_NO_AUTOSTART prevents sidebar from spawning headless browser When set, the browse CLI refuses to start a new server and exits with a clear error: "Server not available, run /open-gstack-browser to restart." The sidebar agent sets this so users never get an invisible headless browser when the headed one is closed. * test: BROWSE_NO_AUTOSTART guard in CLI + sidebar-agent env vars 5 tests: CLI checks env var before starting server, shows actionable error, sidebar-agent sets the flag + BROWSE_PORT, guard runs before lock acquisition to prevent stale lock files. * fix: stale auth token causes Unauthorized + invisible error text background.js checkHealth() never refreshed authToken from /health responses, so when the browse server restarted with a new token, all sidebar-command requests got 401 Unauthorized forever. Also: error placeholder text was #3f3f46 on #0C0C0C (nearly invisible). Now shows in red to match the error border. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: replace 40+ silent catch blocks with debug logging Every empty catch {} in sidepanel.js, sidebar-agent.ts now logs with [gstack sidebar] or [sidebar-agent] prefix. Chat poll 401s, stop agent, tab poll, clear chat, SSE parse, refs fetch, stream JSON parse, queue read/parse, process kill — all now visible in console. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: noisy debug logging + auto model routing in browse server Server-side silent catch blocks (22 instances) now log with [browse] prefix: chat persistence, session save/load, agent kill, tab pin/restore, welcome page, buffer flush, worktree cleanup, lock files, SSE streams. Also adds pickSidebarModel() — routes sidebar messages to sonnet for navigation/interaction (click, goto, fill, screenshot) and opus for analysis/comprehension (summarize, describe, find bugs). Sonnet is ~4x faster for action commands with zero quality difference. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: update sidebar tests for model router + longer stopAgent slice - stopAgent slice 800→1000 to accommodate added error logging lines - Replace hardcoded opus assertion with model router assertions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sidebar arrow hint stays visible until sidebar actually opens Previously the welcome page arrow hid immediately when the extension's content script loaded — but extension loaded ≠ sidebar open. Now the signal flow is: sidepanel connects → tells background.js → relays to content script → dispatches gstack-extension-ready → arrow hides. Adds welcome-page.test.ts: 14 tests verifying arrow, branding, feature cards, dark theme, and auto-hide behavior via real HTTP server. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: arrow hide signal chain (4-step) + stale session-ended assertion 8 new tests verify the sidebarOpened → background → content → welcome signal chain. Updates stale "(session ended)" test that checked for text removed in a prior commit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: preserve optimistic UI during tab switch on first message When the user sends a message and the server assigns it to a new tab (because Chrome's active tab changed), switchChatTab() was blowing away the optimistic user bubble and thinking dots with a welcome screen. Now preserves the current DOM if we're mid-send with a thinking indicator. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: sidebar message flow architecture doc + CLAUDE.md pointer SIDEBAR_MESSAGE_FLOW.md documents the full init timeline, message flow (user types → claude responds), auth token chain, arrow hint signal chain, model routing, tab concurrency, and known failure modes. CLAUDE.md now tells you to read it before touching sidebar files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sidebar chat resets idle timer + shutdown kills sidebar-agent Two fixes for the "browser died while chatting" problem: 1. /sidebar-command now calls resetIdleTimer(). Previously only CLI commands reset it, so the server would shut down after 30 min even while the user was actively chatting in the sidebar. 2. shutdown() now pkills the sidebar-agent daemon. Previously the agent survived server shutdown, kept polling a dead server, and spawned confused claude processes that auto-started headless browsers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: disable idle timeout in headed mode — browser lives until closed The 30-minute idle timeout only applies to headless mode now. In headed mode the user is looking at the Chrome window, so auto-shutdown is wrong. The browser stays alive until explicit disconnect or window close. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: cookies button in sidebar footer opens cookie picker One-click cookie import from the sidebar. Navigates the headed browser to /cookie-picker where you can select which domains to import from your real Chrome profile. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for GStack Browser improvements README.md: updated Real browser mode and sidebar agent sections with model routing, cookie import button, no idle timeout in headed mode. Updated skill table entries for /browse and /open-gstack-browser. docs/skills.md: updated /open-gstack-browser deep dive with model routing and cookie import details. GSTACK_BROWSER_V0.md: added 6 new SHIPPED items to implementation status table (model routing, debug logging, idle timeout, cookie button, arrow hint, architecture doc). TODOS.md: marked "Sidebar agent Write tool + error visibility" as SHIPPED. Added new P2 TODO for direct API calls to eliminate claude -p startup tax. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Claude Code terminal example to welcome page TRY IT NOW Fifth example shows the parent agent workflow: navigate, extract CSS, write to file. The other four are all sidebar-only. This one shows co-presence — the Claude Code session that launched the browser can also control it directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: hide internal tool-result file reads from sidebar activity Claude reads its own ~/.claude/projects/.../tool-results/ files as internal plumbing. These showed up as long unreadable paths in the sidebar. Now: describeToolCall returns empty for tool-result reads, and the sidebar skips rendering tool_use entries with no description. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: collapse tool calls into "See reasoning" disclosure on completion While the agent is working, tool calls stream live so you can watch progress. When the agent finishes, all tool calls collapse into a "See reasoning (N steps)" disclosure. Click to expand and see what the agent did. The final text answer stays visible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: 17 new tests for recent sidebar fixes Covers: tool-result file filtering, empty tool_use skip, reasoning disclosure collapse, idle timeout headed mode bypass, sidebar-command idle reset, shutdown sidebar-agent kill, cookie button, and model routing analysis-before-action priority. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: move cookies button to quick actions toolbar Cookies now sits next to Cleanup and Screenshot as a primary action button (🍪 Cookies) instead of buried in the footer. Same behavior, more discoverable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add instructional text to cookie picker page "Select the domains of cookies you want to import to GStack Browser. You'll be able to browse those sites with the same login as your other browser." Also fixes stale test that expected hardcoded '--model', 'opus'. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: 6-card welcome page with cookie import + dual-agent cards 3x2 grid layout (was 2x2). New cards: "Import your cookies" (click 🍪 Cookies to import login sessions from Chrome/Arc/Brave) and "Or use your main agent" (your Claude Code terminal also controls this browser). Responsive: 3 cols > 2 cols > 1 col. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: move sidebar arrow hint to top-right instead of vertically centered The arrow was centered vertically which put it behind the feature cards. Now positioned at top: 80px where there's open space and it's more visible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 10:17:05 -07:00
Garry Tan	be96ff5ce7	feat: /plan-devex-review + /devex-review — DX review skills (v0.15.3.0) (#784 ) * feat: add DX framework resolver for shared principles and scoring rubric New {{DX_FRAMEWORK}} resolver provides compact (~150 lines) shared content for /plan-devex-review and /devex-review: Addy Osmani's 8 DX principles, 7 characteristics table, 10 cognitive patterns, scoring rubric, and TTHW benchmarks. Hall of Fame examples loaded on-demand per pass to avoid bloat. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add DX Review row to review dashboard Adds plan-devex-review and devex-review schema entries to the review dashboard resolver and placeholder table in the preamble. All existing SKILL.md files regenerated to include the new DX Review row. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /plan-devex-review skill — DX plan review with Osmani framework Plan-stage developer experience review. Rates 8 DX dimensions 0-10: getting started, API/CLI/SDK design, error messages, docs, upgrade path, dev environment, community, and DX measurement. Includes developer empathy simulation, auto-detect product type with applicability gate, DX scorecard with trend tracking, and a conditional Claude Code Skill DX checklist. Hall of Fame examples loaded on-demand per pass from dx-hall-of-fame.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /devex-review skill — live DX audit with browse Live-system developer experience audit using browse tool. Tests all 8 dimensions aligned with /plan-devex-review for boomerang comparison (plan said 3 min TTHW, reality says 8). Each dimension marked TESTED, INFERRED, or N/A with evidence. Scope-aware: declares what browse can and cannot test, falls back to file artifacts for untestable dimensions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.3.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-03 16:22:57 -07:00
Garry Tan	562a67503a	feat: Session Intelligence Layer — /checkpoint + /health + context recovery (v0.15.0.0) (#733 ) * feat: session timeline binaries (gstack-timeline-log + gstack-timeline-read) New binaries for the Session Intelligence Layer. gstack-timeline-log appends JSONL events to ~/.gstack/projects/$SLUG/timeline.jsonl. gstack-timeline-read reads, filters, and formats timeline data for /retro consumption. Timeline is local-only project intelligence, never sent anywhere. Always-on regardless of telemetry setting. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: preamble context recovery + timeline events + predictive suggestions Layers 1-3 of the Session Intelligence Layer: - Timeline start/complete events injected into every skill via preamble - Context recovery (tier 2+): lists recent CEO plans, checkpoints, reviews - Cross-session injection: LAST_SESSION and LATEST_CHECKPOINT for branch - Predictive skill suggestion from recent timeline patterns - Welcome back message synthesis - Routing rules for /checkpoint and /health Timeline writes are NOT gated by telemetry (local project intelligence). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /checkpoint + /health skills (Layers 4-5) /checkpoint: save/resume/list working state snapshots. Supports cross-branch listing for Conductor workspace handoff. Session duration tracking. /health: code quality scorekeeper. Wraps project tools (tsc, biome, knip, shellcheck, tests), computes composite 0-10 score, tracks trends over time. Auto-detects tools or reads from CLAUDE.md ## Health Stack. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files + add timeline tests 9 timeline tests (all passing) mirroring learnings.test.ts pattern. All 34 SKILL.md files regenerated with new preamble (context recovery, timeline events, routing rules for /checkpoint and /health). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.0.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update self-learning roadmap post-Session Intelligence R1-R3 marked shipped with actual versions. R4 becomes Adaptive Ceremony (trust as separate policy engine, scope-aware, gradual degradation). R5 becomes /autoship (resumable state machine, not linear chain). R6-R7 unbundled from old R5. Added State Systems reference, Risk Register (Codex-reviewed), and validation metrics for R4. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: E2E tests for Session Intelligence (timeline, recovery, checkpoint) 3 gate-tier E2E tests: - timeline-event-flow: binary data flow round-trip (no LLM) - context-recovery-artifacts: seeded artifacts appear in preamble - checkpoint-save-resume: checkpoint file created with YAML frontmatter Also fixes package.json version sync (0.14.6.0 → 0.15.0.0). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 00:50:42 -06:00
Garry Tan	8115951284	feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647 ) * refactor: remove dead contributor mode, replace with operational self-improvement slot Contributor mode never fired in 18 days of heavy use (required manual opt-in via gstack-config, gated behind _CONTRIB=true, wrote disconnected markdown). Removes: generateContributorMode(), _CONTRIB bash var, 2 E2E tests, touchfile entry, doc references. Cleans up skip-lists in plan-ceo-review, autoplan, review resolver, and document-release templates. The operational self-improvement system (next commit) replaces this slot with automatic learning capture that requires no opt-in. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: operational self-improvement — every skill learns from failures Adds universal operational learning capture to the preamble completion protocol. At the end of every skill session, the agent reflects on CLI failures, wrong approaches, and project quirks, logging them as type "operational" to the learnings JSONL. Future sessions surface these automatically. - generateCompletionStatus(ctx) now includes operational capture section - Preamble bash shows top 3 learnings inline when count > 5 - New "operational" type in generateLearningsLog alongside pattern/pitfall/etc - Updated unit tests + operational seed entry in learnings E2E Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: wire learnings into all insight-producing skills Adds LEARNINGS_SEARCH and/or LEARNINGS_LOG to 10 skill templates that produce reusable insights but were previously disconnected from the learning system: - office-hours, plan-ceo-review, plan-eng-review: add LOG (had SEARCH) - plan-design-review: add both SEARCH + LOG (had neither) - design-review, design-consultation, cso, qa, qa-only: add both - retro: add SEARCH (had LOG) 13 skills now fully participate in the learning loop (read + write). Every review, QA, investigation, and design session both consults prior learnings and contributes new ones. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add operational-learning E2E test (gate-tier) Validates the write path: agent encounters a CLI failure, logs an operational learning to JSONL via gstack-learnings-log. Replaces the removed contributor-mode E2E test. Setup: temp git repo, copy bin scripts, set GSTACK_HOME. Prompt: simulated npm test failure needing --experimental-vm-modules. Assert: learnings.jsonl exists with type=operational entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: learnings-show E2E slug mismatch — seed at computed slug, not hardcoded The test seeded learnings at projects/test-project/ but gstack-slug computes the slug from basename(workDir) when no git remote exists. The agent's search looked at the wrong path and found nothing. Fix: compute slug the same way gstack-slug does (basename + sanitize) and seed the learnings there. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.13.8.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-31 23:08:22 -06:00
Garry Tan	a0328be04c	feat: always-on adversarial review + scope drift + plan mode design tools (v0.14.3.0) (#694 ) * feat: always-on adversarial review + scope drift resolver + cross-model tension format Rewrite generateAdversarialStep() to remove LOC-based tier skipping. Every review now runs both Claude adversarial subagent and Codex adversarial challenge. OLD_CFG only gates Codex passes, not Claude. Add generateScopeDrift() shared resolver. Fix cross-model tension AskUserQuestion to include RECOMMENDATION + Completeness. * feat: add scope drift to /ship, extract from /review template /ship gets {{SCOPE_DRIFT}} at Step 3.48 + PR body slot. /review replaces hardcoded scope drift with {{SCOPE_DRIFT}} + {{PLAN_COMPLETION_AUDIT_REVIEW}}. * feat: plan mode safe operations — browse, design, codex allowed in plan mode Add preamble section declaring $B, $D, codex, and ~/.gstack/ writes as plan-mode-safe. Unblocks design skills during planning. * test: update adversarial + add scope drift assertions Rename adversarial tests to reflect always-on behavior. Remove tier threshold assertions. Add scope drift content assertions for both /review and /ship generated SKILL.md files. * chore: bump version and changelog (v0.14.3.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-30 21:45:28 -06:00
Garry Tan	a1a933614c	feat: sidebar CSS inspector + per-tab agents (v0.13.9.0) (#650 ) * feat: CDP inspector module — persistent sessions, CSS cascade, style modification New browse/src/cdp-inspector.ts with full CDP inspection engine: - inspectElement() via CSS.getMatchedStylesForNode + DOM.getBoxModel - modifyStyle() via CSS.setStyleTexts with headless page.evaluate fallback - Persistent CDP session lifecycle (create, reuse, detach on nav, re-create) - Specificity sorting, overridden property detection, UA rule filtering - Modification history with undo support - formatInspectorResult() for CLI output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: browse server inspector endpoints + inspect/style/cleanup/prettyscreenshot CLI Server endpoints: POST /inspector/pick, GET /inspector, POST /inspector/apply, POST /inspector/reset, GET /inspector/history, GET /inspector/events (SSE). CLI commands: inspect (CDP cascade), style (live CSS mod), cleanup (page clutter removal), prettyscreenshot (clean screenshot pipeline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar CSS inspector — element picker, box model, rule cascade, quick edit Extension changes for the visual CSS inspector: - inspector.js: element picker with hover highlight, CSS selector generation, basic mode fallback (getComputedStyle + CSSOM), page alteration handlers - inspector.css: picker overlay styles (blue highlight + tooltip) - background.js: inspector message routing (picker <-> server <-> sidepanel) - sidepanel: Inspector tab with box model viz (gstack palette), matched rules with specificity badges, computed styles, click-to-edit quick edit, Send to Agent/Code button, empty/loading/error states Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: document inspect, style, cleanup, prettyscreenshot browse commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: auto-track user-created tabs and handle tab close browser-manager.ts changes: - context.on('page') listener: automatically tracks tabs opened by the user (Cmd+T, right-click open in new tab, window.open). Previously only programmatic newTab() was tracked, so user tabs were invisible. - page.on('close') handler in wirePageEvents: removes closed tabs from the pages map and switches activeTabId to the last remaining tab. - syncActiveTabByUrl: match Chrome extension's active tab URL to the correct Playwright page for accurate tab identity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: per-tab agent isolation via BROWSE_TAB environment variable Prevents parallel sidebar agents from interfering with each other's tab context. Three-layer fix: - sidebar-agent.ts: passes BROWSE_TAB=<tabId> env var to each claude process, per-tab processing set allows concurrent agents across tabs - cli.ts: reads process.env.BROWSE_TAB and includes tabId in command request body - server.ts: handleCommand() temporarily switches activeTabId when tabId is present, restores after command completes (safe: Bun event loop is single-threaded) Also: per-tab agent state (TabAgentState map), per-tab message queuing, per-tab chat buffers, verbose streaming narration, stop button endpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar per-tab chat context, tab bar sync, stop button, UX polish Extension changes: - sidepanel.js: per-tab chat history (tabChatHistories map), switchChatTab() swaps entire chat view, browserTabActivated handler for instant tab sync, stop button wired to /sidebar-agent/stop, pollTabs renders tab bar - sidepanel.html: updated banner text ("Browser co-pilot"), stop button markup, input placeholder "Ask about this page..." - sidepanel.css: tab bar styles, stop button styles, loading state fixes - background.js: chrome.tabs.onActivated sends browserTabActivated to sidepanel with tab URL for instant tab switch detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: per-tab isolation, BROWSE_TAB pinning, tab tracking, sidebar UX sidebar-agent.test.ts (new tests): - BROWSE_TAB env var passed to claude process - CLI reads BROWSE_TAB and sends tabId in body - handleCommand accepts tabId, saves/restores activeTabId - Tab pinning only activates when tabId provided - Per-tab agent state, queue, concurrency - processingTabs set for parallel agents sidebar-ux.test.ts (new tests): - context.on('page') tracks user-created tabs - page.on('close') removes tabs from pages map - Tab isolation uses BROWSE_TAB not system prompt hack - Per-tab chat context in sidepanel - Tab bar rendering, stop button, banner text Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve merge conflicts — keep security defenses + per-tab isolation Merged main's security improvements (XML escaping, prompt injection defense, allowed commands whitelist, --model opus, Write tool, stderr capture) with our branch's per-tab isolation (BROWSE_TAB env var, processingTabs set, no --resume). Updated test expectations for expanded system prompt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.13.9.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add inspector message types to background.js allowlist Pre-existing bug found by Codex: ALLOWED_TYPES in background.js was missing all inspector message types (startInspector, stopInspector, elementPicked, pickerCancelled, applyStyle, toggleClass, injectCSS, resetAll, inspectResult). Messages were silently rejected, making the inspector broken on ALL pages. Also: separate executeScript and insertCSS into individual try blocks in injectInspector(), store inspectorMode for routing, and add content.js fallback when script injection fails (CSP, chrome:// pages). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: basic element picker in content.js for CSP-restricted pages When inspector.js can't be injected (CSP, chrome:// pages), content.js provides a basic picker using getComputedStyle + CSSOM: - startBasicPicker/stopBasicPicker message handlers - captureBasicData() with ~30 key CSS properties, box model, matched rules - Hover highlight with outline save/restore (never leaves artifacts) - Click uses e.target directly (no re-querying by selector) - Sends inspectResult with mode:'basic' for sidebar rendering - Escape key cancels picker and restores outlines Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: cleanup + screenshot buttons in sidebar inspector toolbar Two action buttons in the inspector toolbar: - Cleanup (🧹): POSTs cleanup --all to server, shows spinner, chat notification on success, resets inspector state (element may be removed) - Screenshot (📸): POSTs screenshot to server, shows spinner, chat notification with saved file path Shared infrastructure: - .inspector-action-btn CSS with loading spinner via ::after pseudo-element - chat-notification type in addChatEntry() for system messages - package.json version bump to 0.13.9.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: inspector allowlist, CSP fallback, cleanup/screenshot buttons 16 new tests in sidebar-ux.test.ts: - Inspector message allowlist includes all inspector types - content.js basic picker (startBasicPicker, captureBasicData, CSSOM, outline save/restore, inspectResult with mode basic, Escape cleanup) - background.js CSP fallback (separate try blocks, inspectorMode, fallback) - Cleanup button (POST /command, inspector reset after success) - Screenshot button (POST /command, notification rendering) - Chat notification type and CSS styles Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.13.9.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: cleanup + screenshot buttons in chat toolbar (not just inspector) Quick actions toolbar (🧹 Cleanup, 📸 Screenshot) now appears above the chat input, always visible. Both inspector and chat buttons share runCleanup() and runScreenshot() helper functions. Clicking either set shows loading state on both simultaneously. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: chat toolbar buttons, shared helpers, quick-action-btn styles Tests that chat toolbar exists (chat-cleanup-btn, chat-screenshot-btn, quick-actions container), CSS styles (.quick-action-btn, .quick-action-btn.loading), shared runCleanup/runScreenshot helper functions, and cleanup inspector reset. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: aggressive cleanup heuristics — overlays, scroll unlock, blur removal Massively expanded CLEANUP_SELECTORS with patterns from uBlock Origin and Readability.js research: - ads: 30+ selectors (Google, Amazon, Outbrain, Taboola, Criteo, etc.) - cookies: OneTrust, Cookiebot, TrustArc, Quantcast + generic patterns - overlays (NEW): paywalls, newsletter popups, interstitials, push prompts, app download banners, survey modals - social: follow prompts, share tools - Cleanup now defaults to --all when no args (sidebar button fix) - Uses !important on all display:none (overrides inline styles) - Unlocks body/html scroll (overflow:hidden from modal lockout) - Removes blur/filter effects (paywall content blur) - Removes max-height truncation (article teaser truncation) - Collapses empty ad placeholder whitespace (empty divs after ad removal) - Skips gstack-ctrl indicator in sticky removal Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: disable action buttons when disconnected, no error spam - setActionButtonsEnabled() toggles .disabled class on all cleanup/screenshot buttons (both chat toolbar and inspector toolbar) - Called with false in updateConnection when server URL is null - Called with true when connection established - runCleanup/runScreenshot silently return when disconnected instead of showing 'Not connected' error notifications - CSS .disabled style: pointer-events:none, opacity:0.3, cursor:not-allowed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: cleanup heuristics, button disabled state, overlay selectors 17 new tests: - cleanup defaults to --all on empty args - CLEANUP_SELECTORS overlays category (paywall, newsletter, interstitial) - Major ad networks in selectors (doubleclick, taboola, criteo, etc.) - Major consent frameworks (OneTrust, Cookiebot, TrustArc, Quantcast) - !important override for inline styles - Scroll unlock (body overflow:hidden) - Blur removal (paywall content blur) - Article truncation removal (max-height) - Empty placeholder collapse - gstack-ctrl indicator skip in sticky cleanup - setActionButtonsEnabled function - Buttons disabled when disconnected - No error spam from cleanup/screenshot when disconnected - CSS disabled styles for action buttons Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: LLM-based page cleanup — agent analyzes page semantically Instead of brittle CSS selectors, the cleanup button now sends a prompt to the sidebar agent (which IS an LLM). The agent: 1. Runs deterministic $B cleanup --all as a quick first pass 2. Takes a snapshot to see what's left 3. Analyzes the page semantically to identify remaining clutter 4. Removes elements intelligently, preserving site branding This means cleanup works correctly on any site without site-specific selectors. The LLM understands that "Your Daily Puzzles" is clutter, "ADVERTISEMENT" is junk, but the SF Chronicle masthead should stay. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: aggressive cleanup heuristics + preserve top nav bar Deterministic cleanup improvements (used as first pass before LLM analysis): - New 'clutter' category: audio players, podcast widgets, sidebar puzzles/games, recirculation widgets (taboola, outbrain, nativo), cross-promotion banners - Text-content detection: removes "ADVERTISEMENT", "Article continues below", "Sponsored", "Paid content" labels and their parent wrappers - Sticky fix: preserves the topmost full-width element near viewport top (site nav bar) instead of hiding all sticky/fixed elements. Sorts by vertical position, preserves the first one that spans >80% viewport width. Tests: clutter category, ad label removal, nav bar preservation logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: LLM-based cleanup architecture, deterministic heuristics, sticky nav 22 new tests covering: - Cleanup button uses /sidebar-command (agent) not /command (deterministic) - Cleanup prompt includes deterministic first pass + agent snapshot analysis - Cleanup prompt lists specific clutter categories for agent guidance - Cleanup prompt preserves site identity (masthead, headline, body, byline) - Cleanup prompt instructs scroll unlock and $B eval removal - Loading state management (async agent, setTimeout) - Deterministic clutter: audio/podcast, games/puzzles, recirculation - Ad label text patterns (ADVERTISEMENT, Sponsored, Article continues) - Ad label parent wrapper hiding for small containers - Sticky nav preservation (sort by position, first full-width near top) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent repeat chat message rendering on reconnect/replay Root cause: server persists chat to disk (chat.jsonl) and replays on restart. Client had no dedup, so every reconnect re-rendered the entire history. Messages from an old HN session would repeat endlessly on the SF Chronicle tab. Fix: renderedEntryIds Set tracks which entry IDs have been rendered. addChatEntry skips entries already in the set. Entries without an id (local notifications) bypass the check. Clear chat resets the set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: agent stops when done, no focus stealing, opus for prompt injection safety Three fixes for sidebar agent UX: - System prompt: "Be CONCISE. STOP as soon as the task is done. Do NOT keep exploring or doing bonus work." Prevents agent from endlessly taking screenshots and highlighting elements after answering the question. - switchTab(id, opts): new bringToFront option. Internal tab pinning (BROWSE_TAB) uses bringToFront: false so agent commands never steal window focus from the user's active app. - Keep opus model (not sonnet) for prompt injection resistance on untrusted web pages. Remove Write from allowedTools (agent only needs Bash for $B). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: agent conciseness, focus stealing, opus model, switchTab opts Tests for the three UX fixes: - System prompt contains STOP/CONCISE/Do NOT keep exploring - sidebar agent uses opus (not sonnet) for prompt injection resistance - switchTab has bringToFront option, defaults to true (opt-out) - handleCommand tab pinning uses bringToFront: false (no focus steal) - Updated stale tests: switchTab signature, allowedTools excludes Write, narration -> conciseness, tab pinning restore calls Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: sidebar CSS interaction E2E — HN comment highlight round-trip New E2E test (periodic tier, ~$2/run) that exercises the full sidebar agent pipeline with CSS interaction: 1. Agent navigates to Hacker News 2. Clicks into the top story's comments 3. Reads comments and identifies the most insightful one 4. Highlights it with a 4px solid orange outline via style injection Tests: navigation, snapshot, text reading, LLM judgment, CSS modification. Requires real browser + real Claude (ANTHROPIC_API_KEY). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sidebar CSS E2E test — correct idle timeout (ms not s), pipe stdio Root cause of test failure: BROWSE_IDLE_TIMEOUT is in milliseconds, not seconds. '600' = 0.6 seconds, server died immediately after health check. Fixed to '600000' (10 minutes). Also: use 'pipe' stdio instead of file descriptors (closing fds kills child on macOS/bun), catch ConnectionRefused on poll retry, 4 min poll timeout for the multi-step opus task. Test passes: agent navigates to HN, reads comments, identifies most insightful one, highlights it with orange CSS, stops. 114s, $0.00. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 12:51:05 -06:00
Garry Tan	66c09644a7	feat: composable skills — INVOKE_SKILL resolver + factoring infrastructure (v0.13.7.0) (#644 ) * feat: add parameterized resolver support to gen-skill-docs Extend the placeholder regex from {{WORD}} to {{WORD:arg1:arg2}}, enabling parameterized resolvers like {{INVOKE_SKILL:plan-ceo-review}}. - Widen ResolverFn type to accept optional args?: string[] - Update RESOLVERS record to use ResolverFn type - Both replacement and unresolved-check regexes updated - Fully backward compatible: existing {{WORD}} patterns unchanged Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add INVOKE_SKILL resolver for composable skill loading New composition.ts resolver module that emits prose instructing Claude to read another skill's SKILL.md and follow it, skipping preamble sections. Supports optional skip= parameter for additional sections. Usage: {{INVOKE_SKILL:plan-ceo-review}} or {{INVOKE_SKILL:plan-ceo-review:skip=Outside Voice}} Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: use frontmatter name: for skill symlinks and Codex paths Patch all 3 name-derivation paths to read name: from SKILL.md frontmatter instead of relying solely on directory basenames. This enables directory names that differ from invocation names (e.g., run-tests/ directory with name: test). - setup: link_claude_skill_dirs reads name: via grep, falls back to basename - gen-skill-docs.ts: codexSkillName uses frontmatter name for Codex output paths - gen-skill-docs.ts: moved frontmatter extraction before Codex path logic Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: extract CHANGELOG_WORKFLOW resolver from /ship Move changelog generation logic into a reusable resolver. The resolver is changelog-only (no version bump per Codex review recommendation). Adds voice rules inline. /ship Step 5 now uses {{CHANGELOG_WORKFLOW}}. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: use INVOKE_SKILL resolver for plan-ceo-review office-hours fallback Replace inline skill loading prose (read file, skip sections) with {{INVOKE_SKILL:office-hours}} in the mid-session detection path. The BENEFITS_FROM prerequisite offer is unchanged (separate use case). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: BENEFITS_FROM resolver delegates to INVOKE_SKILL Eliminate duplicated skip-list logic by having generateBenefitsFrom call generateInvokeSkill internally. The wrapper (AskUserQuestion, design doc re-check) stays in BENEFITS_FROM. The loading instructions (read file, skip sections, error handling) come from INVOKE_SKILL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add resolver tests for INVOKE_SKILL, CHANGELOG_WORKFLOW, parameterized args 12 new tests covering: - INVOKE_SKILL: template placeholder, default skip list, error handling, BENEFITS_FROM delegation - CHANGELOG_WORKFLOW: content, cross-check, voice guidance, format - Parameterized resolver infra: colon-separated args processing, no unresolved placeholders across all generated SKILL.md files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.13.7.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: journey routing tests — CLAUDE.md routing rules + stronger descriptions Three journey E2E tests (ideation, ship, debug) were failing because Claude answered directly instead of invoking the Skill tool. Root cause: skill descriptions in system-reminder are too weak to override Claude's default behavior for tasks it can handle natively. Fix has two parts: 1. CLAUDE.md routing rules in test workdir — Claude weighs project-level instructions higher than skill description metadata 2. "Proactively invoke" (not "suggest") in office-hours, investigate, ship descriptions — reinforces the routing signal 10/10 journey tests now pass (was 7/10). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: one-time CLAUDE.md routing injection prompt Add a preamble section that checks if the project's CLAUDE.md has skill routing rules. If not (and user hasn't declined), asks once via AskUserQuestion to inject a "## Skill routing" section. Root cause: skill descriptions in system-reminder metadata are too weak to reliably trigger proactive Skill tool invocation. CLAUDE.md project instructions carry higher weight in Claude's decision making. - Preamble bash checks for "## Skill routing" in CLAUDE.md - Stores decline in gstack-config (routing_declined=true) - Only asks once per project (HAS_ROUTING check + config check) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: annotated config file + routing injection tests gstack-config now writes a documented header on first config creation with every supported key explained (proactive, telemetry, auto_upgrade, skill_prefix, routing_declined, codex_reviews, skip_eng_review, etc.). Users can edit ~/.gstack/config.yaml directly, anytime. Also fixes grep to use ^KEY: anchoring so commented header lines don't shadow real config values. Tests added: - 7 new gstack-config tests (annotated header, no duplication, comment safety, routing_declined get/set/reset) - 6 new gen-skill-docs tests (preamble routing injection: bash checks, config reads, AskUserQuestion, decline persistence, routing rules) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump to v0.13.9.0, separate CHANGELOG from main's releases Split our branch's changes into a new 0.13.9.0 entry instead of jamming them into 0.13.7.0 which already landed on main as "Community Wave." Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: clarify branch-scoped VERSION/CHANGELOG after merging main Add explicit rules: merging main doesn't mean adopting main's version. Branch always gets its own entry on top with a higher version number. Three-point checklist after every merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: put our 0.13.9.0 entry on top of CHANGELOG Newest version goes on top. Our branch lands next, so our entry must be above main's 0.13.8.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: restore missing 0.13.7.0 Community Wave entry Accidentally dropped the 0.13.7.0 entry when reordering. All entries now present: 0.13.9.0 > 0.13.8.0 > 0.13.7.0 > 0.13.6.0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add CHANGELOG integrity check rule After any edit that moves/adds/removes entries, grep for version headers and verify no gaps or duplicates before committing. Prevents accidentally dropping entries during reordering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 23:35:17 -06:00
Garry Tan	3cda8deec9	fix: security audit round 2 (v0.13.4.0) (#640 ) * fix: chrome-cdp localhost-only binding Restrict Chrome CDP to localhost by adding --remote-debugging-address=127.0.0.1 and --remote-allow-origins to prevent network-accessible debugging sessions. Clears 1 Socket anomaly (Chrome CDP session exposure). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: extension sender validation + message type allowlist Add sender.id check and ALLOWED_TYPES allowlist to the Chrome extension's message handler. Defense-in-depth against message spoofing from external extensions or future externally_connectable changes. Clears 2 Socket anomalies (extension permissions). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: checksum-verified bun install Replace unverified curl\|bash bun installation with checksum-verified download-then-execute pattern. The install script is downloaded, sha256 verified against a known hash, then executed. Preserves the Bun-native install path without adding a Node/npm dependency. Clears Snyk W012 + 3 Socket anomalies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: content trust boundary markers in browse output Wrap page-content commands (text, html, links, forms, accessibility, console, dialog, snapshot) with --- BEGIN/END UNTRUSTED EXTERNAL CONTENT --- markers. Covers direct commands (server.ts), chain sub-commands, and snapshot output (meta-commands.ts). Adds PAGE_CONTENT_COMMANDS set and wrapUntrustedContent() helper in commands.ts (single source of truth, DRY). Expands the SKILL.md trust warning with explicit processing rules for agents. Clears Snyk W011 (third-party content exposure). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden trust boundary markers against escape attacks - Sanitize URLs in markers (remove newlines, cap at 200 chars) to prevent marker injection via history.pushState - Escape marker strings in content (zero-width space) so malicious pages can't forge the END marker to break out of the untrusted block - Wrap resume command snapshot with trust boundary markers - Wrap diff command output with trust boundary markers - Wrap watch stop last snapshot with trust boundary markers Found by cross-model adversarial review (Claude + Codex). * chore: bump version and changelog (v0.13.4.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: gitignore .factory/ and remove from tracking Factory Droid support was removed in this branch. The .factory/ directory was re-added by merging main (which had v0.13.5.0 Factory support). Gitignore it so it stays out. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 22:46:33 -06:00
Garry Tan	cdd6f7865d	feat: community wave — 7 fixes, relink, sidebar Write, discoverability (v0.13.5.0) (#641 ) * test: add 16 failing tests for 6 community fixes Tests-first for all fixes in this PR wave: - #594 discoverability: gstack tag in descriptions, 120-char first line - #573 feature signals: ship/SKILL.md Step 4 detection - #510 context warnings: no preemptive warnings in generated files - #474 Safety Net: no find -delete in generated files - #467 telemetry: JSONL writes gated by _TEL conditional - #584 sidebar: Write in allowedTools, stderr capture - #578 relink: prefixed/flat symlinks, cleanup, error, config hook Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: replace find -delete with find -exec rm for Safety Net (#474) -delete is a non-POSIX extension that fails on Safety Net environments. -exec rm {} + is POSIX-compliant and works everywhere. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: gate local JSONL writes by telemetry setting (#467) When telemetry is off, nothing is written anywhere — not just remote, but local JSONL too. Clean trust contract: off means off everywhere. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove preemptive context warnings from plan-eng-review (#510) The system handles context compaction automatically. Preemptive warnings waste tokens and create false urgency. Skills should not warn about context limits — just describe the compression priority order. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add (gstack) tag to skill descriptions for discoverability (#594) Every SKILL.md.tmpl description now contains "gstack" on the last line, making skills findable in Claude Code's command palette. First-line hooks stay under 120 chars. Split ship description to fix wrapping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: auto-relink skill symlinks on prefix config change (#578) New bin/gstack-relink creates prefixed (gstack-) or flat symlinks based on skill_prefix config. gstack-config auto-triggers relink when skill_prefix changes. Setup guards against recursive calls with GSTACK_SETUP_RUNNING env var. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> feat: add feature signal detection to version bump heuristic (#573) /ship Step 4 now checks for feature signals (new routes, migrations, test+source pairs, feat/ branches) when deciding version bumps. PATCH requires no feature signals. MINOR asks the user if any signal is detected or 500+ lines changed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar Write tool, stderr capture, cross-platform URL opener (#584) Add Write to sidebar allowedTools (both sidebar-agent.ts and server.ts). Write doesn't expand attack surface beyond what Bash already provides. Replace empty stderr handler with buffer capture for better error diagnostics. New bin/gstack-open-url for cross-platform URL opening. Does NOT include Search Before Building intro flow (deferred). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update sidebar-security test for Write tool addition The fallback allowedTools string now includes Write, matching the sidebar-agent.ts change from commit `68dc957`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.13.5.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent gstack-relink from double-prefixing gstack-upgrade gstack-relink now checks if a skill directory is already named gstack-* before prepending the prefix. Previously, setting skill_prefix=true would create gstack-gstack-upgrade, breaking the /gstack-upgrade command. Matches setup script behavior (setup:260) which already has this guard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: add double-prefix fix to changelog Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove .factory/ from git tracking and add to .gitignore Generated Factory Droid skills are build output, same as .agents/. They should not be committed to the repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 21:43:36 -06:00
Garry Tan	ae0a9ad195	feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0) (#622 ) * feat: learnings + confidence resolvers — cross-skill memory infrastructure Three new resolvers for the self-learning system: - LEARNINGS_SEARCH: tells skills to load prior learnings before analysis - LEARNINGS_LOG: tells skills to capture discoveries after completing work - CONFIDENCE_CALIBRATION: adds 1-10 confidence scoring to all review findings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: learnings bin scripts — append-only JSONL read/write gstack-learnings-log: validates JSON, auto-injects timestamp, appends to ~/.gstack/projects/$SLUG/learnings.jsonl. Append-only (no mutation). gstack-learnings-search: reads/filters/dedupes learnings with confidence decay (observed/inferred lose 1pt/30d), cross-project discovery, and "latest winner" resolution per key+type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: learnings count in preamble output Every skill now prints "LEARNINGS: N entries loaded" during preamble, making the compounding loop visible to the user. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: integrate learnings + confidence into 9 skill templates Add {{LEARNINGS_SEARCH}}, {{LEARNINGS_LOG}}, and {{CONFIDENCE_CALIBRATION}} placeholders to review, ship, plan-eng-review, plan-ceo-review, office-hours, investigate, retro, and cso templates. Regenerated all SKILL.md files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /learn skill — manage project learnings New skill for reviewing, searching, pruning, and exporting what gstack has learned across sessions. Commands: /learn, /learn search, /learn prune, /learn export, /learn stats, /learn add. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: self-learning roadmap — 5-release design doc Covers: R1 GStack Learns (v0.14), R2 Review Army (v0.15), R3 Smart Ceremony (v0.16), R4 /autoship (v0.17), R5 Studio (v0.18). Inspired by Compound Engineering, adapted to GStack's architecture. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: learnings bin script unit tests — 13 tests, free Tests gstack-learnings-log (valid/invalid JSON, timestamp injection, append-only) and gstack-learnings-search (dedup, type/query/limit filters, confidence decay, user-stated no-decay, malformed JSONL skip). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.13.4.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: learnings resolver + bin script edge case tests — 21 new tests, free Adds gen-skill-docs coverage for LEARNINGS_SEARCH, LEARNINGS_LOG, and CONFIDENCE_CALIBRATION resolvers. Adds bin script edge cases: timestamp preservation, special characters, files array, sort order, type grouping, combined filtering, missing fields, confidence floor at 0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sync package.json version with VERSION file (0.13.4.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: gitignore .factory/ — generated output, not source Same pattern as .claude/skills/ and .agents/. These SKILL.md files are generated from .tmpl templates by gen:skill-docs --host factory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: /learn E2E — seed 3 learnings, verify agent surfaces them Seeds N+1 query pattern, stale cache pitfall, and rubocop preference into learnings.jsonl, then runs /learn and checks that at least 2/3 appear in the agent's output. Gate tier, ~$0.25/run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 17:02:01 -06:00

1 2 3

113 Commits