Commit Graph

7 Commits

Author SHA1 Message Date
Garry Tan 0c88517a0f v1.34.0.0 feat: gstack consumable as submodule (factory-export API + AUTH_TOKEN env + import.meta.main gate) (#1472)
* feat(config): add resolveGstackHome, resolveChromiumProfile, cleanSingletonLocks

Three new exported helpers in browse/src/config.ts:

- resolveGstackHome(): honors GSTACK_HOME env, falls back to os.homedir()/.gstack
  Matches the existing convention in browse/src/telemetry.ts:26 and
  browse/src/domain-skills.ts:66.

- resolveChromiumProfile(explicit?): explicit arg wins -> CHROMIUM_PROFILE env
  -> resolveGstackHome()/chromium-profile. Lets gbrowser pass per-workspace
  profile paths through ServerConfig instead of relying on ambient env state.

- cleanSingletonLocks(dir): removes SingletonLock/Socket/Cookie via safeUnlinkQuiet.
  Defensive guard refuses to operate unless dir basename is 'chromium-profile'
  OR matches explicit CHROMIUM_PROFILE env value, preventing accidental
  deletion in unrelated directories.

Extends browse/test/config.test.ts with 12 tests covering env precedence,
guard behavior, ENOENT swallowing, and CHROMIUM_PROFILE override.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(security-classifier): TDZ when claude CLI is missing from PATH

The checkTranscript Promise executor in browse/src/security-classifier.ts
referenced `finish()` at the !claude early-return guard before declaring
it 5 lines later. JavaScript throws ReferenceError: Cannot access 'finish'
before initialization (TDZ) for that path, but the path is only reachable
when resolveClaudeCommand returns null inside the spawn block (a TOCTOU
window vs. the outer checkHaikuAvailable cache).

Fix: hoist `let stdout = ''`, `let done = false`, and `const finish` block
above `const claude = resolveClaudeCommand()` so finish is in scope before
any reference to it. Behavior is identical when claude is on PATH; the
fix only matters for the dormant missing-CLI degraded path.

Adds browse/test/security-classifier-tdz.test.ts as the regression guard:
clears PATH + override env vars, calls checkTranscript, asserts the result
serializes with degraded:true and a meaningful reason field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(browser-manager): isCustomChromium gate + per-workspace profile + lock cleanup

Three fold-ins so gbrowser can become a thin overlay instead of forking
browse-server:

- Export isCustomChromium(): detects custom Chromium builds that bake the
  extension in as a component extension. Prefers explicit
  GSTACK_CHROMIUM_KIND=custom-extension-baked signal; falls back to
  GSTACK_CHROMIUM_PATH substring containing 'GBrowser' / 'gbrowser'.
  Gates the --load-extension push at launchHeaded so we don't trigger
  ServiceWorkerState::SetWorkerId DCHECK when two copies of the same
  service worker race to register.

- Swap hardcoded path.join(HOME, '.gstack', 'chromium-profile') in
  launchHeaded for resolveChromiumProfile() so phoenix can pass a
  per-workspace profile via CHROMIUM_PROFILE env (one daemon per gbd
  workspace, each with a distinct profile dir).

- Call cleanSingletonLocks(userDataDir) immediately after mkdirSync.
  Chromium's ProcessSingleton refuses to start when stale
  SingletonLock/Socket/Cookie files survive a SIGKILL or hard crash;
  pre-launch cleanup defends against the crash case. Safe under external
  coordination (gbd.lock for gbrowser, single-instance CLI check for
  gstack).

The existing .auth.json write at L291-302 is preserved — extensions
still need it for bootstrap even when component-baked.

Adds browse/test/browser-manager-custom-chromium.test.ts with 8 tests
covering both the env-kind and path-substring signals plus stock /
playwright-bundled Chromium negative cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(server): factory-export API surface + import.meta.main gate

Surfaces the embedder API gbrowser (phoenix) needs to consume gstack as a
submodule, and gates module-load side effects so the file is safe to
import without auto-starting a daemon.

Changes to browse/src/server.ts:

- AUTH_TOKEN now honors process.env.AUTH_TOKEN (trimmed) before falling
  back to crypto.randomUUID(). Whitespace-only values are rejected so the
  security boundary can't be silently weakened.

- New exported types: ServerConfig and ServerHandle. ServerConfig documents
  the full factory contract (authToken, browsePort, idleTimeoutMs, config,
  browserManager, chromiumProfile, xvfb, proxyBridge, startTime, beforeRoute).
  ServerHandle documents the return shape (fetchLocal, fetchTunnel,
  shutdown, stopListeners). Caller-owned lifecycle annotations on xvfb and
  proxyBridge prevent double-close bugs from surprise ownership.

- New exported function: resolveConfigFromEnv() builds a ServerConfig-shaped
  object from process.env for CLI use. Embedders construct their own
  ServerConfig explicitly.

- start() is now exported. Embedders can call it with env vars set as a
  v1 escape hatch until full buildFetchHandler extraction lands.

- Signal handlers (SIGINT, SIGTERM, Windows exit, uncaughtException,
  unhandledRejection) and the auto-kickoff at module bottom are now wrapped
  in `if (import.meta.main)`. CLI path is unchanged. Embedders register
  their own handlers.

- shutdown() and emergencyCleanup() now call cleanSingletonLocks(
  resolveChromiumProfile()) instead of inline path+loop. Single
  implementation, defensive guard, honors per-workspace CHROMIUM_PROFILE.

New tests:
- browse/test/server-no-import-side-effects.test.ts: spawns a fresh Bun
  subprocess that imports server.ts, asserts no signal handlers registered,
  no state-dir populated. Guards the core refactor invariant from
  regression.
- browse/test/server-factory.test.ts: 12 tests covering AUTH_TOKEN env
  behavior (honored, whitespace-rejected, trimmed), preserved exports
  (TUNNEL_COMMANDS, canDispatchOverTunnel), and ServerConfig/ServerHandle
  type compatibility.

Deferred to follow-up PR: full buildFetchHandler extraction that hoists
the 13 module-level mutables + helpers into a factory closure. Phoenix
can ship v0.6.0.0 against the start()+env surface today; the cleaner
factory comes next.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: harden auth-token validation, TDZ try/catch, lockfile path safety

Three security hardening fixes from /ship adversarial review:

1. AUTH_TOKEN unicode-whitespace bypass (server.ts:67-83).
   Old: `process.env.AUTH_TOKEN?.trim() || randomUUID()` only stripped
   ASCII whitespace. A misconfigured embedder shipping AUTH_TOKEN=$''
   (BOM) or $'​' (zero-width space) would silently get a
   one-character bearer secret. New `sanitizeAuthToken()` strips all
   unicode whitespace via regex and requires >= 16 chars after stripping;
   anything shorter falls back to crypto.randomUUID(). Same sanitizer
   used by `resolveConfigFromEnv()` so the embedder path is hardened too.

2. security-classifier.ts checkTranscript safety net.
   `resolveClaudeCommand()` and `spawn()` can throw under transient
   conditions (PATH probe failure, posix_spawn ENOMEM). Old code let the
   throw propagate and rejected the Promise with a raw exception. Now
   wrapped in try/catch that calls finish() with a degraded signal,
   matching the graceful-degradation contract the layer already promises
   for missing-CLI / exit-nonzero / parse-error.

3. cleanSingletonLocks defensive guard tightened (config.ts).
   Old: basename === 'chromium-profile' OR userDataDir === $CHROMIUM_PROFILE.
   The second branch was env-controlled and the first was bypassable by
   passing a relative path that resolved to chromium-profile via CWD
   drift. New guard: refuses relative paths outright, resolves both
   sides via path.resolve(), and only accepts the env-match path when
   $CHROMIUM_PROFILE is itself absolute.

Test updates: replace the old `.trim()` test with three new cases
covering unicode-whitespace stripping, short-token rejection, and
zero-width-only rejection (server-factory.test.ts).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.34.0.0)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:22:30 -04:00
Garry Tan 00f966b3ec v1.30.0.0 fix wave: 21 community PRs + Windows CI extension + codex flag-semantics smoke (#1391)
* fix(codex): use resume-compatible flags

* fix: V-001 security vulnerability

Automated security fix generated by Orbis Security AI

* docs: align prompt-injection thresholds to security.ts (v1.6.4.0 catch-up)

CLAUDE.md:290 and ARCHITECTURE.md:159 were missed when WARN was bumped
0.60 → 0.75 in d75402bb (v1.6.4.0, "cut Haiku classifier FP from 44% to
23%, gate now enforced", #1135). browse/src/security.ts:37 has WARN: 0.75
and BROWSER.md:743 was updated alongside that commit; CLAUDE.md and
ARCHITECTURE.md still read 0.60.

Also adds the SOLO_CONTENT_BLOCK: 0.92 entry to CLAUDE.md (already in
security.ts:50 and BROWSER.md:745, missing from CLAUDE.md's threshold
table).

No code change. No behavior change. Pure doc-vs-code alignment.

Verification:
  $ grep -n "WARN" browse/src/security.ts CLAUDE.md ARCHITECTURE.md BROWSER.md
  browse/src/security.ts:37:  WARN: 0.75,
  CLAUDE.md:290: - \`WARN: 0.75\` ...
  ARCHITECTURE.md:159: ...>= \`WARN\` (0.75)...
  BROWSER.md:743: - \`WARN: 0.75\` ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: Korean/CJK IME input and rendering in Sidebar Terminal

Fixes #1272

This commit addresses three separate Korean/CJK bugs in the Sidebar Terminal:

**Bug 1 - IME Input**: Korean text typed via IME composition was not
reaching the PTY correctly. Added compositionstart/compositionend event
listeners to suppress partial jamo fragments and only send the final
composed string.

**Bug 2a - Font Rendering**: Added CJK monospace font fallbacks
("Noto Sans Mono CJK KR", "Malgun Gothic") to both the xterm.js
fontFamily config and the CSS --font-mono variable. This ensures
consistent cell-width calculations for Korean characters.

**Bug 2b - UTF-8 Boundary Detection**: Added buffering logic to prevent
multi-byte UTF-8 characters (Korean is 3 bytes) from being split across
WebSocket chunks. This follows the same pattern as PR #1007 which fixed
the sidebar-agent path, but extends it to the terminal-agent path.

Special thanks to @ldybob for the excellent root cause analysis and
proposed solutions in issue #1272.

Tested on WSL2 + Windows 11 with Korean IME.

* fix(ship): tighten Plan Completion gate (VAS-449 remediation)

VAS-446 shipped with a PLAN.md acceptance criterion (domain-hq has
/docs/dashboard.md) silently skipped. /ship's Plan Completion subagent
existed at ship time (added in v1.4.1.0) but the gate let the failure
through. Four structural fixes:

1. Path concreteness rule: items naming a concrete filesystem path MUST
   be classified DONE/NOT DONE via [ -f <path> ], never UNVERIFIABLE.
2. Validator detection: CONTENT-SHAPE items scan target repo's
   package.json for validate-* scripts and run them before falling back
   to UNVERIFIABLE.
3. Per-item UNVERIFIABLE confirmation: replaces blanket "I've checked
   each one" with per-item Y/N/D loop. The blanket-confirm path is the
   exact failure VAS-449 surfaced.
4. Subagent fail-closed: if Plan Completion subagent + inline fallback
   both fail, surface explicit AskUserQuestion instead of silent pass.
   Replaces the prior "Never block /ship on subagent failure" fail-open.

Locked in by test/ship-plan-completion-invariants.test.ts (5 assertions,
no LLM dependency, ~60ms).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(browse): bash.exe wrap for telemetry on Windows

reportAttemptTelemetry() in browse/src/security.ts calls spawn(bin, args)
where bin is the gstack-telemetry-log bash script. On Windows this fails
silently with ENOENT — CreateProcess can't dispatch on shebang lines.

Adopts v1.24.0.0's Bun.which + GSTACK_*_BIN override pattern (from
browse/src/claude-bin.ts:resolveClaudeCommand, introduced in #1252) for
resolving bash.exe. resolveBashBinary() honors GSTACK_BASH_BIN absolute-path
or PATH-resolvable override, falling back to Bun.which('bash') which finds
Git Bash on the standard Windows install.

buildTelemetrySpawnCommand() wraps the script invocation on win32 only;
POSIX path is bit-identical. Returns null when bash can't be resolved on
Windows so caller skips spawn — local attempts.jsonl audit trail keeps
working without surfacing a Windows-only failure.

8 new unit tests cover resolveBashBinary (POSIX bash, absolute override,
quote-stripping, BASH_BIN fallback, empty-PATH null) and buildTelemetrySpawnCommand
(POSIX pass-through, win32 bash wrap, win32 null on unresolvable, arg-array
immutability).

POSIX path is bit-identical — Bun.which('bash') on Linux/macOS returns the
same /bin/bash or /usr/bin/bash that the old hardcoded spawn relied on.

* fix(make-pdf): Bun.which-based binary resolution for browse + pdftotext on Windows

Extends v1.24.0.0's Bun.which + GSTACK_*_BIN override pattern (introduced in
browse/src/claude-bin.ts via #1252) to the two other binary resolvers in the
codebase: make-pdf/src/browseClient.ts:resolveBrowseBin and
make-pdf/src/pdftotext.ts:resolvePdftotext.

Same Windows quirks (fs.accessSync(X_OK) degrades to existence-check; `which`
isn't available outside Git Bash; bun --compile --outfile X emits X.exe), same
Bun.which-based fix shape, same env override convention.

Changes:
  - GSTACK_BROWSE_BIN / GSTACK_PDFTOTEXT_BIN as the v1.24-aligned overrides;
    BROWSE_BIN / PDFTOTEXT_BIN remain as back-compat aliases.
  - Bun.which() replaces execFileSync('which', ...) for PATH lookup. Handles
    Windows PATHEXT natively; no more `where`-vs-`which` branch.
  - findExecutable(base) helper exported from each module, probes .exe/.cmd/.bat
    after the bare-path miss on win32. Linux/macOS behavior is bit-identical
    (isExecutable short-circuits before the win32 branch ever runs).
  - macCandidates renamed posixCandidates (always was — /opt/homebrew, /usr/local,
    /usr/bin). No Windows candidates added; Poppler installs scatter across
    Scoop/Chocolatey/portable zips and guessing causes false positives.
  - Error messages get a Windows install hint (scoop install poppler / oschwartz10612)
    and `setx` example for GSTACK_*_BIN.
  - Pre-existing test 'honors BROWSE_BIN when it points at a real executable'
    was hardcoded /bin/sh — made cross-platform via a REAL_EXE constant
    (cmd.exe on win32, /bin/sh on POSIX). Was a Windows-CI blocker on its own.

Coordination: PR #1094 (@BkashJEE) covered browseClient.ts independently with a
narrower scope; this PR's pdftotext + cross-platform tests + GSTACK_*_BIN naming
are additive. Either order of merge works.

Test plan:
  - bun test make-pdf/test/browseClient.test.ts make-pdf/test/pdftotext.test.ts
    on win32 — 29 pass, 0 fail (12 new assertions: findExecutable POSIX/win32/null,
    resolveBrowseBin GSTACK_BROWSE_BIN + BROWSE_BIN + precedence + quote-strip,
    same shape for resolvePdftotext + Windows install hint in error message).
  - POSIX branch unchanged — fs.accessSync(X_OK) on Linux/macOS short-circuits
    before any win32 logic runs, matching the v1.24 claude-bin.ts pattern.

* fix(browse): NTFS ACL hardening for Windows state files via icacls

gstack's ~/.gstack/ state directory holds bearer tokens, canary tokens, agent
queue contents (with prompt history), session state, security-decision logs,
and saved cookie bundles — all written with { mode: 0o600 } / 0o700. On Windows,
those mode bits are a silent no-op: Node's fs module doesn't translate POSIX
modes to NTFS ACLs, and inherited ACLs leave every "restricted" file readable
by other principals on the machine (verified via icacls — six ACEs, the
intended user is the LAST of six).

Threat model is non-trivial on:
  - Self-hosted CI runners (different service account on the same Windows box
    can read developer tokens, canary tokens, prompt history)
  - Shared development machines (agencies, studios, lab environments)
  - Multi-tenant servers with shared home directories

Orthogonal to v1.24.0.0's binary-resolution work — complementary at the write
side. v1.24's bin/gstack-paths resolves ~/.gstack/ correctly across plugin /
global / local installs; this PR ensures files written into those resolved
paths actually get the POSIX 0o600 semantic translated to NTFS.

The fix:
  - New browse/src/file-permissions.ts (158 LOC, 5 public + 1 test-reset).
    restrictFilePermissions / restrictDirectoryPermissions wrap chmod (POSIX)
    or icacls /inheritance:r /grant:r <user>:(F) (Windows). writeSecureFile /
    appendSecureFile / mkdirSecure are drop-in wrappers for the common patterns.
  - 19 call sites converted across 9 source files: browser-manager.ts,
    browser-skill-write.ts, cli.ts, config.ts, meta-commands.ts,
    security-classifier.ts, security.ts (4 sites), server.ts (5 sites),
    terminal-agent.ts (8 sites), tunnel-denial-log.ts.
  - (OI)(CI) inheritance flags on directories mean files created via fs.write*
    *inside* an mkdirSecure-created dir inherit the owner-only ACL automatically
    — important for tunnel-denial-log.ts where appends use async fsp.appendFile.

Error handling: icacls failures (nonexistent path, missing icacls.exe, hardened
environments) log a one-shot warning to stderr and proceed. Once-per-process
gating prevents log spam if the condition persists. Filesystem stays
functional; the file just ends up with inherited ACLs.

Test plan:
  - bun test browse/test/file-permissions.test.ts — 13 pass, 0 fail (POSIX
    mode-bit assertions, Windows no-throw, mkdir idempotence, recursive
    creation, Buffer payloads, append-creates-then-reapplies-once semantics)
  - bun test browse/test/security.test.ts — 38 pass, 0 fail (existing security
    test suite plus the bash-binary resolution tests added in fix #1119; the
    converted writeFileSync/appendFileSync/mkdirSync sites in security.ts
    integrate cleanly)
  - Empirical icacls before/after on a real file — 6 ACEs → 1 ACE
  - bun build typecheck on all modified files — clean (server.ts has a
    pre-existing playwright-core/electron resolution issue unrelated to this PR)

POSIX behavior is bit-identical to old code — fs.chmodSync(path, 0o6XX) on the
helper's POSIX branch matches the inline { mode: 0o6XX } it replaces. Linux
and macOS see no behavior change.

Inviting pushback on three judgment calls (in PR description):
  1. icacls vs npm library
  2. ACL scope — just user, or user + SYSTEM?
  3. Graceful degradation — once-per-process warn, not silent, not hard-fail.

* fix(browse): declare lastConsoleFlushed to restore console-log persistence

flushBuffers() references a `lastConsoleFlushed` cursor at server.ts:337
and assigns it at :344, but the `let lastConsoleFlushed = 0;`
declaration is missing — only the network and dialog siblings are
declared at lines 327-328.

Result: every 1-second flushBuffers tick (line 376) throws
`ReferenceError: lastConsoleFlushed is not defined`, gets swallowed by
the catch at line 369 ("[browse] Buffer flush failed: ..."), and the
console branch's append never runs. browse-console.log is never
written in any production deployment since this regressed.

Discovered by stress-testing the daemon with 15 concurrent CLIs against
cold state — the race surfaced the buffer-flush error spam in one
spawned daemon's stderr. Verified by running the daemon against a real
file:// page with console.log events: in-memory `browse console`
returns the entries, but `.gstack/browse-console.log` is never created
on disk.

Regression introduced by 1a100a2a "fix: eliminate duplicate command
sets in chain, improve flush perf and type safety" — the flush refactor
switched from `Bun.write` to `fs.appendFileSync` and added the
`lastConsoleFlushed` cursor pattern alongside its network/dialog
siblings, but missed the matching `let` declaration. Tests don't
currently exercise flushBuffers, so the regression shipped silently.

Fix:
  - Declare `let lastConsoleFlushed = 0;` next to `lastNetworkFlushed`
    and `lastDialogFlushed` (browse/src/server.ts:327)
  - Add a source-level guard test
    (browse/test/server-flush-trackers.test.ts) that fails any future
    refactor that adds a fourth `last*Flushed` cursor without the
    matching declaration. Same pattern as terminal-agent.test.ts and
    dual-listener.test.ts — read source as text, assert invariant, no
    daemon required.

Test plan:
  - [x] New regression test fails on current main, passes with the fix
  - [x] `bun run build` clean
  - [x] Manual smoke: spawn daemon -> goto file:// page with
        console.log -> wait 4s -> .gstack/browse-console.log now
        exists with the expected entries (163 bytes vs zero before)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

* fix(browse): per-process state-file temp path to fix concurrent-write ENOENT

The daemon writes `.gstack/browse.json` via the standard atomic-rename
pattern: `writeFileSync(tmp, …) → renameSync(tmp, stateFile)`. Four
sites in server.ts use this pattern (initial daemon-startup state at
:2002, /tunnel/start handler at :1479, BROWSE_TUNNEL=1 inline tunnel
update at :2083, BROWSE_TUNNEL_LOCAL_ONLY=1 update at :2113), and all
four hard-code the same temp filename `${stateFile}.tmp`.

Under concurrent writers the shared filename races on the rename:

    t0  Writer A: writeFileSync(stateFile + '.tmp', payloadA)
    t1  Writer B: writeFileSync(stateFile + '.tmp', payloadB)   // overwrites A
    t2  Writer A: renameSync(stateFile + '.tmp', stateFile)    // moves B's payload
    t3  Writer B: renameSync(stateFile + '.tmp', stateFile)    // ENOENT — file gone

Reproduced empirically with 15 concurrent CLIs against a fresh `.gstack/`:

    [browse] Failed to start: ENOENT: no such file or directory,
    rename '…/.gstack/browse.json.tmp' -> '…/.gstack/browse.json'

Pre-fix success rate: **0 / 15** under cold-start race.
Post-fix success rate: **15 / 15**, zero ENOENT.

Fix:
  - New `tmpStatePath()` helper (server.ts:333) returns
    `${stateFile}.tmp.${pid}.${randomBytes(4).toString('hex')}`
  - All 4 call sites use `tmpStatePath()` instead of the shared literal
  - Atomic rename still gives last-writer-wins semantics on the final
    state.json content; only behavior change is that concurrent writers
    no longer kill each other on the rename step

Source-level guard test (browse/test/server-tmp-state-path.test.ts)
locks two invariants: (1) no remaining `stateFile + '.tmp'` literals,
(2) every state-write `writeFileSync` call uses `tmpStatePath()`. Same
read-source-as-text pattern as terminal-agent.test.ts and
dual-listener.test.ts — no daemon required, runs in tier-1 free.

Test plan:
  - [x] Targeted source-level guard test passes (3 / 0)
  - [x] `bun run build` clean
  - [x] Live regression: 15 concurrent CLIs against cold state →
        15 / 15 healthy, 0 ENOENT (vs 0 / 15 pre-fix)
  - [x] No `.tmp.*` orphans left behind after rename succeeds
  - [x] Related test cluster (server-auth, dual-listener, cdp-mutex,
        findport) — same pre-existing flakes as `main`, no new
        regressions introduced

🤖 Generated with [Claude Code](https://claude.com/claude-code)

* fix(browse): clear refs when iframe auto-detaches in getActiveFrameOrPage

Asymmetric cleanup between two equivalent staleness conditions:

  onMainFrameNavigated()  →  clearRefs() + activeFrame = null  ✓
  getActiveFrameOrPage()  →  activeFrame = null  (refs NOT cleared)  ✗

Both paths see the same staleness condition — refs were captured
against a frame that no longer exists. The main-frame path correctly
clears both pieces of state. The iframe-detach path nulls the frame
but leaves the refMap intact.

The lazy click-time check in `resolveRef` (tab-session.ts:97) partially
saves us — `entry.locator.count()` on a detached-frame locator throws
or returns 0, so the click errors out as "Ref X is stale". But the
user has no signal that frame context silently changed underfoot: the
next `snapshot` runs against `this.page` (main) while old iframe refs
still litter `refMap` with the same role+name keys. New refs collide
with stale ones, the resolver picks one at random, the user clicks
the wrong element.

TODOS.md line 816-820 documents "Detached frame auto-recovery" as a
shipped iframe-support feature in v0.12.1.0. This restores the
documented intent — the recovery should leave the session in a clean
state, not a half-cleared one.

Fix: 1 line — add `this.clearRefs()` next to `this.activeFrame = null`
inside the if-branch.

Test plan:
  - [x] New regression test: 4/4 pass
        - refs cleared when getActiveFrameOrPage detects detached iframe
        - refs preserved when active frame is still attached (no regression)
        - refs preserved when no frame set (page-level path untouched)
        - matches onMainFrameNavigated symmetry — both paths reach the
          same clean end state
  - [x] `bun run build` clean

🤖 Generated with [Claude Code](https://claude.com/claude-code)

* fix(codex): resolve python for JSON parser

* fix: add fail-fast probe for base branch in ship step 12

* fix(plan-devex-review): remove contradictory plan-mode handshake

* fix(design): honor Retry-After header in variants 429 handler

Closes #1244.

The 429 handler in `generateVariant` discarded the `Retry-After` response
header and fell straight through to a local exponential schedule (2s/4s/8s).
In image-generation batches, that burns retry attempts inside the provider's
cooldown window and the request never recovers.

Now we parse `Retry-After` per RFC 7231 — both delta-seconds (`Retry-After: 5`)
and HTTP-date (`Retry-After: Fri, 31 Dec 1999 23:59:59 GMT`). Honored waits
are capped at 60s to bound stalls from hostile or buggy headers. Delta-seconds
are validated as digits-only (rejects `2abc`). When `Retry-After` is honored
(including 0 / past-date "retry now"), the next iteration's leading exponential
sleep is skipped so we don't double-wait. Invalid or missing headers fall
through to the existing exponential schedule unchanged.

Behavior matrix:

| Header                          | Behavior                                  |
|---------------------------------|-------------------------------------------|
| Retry-After: 5                  | wait 5s, skip leading on next attempt     |
| Retry-After: 999999             | capped to 60s, skip leading               |
| Retry-After: 2abc               | invalid, fall through to exponential      |
| Retry-After: 0                  | wait 0, skip leading (retry immediately)  |
| Retry-After: <past HTTP-date>   | wait 0, skip leading                      |
| Retry-After: <future date>      | wait diff capped at 60s, skip leading     |
| no header                       | fall through to existing exponential      |

`generateVariant` now accepts an optional `fetchFn` parameter (defaults to
`globalThis.fetch`) so tests can inject a stub. Production call sites are
unchanged.

Tests cover the five behavior buckets above, asserting both the 1st-to-2nd
call timing gap and call counts. All five pass in ~8s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docs): correct per-skill symlink removal snippet in README uninstall

Closes #1130.

The manual-uninstall fallback in `## Uninstall` → `### Option 2` used
`find ~/.claude/skills -maxdepth 1 -type l`, which finds nothing on real
installs. Each `~/.claude/skills/<name>/` is a real directory, and only
`<name>/SKILL.md` inside it is a symlink into `gstack/`. The find never
matched, so the snippet silently removed nothing.

Replace with a directory walk that inspects each `<name>/SKILL.md`:

  find ~/.claude/skills -mindepth 1 -maxdepth 1 -type d ! -name gstack
  → check $dir/SKILL.md is a symlink → readlink it
  → if target is gstack/* or */gstack/*: rm -f the link, rmdir the dir
    (only if empty — preserves any user-added files)

Excludes the top-level `gstack/` dir from the walk; that's removed by
step 3 of the same uninstall block.

`bin/gstack-uninstall` (the script-mode path) already handles the layout
correctly via its own walk; only this manual fallback needed updating.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: reject partial browse client env integers

* fix(gemini-adapter): detect new ~/.gemini/oauth_creds.json auth path

gemini-cli >=0.30 stores OAuth credentials at ~/.gemini/oauth_creds.json
instead of the legacy ~/.config/gemini/ directory. The benchmark adapter's
availability check now succeeds for users on recent gemini-cli releases
who have authenticated via interactive login.

Both paths are accepted so users on older versions still work.

* fix(browser): add --no-sandbox for root user on Linux/WSL2

Chromium's sandbox can't initialize when running as root on Linux,
causing an immediate exit. Extend the existing CI/CONTAINER check to
also cover this case, keeping the Windows-safe `typeof getuid` guard.

* security: pass cwd to git via execFileSync, not interpolation through /bin/sh

`bin/gstack-memory-ingest.ts:632-643` ran `execSync(\`git -C ${JSON.stringify(cwd)}
remote get-url origin 2>/dev/null\`, ...)`. JSON.stringify escapes `"` and `\`
but not `$` or backticks, so a `cwd` of `"$(touch /tmp/marker)"` survived JSON
quoting and detonated under /bin/sh's command-substitution-inside-double-quotes.

`cwd` originates from transcript JSONL records under
`~/.claude/projects/<encoded-cwd>/<uuid>.jsonl` and
`~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl`. The walker grabs the first
`.cwd` it sees per session. That's an untrusted surface in the gstack threat
model — the L1-L6 sidebar security stack exists exactly because agent
transcripts can carry attacker-influenced text. Two pivots above the local
same-uid bar: (a) prompt-injection appending `cwd="$(...)"` to the active
session log turns the next /sync-gbrain run into RCE under the user's uid;
(b) cross-machine transcript share (a colleague's `.claude/projects` snippet
untar'd into HOME, a documented gbrain dogfooding shape) → RCE on first sync.

Fix swaps the one execSync for `execFileSync("git", ["-C", cwd, "remote",
"get-url", "origin"], ...)`. No shell, argv passed directly to git. The same
module already uses execFileSync for `gbrainAvailable()` (line 762 pre-patch)
and `gbrainPutPage()` (line 816 pre-patch) — this single execSync was the
outlier.

Test: `gstack-memory-ingest security: untrusted cwd cannot trigger shell
substitution` plants a Claude-Code-shaped JSONL with cwd=`$(touch <marker>)`
and asserts the marker file is not created after `--incremental --quiet`.
Negative control: with the patch reverted, the test fails (marker created);
with the patch applied, it passes (18/18 in test/gstack-memory-ingest.test.ts).

* security: gate domain-skill auto-promote on classifier_score > 0

`browse/src/domain-skill-commands.ts:140` (handleSave) writes
`classifier_score: 0` with the comment "L4 deferred to load-time / sidebar-agent
fills this in on first prompt-injection load." But CLAUDE.md "Sidebar
architecture" documents that sidebar-agent.ts was ripped, and grep for
recordSkillUse + classifierFlagged callers across browse/src/ returns zero hits
outside the module under test.

Net effect: every quarantined skill that survives three benign uses without
flag (`recordSkillUse(... , classifierFlagged: false)` x3) auto-promotes to
`active` and lands in prompt context wrapped as UNTRUSTED on every subsequent
visit to that host. The L4 score that was supposed to gate the promotion was
never written — the production save path puts 0 on disk and nothing later
updates it.

Threat model: a domain-skill body authored by an agent under the influence of
a poisoned page (the new `gstackInjectToTerminal` PTY path runs no L1-L3
either) would lose its auto-promote barrier after three uses. The exploit
isn't single-step but the bar is exactly N=3 prompt-injection-shaped uses on
a hostile page, which is well within reach.

Fix adds a single condition to the auto-promote gate in `recordSkillUse`:

    if (state === 'quarantined' && useCount >= PROMOTE_THRESHOLD &&
        flagCount === 0 && current.classifier_score > 0) {
      state = 'active';
    }

`classifier_score` is set once at writeSkill and never updated. Production
saves it as 0 (handleSave), so the gate stays closed; existing tests that
explicitly pass `classifierScore: 0.1` still auto-promote (the auto-promote
path is preserved for the day L4 is rewired).

Manual promotion via `domain-skill promote-to-global` is unaffected (it goes
through `promoteToGlobal` which has its own state-machine guard at line 337+).

Test: new regression case `does NOT auto-promote when classifier_score is 0
(production handleSave shape)` plants a skill with classifierScore=0 (matches
domain-skill-commands.ts:140), runs three uses without flag, asserts the skill
stays quarantined and readSkill returns null. Negative control: revert the
patch, the test fails with `Received: "active"`. With the patch: 15/15 pass.

* fix(ship): port #1302 SKILL.md edits to .tmpl + resolver source

PR #1302 added Verification Mode + UNVERIFIABLE classification + per-item
confirmation gate to ship/SKILL.md, but only the generated SKILL.md was
edited — not the .tmpl source or scripts/resolvers/review.ts. The next
`bun run gen:skill-docs` run would have wiped the changes.

Port the same content into the resolver and .tmpl so regeneration produces
the intended output.

* ci(windows): extend free-tests lane to cover icacls + Bun.which resolvers from fix-wave PRs

Closes #1306/#1307/#1308 validation gap. The four newly-added test files
already have process.platform guards so they run safely on both POSIX and
Windows lanes — only platform-relevant assertions execute on each.

Tests added to the windows-latest lane:
- browse/test/file-permissions.test.ts (#1308 icacls + writeSecureFile)
- browse/test/security.test.ts (#1306 bash.exe wrap pure-function path)
- make-pdf/test/browseClient.test.ts (#1307 Bun.which browse resolver)
- make-pdf/test/pdftotext.test.ts (#1307 Bun.which pdftotext resolver)

* test(codex): live flag-semantics smoke for codex exec resume

Closes #1270's regex-only test gap. PR #1270 asserted that codex/SKILL.md's
`codex exec resume` invocation drops -C/-s and uses sandbox_mode config.
That regex catches the skill template regressing, but not codex CLI itself
flipping flag semantics again.

This test probes `codex exec resume --help` and asserts the surface gstack
relies on: -c/sandbox_mode is accepted, top-level -C is absent. Skips
silently when codex isn't on PATH, so dev machines without codex installed
never see it fail.

* chore: regen SKILL.md after fix wave

One regen commit at the end of the merge wave per the plan. plan-devex-review
loses the contradictory plan-mode handshake (#1333). review/SKILL.md picks up
the Verification Mode + UNVERIFIABLE classification additions that #1302
authored against ship/SKILL.md (same resolver shared between ship and review
modes).

* fix(server.ts): keep fs.writeFileSync for state-file writes

#1308's writeSecureFile wrapper added Windows icacls hardening for the
4 state-file write sites in server.ts, but #1310's regression test grep's
for fs.writeFileSync(tmpStatePath()) calls. The two changes are technically
compatible only if the test relaxes — keeping the test strict (the safer
choice for catching regressions on the cold-start race) means the 4 state-
file sites stay on fs.writeFileSync(..., { mode: 0o600 }).

POSIX 0o600 hardening is preserved on those 4 sites. Windows icacls
hardening still applies to all the other writeSecureFile call sites
#1308 added (auth.json, mkdirSecure, etc.).

Also refreshes golden baselines after #1302 / port + minor wording tweak
in scripts/resolvers/review.ts to keep gen-skill-docs.test.ts assertion
'Cite the specific file' satisfied.

* v1.30.0.0: fix wave — 21 community PRs + 2 closing fixes for Windows + codex CI gaps

Headline release. Browse stops dropping console logs, cold-start race
fixed, codex resume works without python3, Windows hardening (icacls +
Bun.which + bash.exe wrap), ship gate gets VAS-449 remediation, two
closing fixes that put icacls/Bun.which/codex flag semantics under CI.

* test(domain-skills): cover #1369 classifier_score=0 quarantine + score>0 promote path

The pre-existing T6 test seeded skills via writeSkill (which defaults
classifier_score to 0 until L4 is rewired) and then expected 3 uses to
auto-promote. PR #1369 added `current.classifier_score > 0` to the gate
specifically to block that path — a quarantined skill written under the
influence of a poisoned page would otherwise auto-promote after three
benign uses.

Updated test asserts both halves of the new contract:
- classifier_score=0 + 3 uses → stays quarantined (the security guarantee)
- classifier_score>0 + 3 more uses → promotes to active (unblock path)

Catches both regressions: the gate going away (would re-allow the bypass)
and the unblock path breaking (would silently quarantine all skills
forever once L4 is rewired).

---------

Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com>
Co-authored-by: orbisai0security <mediratta01.pally@gmail.com>
Co-authored-by: Bryce Alan <brycealan.eth@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Terry Carson YM <cym3118288@gmail.com>
Co-authored-by: Vasko Ckorovski <vckorovski@gmail.com>
Co-authored-by: Samuel Carson <samuel.carson@gmail.com>
Co-authored-by: Yashwant Kotipalli <yashwant7kotipalli@gmail.com>
Co-authored-by: Jasper Chen <jasperchen925@gmail.com>
Co-authored-by: Stefan Neamtu <stefan.neamtu@gmail.com>
Co-authored-by: 陈家名 <chenjiaming@kezaihui.com>
Co-authored-by: Abigail Atheryon <abi@atheryon.ai>
Co-authored-by: Furkan Köykıran <furkankoykiran@gmail.com>
Co-authored-by: gus <gustavoraularagon@gmail.com>
2026-05-09 08:06:47 -07:00
Garry Tan 7e96fe299b fix: security wave 3 — 12 fixes, 7 contributors (v0.16.4.0) (#988)
* fix(security): validateOutputPath symlink bypass — check file-level symlinks

validateOutputPath() previously only resolved symlinks on the parent directory.
A symlink at /tmp/evil.png → /etc/crontab passed the parent check (parent is
/tmp, which is safe) but the write followed the symlink outside safe dirs.

Add lstatSync() check: if the target file exists and is a symlink, resolve
through it and verify the real target is within SAFE_DIRECTORIES. ENOENT
(file doesn't exist yet) falls through to the existing parent-dir check.

Closes #921

Co-Authored-By: Yunsu <Hybirdss@users.noreply.github.com>

* fix(security): shell injection in bin/ scripts — use env vars instead of interpolation

gstack-settings-hook interpolated $SETTINGS_FILE directly into bun -e
double-quoted blocks. A path containing quotes or backticks breaks the JS
string context, enabling arbitrary code execution.

Replace direct interpolation with environment variables (process.env).
Same fix applied to gstack-team-init which had the same pattern.

Systematic audit confirmed only these two scripts were vulnerable — all
other bin/ scripts already use stdin piping or env vars.

Closes #858

Co-Authored-By: Gus <garagon@users.noreply.github.com>

* fix(security): cookie-import path validation bypass + hardcoded /tmp

Two fixes:
1. cookie-import relative path bypass (#707): path.isAbsolute() gated the
   entire validation, so relative paths like "sensitive-file.json" bypassed
   the safe-directory check entirely. Now always resolves to absolute path
   with realpathSync for symlink resolution, matching validateOutputPath().

2. Hardcoded /tmp in cookie-import-browser (#708): openDbFromCopy used
   /tmp directly instead of os.tmpdir(), breaking Windows support.

Also adds explicit imports for SAFE_DIRECTORIES and isPathWithin in
write-commands.ts (previously resolved implicitly through bundler).

Closes #852

Co-Authored-By: Toby Morning <urbantech@users.noreply.github.com>

* fix(security): redact form fields with sensitive names, not just type=password

Form redaction only applied to type="password" fields. Hidden and text
fields named csrf_token, api_key, session_id, etc. were exposed unredacted
in LLM context, leaking secrets.

Extend redaction to check field name and id against sensitive patterns:
token, secret, key, password, credential, auth, jwt, session, csrf, sid,
api_key. Uses the same pattern style as SENSITIVE_COOKIE_NAME.

Closes #860

Co-Authored-By: Gus <garagon@users.noreply.github.com>

* fix(security): restrict session file permissions to owner-only

Design session files written to /tmp with default umask (0644) were
world-readable on shared systems. Sessions contain design prompts and
feedback history.

Set mode 0o600 (owner read/write only) on both create and update paths.

Closes #859

Co-Authored-By: Gus <garagon@users.noreply.github.com>

* fix(security): enforce frozen lockfile during setup

bun install without --frozen-lockfile resolves ^semver ranges from npm on
every run. If an attacker publishes a compromised compatible version of any
dependency, the next ./setup pulls it silently.

Add --frozen-lockfile with fallback to plain install (for fresh clones
where bun.lock may not exist yet). Matches the pattern already used in
the .agents/ generation block (line 237).

Closes #614

Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com>

* fix: remove duplicate recursive chmod on /tmp in Dockerfile.ci

chmod -R 1777 /tmp recursively sets sticky bit on files (no defined
behavior), not just the directory. Deduplicate to single chmod 1777 /tmp.

Closes #747

Co-Authored-By: Maksim Soltan <Gonzih@users.noreply.github.com>

* fix(security): learnings input validation + cross-project trust gate

Three fixes to the learnings system:

1. Input validation in gstack-learnings-log: type must be from allowed list,
   key must be alphanumeric, confidence must be 1-10 integer, source must
   be from allowed list. Prevents injection via malformed fields.

2. Prompt injection defense: insight field checked against 10 instruction-like
   patterns (ignore previous, system:, override, etc.). Rejected with clear
   error message.

3. Cross-project trust gate in gstack-learnings-search: AI-generated learnings
   from other projects are filtered out. Only user-stated learnings cross
   project boundaries. Prevents silent prompt injection across codebases.

Also adds trusted field (true for user-stated source, false for AI-generated)
to enable the trust gate at read time.

Closes #841

Co-Authored-By: Ziad Al Sharif <Ziadstr@users.noreply.github.com>

* feat(security): track cookie-imported domains and scope cookie imports

Foundation for origin-pinned JS execution (#616). Tracks which domains
cookies were imported from so the JS/eval commands can verify execution
stays within imported origins.

Changes:
- BrowserManager: new cookieImportedDomains Set with track/get/has methods
- cookie-import: tracks imported cookie domains after addCookies
- cookie-import-browser: tracks domains on --domain direct import
- cookie-import-browser --all: new explicit opt-in for all-domain import
  (previously implicit behavior, now requires deliberate flag)

Closes #615

Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com>

* feat(security): pin JS/eval execution to cookie-imported origins

When cookies have been imported for specific domains, block JS execution
on pages whose origin doesn't match. Prevents the attack chain:
1. Agent imports cookies for github.com
2. Prompt injection navigates to attacker.com
3. Agent runs js document.cookie → exfiltrates github cookies

assertJsOriginAllowed() checks the current page hostname against imported
cookie domains with subdomain matching (.github.com allows api.github.com).
When no cookies are imported, all origins allowed (nothing to protect).
about:blank and data: URIs are allowed (no cookies at risk).

Depends on #615 (cookie domain tracking).

Closes #616

Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com>

* feat(security): add persistent command audit log

Append-only JSONL audit trail for all browse server commands. Unlike
in-memory ring buffers, the audit log persists across restarts and is
never truncated. Each entry records: timestamp, command, args (truncated
to 200 chars), page origin, duration, status, error (truncated to 300
chars), hasCookies flag, connection mode.

All writes are best-effort — audit failures never block command execution.
Log stored at ~/.gstack/.browse/browse-audit.jsonl.

Closes #617

Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com>

* fix(security): block hex-encoded IPv4-mapped IPv6 metadata bypass

URL constructor normalizes ::ffff:169.254.169.254 to ::ffff:a9fe:a9fe
(hex form), which was not in the blocklist. Similarly, ::169.254.169.254
normalizes to ::a9fe:a9fe.

Add both hex-encoded forms to BLOCKED_METADATA_HOSTS so they're caught
by the direct hostname check in validateNavigationUrl.

Closes #739

Co-Authored-By: Osman Mehmood <mehmoodosman@users.noreply.github.com>

* chore: bump version and changelog (v0.16.4.0)

Security wave 3: 12 fixes, 7 contributors.
Cookie origin pinning, command audit log, domain tracking.
Symlink bypass, path validation, shell injection, form redaction,
learnings injection, IPv6 SSRF, session permissions, frozen lockfile.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Yunsu <Hybirdss@users.noreply.github.com>
Co-authored-by: Gus <garagon@users.noreply.github.com>
Co-authored-by: Toby Morning <urbantech@users.noreply.github.com>
Co-authored-by: Alberto Martinez <halbert04@users.noreply.github.com>
Co-authored-by: Maksim Soltan <Gonzih@users.noreply.github.com>
Co-authored-by: Ziad Al Sharif <Ziadstr@users.noreply.github.com>
Co-authored-by: Osman Mehmood <mehmoodosman@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 07:49:37 -10:00
Garry Tan 115d81d792 fix: security wave 1 — 14 fixes for audit #783 (v0.15.7.0) (#810)
* fix: DNS rebinding protection checks AAAA (IPv6) records too

Cherry-pick PR #744 by @Gonzih. Closes the IPv6-only DNS rebinding gap
by checking both A and AAAA records independently.

Co-Authored-By: Gonzih <gonzih@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: validateOutputPath symlink bypass — resolve real path before safe-dir check

Cherry-pick PR #745 by @Gonzih. Adds a second pass using fs.realpathSync()
to resolve symlinks after lexical path validation.

Co-Authored-By: Gonzih <gonzih@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: validate saved URLs before navigation in restoreState

Cherry-pick PR #751 by @Gonzih. Prevents navigation to cloud metadata
endpoints or file:// URIs embedded in user-writable state files.

Co-Authored-By: Gonzih <gonzih@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: telemetry-ingest uses anon key instead of service role key

Cherry-pick PR #750 by @Gonzih. The service role key bypasses RLS and
grants unrestricted database access — anon key + RLS is the right model
for a public telemetry endpoint.

Co-Authored-By: Gonzih <gonzih@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: killAgent() actually kills the sidebar claude subprocess

Cherry-pick PR #743 by @Gonzih. Implements cross-process kill signaling
via kill-file + polling pattern, tracks active processes per-tab.

Co-Authored-By: Gonzih <gonzih@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(design): bind server to localhost and validate reload paths

Cherry-pick PR #803 by @garagon. Adds hostname: '127.0.0.1' to Bun.serve()
and validates /api/reload paths are within cwd() or tmpdir(). Closes C1+C2
from security audit #783.

Co-Authored-By: garagon <garagon@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add auth gate to /inspector/events SSE endpoint (C3)

The /inspector/events endpoint had no authentication, unlike /activity/stream
which validates tokens. Now requires the same Bearer header or ?token= query
param check. Closes C3 from security audit #783.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: sanitize design feedback with trust boundary markers (C4+H5)

Wrap user feedback in <user-feedback> XML markers with tag escaping to
prevent prompt injection via malicious feedback text. Cap accumulated
feedback to last 5 iterations to limit incremental poisoning.
Closes C4 and H5 from security audit #783.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: harden file/directory permissions to owner-only (C5+H9+M9+M10)

Add mode 0o700 to all mkdirSync calls for state/session directories.
Add mode 0o600 to all writeFileSync calls for session.json, chat.jsonl,
and log files. Add umask 077 to setup script. Prevents auth tokens, chat
history, and browser logs from being world-readable on multi-user systems.
Closes C5, H9, M9, M10 from security audit #783.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: TOCTOU race in setup symlink creation (C6)

Remove the existence check before mkdir -p (it's idempotent) and validate
the target isn't already a symlink before creating the link. Prevents a
local attacker from racing between the check and mkdir to redirect
SKILL.md writes. Closes C6 from security audit #783.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove CORS wildcard, restrict to localhost (H1)

Replace Access-Control-Allow-Origin: * with http://127.0.0.1 on sidebar
tab/chat endpoints. The Chrome extension uses manifest host_permissions
to bypass CORS entirely, so this only blocks malicious websites from
making cross-origin requests. Closes H1 from security audit #783.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: make cookie picker auth mandatory (H2)

Remove the conditional if(authToken) guard that skipped auth when
authToken was undefined. Now all cookie picker data/action routes
reject unauthenticated requests. Closes H2 from security audit #783.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: gate /health token on chrome-extension Origin header

Only return the auth token in /health response when the request Origin
starts with chrome-extension://. The Chrome extension always sends this
origin via manifest host_permissions. Regular HTTP requests (including
tunneled ones from ngrok/SSH) won't get the token. The extension also
has a fallback path through background.js that reads the token from the
state file directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: update server-auth test for chrome-extension Origin gating

The test previously checked for 'localhost-only' comment. Now checks for
'chrome-extension://' since the token is gated on Origin header.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.15.7.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Gonzih <gonzih@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: garagon <garagon@users.noreply.github.com>
2026-04-04 22:12:04 -07:00
Garry Tan 41141007c1 feat: TODOS-aware skills, 2-tier Greptile replies, gitignore fix (#61)
* fix: log non-ENOENT errors in ensureStateDir() instead of silently swallowing

Replace bare catch {} with ENOENT-only silence. Non-ENOENT errors (EACCES,
ENOSPC) are now logged to .gstack/browse-server.log. Includes test for
permission-denied scenario with chmod 444.

* feat: merge TODO.md + TODOS.md into unified backlog with shared format reference

Merge TODO.md (roadmap) and TODOS.md (near-term) into one file organized by
skill/component with P0-P4 priority ordering and Completed section. Add shared
review/TODOS-format.md for canonical format. Add static validation tests.

* feat: add 2-tier Greptile reply system with escalation detection

Add reply templates (Tier 1 friendly, Tier 2 firm), explicit escalation
detection algorithm, and severity re-ranking guidance to greptile-triage.md.

* feat: cross-skill TODOS awareness + Greptile template refs in all skills

/ship Step 5.5: auto-detect completed TODOs, offer reorganization.
/review Step 5.5: cross-reference PR against open TODOs.
/plan-ceo-review, /plan-eng-review: TODOS context in planning.
/retro: Backlog Health metric. /qa: bug TODO context in diff-aware mode.
All Greptile-aware skills now reference reply templates and escalation detection.

* chore: bump version and changelog (v0.3.8)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update CONTRIBUTING.md for v0.3.8 changes

Clarify test tier cost table (Tier 3 standalone vs combined), add TODOS.md
to "Things to know", mention Greptile triage in ship workflow description.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-14 20:15:11 -07:00
Garry Tan ff5cbbbfef feat: add remote slug helper and auto-gitignore for .gstack/
- getRemoteSlug() in config.ts: parses git remote origin → owner-repo format
- browse/bin/remote-slug: shell helper for SKILL.md use (BSD sed compatible)
- ensureStateDir() now appends .gstack/ to project .gitignore if not present
- setup creates ~/.gstack/projects/ global state directory
- 7 new tests: 4 gitignore behavior + 3 remote slug parsing
2026-03-14 00:10:00 -05:00
Garry Tan 07b4e15b34 feat: v0.3.2 — project-local state, diff-aware QA, Greptile integration (#36)
* fix: cookie import picker returns JSON instead of HTML

jsonResponse() was defined at module scope but referenced `url` which
only existed as a parameter of handleCookiePickerRoute(). Every API call
crashed, the catch block also crashed, and Bun returned a default HTML
page that the frontend couldn't parse as JSON.

Thread port via corsOrigin() helper and options objects. Add route-level
tests to prevent this class of bug from shipping again.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add help command to browse server

Agents that don't have SKILL.md loaded (or misread flags) had no way to
self-discover the CLI. The help command returns a formatted reference of
all commands and snapshot flags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: version-aware find-browse with META signal protocol

Agents in other workspaces found stale browse binaries that were missing
newer flags. find-browse now compares the local binary's git SHA against
origin/main via git ls-remote (4hr cache), and emits META:UPDATE_AVAILABLE
when behind. SKILL.md setup checks parse META signals and prompt the user
to update.

- New compiled binary: browse/dist/find-browse (TypeScript, testable)
- Bash shim at browse/bin/find-browse delegates to compiled binary
- .version file written at build time with git commit SHA
- Build script compiles both browse and find-browse binaries
- Graceful degradation: offline, missing .version, corrupt cache all skip check

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: clean up .bun-build temp files after compile

bun build --compile leaves ~58MB temp files in the working directory.
Add rm -f .*.bun-build to the build script to clean up after each build.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: make help command reachable by removing it from META_COMMANDS

help was in META_COMMANDS, so it dispatched to handleMetaCommand() which
threw "Unknown meta command: help". Removing it from the set lets the
dedicated else-if handler in handleCommand() execute correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: bump version and changelog (v0.3.2)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add shared Greptile comment triage reference doc

Shared reference for fetching, filtering, and classifying Greptile
review comments on GitHub PRs. Used by both /review and /ship skills.
Includes parallel API fetching, suppressions check, classification
logic, reply APIs, and history file writes.

* feat: make /review and /ship Greptile-aware

/review: Step 2.5 fetches and classifies Greptile comments, Step 5
resolves them with AskUserQuestion for valid issues and false positives.

/ship: Step 3.75 triages Greptile comments between pre-landing review
and version bump. Adds Greptile Review section to PR body in Step 8.
Re-runs tests if any Greptile fixes are applied.

* feat: add Greptile batting average to /retro

Reads ~/.gstack/greptile-history.md, computes signal ratio
(valid catches vs false positives), includes in metrics table,
JSON snapshot, and Code Quality Signals narrative.

* docs: add Greptile integration section to README

Personal endorsement, two-layer review narrative, full UX walkthrough
transcript, skills table updates. Add Greptile training feedback loop
to TODO.md future ideas.

* feat: add local dev mode for testing skills from within the repo

bin/dev-setup creates .claude/skills/gstack symlink to the working tree
so Claude Code discovers skills locally. bin/dev-teardown cleans up.
DEVELOPING_GSTACK.md documents the workflow.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: narrow gitignore to .claude/skills/ instead of all .claude/

Avoids ignoring legitimate Claude Code config like settings.json or CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: rename DEVELOPING_GSTACK.md to CONTRIBUTING.md

Rewritten as a contributor-friendly guide instead of a dry plan doc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: explain why dev-setup is needed in CONTRIBUTING.md quick start

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add browser interaction guidance to CLAUDE.md

Prevents Claude from using mcp__claude-in-chrome__* tools instead of /browse.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add shared config module for project-local browse state

Centralizes path resolution (git root detection, state dir, log paths) into
config.ts. Both cli.ts and server.ts import from it, eliminating duplicated
PORT_OFFSET/BROWSE_PORT/STATE_FILE logic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: rewrite port selection to use random ports

Replace CONDUCTOR_PORT magic offset and 9400-9409 scan with random port
10000-60000. Atomic state file writes, log paths from config module,
binaryVersion field for auto-restart on update.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: move browse state from /tmp to project-local .gstack/

CLI now uses config module for state paths, passes BROWSE_STATE_FILE to
spawned server. Adds version mismatch auto-restart, legacy /tmp cleanup
with PID verification, and removes stale global install fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update crash log path reference to .gstack/

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add config tests and update CLI lifecycle test

14 new tests for config resolution, ensureStateDir, readVersionHash,
resolveServerScript, and version mismatch detection. Remove obsolete
CONDUCTOR_PORT/BROWSE_PORT filtering from commands.test.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update BROWSER.md and TODO.md for project-local state

Replace /tmp paths with .gstack/, remove CONDUCTOR_PORT docs, document
random port selection and per-project isolation. Add server bundling TODO.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: update README, CHANGELOG, and CONTRIBUTING for v0.3.2

- README: replace Conductor-aware language with project-local isolation,
  add Greptile setup note
- CHANGELOG: comprehensive v0.3.2 entry with all state management changes
- CONTRIBUTING: add instructions for testing branches in other repos

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add diff-aware mode to /qa — auto-tests affected pages from branch diff

When on a feature branch, /qa now reads git diff main, identifies affected
pages/routes from changed files, and tests them automatically. No URL required.
The most natural flow: write code, /ship, /qa.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: update CHANGELOG for complete v0.3.2 coverage

Add missing entries: diff-aware QA mode, Greptile integration,
local dev mode, crash log path fix, README/SKILL.md updates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 18:10:56 -07:00