Commit Graph

4 Commits

Author SHA1 Message Date
Garry Tan 66f3a180d3 v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits (#1642)
* fix(gbrain-sync): --full produces an empty code index on first run of a new repo

`gbrain reindex-code` only RE-EMBEDS pages that already exist; it never walks
the filesystem. On a freshly-registered source (0 pages), a --full run that
called reindex-code alone found nothing ("No code pages to reindex"), finished
in ~1s, and left the code index permanently empty while still reporting OK.

Fix: --full now runs `sync --strategy code` FIRST to create pages via the file
walk, then runs `reindex-code` to honor the documented "full walk + reindex"
contract for both fresh and populated sources.

Contributed by @jetsetterfl via #1584.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gbrain-local-status): classifier falsely reports broken-db inside repos with their own DATABASE_URL

The freshClassify probe ran `gbrain sources list --json` with the inherited
process env. When the probe ran from inside a repo with its own .env (an app
DATABASE_URL on a different port), Bun autoloaded the project's .env, gbrain
connected to the wrong database, and the classifier reported broken-db on
otherwise-healthy brains.

Fix: route the probe env through `buildGbrainEnv` from lib/gbrain-exec, the
same helper the sync orchestrator uses. DATABASE_URL is seeded from
~/.gbrain/config.json so the result is cwd-independent. The 60s cache can no
longer propagate a poisoned negative to clean directories.

Contributed by @jetsetterfl via #1583.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(retro): stale-base + bad-today-anchor pre-flight guard (#1624)

/retro silently produced confidently-wrong output when "today" drifted (model
session-context error) or when origin/<default> was materially behind the
actual remote — git log --since returned zero or near-zero commits and the
narrative was fabricated from nothing.

Adds Step 0.5 with four ordered pre-check branches before any window analysis:

  A. No 'origin' remote → skip with "base freshness not verified" note
  B. Detached HEAD → skip with "base freshness not verified" note
  C. `git fetch origin <default>` fails (offline) → warn, proceed against
     last-known origin/<default>
  D. Fetch succeeded → compare today vs latest origin/<default> commit; if
     gap > window-days, BLOCK with explicit citation of latest-commit date.

Skip paths still proceed to Step 1, but the disclosure is carried into the
retro narrative ("offline run, window not freshness-verified") so the output
is never silently confidently-wrong.

Atomic .tmpl + gen:skill-docs regen commit (T-Codex-3 pattern).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(retro): regression for #1624 stale-base pre-flight guard

13 static-invariant tests pinning the four ordered pre-check branches in
retro/SKILL.md.tmpl:Step 0.5:

  A. no-remote skip            — must check origin presence + set verdict
  B. detached-HEAD skip        — must gate behind prior verdict (ordering)
  C. fetch-fail warn           — must match `if !` or `||` shape, gate by verdict
  D. stale-base BLOCK          — must read latest-commit ISO date, cite remediation

Plus a disclosure-survives-to-narrative invariant: skip-path verdicts must be
named in prose so the retro output carries the cited reason rather than
silently misreporting.

Failing build if Step 0.5 is removed, branches re-ordered (no-remote no longer
wins), or the BLOCK message stops citing today/latest-commit/remediation
path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gbrain-sync): configurable timeouts + resume from gbrain checkpoint (#1611)

The memory and code stages hardcoded a 35-min spawn timeout. On brains with
~2000+ staged files, /sync-gbrain --full reliably SIGTERM'd the child at
exactly 35 minutes with exit 143. gbrain left ~/.gbrain/import-checkpoint.json
pointing at the staging dir, but gstack-memory-ingest's SIGTERM handler
unconditionally cleaned the dir up — so the next run found a checkpoint
pointing at nothing and restaged from scratch, repeating the SIGTERM forever.

Three changes:

1. Configurable timeouts via env (bounds 60_000ms - 86_400_000ms, default
   2_100_000ms = 35min unchanged):
     GSTACK_SYNC_MEMORY_TIMEOUT_MS
     GSTACK_SYNC_CODE_TIMEOUT_MS
   Out-of-range or non-numeric values warn and fall back to the default.

2. SIGTERM in gstack-memory-ingest no longer always cleans up the staging
   dir. If gbrain has written ~/.gbrain/import-checkpoint.json pointing at
   the active staging dir, the dir is PRESERVED for next-run resume.
   Otherwise (no checkpoint pointing here, crash before gbrain ever
   touched it) it's cleaned up as before.

3. Next /sync-gbrain run detects gbrain's checkpoint via decideResume() in
   gstack-gbrain-sync.ts:
     - no checkpoint               → fresh ingest pass
     - checkpoint + staging ok     → set GSTACK_INGEST_RESUME_DIR; child
                                      reuses staging dir and skips
                                      writeStaged; gbrain import resumes
                                      from processedIndex+1
     - checkpoint + staging gone   → warn "previous checkpoint stale
                                      (staging dir gone), restaging from
                                      scratch" and proceed

Reuses gbrain's own checkpoint as the source of truth (D1 — no double-store
state). Detect-then-fallback semantics per C1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain-sync): regression for #1611 timeouts + resume

19 tests across three surfaces:

  - resolveStageTimeoutMs (10 tests): undefined/empty → default; non-numeric,
    zero, negative, below-floor, above-ceiling → warn + default; at-floor,
    at-ceiling, valid mid-range → accepted as-is.

  - decideResume (6 tests): no checkpoint, corrupt JSON, checkpoint + staging
    ok, checkpoint + staging missing, checkpoint with no dir, checkpoint with
    empty dir.

  - SIGTERM staging preservation (3 static invariants): memory-ingest signal
    handler must check stagingDirIsCheckpointed BEFORE cleanup; preserve
    branch must come before cleanup branch (ordering); orchestrator must
    pass GSTACK_INGEST_RESUME_DIR to the grandchild on resume.

Also threads process.env.HOME through readGbrainCheckpoint and
stagingDirIsCheckpointed so tests can redirect home. os.homedir() caches
at process start and ignores later mutation, so the env override is the
only reliable test injection point.

Failing build if the timeout bounds are removed, the resume detection
short-circuits incorrectly, or the SIGTERM handler regresses to
unconditional cleanup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): pre-emit verification gate kills Django-shape FP class (#1539)

External user filed 4/8 false positives on a /review run against a Django +
DRF + PostgreSQL repo (Sprint 2.5). Every FP class was the same shape:
"resolvable in <5 minutes by viewing the actual code or running a simple
grep" — fields that don't exist on the model, dict.get()-might-be-None on a
form that returns {}-initialized cleaned_data, standard ORM save behavior
called out as data loss.

Extends the Confidence Calibration resolver (consumed by review, cso,
plan-eng-review, ship) with a Pre-emit verification gate:

  Every finding MUST quote the specific code line that motivates it
  (file:line + verbatim text). If the reviewer cannot produce the quote,
  the finding is unverified — its confidence is forced to 4-5 so the
  existing "Suppress from main report" rule fires automatically. The
  finding still goes to the appendix for calibration audit, but the user
  does not see it in the critical-pass output.

Reuses the existing suppression mechanism — no new code path. The FP
classes the gate kills are enumerated in the resolver text so reviewers
see the named patterns.

Framework-meta nudge included for Django Meta, Rails associations,
SQLAlchemy relationships, TypeORM decorators, Sequelize init, Prisma
generated client — the reviewer must quote the meta-construct that
generates the symbol, not just grep for the literal name. Deeper
framework-aware ORM verification (model introspection, migration-history-
aware checks) is deliberately deferred to a future wave per T-Codex-2.

Atomic .tmpl-equivalent (resolver) edit + gen:skill-docs regen commit
per T-Codex-3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(review): regression for #1539 pre-emit verification gate

12 tests pinning the gate behavior:

  - Resolver emits the gate header + #1539 reference
  - Gate requires quoting file:line + verbatim text
  - Unverified findings forced to confidence 4-5 (auto-suppress via
    existing <7-rule, no new mechanism)
  - Framework-meta nudge names Django, Rails, SQLAlchemy, TypeORM,
    Sequelize, Prisma
  - Deferred design doc reference present (1539-framework-aware-review.md)
  - Four named FP classes from #1539 enumerated:
      * field doesn't exist on model
      * dict.get() might be None
      * save() might lose fields
      * update_fields might miss X
  - All four downstream SKILL.md consumers (review, cso, plan-eng-review,
    ship) carry the gate text after gen:skill-docs
  - Existing confidence 9-10 'Show normally' + 3-4 'Suppress' rows
    unchanged (regression on existing behavior)

Failing build if the gate is removed, the suppression mechanism is
re-invented separately, the framework-meta nudge drops a framework, or
gen:skill-docs stops propagating the gate to consumers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(config): expose explain_level default

* fix(benchmark): parse positional prompt after flags

* fix(artifacts): reject malformed remote paths

* fix(learnings): preserve current entries in cross-project search

* fix(setup): register root gstack slash alias

* fix(memory): probe gitleaks without shell builtin

* fix(gbrain-lib): pin LC_ALL=C in varname validator (macOS locale guard)

In many macOS shells the default locale (e.g. en_US.UTF-8) makes bash
glob brackets like `[A-Z]` match lowercase letters too, so the existing
`case "$name" in [A-Z_][A-Z0-9_]*)` branch lets names like `lower-case`
through validation. The function then trips `printf -v "$varname"` and
`export "$varname"` with `not a valid identifier` errors that surface
mid-prompt, which is exactly what the validator was supposed to prevent.

Pinning `LC_ALL=C` inside the function gives ASCII-only bracket semantics
on both macOS and Linux, matching the documented `[A-Z_][A-Z0-9_]*`
contract. Declared `local` so it doesn't leak to the calling shell —
`gstack-gbrain-lib.sh` is documented as a sourced helper, so a bare
assignment would mutate the caller's locale for the rest of the process
(silently affecting downstream `sort`, `tr`, locale-aware globs in the
same shell, etc.).

The existing regression test
`test/gbrain-lib-verify.test.ts:'rejects invalid var names'`
already covers the macOS repro shape (passes `lower-case` and expects
the validator to reject + emit `invalid var name`). On Linux CI the
test silently passed because `LC_ALL=C` is the typical default; on
macOS dev boxes it fails.

Verified:
- `bun test test/gbrain-lib-verify.test.ts`: 22 pass, 0 fail (on macOS).
- `_gstack_gbrain_validate_varname lower-case; echo $?` → 2.
- `_gstack_gbrain_validate_varname FOO_BAR; echo $?` → 0.
- Caller's LC_ALL preserved across calls (confirmed via sourced bash).

* fix(land-and-deploy): detect merged PR after gh failure

After `gh pr merge` exits non-zero, the PR may already be MERGED server-side
(concurrent merge landed, or local cleanup phase failed AFTER the merge
succeeded). Calling `gh pr merge` a second time then errors with a confusing
"already merged" — and worse, the deploy workflow never runs because we
stopped on the first failure.

Adds a Post-failure PR-state check (§4a-postfail) that runs after ANY
non-zero exit from `gh pr merge`:

  - state == MERGED  → record MERGE_PATH=direct, OFFER (don't force)
                       stale-worktree cleanup on the base branch with
                       uncommitted-work guard, proceed to §4a CI watch
  - state == OPEN    → check autoMergeRequest; if non-null treat as
                       merge-queue wait; if null surface both errors and STOP
  - state == CLOSED  → STOP

Hard invariant: never retry `gh pr merge` after a non-zero exit. Server
state is authoritative.

Re-authored from PR #1620 into land-and-deploy/SKILL.md.tmpl (the source of
truth) instead of the generated SKILL.md, so the next gen:skill-docs run
preserves the change. Original diff by @davidfoy via #1620.

Related: cli/cli#3442, cli/cli#13380.

Contributed by @davidfoy via #1620.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: detect PgBouncer transaction-mode pooler and set GBRAIN_PREPARE=true (#1435)

When gbrain connects through a PgBouncer transaction-mode pooler (port
6543), it auto-disables prepared statements. This breaks `gbrain search`
silently — the /sync-gbrain capability check fails and the GBrain Search
Guidance block never gets written to CLAUDE.md.

Three-layer fix:

1. **lib/gbrain-exec.ts** — `buildGbrainEnv()` now detects port 6543 in
   the effective DATABASE_URL and sets `GBRAIN_PREPARE=true` in the env
   passed to every gbrain spawn. This is the single chokepoint — all
   gstack gbrain invocations inherit the fix. Caller can opt out with
   `GBRAIN_PREPARE=false`.

2. **sync-gbrain/SKILL.md{,.tmpl}** — capability check now exports
   `GBRAIN_PREPARE=true` explicitly and retries search up to 3x with 1s
   delay for async index propagation under connection pooling.

3. **bin/gstack-gbrain-detect** — surfaces `gbrain_pooler_mode` field
   ("transaction" | "session" | null) in the preamble probe JSON so
   /setup-gbrain and /sync-gbrain can advise users about pooler state.

Closes #1435

Built with [ClosedLoop.AI](https://closedloop.ai) | [GitHub](https://github.com/closedloop-ai/claude-plugins)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(supabase-provision): rewrite transaction/6543 -> session/5432 for new projects

- Single-object pooler API responses default to transaction-mode at 6543,
  but the shared pooler tenant on new projects only listens on session/5432
- Add a `pool_mode == transaction && db_port == 6543` rewrite + stderr note
- Escape hatch via `GSTACK_SUPABASE_TRUST_API_PORT=1` for forward-compat
- 5 new tests covering rewrite, no-op shapes, env opt-out, array path

Fixes #1301.

* fix(browse): GSTACK_CHROMIUM_NO_SANDBOX opt-out for Ubuntu/AppArmor (#1562)

Ubuntu/AppArmor configurations often block unprivileged Chromium sandboxing
for headless agent sessions even for normal users — /qa hangs without
--no-sandbox. The kernel policy denies the unprivileged user namespaces
Chromium needs.

Adds GSTACK_CHROMIUM_NO_SANDBOX=1 as an explicit user override that forces
the sandbox off without changing the default for everyone else. Re-authored
from PR #1562 onto v1.42.2.0's shouldEnableChromiumSandbox() helper —
purely additive, preserves the headed-launch sandbox-on-by-default behavior
that v1.42.2.0 shipped to kill the --no-sandbox yellow infobar.

Three new regression tests cover:
  - linux + override=1 → false (the named use case)
  - darwin + override=1 → false (env wins on any platform)
  - override=0 → does NOT trigger (must be exactly "1")

Original diff by @techcenter68 via #1562.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(browse): mirror isCustomChromium() guard in headless launch()

When BROWSE_EXTENSIONS_DIR is set alongside GSTACK_CHROMIUM_PATH pointing
at a baked-extension build (GBrowser / GStack Browser), the headless launch()
path was unconditionally adding --disable-extensions-except / --load-extension.
This causes the same ServiceWorkerState::SetWorkerId DCHECK crash that
launchHeaded() already guards against via isCustomChromium().

Mirror the existing guard: skip --load-extension flags when isCustomChromium()
returns true; always push the off-screen window geometry args.

* fix(browse): daemonize macOS/Linux server via setsid()

`Bun.spawn().unref()` only releases the child from Bun's event loop —
it does NOT call setsid(). The spawned bun server inherits the spawning
shell's process session. When the CLI runs inside a session-managed shell
that exits shortly after the CLI returns (Claude Code's per-command Bash
sandbox, Conductor, OpenClaw, CI step runners), the session leader's exit
sends SIGHUP to every PID in the session — killing the bun server and
its Chromium grandchildren within seconds of a successful `connect`.

Setting `BROWSE_PARENT_PID=0` (already done by the `connect` command and
pair-agent) disables the parent-process watchdog but does NOT save the
server here: SIGHUP from session teardown still reaps it.

Replace the macOS/Linux `Bun.spawn().unref()` with Node's
`child_process.spawn({ detached: true })`, which calls setsid() and
gives the server its own session leader role (PPID=1, STAT=Ss). This
mirrors the Windows path's rationale (PR #191 by @fqueiro) — same root
cause, different OS surface.

Verified on macOS in Conductor: pre-fix the server dies ~10–15s after
connect across separate Bash invocations; post-fix the same PID stays
alive (PPID=1, SESS=0, STAT=Ss) and responds to `status`/`goto`/
`snapshot` across many separate shell calls.

The `proc?.stderr` startup-error branch is removed since both platforms
now spawn with `stdio: 'ignore'`; both fall through to the on-disk
`browse-startup-error.log` written by `server.ts`'s start().catch.

* fix(design): bump image-gen timeout to 240s + pin gpt-image-2

The design binary calls /v1/responses (gpt-4o + image_generation tool,
quality:high, 1536x1024) but aborted the request after a hardcoded 120s.
That class of request consistently takes ~140-160s end-to-end, so every
generate/variants/evolve/iterate call aborted before the image returned.

In /design-shotgun this cascades: Step 3c launches N parallel agents,
each calling `$D generate`, each aborts at 120s and retries, all fail,
the comparison board never opens — the skill appears to hang indefinitely.

Reproduced the exact API call with a longer budget: HTTP 200, valid
image, 143.5s. A real /design-shotgun run after the patch generated 3
variants in parallel at 150.0s / 161.0s / 152.1s, all exit 0 — note the
161s case, which a naive 150s bump would still have failed.

- Bump AbortController timeout 120_000 -> 240_000 in generate.ts,
  variants.ts, evolve.ts, iterate.ts (both call sites)
- Pin the image_generation tool to model "gpt-image-2"

design/test/variants-retry-after.test.ts: 5 pass, 0 fail. The
feedback-roundtrip.test.ts failures are a pre-existing browse-module
breakage (session.clearLoadedHtml undefined), unrelated to this change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: fill coverage gaps for PRs #1606, #1612, #1620

Three cherry-picked PRs in this wave landed without unit-test coverage for
the specific invariant they protect:

  #1606 (@andrey-esipov) — LC_ALL=C pin in _gstack_gbrain_validate_varname
    8 tests by sourcing bin/gstack-gbrain-lib.sh and calling the validator
    directly. Asserts uppercase/digit/underscore accepted, lowercase
    REJECTED (the macOS-locale regression case), mixed-case rejected,
    LC_ALL=C scoping is local (doesn't leak to caller).

  #1612 (@bharat2913) — setsid daemonize via Node child_process.spawn
    4 static-invariant tests on browse/src/cli.ts. The actual setsid
    syscall is hard to assert without a real spawn, so we pin the source
    shape: nodeSpawn imported from child_process; non-Windows branch uses
    nodeSpawn(...) with detached:true and .unref(); comment documents
    setsid/SIGHUP root cause; Bun.spawn() is NOT used on macOS/Linux.

  #1620 (@davidfoy, re-authored into .tmpl per A3) — §4a-postfail
    12 static invariants on land-and-deploy/SKILL.md.tmpl + generated
    SKILL.md. Pins all three state branches (MERGED/OPEN/CLOSED), the
    authoritative state query, the merge-SHA capture, non-destructive
    worktree cleanup with uncommitted-work guard, autoMergeRequest probe
    on OPEN, hard "never retry gh pr merge" rule, and atomic regen
    propagation.

Failing build if any of the three invariants regresses.

Note: gbrain-lib-validate-varname.test.ts also surfaces a pre-existing
glob-pattern overpermissiveness (hyphens + dots accepted) — not in
#1606's scope; documented inline as a separate cleanup target.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(learnings): align injection-prevention tests with PR #1619 tagged-line shape

PR #1619 (preserve current entries in cross-project search) refactored
gstack-learnings-search to tag rows inline (`current\t<json>` vs
`cross\t<json>`) instead of filtering inside the bun block via
process.env.GSTACK_SEARCH_SLUG. The bun block no longer reads SLUG or
CROSS env vars — it parses the per-line tag and sets a per-entry
_crossProject flag.

The pre-existing test/learnings-injection.test.ts still asserted on the
old SLUG + CROSS env var shape. Updates:

  - Remove the SLUG env var assertion (no longer set on bash command line)
  - Remove the bun-block CROSS env var assertion (block reads the tag now,
    not the env)
  - Add a new positive assertion that the bun block parses the tag
    (sourceTag | tabIndex | crossProject)
  - Keep the shell-interpolation safety assertion unchanged — that's
    independent of the SLUG refactor

The CROSS env var is still SET on the bash command line (it controls
whether the cross-project find runs at all), but the bun child no longer
reads it. The existing "env vars set on bash command line" test continues
to pin that.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(fixtures): regenerate ship-SKILL.md golden baselines

ship/SKILL.md consumes the Confidence Calibration resolver via the
preamble pipeline. This wave's #1539 pre-emit verification gate extends
the resolver text, which propagated to ship/SKILL.md via gen:skill-docs.
The golden fixtures in test/fixtures/golden/ matched the pre-#1539 shape
and failed the host-config regression check.

Refreshes claude-ship-SKILL.md, codex-ship-SKILL.md, and factory-ship-SKILL.md
to match the current generated output. Matches the Daegu wave's bisect
commit 23 ("test(fixtures): regenerate ship-SKILL.md golden baselines").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain-detect): include gbrain_pooler_mode in schema regression (PR #1591)

PR #1591 (PgBouncer transaction-mode detection, @mikeangstadt) added
gbrain_pooler_mode to the gstack-gbrain-detect JSON output but did not
update the schema regression check in
test/gstack-gbrain-detect-mcp-mode.test.ts. Adding the key in alphabetical
order matching the rest of the schema array. Downstream sync-gbrain ignores
unknown keys, so this is forward-compat.

Without this, the test fails with a diff:
  + "gbrain_pooler_mode"
because keys is the actual set returned and the expected array was
pre-#1591.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): v1.43.0.0 — post-Daegu paper-cut wave

Bumps VERSION 1.42.2.0 → 1.43.0.0 (MINOR per scale-aware bump rules: new
env-var surface GSTACK_SYNC_*_TIMEOUT_MS + GSTACK_CHROMIUM_NO_SANDBOX,
behavior expansion in browse/src/browser-manager.ts headless launch,
three skill-template prompt changes affecting /retro, /review,
/sync-gbrain).

CHANGELOG entry leads with what stopped happening: /retro stops
fabricating retros against stale bases, /sync-gbrain stops SIGTERM-looping
35-min restarts on big brains, /review stops shipping framework FPs the
reviewer never grep'd.

18 fixes total — 15 community PRs + 3 self-filed silent-failure issues
(#1624, #1611, #1539) — in one bundled PR with 26 bisect commits and 7
new regression test files. Every wave-touched test file passes in
isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): bump v1.43.0.0 → v1.43.2.0 for queue collision

CI check-version-stale flagged v1.43.0.0 already claimed by PR #1574
(garrytan/colombo-v3). PR #1639 (garrytan/muscat-v3) claims v1.43.1.0.
Next available MINOR slot is v1.43.2.0.

Bump VERSION + package.json + CHANGELOG entry header. No behavior
changes — purely re-versioning to clear the queue collision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com>
Co-authored-by: Andrey Esipov <andrey.esipov@outlook.com>
Co-authored-by: David Foy <davidfoy@users.noreply.github.com>
Co-authored-by: mikeangstadt <mike.angstadt@closedloop.ai>
Co-authored-by: 0xDevNinja <manmit0x@gmail.com>
Co-authored-by: techcenter68 <techcenter68@users.noreply.github.com>
Co-authored-by: shohu <shohu33@gmail.com>
Co-authored-by: Bharat <bharat@theysaid.io>
Co-authored-by: Matteo Hertel <info@matteohertel.com>
2026-05-21 21:21:07 -07:00
Garry Tan 7ca04d8ef0 v1.42.0.0 Daegu wave: 23 community-filed bugs + PTY classifier enforcement (24 bisect commits) (#1594)
* fix(gstack-paths): guard CLAUDE_PLUGIN_DATA against cross-plugin contamination (#1569)

gstack-paths previously trusted CLAUDE_PLUGIN_DATA as a fallback for
GSTACK_STATE_ROOT whenever GSTACK_HOME was unset. When another plugin
(e.g. Codex) persists its own CLAUDE_PLUGIN_DATA into the session env
via CLAUDE_ENV_FILE, gstack picked it up and wrote checkpoints,
analytics, and learnings into that plugin's directory. Anyone with the
Codex plugin installed alongside gstack hit this silently.

Fix: guard the CLAUDE_PLUGIN_DATA branch so it only fires when
CLAUDE_PLUGIN_ROOT confirms we're running as the gstack plugin (path
contains "gstack"). Skill installs fall through to \$HOME/.gstack.

Contributed by @ElliotDrel via #1570. Closes #1569.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gbrain-sync): sourceLocalPath handles wrapped {sources:[...]} shape from gbrain v0.20+

gbrain v0.20+ changed `gbrain sources list --json` to return
{sources: [...]} instead of a flat array. sourceLocalPath crashed
upstream with `list.find is not a function` on every /sync-gbrain
invocation against modern gbrain. Accept both shapes for
forward/backward compat, matching probeSource/sourcePageCount in
lib/gbrain-sources.ts.

Contributed by @jakehann11 via #1571. Closes #1567. Supersedes #1564
(@tonyjzhou, same fix, different shape — credit retained).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(brain-context-load): probe gbrain via execFile, not shell builtin (#1559)

gbrainAvailable() used `execFileSync("command", ["-v", "gbrain"])`,
which fails in any environment where the `command` builtin isn't on
the spawned process's PATH (most non-interactive shells). The probe
then reported gbrain as missing even when it was installed, and
context-load silently skipped vector/list queries.

Fix: probe `gbrain --version` directly with a 500ms timeout (matching
the rest of the file's MCP_TIMEOUT_MS). Same semantics, works
everywhere execFile works.

Contributed by @jbetala7 via #1560. Closes #1559.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gbrain-doctor): pin schema_version:2 doctor parse path (#1418)

Adds an exec-path regression test that runs a fake gbrain shim emitting
the v0.25+ doctor JSON shape (schema_version: 2, status: "warnings",
exit 1 for health_score < 100, no top-level `engine` field). Confirms
freshDetectEngineTier recovers stdout from the non-zero exit and falls
back to GBRAIN_HOME/config.json for the engine label.

The pre-existing test for #1415 only stripped gbrain from PATH; this
test exercises the actual doctor parse path, closing the gap that
codex's plan review flagged.

Also documents the schema_version separation in
lib/gbrain-local-status.ts: the local CacheEntry stays at version 1,
distinct from the doctor-output schema_version which we accept across
versions in gstack-memory-helpers.

Closes #1418 (credit @mvanhorn for surfacing the doctor + schema_v2
collapse). The fix landed pre-emptively in v1.29.x; this commit pins
it with a stronger test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(memory-ingest): pin put_page regression + scrub stale name from --help and comments (#1346)

#1346 reported that gstack-memory-ingest still called the renamed
gbrain put_page subcommand on gbrain v0.18+. The actual code migrated
to `gbrain put` and later to batch `gbrain import <dir>` before this
report landed — only documentation lag remained.

This commit:
- Updates the --help string ("Skip gbrain put calls (still updates
  state file)") so user-facing docs match the shipped subcommand
- Updates two inline comments that still referenced the old name
- Adds test/memory-ingest-no-put_page.test.ts: a regression pin that
  strips comments from bin/gstack-memory-ingest.ts and fails the build
  if "put_page" appears in any active code or string literal, plus a
  sanity check that the file still calls a supported gbrain page-write
  verb (put or import)

Closes #1346. Reporter @kylma-code surfaced the doc lag; the original
code migration credit is on the v1.27.x wave.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(resolvers): rewrite all gbrain put_page instructions to canonical put <slug>

scripts/resolvers/gbrain.ts emitted user-facing copy-paste instructions
using the renamed `gbrain put_page` subcommand across 10 skills
(office-hours, investigate, plan-ceo-review, retro, plan-eng-review,
ship, cso, design-consultation, fallback, entity-stub). Every gstack
user copying those snippets hit "unknown command: put_page" on gbrain
v0.18+.

This commit:
- Rewrites all 10 instruction templates to use `gbrain put <slug>
  --content "$(cat <<EOF...EOF)"` with title/tags moved into YAML
  frontmatter inside --content, matching the v0.18+ subcommand shape
- Updates README.md and USING_GBRAIN_WITH_GSTACK.md "common commands"
  table to reference `gbrain put` and `gbrain get`
- Adds test/resolvers-gbrain-put-rewrite.test.ts pinning two
  invariants: (a) resolver source ships only canonical instructions,
  (b) every tracked SKILL.md file is free of `gbrain put_page`

CHANGELOG entries are deliberately left untouched (historical record).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(build): extract package.json build to scripts/build.sh for Windows Bun compat (#1538, #1537, #1530, #1457, #1561)

Bun's Windows shell parser rejects multiple constructs the inline
package.json build chain used: brace groups `{ cmd; }`, subshells with
redirection `( git ... ) > path/.version`, and (in Bun 1.3.x) subshells
near redirections in general. Every Windows install + every
auto-upgrade since v1.34.2.0 has failed on `bun run build`.

Extracts the build chain to scripts/build.sh and the .version writes to
scripts/write-version-files.sh. POSIX-portable, no Bun shell parsing
involved. Also adds Windows-specific bun.exe handling for non-ASCII
PATHs (a separate Windows footgun where Bun's --compile fails when the
binary lives under a path with non-ASCII chars).

Updates test/build-script-shell-compat.test.ts to assert the new shape:
no subshells with redirections anywhere in the build chain, and build
delegates to scripts/build.sh which delegates .version writes.

Contributed by @Charlie-El via #1544. Supersedes #1531 (@scarson, fixed
in build helper), #1480 (@mikepsinn, partial overlap), #1460
(@realcarsonterry, brace-group fix subsumed) — credit retained.
Closes #1538, #1537, #1530, #1457, #1561.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(windows): .exe glob in .gitignore + .exe extension resolution in find-browse (#1554)

bun build --compile on Windows appends .exe to the output filename,
producing browse.exe instead of browse. find-browse's existsSync probe
only checked the bare path and returned null on Windows even when the
binary was correctly built. .gitignore similarly only excluded the
bare bin/gstack-global-discover path, leaving the .exe variant
tracked.

This commit:
- .gitignore: changes `bin/gstack-global-discover` →
  `bin/gstack-global-discover*` so the Windows .exe variant is ignored
- browse/src/find-browse.ts: adds isExecutable + findExecutable helpers
  that fall back to .exe/.cmd/.bat probing on Windows, mirroring the
  same helper already in make-pdf/src/browseClient.ts and pdftotext.ts

Contributed by @Mike-E-Log via #1554.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(windows): add fresh-install E2E gate that runs bun run build on windows-latest

Adds .github/workflows/windows-setup-e2e.yml as the gate that catches
Bun shell-parser regressions in the build chain before they reach
users. Triggers on PRs touching package.json, scripts/build.sh,
scripts/write-version-files.sh, setup, browse cli/find-browse, or
gstack-paths.

What it verifies:
1. bun run build completes on Windows (the previously-broken path that
   #1538/#1537/#1530/#1457/#1561 reported)
2. All compiled binaries land on disk (browse.exe, find-browse.exe,
   design.exe, gstack-global-discover.exe)
3. find-browse resolves to the .exe variant on Windows (regression
   gate for #1554)
4. gstack-paths returns non-empty GSTACK_STATE_ROOT/PLAN_ROOT/TMP_ROOT
   on Windows (regression gate for #1570)

Complements the existing windows-free-tests.yml (curated unit subset);
this new workflow exercises the install path itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(codex): move diff scope into prompt instead of --base (Codex CLI 0.130+ argv conflict) (#1209)

Codex CLI ≥ 0.130.0 rejects passing a custom prompt and --base together
(mutually exclusive at argv level). Every /codex review, /review, and
/ship structured Codex review call ended with an argv error before the
model ran.

Fix: scope the diff in prompt text using
"Run git diff origin/<base>...HEAD 2>/dev/null || git diff <base>...HEAD"
instead of `--base <base>`. Preserves the filesystem boundary
instruction across all invocations and keeps Codex's review prompt
tuning.

Touches:
- codex/SKILL.md.tmpl + regenerated codex/SKILL.md
- scripts/resolvers/review.ts + regenerated review/SKILL.md, ship/SKILL.md
- test/gen-skill-docs.test.ts: new regression that fails if any of the
  five known files still contain the prompt+--base shape
- test/skill-validation.test.ts: corresponding negative + positive pin
  on the rendered SKILL.md files

Contributed by @jbetala7 via #1209. Closes #1479. Supersedes #1527
(@mvanhorn — same intent, different patch shape, CONFLICTING) and
#1449 (@Gujiassh — broader refactor, CONFLICTING). Credit retained
in CHANGELOG.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(review): diff from git merge-base, not git diff origin/<base> (#1492)

git diff origin/<base> shows everything since the common ancestor in
both directions — it includes commits that landed on origin/<base>
after this branch was created as deletions. That made /review and
/ship's pre-landing structured review report inflated diff totals and
flagged "removed" code that was actually still present in the working
tree.

Fix: compute DIFF_BASE via git merge-base origin/<base> HEAD and diff
the working tree against that point. Same coverage of uncommitted
edits, no phantom deletions from out-of-order base advancement.

Applies to /review's Step 1 (diff existence check), Step 3 (get the
diff), the build-on-intent scope-creep check, the structured review
DIFF_INS/DIFF_DEL stats, and the Claude adversarial subagent prompt.
Same change flows into ship/SKILL.md via the shared resolver.

Touches:
- review/SKILL.md.tmpl + regenerated review/SKILL.md, ship/SKILL.md
- scripts/resolvers/review.ts
- scripts/resolvers/review-army.ts

Contributed by @mvanhorn via #1492.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(codex): pin filesystem-boundary preservation across all codex review surfaces (#1503, #1522)

#1503 reported that the bare codex review --base path stripped the
filesystem boundary instruction, letting Codex spend tokens reading
.claude/skills/ and agents/. #1522 proposed adding a skill-path
detector that switched to the custom-instructions route when the diff
touched skill files.

After C10 (#1209) restructured codex review to always carry the
boundary in the prompt (the prompt+--base argv conflict forced the
restructure), the skill-path detector becomes redundant — every
default call already preserves the boundary.

This commit pins the post-#1209 invariant with a test that fails the
build if any future refactor strips the boundary from codex/SKILL.md,
review/SKILL.md, or ship/SKILL.md. Closes #1503 by regression test.

#1522 (@genisis0x) is superseded by #1209 (the prompt rewrite covers
its safety concern); credit retained in CHANGELOG.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(skills): use command -v instead of which for codex detection (#1197)

`which` is not on PATH in every shell — some Windows shells, BusyBox-
only containers, and minimal CI images all fail when skills probe
codex availability via `which codex`. `command -v` is a POSIX builtin
and always available where the skill is running.

Touched:
- codex/SKILL.md.tmpl: CODEX_BIN=$(command -v codex || echo "")
- scripts/resolvers/review.ts and scripts/resolvers/design.ts:
  3 + 3 sites each rewritten to `command -v codex >/dev/null 2>&1`
- Regenerated all 10 affected SKILL.md files (codex, review, ship,
  design-consultation, design-review, office-hours, plan-ceo-review,
  plan-design-review, plan-devex-review, plan-eng-review)
- test/skill-validation.test.ts: updated pin + defensive regression
  test that fails if `which codex` returns to codex/SKILL.md
- test/skill-e2e-plan.test.ts: updated summary regex

Contributed by @mvanhorn via #1197.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(codex): surface non-zero exits so wrappers stop reading as silent stalls (#1467, #1327)

When codex exits non-zero (parse errors, arg-shape breaks, model API
errors that propagate as non-zero status), the calling agent
previously saw an empty output and burned 30-60 minutes misdiagnosing
as a silent model/API stall. The hang-detection block only caught
exit 124 (the timeout-wrapper signal).

Adds elif blocks in all four codex invocation sites (Review default,
Challenge, Consult new-session, Consult resume) that:
- Echo "[codex exit N] <stderr first line>" to stdout
- Indent the first 20 stderr lines for inline context
- Log codex_nonzero_exit telemetry tagged with the call site

Contributed by @genisis0x via #1467. Closes #1327.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(design): disclose OpenAI key source + warn on cwd .env match (#1278, closes #1248)

The design binary previously called process.env.OPENAI_API_KEY without
checking where the key came from. If a user ran $D inside someone
else's project that had OPENAI_API_KEY in its .env, the resulting
generation billed that project's account. Silent and irreversible.

Fix: resolveApiKeyInfo() returns both the key and its source. When the
env-var path matches an OPENAI_API_KEY entry in the current
directory's .env, .env.<NODE_ENV>, or .env.local file, we set a
warning. requireApiKey() prints "Using OpenAI key from <source>" plus
the warning before the run — never the key itself.

Adds 6 unit tests covering: config-vs-env precedence, env-only (no
match), env+cwd .env match, quoted/exported values, value-mismatch
(no false positive), and the no-leak invariant for requireApiKey
stderr output.

Contributed by @jbetala7 via #1278. Closes #1248.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(browse): guard full-page screenshots against Anthropic vision API >2000px brick (#1214)

Full-page screenshots of tall pages routinely exceeded 2000px on the
longest dimension, silently bricking the agent's session: the
resulting base64 reached the Anthropic vision API which rejected the
oversized image, leaving the agent burning turns on a useless blob
with no stderr trace from the browse side.

Adds browse/src/screenshot-size-guard.ts as a shared helper:
- guardScreenshotBuffer(buf) → downscales in-memory if max(w,h) > 2000
- guardScreenshotPath(path) → file-mode variant that rewrites in place
- Aspect ratio preserved via sharp's resize fit:inside
- Stderr diagnostic on any downscale so callers can see when it fired
- Lazy sharp import so non-screenshot paths pay no startup cost

Wires the guard into all three full-page callsites codex review
flagged:
- browse/src/snapshot.ts: annotated + heatmap fullPage captures
- browse/src/meta-commands.ts: screenshot command (path + base64
  fullPage modes) plus the responsive 3-viewport sweep
- browse/src/write-commands.ts: prettyscreenshot fullPage path

Covers seven unit cases (pass-through, downscale, aspect ratio,
exactly-2000px edge, file-mode rewrite) plus a static invariant test
that fails the build if any of the three callsites stops importing the
guard.

Closes #1214.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): add Node sidecar entry for L4 prompt-injection classifier (#1370)

The L4 TestSavant classifier in browse/src/security-classifier.ts
can't be imported into the compiled browse server (onnxruntime-node
dlopen fails from Bun's compile extract dir per CLAUDE.md). The agent
that used to host it (sidebar-agent.ts) was removed when the PTY
proved out — leaving the classifier file shipped but with zero
callers. Exactly the gap codex flagged in #1370.

Adds browse/src/security-sidecar-entry.ts: a Node script that runs the
classifier as a subprocess of the browse server. It reads NDJSON
requests from stdin and writes id-correlated NDJSON responses to
stdout, supporting:
  - op: "scan-page-content" — full L4 classifier scan
  - op: "ping" — liveness probe for the client's health check
  - op: "status" — classifier readiness (used by /pty-inject-scan to
    surface l4 { available: bool } in its response)

Plus browse/src/find-security-sidecar.ts: a resolver that locates
node + the bundled JS entry (browse/dist/security-sidecar.js, built in
a follow-up package.json change) or falls back to the dev TS entry.
Returns null cleanly when node isn't on PATH so the calling endpoint
can degrade per D7 (extension WARN + user confirm).

C17 of the security-stack wave. C18 adds the IPC client + lifecycle
management; C19 wires the endpoint; C20 routes the extension through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): sidecar IPC client with lifecycle + circuit breaker (#1370)

Adds browse/src/security-sidecar-client.ts to manage the Node L4
classifier subprocess from the compiled browse server:

- Lazy spawn on first scan; reuses the same process across requests
- Id-correlated request/response via NDJSON over stdio
- 5s default per-scan timeout; 64KB payload cap (short-circuits before
  spawn so oversized requests don't waste a process)
- 3-in-10-minutes respawn cap → trips circuit breaker; subsequent
  scans throw immediately so the /pty-inject-scan endpoint can surface
  l4 { available: false } to the extension and degrade to WARN+confirm
- process.on('exit') sends SIGTERM to the child for clean teardown
- isSidecarAvailable() lets the endpoint probe before scan calls so
  the response shape reflects degraded mode honestly

Unit tests cover the payload cap, the availability probe, and the
breaker-doesn't-crash invariant under repeated rejected calls.

C18 of the security-stack wave. C19 adds POST /pty-inject-scan; C20
routes the extension through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(security): add POST /pty-inject-scan endpoint for pre-PTY-inject scans (#1370)

The sidebar's gstackInjectToTerminal callers (toolbar Cleanup,
Inspector "Send to Code") were piping page-derived text directly into
the live claude PTY with ZERO classifier processing — the gap codex
flagged in #1370. The documented sidebar security stack had a hole
the size of every Cleanup-button click.

Adds POST /pty-inject-scan to browse/src/server.ts:
- Local-only binding (NOT in TUNNEL_PATHS — tunnel attempts get the
  general 404 path; never reaches the scan logic)
- Root-token auth via existing validateAuth() — 401 on unauth
- 64KB request cap → 413 + payload-too-large body
- 5s scan timeout via sidecar client
- URL-blocklist forced to BLOCK in PTY context (page-derived REPL
  input is higher-risk than ordinary tool output)
- L4 ML classifier via the sidecar when available; degrades to WARN
  per D7 when sidecar is unavailable
- Response goes through JSON.stringify(..., sanitizeReplacer) per
  v1.38.0.0 Unicode-egress hardening
- Imports only from security-sidecar-client.ts, never directly from
  security-classifier.ts (which would brick the compiled Bun binary)

Seven static-invariant tests pin the POST verb, auth gate, 64KB cap,
tunnel-listener exclusion, sanitizeReplacer wrapping, l4 availability
shape, and the no-direct-classifier-import rule.

C19 of the security-stack wave. C20 routes the extension through it;
C21 adds the invariant AST check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(extension): route gstackInjectToTerminal through /pty-inject-scan (#1370)

Closes the documented-vs-shipped gap codex flagged in #1370. The
sidebar's two PTY-injection call sites (Inspector "Send to Code" and
toolbar Cleanup) now pre-scan via the new /pty-inject-scan endpoint
before writing to the live claude REPL.

Adds window.gstackScanForPTYInject(text, origin) to
extension/sidepanel-terminal.js:
- Async, returns { allow, verdict, reasons, l4 }
- POST to /pty-inject-scan with the existing root-token auth
- WARN+confirm on scan failure (network down, sidecar absent, etc.)
  rather than silent PASS — D7 honest-degradation

gstackInjectToTerminal stays synchronous, returns boolean. Per D6:
keeping the inject sync means existing `const ok = ...?.()` callers
don't break, and the invariant test in
test/extension-pty-inject-invariant.test.ts can statically pin that
every call goes through the scan first.

extension/sidepanel.js call sites updated:
- inspectorSendBtn click → await scan, BLOCK drops + WARN prompts via
  window.confirm, PASS injects silently
- runCleanup() → same flow. Static cleanup prompt always PASSes but
  still routes through scan to honor the invariant.

C20 of the security-stack wave. C21 adds the static invariant test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(security): invariant — extension PTY inject must be scan-gated (#1370)

Static-analysis invariant test that fails the build if any
extension/*.js path calls window.gstackInjectToTerminal without a
preceding window.gstackScanForPTYInject in the same enclosing
function. Closes the documented-vs-shipped gap codex demanded a
machine check on.

Rules:
- Rule 1: any file that calls inject must also reference scan
- Rule 2: in the enclosing function (function declaration, arrow,
  async (), event handler), a scan call must appear before the inject
  call by source position
- Exemption: sidepanel-terminal.js (the file that DEFINES the inject
  function) is exempt from Rule 2 since the definition is not a call

Plus two structural checks:
- sidepanel-terminal.js defines both the inject and scan functions
- inject stays SYNCHRONOUS (no `async` modifier) per D6 — async would
  silently break the `const ok = ...?.()` pattern at every caller

C21 of the security-stack wave. The sidecar architecture (#1370) is
complete: server-side L1-L3 + L4-via-sidecar (C17+C18+C19), extension
pre-scan wiring (C20), and now the regression gate (C21).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(browse): opt-in extended stealth mode with 6 detection-vector patches (#1112)

Rebases @garrytan's PR #1112 (Apr 2026, abandoned) onto the current
browse/src/stealth.ts contract. The existing minimal "codex narrowed"
stealth (webdriver-mask + AutomationControlled launch arg) stays the
default. PR #1112's six additional patches are added behind an opt-in
GSTACK_STEALTH=extended env flag.

Extended-mode patches (applied AFTER the default mask, in order):
  1. delete navigator.webdriver from prototype (not just the getter —
     detectors check `"webdriver" in navigator`)
  2. WebGL renderer spoof to Apple M1 Pro (SwiftShader was the #1
     software-GPU tell in containers)
  3. navigator.plugins returns a PluginArray-prototype-passing array
     with MimeType objects and namedItem()
  4. window.chrome populated with chrome.app, chrome.runtime,
     chrome.loadTimes(), chrome.csi() with realistic shapes
  5. navigator.mediaDevices backfilled when headless drops it
  6. CDP cdc_*-prefixed window globals cleared

Why opt-in: the default mode's contract is fingerprint CONSISTENCY,
which protects against detectors that flag spoofing mismatch. Extended
mode actively lies about the environment; sites that reflect on these
properties can break. Users who hit detection in default mode can flip
GSTACK_STEALTH=extended for SannySoft 100% pass-rate.

Twenty unit tests pin the env-flag semantics, all six patches' code
presence, and the applyStealth wiring order. Live SannySoft pass-rate
verification stays in the periodic-tier E2E suite.

Contributed by @garrytan via #1112 (rebased — original PR opened
before the codex-narrowed minimum landed; rebase preserves the
narrowed default while adding the SannySoft-passing path as opt-in).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(fixtures): regenerate ship-SKILL.md golden baselines after C10-C13 + C16 templates

Updates the three ship-SKILL.md golden baselines (claude, codex,
factory hosts) to match the new shape produced by:
- C10 #1209 codex argv (prompt + diff scope, no --base)
- C11 #1492 merge-base diff (DIFF_BASE= preamble)
- C13 #1197 command -v for codex detection
- C12 + boundary preservation per regen-enforcing test

Per CLAUDE.md SKILL.md workflow: edit the .tmpl, run gen:skill-docs,
commit the regenerated outputs together. Goldens are part of the
regen contract — without this commit, test/host-config.test.ts'
golden-baseline checks fail with the diff codex review surfaced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): v1.41.0.0 — Daegu wave (24 bisect commits, 14 user-facing fixes)

Bumps VERSION 1.40.0.0 → 1.41.0.0. CHANGELOG entry follows the
release-summary format in CLAUDE.md: two-line headline, lead
paragraph, "The numbers that matter" table, "What this means for
builders" closer, then itemized Added/Changed/Fixed/For contributors
with inline credit to every PR author and original issue reporter.

Scale-aware bump per CLAUDE.md: 24 commits, ~6000 LOC net,
substantial new capability across security (PTY sidecar wiring),
install (Windows build chain), compat (gbrain 0.18-0.35, Codex CLI
0.130+), and quality (screenshot guard, design key disclosure,
extended stealth opt-in). MINOR is the right call.

Closes for users: #1567, #1559, #1569, #1346, #1418, #1538, #1537,
#1530, #1457, #1561, #1554, #1479, #1503, #1248, #1214, #1370, #1327,
#1193 pattern, #1152 pattern. Credit retained inline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(find-browse): resolve source-checkout layout <git-root>/browse/dist/browse[.exe]

windows-setup-e2e.yml runs `bun browse/src/find-browse.ts` against a
freshly-built repo where binaries land at browse/dist/browse.exe (no
.claude/skills/gstack/ install layout). The previous markers chain
only matched .codex/.agents/.claude prefixed paths, so find-browse
exited "not found" even when the binary was present.

Adds a source-checkout fallback after the marker scan: if no
installed layout resolves but <git-root>/browse/dist/browse[.exe]
exists, return that. Three real callers hit this path:
- gstack repo dev workflow before `./setup` runs
- windows-setup-e2e.yml CI (the breakage that surfaced this)
- make-pdf consumers running from a sibling source checkout

Smoke-verified: a fresh git repo with browse/dist/browse on disk now
resolves through the source-checkout branch (was returning null
before this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(release): bump v1.41.0.0 → v1.42.0.0 to clear queue collision with #1574

The version-gate workflow flagged a collision: PR #1574
(garrytan/colombo-v3) already claims v1.41.0.0, and #1592
(fix/audit-critical-high-bugs) claims v1.41.1.0. Per CLAUDE.md's
workspace-aware ship rule, queue-advancing past a claimed version
within the same bump level is permitted — MINOR work landing on top
of a queued MINOR still reads as MINOR relative to main.

Util's suggested next slot is v1.42.0.0; taking it. CHANGELOG entry
header bumped + dated 2026-05-19; entry body unchanged (same wave
content, same credit list).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-20 07:35:01 -07:00
Garry Tan b9371d716e v1.34.2.0 fix wave: /codex review on CLI 0.130+, /investigate learnings, /sync-gbrain on Supabase (3 community-reported bugs) (#1478)
* fix(learnings): accept type:"investigation" in gstack-learnings-log

The /investigate skill instructed agents to log learnings with type:"investigation",
but bin/gstack-learnings-log:22 rejected anything not in
[pattern, pitfall, preference, architecture, tool, operational]. Every
investigation run exited 1 to stderr and the learning was dropped, silently
to the user.

Fix: add 'investigation' to ALLOWED_TYPES.

Regression test: round-trips a learning with type:"investigation" and asserts
exit 0 + file write; second test reads investigate/SKILL.md.tmpl and asserts
it emits the literal type:"investigation" string, guarding the
template/validator contract at both ends.

Fixes #1423. Reported by diogolealassis.

* fix(gbrain): engine detection survives gbrain ≥0.25 schema + non-zero doctor exit

freshDetectEngineTier() in lib/gstack-memory-helpers.ts returned engine:
"unknown" for every Supabase user on gbrain ≥0.25. Two stacking bugs:

1. execSync("gbrain doctor --json --fast 2>/dev/null") threw on non-zero
   exit. gbrain doctor exits 1 whenever health_score < 100, which is
   essentially every fresh install due to resolver_health warnings. The
   JSON output never reached the parser.
2. gbrain ≥0.25 shipped schema_version:2 doctor output that dropped the
   top-level 'engine' field entirely.

Result: every /sync-gbrain on Supabase logged 'engine=unknown' and skipped
all sync stages silently.

Fix:
- Replace execSync with execFileSync (no shell, no bash-specific 2>/dev/null
  redirect; portable to Windows).
- Recover stdout from the thrown error object so non-zero exits still parse.
- Fall back to reading gbrain's config.json (respecting GBRAIN_HOME env var,
  defaulting to ~/.gbrain/config.json) when doctor output doesn't surface
  an engine field.
- Add logGbrainError() helper that appends one-line JSONL to
  ~/.gstack/.gbrain-errors.jsonl on parse failure, so future regressions
  leave a forensic trail.

The "supabase" tier here means "remote postgres" in practice — gbrain
config uses engine:"postgres" for both real Supabase and any other
remote postgres (e.g. local-postgres-for-testing). Downstream sync code
treats them identically, so the label compression is intentional and
documented inline.

Regression test: existing detectEngineTier suite now isolates HOME +
GBRAIN_HOME + PATH to temp dirs (closes a flake source where the prior
tests would read whatever was on the reviewer's machine). New test
forces gbrain off PATH, writes a synthetic config.json with
engine:"postgres", asserts detectEngineTier() returns
engine:"supabase".

Fixes #1415. Patch shape contributed by Shiv @shivasymbl (tested on
gstack v1.31.0.0 + gbrain v0.31.3 + Supabase).

* fix(codex): /codex review works on Codex CLI ≥0.130.0

Codex CLI 0.130.0 made [PROMPT] and --base <BRANCH> mutually exclusive at
argv level. Step 2A of codex/SKILL.md.tmpl had always passed both (the
filesystem boundary prefix as the prompt argument + the base branch), so
every /codex review call died with:

  error: the argument '[PROMPT]' cannot be used with '--base <BRANCH>'

Fix: split Step 2A into two paths.

Default (no custom user instructions): bare 'codex review --base <base>'.
Codex's review prompt is internally diff-scoped, so the model focuses on
the changes against base. The filesystem boundary prefix is dropped here
because Codex 0.130 has no documented system-prompt config key
(probed -c 'system_prompt="..."' against 0.130 — the flag is silently
accepted but the value isn't applied). Skill files under .claude/ and
agents/ are public, so this is a token-efficiency concern, not a safety
one.

Custom instructions (/codex review <focus>): route through codex exec
with the diff written to a tempfile, inlined into the prompt between
explicit DIFF_START / DIFF_END markers. The boundary is preserved here
because codex exec isn't auto-scoped to the diff. The DIFF_START/END
delimiters tell the model where data ends and instructions resume, which
materially reduces prompt-injection hijack rates when the diff contains
adversarial content.

Note on bash semantics: codex's earlier review flagged the exec route as
"command injection via $_DIFF interpolation." That framing is wrong —
bash parameter expansion does not re-evaluate $(...) or backticks inside
the expanded value, so a diff containing $(rm -rf /) is plain string
data to codex exec. The real risk is prompt injection (model-side, not
shell-side), which the DIFF_START/END pattern mitigates.

Regression tests in test/codex-hardening.test.ts assert across BOTH
codex/SKILL.md.tmpl AND the generated codex/SKILL.md:
1. No 'codex review' invocation line combines a quoted-string OR variable
   positional argument with --base.
2. Step 2A still contains either bare 'codex review --base' OR 'codex
   exec' (guards against accidental deletion of both fix paths).

Fixes #1428. Reported by Stashub.

* test: raise timeouts for slow integration tests

Two test files were timing out at the default 5s on developer machines,
both pre-existing on origin/main but unrelated to this branch's bug fixes:

- test/gstack-artifacts-init.test.ts: 13 tests spawning real subprocesses
  via fake gh/glab/git shims in PATH. bun's fork+exec overhead pushed
  these past 5s consistently. Added a local test-wrapper that aliases
  test() with a 30s timeout (matches the brain-sync.test.ts pattern
  already in the repo).
- test/gstack-next-version.test.ts: one integration smoke test that
  spawns 'bun run ./bin/gstack-next-version' and parses the resulting
  JSON. The subprocess does a 'gh pr list' against the live GitHub API
  to enumerate claimed version slots. Network latency makes 5s tight;
  raised this single test to 30s.

No production code changed. The tests already passed deterministically
once given enough wall-clock time.

* chore: bump version and changelog (v1.34.2.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 11:11:52 -04:00
Garry Tan bf65487162 v1.26.0.0 feat: V1 transcript ingest + per-skill gbrain manifests + retrieval surface (#1298)
* feat: lib/gstack-memory-helpers shared module for V1 memory ingest pipeline

Lane 0 foundation per plan §"Eng review additions". 5 public functions
imported by the V1 helpers (Lanes A/B/C):

  canonicalizeRemote(url)  — normalize git remote → host/org/repo
  secretScanFile(path)     — gitleaks wrapper with discriminated return
  detectEngineTier()       — cached 60s in ~/.gstack/.gbrain-engine-cache.json
  parseSkillManifest(path) — extract gbrain.context_queries: from frontmatter
  withErrorContext(op,fn,caller) — async-aware error logging

22 unit tests, all passing. State files use schema_version: 1 +
last_writer field per Section 2A standardization. Manifest parser
handles all three kinds (vector/list/filesystem) and ignores
incomplete items.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: bin/gstack-memory-ingest — V1 unified memory ingest helper

Lane A. Walks coding-agent transcripts (Claude Code + Codex; Cursor V1.0.1
follow-up) AND ~/.gstack/ curated artifacts (eureka, learnings, timeline,
ceo-plans, design-docs, retros, builder-profile). Calls gbrain put_page
with type-tagged frontmatter. Uses gstack-memory-helpers (Lane 0):

  - Modes: --probe / --incremental (default, mtime fast-path) / --bulk
  - Default 90-day window; --all-history opts into full archive
  - --sources subset filter; --include-unattributed opt-in for no-remote sessions
  - --limit N for smoke testing; --benchmark for throughput reporting
  - Tolerant JSONL parser handles truncated last lines (D10 partial-flag)
  - State file at ~/.gstack/.transcript-ingest-state.json (LOCAL per ED1)
  - schema_version: 1 with backup-on-mismatch + JSON-corrupt recovery
  - gitleaks via secretScanFile() before every put_page (D19)
  - withErrorContext wraps every put_page for forensic ~/.gstack/.gbrain-errors.jsonl

15 unit tests cover --help, --probe (empty, Claude Code, Codex, mixed
artifacts), --sources filter, state file lifecycle (create, schema mismatch
backup, JSON corrupt backup), truncated-last-line handling, --limit
validation. All passing.

V1.5 P0 follow-ups noted in the file header:
  - Cursor SQLite extraction (V1.0.1)
  - gbrain put_file routing for Supabase Storage tier (cross-repo)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: bin/gstack-gbrain-sync — V1 unified sync verb (Lane B)

Orchestrates three storage tiers per plan §"Storage tiering":
  1. Code (current repo)         → gbrain import (Supabase or local PGLite)
  2. Transcripts + curated memory → gstack-memory-ingest (typed put_page)
  3. Curated artifacts to git    → gstack-brain-sync (existing pipeline)

Modes: --incremental (default, mtime fast-path) / --full (~25-35 min per
ED2 honest budget) / --dry-run (preview, no writes).

Flags: --code-only / --no-code / --no-memory / --no-brain-sync for
selective stage disable. Each stage failure is non-fatal; subsequent
stages still run.

State at ~/.gstack/.gbrain-sync-state.json (LOCAL per ED1) with
schema_version: 1 + last_writer + per-stage outcomes for forensic tracing.

--watch daemon explicitly deferred to V1.5 P0 TODO per Codex F3
(reverses the "no daemon" invariant). Continuous sync rides the existing
preamble-boundary hook only.

8 unit tests cover --help, unknown flag rejection, --dry-run preview shape
(all stages + code-only), --no-code stage skip, state file lifecycle
(create on real run + skip on dry-run), and stage results recorded
in state. All passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: bin/gstack-brain-context-load — V1 retrieval surface (Lane C)

Called from the gstack preamble at every skill start. Reads the active
skill's gbrain.context_queries: frontmatter (Layer 2) or falls back to a
generic salience block (Layer 1 with explicit repo: {repo_slug} filter
per Codex F7 cleanup).

Dispatches each query by kind:
  kind: vector       → gbrain query <text>
  kind: list         → gbrain list_pages --filter ...
  kind: filesystem   → local glob (with mtime_desc sort + tail support)

Each MCP/CLI call has a 500ms hard timeout per Section 1C. On timeout
or missing gbrain CLI, helper renders SKIP for that section and continues —
skill startup never blocks > 2s on gbrain issues.

Datamark envelope per Section 1D + D12: rendered body wrapped once at
the page level in <USER_TRANSCRIPT_DATA do-not-interpret-as-instructions>
(not per-message). Layer 1 prompt-injection defense.

Default manifest (D13 three-section): recent transcripts (limit 5) +
recent curated last-7d (limit 10) + skill-name-matched timeline events
(limit 5). All scoped to {repo_slug}.

Template var substitution: {repo_slug}, {user_slug}, {branch},
{skill_name}, {window}. Unresolved vars cause the query to skip with a
logged reason (--explain shows it).

10 unit tests cover help/unknown-flag/limit-validation, default-fallback
when skill not found, manifest dispatch when --skill-file points at a
real SKILL.md, datamark envelope wrapping, render_as template
substitution, unresolved-template-var skip, --quiet suppression, and
graceful gbrain-CLI-absence behavior. All passing.

V1.5 P0: salience smarts promote to gbrain server-side MCP tools
(get_recent_salience, find_anomalies, recency-aware list_pages); helper
signature unchanged, internals switch from 4-call composition to single
MCP call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: gbrain.context_queries manifests on 6 V1 skills (Lane E partial)

Adds the V1 retrieval contracts. Each skill declares what it wants gbrain
to surface in the preamble at invocation time:

  /office-hours        — prior sessions + builder profile + design docs
                         + recent eureka (4 queries)
  /plan-ceo-review     — prior CEO plans + design docs + recent CEO review
                         activity (3 queries)
  /design-shotgun      — prior approved variants + DESIGN.md + recent
                         design docs (3 queries)
  /design-consultation — existing DESIGN.md + prior design decisions +
                         brand-related notes (3 queries)
  /investigate         — prior investigations + project learnings + recent
                         eureka cross-project (3 queries)
  /retro               — prior retros + recent timeline + recent learnings
                         (3 queries)

Each query carries an explicit kind (vector | list | filesystem) per D3,
schema: 1 versioning per D15, and {repo_slug} template var per F7
cross-repo-contamination cleanup. Mix of vector / list / filesystem
matches what each skill actually needs:

  - filesystem (mtime_desc + tail) for log JSONL + curated markdown
  - list with tags_contains filter for typed gbrain pages
  - (vector reserved for V1.0.1 when gbrain query surface stabilizes)

Smoke test: bun run bin/gstack-brain-context-load.ts --skill-file
office-hours/SKILL.md --repo test-repo --explain returns mode=manifest
queries=4 with the filesystem kinds populating real data from
~/.gstack/builder-profile.jsonl + ~/.gstack/analytics/eureka.jsonl on
this Mac. End-to-end retrieval flow confirmed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat: setup-gbrain Step 7.5 ingest gate + Step 10 verdict + memory.md ref doc (Lane E partial)

Step 7.5: Transcript & memory ingest gate. After Step 7 wires brain-sync
but before Step 8's CLAUDE.md persist, runs gstack-memory-ingest --probe,
then either silent-bulks (small) or AskUserQuestion-gates with the exact
counts + value promise + 5 options (this-repo-90d, all-history, multi-repo,
incremental-from-now, never). Decision persists to
gstack-config set transcript_ingest_mode <choice>.

Step 10: GREEN/YELLOW/RED verdict block. Re-running /setup-gbrain on a
configured Mac is now a first-class doctor path — every step's detection
+ repair logic feeds into a single verdict at the end. Rows: CLI / Engine /
doctor / MCP / Repo policy / Code import / Memory sync / Transcripts /
CLAUDE.md / Smoke. Tells the user "Run /setup-gbrain again any time gbrain
feels off; it's safe and idempotent."

setup-gbrain/memory.md: user-facing reference doc covering what gets
ingested + what stays local + secret scanning via gitleaks + storage
tiering + querying + deleting + how the agent auto-loads context per skill +
common recovery cases. Linked from Step 8's CLAUDE.md persist.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: V1 E2E pipeline + --no-write flag for ingest helper (Lane F)

E2E pipeline test exercises the full Lane A → B → C value loop:
  1. Set up fake $HOME with all 8 memory source types as fixtures
  2. gstack-memory-ingest --probe verifies counts match disk
  3. gstack-memory-ingest --incremental writes state with schema_version: 1
  4. Idempotency: re-run reports 0 changes
  5. --probe distinguishes new vs unchanged after first incremental
  6. gstack-gbrain-sync --dry-run previews 3 stages
  7. --no-code --no-brain-sync --quiet writes sync state with 1 stage entry
  8. office-hours/SKILL.md V1 manifest dispatches 4 queries (mode=manifest)
  9. Datamark envelope wraps every loaded section (Section 1D + D12)
 10. Layer 1 fallback when no skill specified — default 3-section manifest
 11. plan-ceo-review/SKILL.md manifest also dispatches (regression for V1
     manifest authoring across all 6 V1 skills)

Side effect: bin/gstack-memory-ingest.ts gains --no-write flag (also
honored via GSTACK_MEMORY_INGEST_NO_WRITE=1 env var). Skips gbrain put_page
calls while still updating the state file. Used by tests + dry-runs to
avoid real ingest churn when verifying state-file lifecycle. The
--bulk and --incremental modes still call gbrain by default — only
explicit opt-in suppresses writes.

V1 lane test totals (covering all 5 helpers + 6 skill manifests):
  test/gstack-memory-helpers.test.ts     22 tests
  test/gstack-memory-ingest.test.ts      15 tests
  test/gstack-gbrain-sync.test.ts         8 tests
  test/gstack-brain-context-load.test.ts 10 tests
  test/skill-e2e-memory-pipeline.test.ts 10 tests
  ────────────────────────────────────── ─────────
  TOTAL                                  65 passing

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v1.26.0.0)

V1 of memory ingest + retrieval surface. Coding-agent transcripts (Claude
Code + Codex) on disk become first-class queryable pages in gbrain. Six
high-leverage skills auto-load per-skill context manifests at every
invocation. Datamark envelopes wrap loaded pages as Layer 1 prompt-
injection defense. Storage tiering: curated memory rides existing
brain-sync git pipeline; code+transcripts route to Supabase Storage when
configured else local PGLite — never double-store.

Net branch size vs main: +4174/-849 across 39 files. 65 V1 tests, all
green. Goldilocks scope per CEO D18; V1.5 P0 follow-ups documented in
the plan's V1.5 TODOs section.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 08:40:30 -07:00