gstack

mirror of https://github.com/garrytan/gstack.git synced 2026-07-11 10:23:44 +02:00

Author	SHA1	Message	Date
Garry Tan	66f3a180d3	v1.43.2.0 fix wave: post-Daegu paper-cut — 18 fixes, 28 bisect commits (#1642 ) * fix(gbrain-sync): --full produces an empty code index on first run of a new repo `gbrain reindex-code` only RE-EMBEDS pages that already exist; it never walks the filesystem. On a freshly-registered source (0 pages), a --full run that called reindex-code alone found nothing ("No code pages to reindex"), finished in ~1s, and left the code index permanently empty while still reporting OK. Fix: --full now runs `sync --strategy code` FIRST to create pages via the file walk, then runs `reindex-code` to honor the documented "full walk + reindex" contract for both fresh and populated sources. Contributed by @jetsetterfl via #1584. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gbrain-local-status): classifier falsely reports broken-db inside repos with their own DATABASE_URL The freshClassify probe ran `gbrain sources list --json` with the inherited process env. When the probe ran from inside a repo with its own .env (an app DATABASE_URL on a different port), Bun autoloaded the project's .env, gbrain connected to the wrong database, and the classifier reported broken-db on otherwise-healthy brains. Fix: route the probe env through `buildGbrainEnv` from lib/gbrain-exec, the same helper the sync orchestrator uses. DATABASE_URL is seeded from ~/.gbrain/config.json so the result is cwd-independent. The 60s cache can no longer propagate a poisoned negative to clean directories. Contributed by @jetsetterfl via #1583. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(retro): stale-base + bad-today-anchor pre-flight guard (#1624) /retro silently produced confidently-wrong output when "today" drifted (model session-context error) or when origin/<default> was materially behind the actual remote — git log --since returned zero or near-zero commits and the narrative was fabricated from nothing. Adds Step 0.5 with four ordered pre-check branches before any window analysis: A. No 'origin' remote → skip with "base freshness not verified" note B. Detached HEAD → skip with "base freshness not verified" note C. `git fetch origin <default>` fails (offline) → warn, proceed against last-known origin/<default> D. Fetch succeeded → compare today vs latest origin/<default> commit; if gap > window-days, BLOCK with explicit citation of latest-commit date. Skip paths still proceed to Step 1, but the disclosure is carried into the retro narrative ("offline run, window not freshness-verified") so the output is never silently confidently-wrong. Atomic .tmpl + gen:skill-docs regen commit (T-Codex-3 pattern). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(retro): regression for #1624 stale-base pre-flight guard 13 static-invariant tests pinning the four ordered pre-check branches in retro/SKILL.md.tmpl:Step 0.5: A. no-remote skip — must check origin presence + set verdict B. detached-HEAD skip — must gate behind prior verdict (ordering) C. fetch-fail warn — must match `if !` or `\|\|` shape, gate by verdict D. stale-base BLOCK — must read latest-commit ISO date, cite remediation Plus a disclosure-survives-to-narrative invariant: skip-path verdicts must be named in prose so the retro output carries the cited reason rather than silently misreporting. Failing build if Step 0.5 is removed, branches re-ordered (no-remote no longer wins), or the BLOCK message stops citing today/latest-commit/remediation path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(gbrain-sync): configurable timeouts + resume from gbrain checkpoint (#1611) The memory and code stages hardcoded a 35-min spawn timeout. On brains with ~2000+ staged files, /sync-gbrain --full reliably SIGTERM'd the child at exactly 35 minutes with exit 143. gbrain left ~/.gbrain/import-checkpoint.json pointing at the staging dir, but gstack-memory-ingest's SIGTERM handler unconditionally cleaned the dir up — so the next run found a checkpoint pointing at nothing and restaged from scratch, repeating the SIGTERM forever. Three changes: 1. Configurable timeouts via env (bounds 60_000ms - 86_400_000ms, default 2_100_000ms = 35min unchanged): GSTACK_SYNC_MEMORY_TIMEOUT_MS GSTACK_SYNC_CODE_TIMEOUT_MS Out-of-range or non-numeric values warn and fall back to the default. 2. SIGTERM in gstack-memory-ingest no longer always cleans up the staging dir. If gbrain has written ~/.gbrain/import-checkpoint.json pointing at the active staging dir, the dir is PRESERVED for next-run resume. Otherwise (no checkpoint pointing here, crash before gbrain ever touched it) it's cleaned up as before. 3. Next /sync-gbrain run detects gbrain's checkpoint via decideResume() in gstack-gbrain-sync.ts: - no checkpoint → fresh ingest pass - checkpoint + staging ok → set GSTACK_INGEST_RESUME_DIR; child reuses staging dir and skips writeStaged; gbrain import resumes from processedIndex+1 - checkpoint + staging gone → warn "previous checkpoint stale (staging dir gone), restaging from scratch" and proceed Reuses gbrain's own checkpoint as the source of truth (D1 — no double-store state). Detect-then-fallback semantics per C1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gbrain-sync): regression for #1611 timeouts + resume 19 tests across three surfaces: - resolveStageTimeoutMs (10 tests): undefined/empty → default; non-numeric, zero, negative, below-floor, above-ceiling → warn + default; at-floor, at-ceiling, valid mid-range → accepted as-is. - decideResume (6 tests): no checkpoint, corrupt JSON, checkpoint + staging ok, checkpoint + staging missing, checkpoint with no dir, checkpoint with empty dir. - SIGTERM staging preservation (3 static invariants): memory-ingest signal handler must check stagingDirIsCheckpointed BEFORE cleanup; preserve branch must come before cleanup branch (ordering); orchestrator must pass GSTACK_INGEST_RESUME_DIR to the grandchild on resume. Also threads process.env.HOME through readGbrainCheckpoint and stagingDirIsCheckpointed so tests can redirect home. os.homedir() caches at process start and ignores later mutation, so the env override is the only reliable test injection point. Failing build if the timeout bounds are removed, the resume detection short-circuits incorrectly, or the SIGTERM handler regresses to unconditional cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): pre-emit verification gate kills Django-shape FP class (#1539) External user filed 4/8 false positives on a /review run against a Django + DRF + PostgreSQL repo (Sprint 2.5). Every FP class was the same shape: "resolvable in <5 minutes by viewing the actual code or running a simple grep" — fields that don't exist on the model, dict.get()-might-be-None on a form that returns {}-initialized cleaned_data, standard ORM save behavior called out as data loss. Extends the Confidence Calibration resolver (consumed by review, cso, plan-eng-review, ship) with a Pre-emit verification gate: Every finding MUST quote the specific code line that motivates it (file:line + verbatim text). If the reviewer cannot produce the quote, the finding is unverified — its confidence is forced to 4-5 so the existing "Suppress from main report" rule fires automatically. The finding still goes to the appendix for calibration audit, but the user does not see it in the critical-pass output. Reuses the existing suppression mechanism — no new code path. The FP classes the gate kills are enumerated in the resolver text so reviewers see the named patterns. Framework-meta nudge included for Django Meta, Rails associations, SQLAlchemy relationships, TypeORM decorators, Sequelize init, Prisma generated client — the reviewer must quote the meta-construct that generates the symbol, not just grep for the literal name. Deeper framework-aware ORM verification (model introspection, migration-history- aware checks) is deliberately deferred to a future wave per T-Codex-2. Atomic .tmpl-equivalent (resolver) edit + gen:skill-docs regen commit per T-Codex-3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(review): regression for #1539 pre-emit verification gate 12 tests pinning the gate behavior: - Resolver emits the gate header + #1539 reference - Gate requires quoting file:line + verbatim text - Unverified findings forced to confidence 4-5 (auto-suppress via existing <7-rule, no new mechanism) - Framework-meta nudge names Django, Rails, SQLAlchemy, TypeORM, Sequelize, Prisma - Deferred design doc reference present (1539-framework-aware-review.md) - Four named FP classes from #1539 enumerated: * field doesn't exist on model * dict.get() might be None * save() might lose fields * update_fields might miss X - All four downstream SKILL.md consumers (review, cso, plan-eng-review, ship) carry the gate text after gen:skill-docs - Existing confidence 9-10 'Show normally' + 3-4 'Suppress' rows unchanged (regression on existing behavior) Failing build if the gate is removed, the suppression mechanism is re-invented separately, the framework-meta nudge drops a framework, or gen:skill-docs stops propagating the gate to consumers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(config): expose explain_level default * fix(benchmark): parse positional prompt after flags * fix(artifacts): reject malformed remote paths * fix(learnings): preserve current entries in cross-project search * fix(setup): register root gstack slash alias * fix(memory): probe gitleaks without shell builtin * fix(gbrain-lib): pin LC_ALL=C in varname validator (macOS locale guard) In many macOS shells the default locale (e.g. en_US.UTF-8) makes bash glob brackets like `[A-Z]` match lowercase letters too, so the existing `case "$name" in [A-Z_][A-Z0-9_])` branch lets names like `lower-case` through validation. The function then trips `printf -v "$varname"` and `export "$varname"` with `not a valid identifier` errors that surface mid-prompt, which is exactly what the validator was supposed to prevent. Pinning `LC_ALL=C` inside the function gives ASCII-only bracket semantics on both macOS and Linux, matching the documented `[A-Z_][A-Z0-9_]` contract. Declared `local` so it doesn't leak to the calling shell — `gstack-gbrain-lib.sh` is documented as a sourced helper, so a bare assignment would mutate the caller's locale for the rest of the process (silently affecting downstream `sort`, `tr`, locale-aware globs in the same shell, etc.). The existing regression test `test/gbrain-lib-verify.test.ts:'rejects invalid var names'` already covers the macOS repro shape (passes `lower-case` and expects the validator to reject + emit `invalid var name`). On Linux CI the test silently passed because `LC_ALL=C` is the typical default; on macOS dev boxes it fails. Verified: - `bun test test/gbrain-lib-verify.test.ts`: 22 pass, 0 fail (on macOS). - `_gstack_gbrain_validate_varname lower-case; echo $?` → 2. - `_gstack_gbrain_validate_varname FOO_BAR; echo $?` → 0. - Caller's LC_ALL preserved across calls (confirmed via sourced bash). * fix(land-and-deploy): detect merged PR after gh failure After `gh pr merge` exits non-zero, the PR may already be MERGED server-side (concurrent merge landed, or local cleanup phase failed AFTER the merge succeeded). Calling `gh pr merge` a second time then errors with a confusing "already merged" — and worse, the deploy workflow never runs because we stopped on the first failure. Adds a Post-failure PR-state check (§4a-postfail) that runs after ANY non-zero exit from `gh pr merge`: - state == MERGED → record MERGE_PATH=direct, OFFER (don't force) stale-worktree cleanup on the base branch with uncommitted-work guard, proceed to §4a CI watch - state == OPEN → check autoMergeRequest; if non-null treat as merge-queue wait; if null surface both errors and STOP - state == CLOSED → STOP Hard invariant: never retry `gh pr merge` after a non-zero exit. Server state is authoritative. Re-authored from PR #1620 into land-and-deploy/SKILL.md.tmpl (the source of truth) instead of the generated SKILL.md, so the next gen:skill-docs run preserves the change. Original diff by @davidfoy via #1620. Related: cli/cli#3442, cli/cli#13380. Contributed by @davidfoy via #1620. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: detect PgBouncer transaction-mode pooler and set GBRAIN_PREPARE=true (#1435) When gbrain connects through a PgBouncer transaction-mode pooler (port 6543), it auto-disables prepared statements. This breaks `gbrain search` silently — the /sync-gbrain capability check fails and the GBrain Search Guidance block never gets written to CLAUDE.md. Three-layer fix: 1. lib/gbrain-exec.ts — `buildGbrainEnv()` now detects port 6543 in the effective DATABASE_URL and sets `GBRAIN_PREPARE=true` in the env passed to every gbrain spawn. This is the single chokepoint — all gstack gbrain invocations inherit the fix. Caller can opt out with `GBRAIN_PREPARE=false`. 2. sync-gbrain/SKILL.md{,.tmpl} — capability check now exports `GBRAIN_PREPARE=true` explicitly and retries search up to 3x with 1s delay for async index propagation under connection pooling. 3. bin/gstack-gbrain-detect — surfaces `gbrain_pooler_mode` field ("transaction" \| "session" \| null) in the preamble probe JSON so /setup-gbrain and /sync-gbrain can advise users about pooler state. Closes #1435 Built with [ClosedLoop.AI](https://closedloop.ai) \| [GitHub](https://github.com/closedloop-ai/claude-plugins) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(supabase-provision): rewrite transaction/6543 -> session/5432 for new projects - Single-object pooler API responses default to transaction-mode at 6543, but the shared pooler tenant on new projects only listens on session/5432 - Add a `pool_mode == transaction && db_port == 6543` rewrite + stderr note - Escape hatch via `GSTACK_SUPABASE_TRUST_API_PORT=1` for forward-compat - 5 new tests covering rewrite, no-op shapes, env opt-out, array path Fixes #1301. * fix(browse): GSTACK_CHROMIUM_NO_SANDBOX opt-out for Ubuntu/AppArmor (#1562) Ubuntu/AppArmor configurations often block unprivileged Chromium sandboxing for headless agent sessions even for normal users — /qa hangs without --no-sandbox. The kernel policy denies the unprivileged user namespaces Chromium needs. Adds GSTACK_CHROMIUM_NO_SANDBOX=1 as an explicit user override that forces the sandbox off without changing the default for everyone else. Re-authored from PR #1562 onto v1.42.2.0's shouldEnableChromiumSandbox() helper — purely additive, preserves the headed-launch sandbox-on-by-default behavior that v1.42.2.0 shipped to kill the --no-sandbox yellow infobar. Three new regression tests cover: - linux + override=1 → false (the named use case) - darwin + override=1 → false (env wins on any platform) - override=0 → does NOT trigger (must be exactly "1") Original diff by @techcenter68 via #1562. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(browse): mirror isCustomChromium() guard in headless launch() When BROWSE_EXTENSIONS_DIR is set alongside GSTACK_CHROMIUM_PATH pointing at a baked-extension build (GBrowser / GStack Browser), the headless launch() path was unconditionally adding --disable-extensions-except / --load-extension. This causes the same ServiceWorkerState::SetWorkerId DCHECK crash that launchHeaded() already guards against via isCustomChromium(). Mirror the existing guard: skip --load-extension flags when isCustomChromium() returns true; always push the off-screen window geometry args. * fix(browse): daemonize macOS/Linux server via setsid() `Bun.spawn().unref()` only releases the child from Bun's event loop — it does NOT call setsid(). The spawned bun server inherits the spawning shell's process session. When the CLI runs inside a session-managed shell that exits shortly after the CLI returns (Claude Code's per-command Bash sandbox, Conductor, OpenClaw, CI step runners), the session leader's exit sends SIGHUP to every PID in the session — killing the bun server and its Chromium grandchildren within seconds of a successful `connect`. Setting `BROWSE_PARENT_PID=0` (already done by the `connect` command and pair-agent) disables the parent-process watchdog but does NOT save the server here: SIGHUP from session teardown still reaps it. Replace the macOS/Linux `Bun.spawn().unref()` with Node's `child_process.spawn({ detached: true })`, which calls setsid() and gives the server its own session leader role (PPID=1, STAT=Ss). This mirrors the Windows path's rationale (PR #191 by @fqueiro) — same root cause, different OS surface. Verified on macOS in Conductor: pre-fix the server dies ~10–15s after connect across separate Bash invocations; post-fix the same PID stays alive (PPID=1, SESS=0, STAT=Ss) and responds to `status`/`goto`/ `snapshot` across many separate shell calls. The `proc?.stderr` startup-error branch is removed since both platforms now spawn with `stdio: 'ignore'`; both fall through to the on-disk `browse-startup-error.log` written by `server.ts`'s start().catch. * fix(design): bump image-gen timeout to 240s + pin gpt-image-2 The design binary calls /v1/responses (gpt-4o + image_generation tool, quality:high, 1536x1024) but aborted the request after a hardcoded 120s. That class of request consistently takes ~140-160s end-to-end, so every generate/variants/evolve/iterate call aborted before the image returned. In /design-shotgun this cascades: Step 3c launches N parallel agents, each calling `$D generate`, each aborts at 120s and retries, all fail, the comparison board never opens — the skill appears to hang indefinitely. Reproduced the exact API call with a longer budget: HTTP 200, valid image, 143.5s. A real /design-shotgun run after the patch generated 3 variants in parallel at 150.0s / 161.0s / 152.1s, all exit 0 — note the 161s case, which a naive 150s bump would still have failed. - Bump AbortController timeout 120_000 -> 240_000 in generate.ts, variants.ts, evolve.ts, iterate.ts (both call sites) - Pin the image_generation tool to model "gpt-image-2" design/test/variants-retry-after.test.ts: 5 pass, 0 fail. The feedback-roundtrip.test.ts failures are a pre-existing browse-module breakage (session.clearLoadedHtml undefined), unrelated to this change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: fill coverage gaps for PRs #1606, #1612, #1620 Three cherry-picked PRs in this wave landed without unit-test coverage for the specific invariant they protect: #1606 (@andrey-esipov) — LC_ALL=C pin in _gstack_gbrain_validate_varname 8 tests by sourcing bin/gstack-gbrain-lib.sh and calling the validator directly. Asserts uppercase/digit/underscore accepted, lowercase REJECTED (the macOS-locale regression case), mixed-case rejected, LC_ALL=C scoping is local (doesn't leak to caller). #1612 (@bharat2913) — setsid daemonize via Node child_process.spawn 4 static-invariant tests on browse/src/cli.ts. The actual setsid syscall is hard to assert without a real spawn, so we pin the source shape: nodeSpawn imported from child_process; non-Windows branch uses nodeSpawn(...) with detached:true and .unref(); comment documents setsid/SIGHUP root cause; Bun.spawn() is NOT used on macOS/Linux. #1620 (@davidfoy, re-authored into .tmpl per A3) — §4a-postfail 12 static invariants on land-and-deploy/SKILL.md.tmpl + generated SKILL.md. Pins all three state branches (MERGED/OPEN/CLOSED), the authoritative state query, the merge-SHA capture, non-destructive worktree cleanup with uncommitted-work guard, autoMergeRequest probe on OPEN, hard "never retry gh pr merge" rule, and atomic regen propagation. Failing build if any of the three invariants regresses. Note: gbrain-lib-validate-varname.test.ts also surfaces a pre-existing glob-pattern overpermissiveness (hyphens + dots accepted) — not in #1606's scope; documented inline as a separate cleanup target. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(learnings): align injection-prevention tests with PR #1619 tagged-line shape PR #1619 (preserve current entries in cross-project search) refactored gstack-learnings-search to tag rows inline (`current\t<json>` vs `cross\t<json>`) instead of filtering inside the bun block via process.env.GSTACK_SEARCH_SLUG. The bun block no longer reads SLUG or CROSS env vars — it parses the per-line tag and sets a per-entry _crossProject flag. The pre-existing test/learnings-injection.test.ts still asserted on the old SLUG + CROSS env var shape. Updates: - Remove the SLUG env var assertion (no longer set on bash command line) - Remove the bun-block CROSS env var assertion (block reads the tag now, not the env) - Add a new positive assertion that the bun block parses the tag (sourceTag \| tabIndex \| crossProject) - Keep the shell-interpolation safety assertion unchanged — that's independent of the SLUG refactor The CROSS env var is still SET on the bash command line (it controls whether the cross-project find runs at all), but the bun child no longer reads it. The existing "env vars set on bash command line" test continues to pin that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(fixtures): regenerate ship-SKILL.md golden baselines ship/SKILL.md consumes the Confidence Calibration resolver via the preamble pipeline. This wave's #1539 pre-emit verification gate extends the resolver text, which propagated to ship/SKILL.md via gen:skill-docs. The golden fixtures in test/fixtures/golden/ matched the pre-#1539 shape and failed the host-config regression check. Refreshes claude-ship-SKILL.md, codex-ship-SKILL.md, and factory-ship-SKILL.md to match the current generated output. Matches the Daegu wave's bisect commit 23 ("test(fixtures): regenerate ship-SKILL.md golden baselines"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(gbrain-detect): include gbrain_pooler_mode in schema regression (PR #1591) PR #1591 (PgBouncer transaction-mode detection, @mikeangstadt) added gbrain_pooler_mode to the gstack-gbrain-detect JSON output but did not update the schema regression check in test/gstack-gbrain-detect-mcp-mode.test.ts. Adding the key in alphabetical order matching the rest of the schema array. Downstream sync-gbrain ignores unknown keys, so this is forward-compat. Without this, the test fails with a diff: + "gbrain_pooler_mode" because keys is the actual set returned and the expected array was pre-#1591. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): v1.43.0.0 — post-Daegu paper-cut wave Bumps VERSION 1.42.2.0 → 1.43.0.0 (MINOR per scale-aware bump rules: new env-var surface GSTACK_SYNC__TIMEOUT_MS + GSTACK_CHROMIUM_NO_SANDBOX, behavior expansion in browse/src/browser-manager.ts headless launch, three skill-template prompt changes affecting /retro, /review, /sync-gbrain). CHANGELOG entry leads with what stopped happening: /retro stops fabricating retros against stale bases, /sync-gbrain stops SIGTERM-looping 35-min restarts on big brains, /review stops shipping framework FPs the reviewer never grep'd. 18 fixes total — 15 community PRs + 3 self-filed silent-failure issues (#1624, #1611, #1539) — in one bundled PR with 26 bisect commits and 7 new regression test files. Every wave-touched test file passes in isolation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> chore(release): bump v1.43.0.0 → v1.43.2.0 for queue collision CI check-version-stale flagged v1.43.0.0 already claimed by PR #1574 (garrytan/colombo-v3). PR #1639 (garrytan/muscat-v3) claims v1.43.1.0. Next available MINOR slot is v1.43.2.0. Bump VERSION + package.json + CHANGELOG entry header. No behavior changes — purely re-versioning to clear the queue collision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Jayesh Betala <jayesh.betala7@gmail.com> Co-authored-by: Andrey Esipov <andrey.esipov@outlook.com> Co-authored-by: David Foy <davidfoy@users.noreply.github.com> Co-authored-by: mikeangstadt <mike.angstadt@closedloop.ai> Co-authored-by: 0xDevNinja <manmit0x@gmail.com> Co-authored-by: techcenter68 <techcenter68@users.noreply.github.com> Co-authored-by: shohu <shohu33@gmail.com> Co-authored-by: Bharat <bharat@theysaid.io> Co-authored-by: Matteo Hertel <info@matteohertel.com>	2026-05-21 21:21:07 -07:00
Garry Tan	1a4f0c9c15	v1.33.1.0 fix(learnings): token-OR query + task-shaped retrieval in 3 long skills (#1442 ) * fix(learnings): use token-OR matching in gstack-learnings-search --query Split the query on whitespace into tokens; a learning matches if ANY token appears as a substring in ANY of key/insight/files. Previously the whole query was a single substring, so multi-word queries like "debug investigation" only matched learnings whose insight contained that exact contiguous phrase, which is usually nothing. Whitespace-only query falls through to no-query (matches today's no-flag behavior). Single-word queries behave exactly as before. Adds test/gstack-learnings-search.test.ts: 3 assertions covering multi-token, single-token, and no-query backwards compat. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(resolver): parameterized LEARNINGS_SEARCH with shell-injection guard The {{LEARNINGS_SEARCH}} macro now accepts a query=KEYWORD argument that gets interpolated as --query "<keyword>" into the generated bash. Empty value falls through to no-query (principle of least surprise: a stray {{LEARNINGS_SEARCH:query=}} placeholder gets today's behavior, not a build failure). Pattern reuses the parameterized-macro parsing from composition.ts. The 13 templates that don't pass a query stay byte-identical in their generated SKILL.md output. Shell-injection guard: the query value is whitelisted to ^[A-Za-z0-9 _-]+$ at gen-skill-docs time. Any \$(), backticks, semicolons, or quotes throw a loud build error instead of emitting executable bash. Static template queries are safe by inspection; this defends against future contributors writing dangerous values. Adds 5 assertions to test/gen-skill-docs.test.ts covering no-args, claude+query=foo bar on both cross-project and project-scoped branches, codex host variant, empty value semantics, and shell-injection payloads (\$(whoami), backticks, ;, &, ", \\, \$x) throwing build errors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(skills): task-shaped queries + mid-flow refresh in /investigate /qa /ship The three long skills now pull learnings keyed to their theme at the top, then re-pull at phase boundaries as work shifts to new sub-tasks. Top-of-skill queries (5-6 token unions, token-OR matched): - investigate: "debug investigation root cause hypothesis bug fix" - qa: "qa testing bug regression flake fixture" - ship: "release ship version changelog merge pr" Mid-flow refresh blocks (concrete keyword recipe + worked examples): - investigate: between Phase 1 (hypothesis) and Phase 2 (analysis), keyed to the hypothesis noun. Examples: auth-cookie, session-expiry. - qa: between Phase 7 (triage) and Phase 8 (fix loop), keyed to the buggy component name. Examples: checkout-button, signup-form. - ship: just before Step 12 (VERSION bump), keyed to the headline feature. Examples: learnings-search, pacing, worktree-ship. Keyword recipe enforces alphanumeric+hyphen only (no quotes, slashes, dots, colons) so dynamic queries cannot inject shell metacharacters. The other 13 short-lived skills keep the bare {{LEARNINGS_SEARCH}} form. Backwards-compat verified via diff: their generated SKILL.md output is byte-identical to before this change. Golden ship fixtures regenerated to match the new ship/SKILL.md output. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: bump version and changelog (v1.33.1.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: refresh codex+factory ship golden fixtures Follow-up to `513c9660` — the codex and factory host outputs needed regeneration too, missed in the initial commit because gen:skill-docs was only run for the claude host. Now matches gen:skill-docs --host all. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-11 19:34:33 -07:00
Garry Tan	7e96fe299b	fix: security wave 3 — 12 fixes, 7 contributors (v0.16.4.0) (#988 ) * fix(security): validateOutputPath symlink bypass — check file-level symlinks validateOutputPath() previously only resolved symlinks on the parent directory. A symlink at /tmp/evil.png → /etc/crontab passed the parent check (parent is /tmp, which is safe) but the write followed the symlink outside safe dirs. Add lstatSync() check: if the target file exists and is a symlink, resolve through it and verify the real target is within SAFE_DIRECTORIES. ENOENT (file doesn't exist yet) falls through to the existing parent-dir check. Closes #921 Co-Authored-By: Yunsu <Hybirdss@users.noreply.github.com> * fix(security): shell injection in bin/ scripts — use env vars instead of interpolation gstack-settings-hook interpolated $SETTINGS_FILE directly into bun -e double-quoted blocks. A path containing quotes or backticks breaks the JS string context, enabling arbitrary code execution. Replace direct interpolation with environment variables (process.env). Same fix applied to gstack-team-init which had the same pattern. Systematic audit confirmed only these two scripts were vulnerable — all other bin/ scripts already use stdin piping or env vars. Closes #858 Co-Authored-By: Gus <garagon@users.noreply.github.com> * fix(security): cookie-import path validation bypass + hardcoded /tmp Two fixes: 1. cookie-import relative path bypass (#707): path.isAbsolute() gated the entire validation, so relative paths like "sensitive-file.json" bypassed the safe-directory check entirely. Now always resolves to absolute path with realpathSync for symlink resolution, matching validateOutputPath(). 2. Hardcoded /tmp in cookie-import-browser (#708): openDbFromCopy used /tmp directly instead of os.tmpdir(), breaking Windows support. Also adds explicit imports for SAFE_DIRECTORIES and isPathWithin in write-commands.ts (previously resolved implicitly through bundler). Closes #852 Co-Authored-By: Toby Morning <urbantech@users.noreply.github.com> * fix(security): redact form fields with sensitive names, not just type=password Form redaction only applied to type="password" fields. Hidden and text fields named csrf_token, api_key, session_id, etc. were exposed unredacted in LLM context, leaking secrets. Extend redaction to check field name and id against sensitive patterns: token, secret, key, password, credential, auth, jwt, session, csrf, sid, api_key. Uses the same pattern style as SENSITIVE_COOKIE_NAME. Closes #860 Co-Authored-By: Gus <garagon@users.noreply.github.com> * fix(security): restrict session file permissions to owner-only Design session files written to /tmp with default umask (0644) were world-readable on shared systems. Sessions contain design prompts and feedback history. Set mode 0o600 (owner read/write only) on both create and update paths. Closes #859 Co-Authored-By: Gus <garagon@users.noreply.github.com> * fix(security): enforce frozen lockfile during setup bun install without --frozen-lockfile resolves ^semver ranges from npm on every run. If an attacker publishes a compromised compatible version of any dependency, the next ./setup pulls it silently. Add --frozen-lockfile with fallback to plain install (for fresh clones where bun.lock may not exist yet). Matches the pattern already used in the .agents/ generation block (line 237). Closes #614 Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com> * fix: remove duplicate recursive chmod on /tmp in Dockerfile.ci chmod -R 1777 /tmp recursively sets sticky bit on files (no defined behavior), not just the directory. Deduplicate to single chmod 1777 /tmp. Closes #747 Co-Authored-By: Maksim Soltan <Gonzih@users.noreply.github.com> * fix(security): learnings input validation + cross-project trust gate Three fixes to the learnings system: 1. Input validation in gstack-learnings-log: type must be from allowed list, key must be alphanumeric, confidence must be 1-10 integer, source must be from allowed list. Prevents injection via malformed fields. 2. Prompt injection defense: insight field checked against 10 instruction-like patterns (ignore previous, system:, override, etc.). Rejected with clear error message. 3. Cross-project trust gate in gstack-learnings-search: AI-generated learnings from other projects are filtered out. Only user-stated learnings cross project boundaries. Prevents silent prompt injection across codebases. Also adds trusted field (true for user-stated source, false for AI-generated) to enable the trust gate at read time. Closes #841 Co-Authored-By: Ziad Al Sharif <Ziadstr@users.noreply.github.com> * feat(security): track cookie-imported domains and scope cookie imports Foundation for origin-pinned JS execution (#616). Tracks which domains cookies were imported from so the JS/eval commands can verify execution stays within imported origins. Changes: - BrowserManager: new cookieImportedDomains Set with track/get/has methods - cookie-import: tracks imported cookie domains after addCookies - cookie-import-browser: tracks domains on --domain direct import - cookie-import-browser --all: new explicit opt-in for all-domain import (previously implicit behavior, now requires deliberate flag) Closes #615 Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com> * feat(security): pin JS/eval execution to cookie-imported origins When cookies have been imported for specific domains, block JS execution on pages whose origin doesn't match. Prevents the attack chain: 1. Agent imports cookies for github.com 2. Prompt injection navigates to attacker.com 3. Agent runs js document.cookie → exfiltrates github cookies assertJsOriginAllowed() checks the current page hostname against imported cookie domains with subdomain matching (.github.com allows api.github.com). When no cookies are imported, all origins allowed (nothing to protect). about:blank and data: URIs are allowed (no cookies at risk). Depends on #615 (cookie domain tracking). Closes #616 Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com> * feat(security): add persistent command audit log Append-only JSONL audit trail for all browse server commands. Unlike in-memory ring buffers, the audit log persists across restarts and is never truncated. Each entry records: timestamp, command, args (truncated to 200 chars), page origin, duration, status, error (truncated to 300 chars), hasCookies flag, connection mode. All writes are best-effort — audit failures never block command execution. Log stored at ~/.gstack/.browse/browse-audit.jsonl. Closes #617 Co-Authored-By: Alberto Martinez <halbert04@users.noreply.github.com> * fix(security): block hex-encoded IPv4-mapped IPv6 metadata bypass URL constructor normalizes ::ffff:169.254.169.254 to ::ffff:a9fe:a9fe (hex form), which was not in the blocklist. Similarly, ::169.254.169.254 normalizes to ::a9fe:a9fe. Add both hex-encoded forms to BLOCKED_METADATA_HOSTS so they're caught by the direct hostname check in validateNavigationUrl. Closes #739 Co-Authored-By: Osman Mehmood <mehmoodosman@users.noreply.github.com> * chore: bump version and changelog (v0.16.4.0) Security wave 3: 12 fixes, 7 contributors. Cookie origin pinning, command audit log, domain tracking. Symlink bypass, path validation, shell injection, form redaction, learnings injection, IPv6 SSRF, session permissions, frozen lockfile. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Yunsu <Hybirdss@users.noreply.github.com> Co-authored-by: Gus <garagon@users.noreply.github.com> Co-authored-by: Toby Morning <urbantech@users.noreply.github.com> Co-authored-by: Alberto Martinez <halbert04@users.noreply.github.com> Co-authored-by: Maksim Soltan <Gonzih@users.noreply.github.com> Co-authored-by: Ziad Al Sharif <Ziadstr@users.noreply.github.com> Co-authored-by: Osman Mehmood <mehmoodosman@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-13 07:49:37 -10:00
Garry Tan	03973c2fab	fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847 ) * fix(bin): pass search params via env vars (RCE fix) (#819) Replace shell string interpolation with process.env in gstack-learnings-search to prevent arbitrary code execution via crafted learnings entries. Also fixes the CROSS_PROJECT interpolation that the original PR missed. Adds 3 regression tests verifying no shell interpolation remains in the bun -e block. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): add path validation to upload command (#821) Add isPathWithin() and path traversal checks to the upload command, blocking file exfiltration via crafted upload paths. Uses existing SAFE_DIRECTORIES constant instead of a local copy. Adds 3 regression tests. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): symlink resolution in meta-commands validateOutputPath (#820) Add realpathSync to validateOutputPath in meta-commands.ts to catch symlink-based directory escapes in screenshot, pdf, and responsive commands. Resolves SAFE_DIRECTORIES through realpathSync to handle macOS /tmp -> /private/tmp symlinks. Existing path validation tests pass with the hardened implementation. Co-authored-by: garagon <garagon@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add uninstall instructions to README (#812) Community PR #812 by @0531Kim. Adds two uninstall paths: the gstack-uninstall script (handles everything) and manual removal steps for when the repo isn't cloned. Includes CLAUDE.md cleanup note and Playwright cache guidance. Co-Authored-By: 0531Kim <0531Kim@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): Windows launcher extraEnv + headed-mode token (#822) Community PR #822 by @pieterklue. Three fixes: 1. Windows launcher now merges extraEnv into spawned server env (was only passing BROWSE_STATE_FILE, dropping all other env vars) 2. Welcome page fallback serves inline HTML instead of about:blank redirect (avoids ERR_UNSAFE_REDIRECT on Windows) 3. /health returns auth token in headed mode even without Origin header (fixes Playwright Chromium extensions that don't send it) Also adds HOME/USERPROFILE fallback for cross-platform compatibility. Co-Authored-By: pieterklue <pieterklue@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(browse): terminate orphan server when parent process exits (#808) Community PR #808 by @mmporong. Passes BROWSE_PARENT_PID to the spawned server process. The server polls every 15s with signal 0 and calls shutdown() if the parent is gone. Prevents orphaned chrome-headless-shell processes when Claude Code sessions exit abnormally. Co-Authored-By: mmporong <mmporong@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): IPv6 ULA blocking, cookie redaction, per-tab cancel, targeted token (#664) Community PR #664 by @mr-k-man (security audit round 1, new parts only). - IPv6 ULA prefix blocking (fc00::/7) in url-validation.ts with false-positive guard for hostnames like fd.example.com - Cookie value redaction for tokens, API keys, JWTs in browse cookies command - Per-tab cancel files in killAgent() replacing broken global kill-signal - design/serve.ts: realpathSync upgrade prevents symlink bypass in /api/reload - extension: targeted getToken handler replaces token-in-health-broadcast - Supabase migration 003: column-level GRANT restricts anon UPDATE scope - Telemetry sync: upsert error logging - 10 new tests for IPv6, cookie redaction, DNS rebinding, path traversal Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): CSS injection guard, timeout clamping, session validation, tests (#806) Community PR #806 by @mr-k-man (security audit round 2, new parts only). - CSS value validation (DANGEROUS_CSS) in cdp-inspector, write-commands, extension inspector - Queue file permissions (0o700/0o600) in cli, server, sidebar-agent - escapeRegExp for frame --url ReDoS fix - Responsive screenshot path validation with validateOutputPath - State load cookie filtering (reject localhost/.internal/metadata cookies) - Session ID format validation in loadSession - /health endpoint: remove currentUrl and currentMessage fields - QueueEntry interface + isValidQueueEntry validator for sidebar-agent - SIGTERM->SIGKILL escalation in timeout handler - Viewport dimension clamping (1-16384), wait timeout clamping (1s-300s) - Cookie domain validation in cookie-import and cookie-import-browser - DocumentFragment-based tab switching (XSS fix in sidepanel) - pollInProgress reentrancy guard for pollChat - toggleClass/injectCSS input validation in extension inspector - Snapshot annotated path validation with realpathSync - 714-line security-audit-r2.test.ts + 33-line learnings-injection.test.ts Co-Authored-By: mr-k-man <mr-k-man@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.15.13.0) Community security wave: 8 PRs from 4 contributors (@garagon, @mr-k-man, @mmporong, @0531Kim, @pieterklue). IPv6 ULA blocking, cookie redaction, per-tab cancel signaling, CSS injection guards, timeout clamping, session validation, DocumentFragment XSS fix, parent process watchdog, uninstall docs, Windows fixes, and 750+ lines of security regression tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: garagon <garagon@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: 0531Kim <0531Kim@users.noreply.github.com> Co-authored-by: pieterklue <pieterklue@users.noreply.github.com> Co-authored-by: mmporong <mmporong@users.noreply.github.com> Co-authored-by: mr-k-man <mr-k-man@users.noreply.github.com>	2026-04-06 00:47:04 -07:00
Garry Tan	ae0a9ad195	feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0) (#622 ) * feat: learnings + confidence resolvers — cross-skill memory infrastructure Three new resolvers for the self-learning system: - LEARNINGS_SEARCH: tells skills to load prior learnings before analysis - LEARNINGS_LOG: tells skills to capture discoveries after completing work - CONFIDENCE_CALIBRATION: adds 1-10 confidence scoring to all review findings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: learnings bin scripts — append-only JSONL read/write gstack-learnings-log: validates JSON, auto-injects timestamp, appends to ~/.gstack/projects/$SLUG/learnings.jsonl. Append-only (no mutation). gstack-learnings-search: reads/filters/dedupes learnings with confidence decay (observed/inferred lose 1pt/30d), cross-project discovery, and "latest winner" resolution per key+type. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: learnings count in preamble output Every skill now prints "LEARNINGS: N entries loaded" during preamble, making the compounding loop visible to the user. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: integrate learnings + confidence into 9 skill templates Add {{LEARNINGS_SEARCH}}, {{LEARNINGS_LOG}}, and {{CONFIDENCE_CALIBRATION}} placeholders to review, ship, plan-eng-review, plan-ceo-review, office-hours, investigate, retro, and cso templates. Regenerated all SKILL.md files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /learn skill — manage project learnings New skill for reviewing, searching, pruning, and exporting what gstack has learned across sessions. Commands: /learn, /learn search, /learn prune, /learn export, /learn stats, /learn add. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: self-learning roadmap — 5-release design doc Covers: R1 GStack Learns (v0.14), R2 Review Army (v0.15), R3 Smart Ceremony (v0.16), R4 /autoship (v0.17), R5 Studio (v0.18). Inspired by Compound Engineering, adapted to GStack's architecture. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: learnings bin script unit tests — 13 tests, free Tests gstack-learnings-log (valid/invalid JSON, timestamp injection, append-only) and gstack-learnings-search (dedup, type/query/limit filters, confidence decay, user-stated no-decay, malformed JSONL skip). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.13.4.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: learnings resolver + bin script edge case tests — 21 new tests, free Adds gen-skill-docs coverage for LEARNINGS_SEARCH, LEARNINGS_LOG, and CONFIDENCE_CALIBRATION resolvers. Adds bin script edge cases: timestamp preservation, special characters, files array, sort order, type grouping, combined filtering, missing fields, confidence floor at 0. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sync package.json version with VERSION file (0.13.4.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: gitignore .factory/ — generated output, not source Same pattern as .claude/skills/ and .agents/. These SKILL.md files are generated from .tmpl templates by gen:skill-docs --host factory. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: /learn E2E — seed 3 learnings, verify agent surfaces them Seeds N+1 query pattern, stale cache pitfall, and rubocop preference into learnings.jsonl, then runs /learn and checks that at least 2/3 appear in the agent's output. Gate tier, ~$0.25/run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-29 17:02:01 -06:00

5 Commits