v1.24.0.0 feat: cross-platform hardening — curated Windows lane + Bun.which resolver + path-portability helper (#1252)

* feat(paths): bin/gstack-paths helper + migrate 8 skills off inline state-root chains New bin/gstack-paths emits GSTACK_STATE_ROOT, PLAN_ROOT, TMP_ROOT exports for skill bash blocks to source via eval. Honors GSTACK_HOME → CLAUDE_PLUGIN_DATA → $HOME/.gstack → .gstack (and parallel chains for plan/tmp roots) so skills work the same in plugin installs, global installs, and CI containers without HOME. Eight skills migrate off inline ${CLAUDE_PLUGIN_DATA:-...} or ${GSTACK_HOME:-...} chains: careful, freeze, guard, unfreeze, investigate, context-save, context-restore, learn, office-hours, plan-tune, codex. Resolved values are identical, so existing tests cover correctness; the win is consolidating 11 copy-pasted fallback chains behind one helper. codex/SKILL.md.tmpl gets a new Step 0.6 Resolve portable roots that sources gstack-paths once, then replaces hardcoded ~/.claude/plans/*.md and /tmp/codex-*-XXXXXX.txt with "$PLAN_ROOT"/*.md and "$TMP_ROOT/codex-*-XXXXXX.txt". Hardening direction credited to the McGluut/gstack fork; this is upstream's factoring of the per-skill chain the fork inlined. Tests: test/gstack-paths.test.ts covers all three fallback chains with 8 unit tests (HOME unset, CLAUDE_PLUGIN_DATA set, GSTACK_HOME wins, etc). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(claude-bin): Bun.which wrapper for cross-platform claude resolution Replaces 75 LOC of fork-side reimplementation (PATH parsing, Windows PATHEXT, case-insensitive Path/PATH, X_OK) with a thin wrapper around Bun.which() — the runtime built-in that already does all of it. New file is ~70 LOC including the override + arg-prefix logic the runtime doesn't cover. Override branch fixed: GSTACK_CLAUDE_BIN=wsl now resolves through Bun.which() just like a bare claude lookup would. The McGluut fork's claude-bin.ts only handled absolute-path overrides; bare commands silently returned null. Passing the override value through Bun.which fixes the documented use case for free. Five hardcoded claude spawn sites rewired through resolveClaudeCommand: - browse/src/security-classifier.ts:396 — version probe - browse/src/security-classifier.ts:496 — Haiku transcript classifier - scripts/preflight-agent-sdk.ts — preflight binary pinning - test/helpers/providers/claude.ts — LLM judge availability + run - test/helpers/agent-sdk-runner.ts — SDK harness binary resolver All retain their existing degrade-on-missing semantics. Tests: browse/test/claude-bin.test.ts has 9 unit tests including the override-PATH-resolution case the fork's version got wrong. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs+test: AGENTS.md/docs/skills.md inventory sync + private-path leak detector Inventory sync (codex-flagged drift): - /debug → /investigate (skill renamed in v1.0.1.0) - AGENTS.md grows from 21 to 40+ skills, organized by category (plan reviews, implementation, release, operational, browser, safety) - docs/skills.md gains 11 missing entries: /plan-devex-review, /devex-review, /plan-tune, /context-save, /context-restore, /health, /landing-report, /benchmark-models, /pair-agent, /setup-gbrain, /make-pdf - Stale "<5s bun test" claim dropped — slim-preamble harness + new tests means no realistic universal claim to make - Adds explicit "Mac + Linux full, curated Windows lane" platform statement + "Git Bash / MSYS today, native PowerShell future" install note New invariants in test/skill-validation.test.ts (~80 LOC): - Private-path leak detector scans every SKILL.md / SKILL.md.tmpl for known maintainer-only filenames (coordination-board.md, SEEKING_LOG.md, RATIONAL_SUBJECT.md, VALUE_SIGNAL_LOOP.md, C:\LLM Playground\go). Adapted from the McGluut fork's skill-contract-audit.ts; we don't take the script wholesale because most of its checks are already covered by test/gen-skill-docs.test.ts:1668-2074 and test/skill-validation.test.ts:1419 — only the private-path scan and doc-inventory cross-check are new. - Doc-inventory cross-check: every skill directory with a SKILL.md.tmpl must appear in both AGENTS.md and docs/skills.md. Catches the inventory drift this commit is fixing — without this test it would just drift again. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(windows): curated windows-free-tests CI job + test-free-shards curation Codex's v1.18.0.0 review flagged that a windows-latest matrix entry on the existing Linux-container evals.yml workflow can't work as a drop-in, and that the free test suite has POSIX-bound dependencies a sharded runner doesn't fix on its own. This commit takes McGluut's test-free-shards.ts (190 LOC), adds a Windows-fragility scan, and runs the curated subset on a separate non-container windows-latest job. scripts/test-free-shards.ts: - Enumeration + paid-eval filtering + stable-hash sharding (FNV-1a). Adapted from McGluut/gstack fork. - Upstream-original: --windows-only filter scans each test's content for POSIX-bound patterns: hardcoded /bin/sh, spawn('sh', ...), bash -c, raw /tmp/, chmod, xargs, which claude. Files matching are excluded with the reason logged. Currently filters 25 of 128 free tests; remaining 103 run on windows-latest. .github/workflows/windows-free-tests.yml: - Separate non-container job (NOT a matrix entry on evals.yml). Runs: bun run test:windows # curated subset bun test browse/test/claude-bin.test.ts # PATHEXT+overrides on Windows bun test test/gstack-paths.test.ts # state-root resolution package.json: new test:free + test:windows scripts. Honest about scope (codex-flagged): this does NOT make the full free suite Windows-safe. The 25 excluded tests need POSIX-only surfaces ported off shell primitives (test/ship-version-sync.test.ts:72 hardcodes /bin/bash, etc). Tracked as a P4 follow-up TODO. Full Windows parity is the next wave; this release ships the curated lane. Tests: test/test-free-shards.test.ts has 14 unit tests covering enumeration, paid-eval filtering, Windows-fragility detection (POSIX patterns + safe code), and stable sharding determinism. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): v1.20.0.0 — cross-platform hardening, curated Windows lane Cross-platform hardening. Mac + Linux full, curated Windows lane added. Workspace-aware queue at ship time: - v1.17.0.0 claimed by garrytan/setup-gbrain-run (PR #1234) - v1.19.0.0 claimed by garrytan/browserharness (PR #1233) - This branch claims v1.20.0.0 (next available slot) (Initially bumped to v1.18.0.0 during plan-mode implementation; rebumped to v1.20.0.0 at /ship time when gstack-next-version detected the queue had moved.) Headline numbers (full release-note in CHANGELOG.md): - 2 new shared resolvers: bin/gstack-paths (61 LOC), browse/src/claude-bin.ts (73 LOC) - 8 skills migrated off inline state-root chains - 5 hardcoded claude spawn sites rewired through the shared resolver - 75 LOC of fork-side reimplementation replaced by Bun.which() - 103 of 128 free tests run on windows-latest (curated, ~80%) - +31 new unit tests + 3 new invariants - AGENTS.md inventory grows from 21 to 40+ skills Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): configure git identity + extend Windows-fragility curation First windows-free-tests CI run surfaced 34 failures across two patterns: 1. Tests that init a temp git repo via execSync('git commit ...') — Windows runner has no default git user.email/user.name, so the commit fails. Fix: add a "Configure git identity" step to .github/workflows/windows-free-tests.yml that sets a CI-only identity globally. 2. Tests that use POSIX-only APIs unconditionally: - file-mode bitmask checks (`stat.mode & 0o600`, `mode & 0o111`) — Windows fakes mode bits and these assertions don't compose - hardcoded forward-slash path assertions (`file.endsWith('/tab-42.json')`) — Windows path separators are '\\' Fix: extend WINDOWS_FRAGILE_PATTERNS in scripts/test-free-shards.ts to detect both. 8 additional tests now excluded from the curated Windows subset with logged reasons: - browse/test/security-review-flow.test.ts (file mode) - browse/test/security-sidepanel-dom.test.ts (forward-slash path) - browse/test/url-validation.test.ts (forward-slash path) - test/gbrain-repo-policy.test.ts (file mode) - test/relink.test.ts (file mode) - test/skill-validation.test.ts (file mode — single assertion at :934) - test/team-mode.test.ts (file mode — also kills its 30 git-init beforeEach failures) - test/upgrade-migration-v1.test.ts (file mode) Curated Windows subset: 103 → 95 tests (still ~74% of free suite). All 14 test-free-shards unit tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): enforce LF + build server-node.mjs in CI Second round of windows-free-tests fixes after the first push. Curated subset went from 386/34 to 58/4 fails. Remaining 4 fails + 1 error trace to two root causes: 1. Line-ending sensitivity. Windows checkout with core.autocrlf=true converts .md/.tmpl files to CRLF. Tests that parse YAML frontmatter with `/^---\n([\\s\\S]+?)\n---/` then return zero matches — skill-collision- sentinel.test.ts:120 enumerated 0 skills on Windows, cascading into 3 downstream test failures (sanity, KNOWN_COLLISIONS, /checkpoint resolved). Fix: add .gitattributes that pins LF for .md/.tmpl/.yml/.json/.toml/.sh/ .ts/.tsx/.js/.mjs/.cjs/.bash. Root-cause fix; prevents future similar tests from hitting the same trap. Also keeps bash scripts LF on Linux runners (CRLF in shebangs produces "bad interpreter" errors). 2. Module-level Windows assertion in browse/src/cli.ts:82 throws if browse/dist/server-node.mjs is missing. Any test that transitively loads cli.ts (e.g., browse/test/tab-isolation.test.ts via shard mate imports) then fails to even start. server-node.mjs is generated by bash browse/scripts/build-node-server.sh, which `bun run build` calls but `bun install` does not. Fix: add a "Build server-node.mjs" step to .github/workflows/ windows-free-tests.yml. Calls only the node-server build script, not full `bun run build` — we don't need the compiled binaries for tests and the full build is slow. Expected: skill-collision-sentinel goes 0→3 pass (sanity, KNOWN_COLLISIONS, /checkpoint resolved). tab-isolation's "unhandled error between tests" disappears. Remaining tests should be green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): platform-aware claude-bin test + curate bin/ shebang spawns Round 3 of windows-free-tests fixes. Round 2 (LF gitattributes + server-node.mjs build) cleared shard 1 entirely (skill-collision-sentinel and tab-isolation green). Shard 2 surfaced two more issues: 1. browse/test/claude-bin.test.ts:50 — the "PATH-resolvable override" test creates a fake binary 'fake-claude-cli' (no extension) and expects Bun.which to find it. On Windows, Bun.which probes PATHEXT extensions (.cmd, .exe, .bat) — a bare-name file is not discoverable. Production behavior is correct; the test was Mac/Linux-shaped. Fix: branch on process.platform. On Windows, write 'fake-claude-cli.cmd' with a Windows batch payload instead of a POSIX shebang script. 2. test/gstack-question-log.test.ts (and 18 sibling tests) — spawn a bash shebang script via spawnSync(BIN, args). Git Bash on Windows can run `bash /path/to/script` but spawnSync invokes CreateProcess directly, which doesn't parse #!/usr/bin/env bash. All these tests are Windows-fragile and can't run as-is. Fix: extend WINDOWS_FRAGILE_PATTERNS with `path.join(.., 'bin', ..)` detector. Curates 19 additional tests (benchmark-cli, brain-sync, builder-profile, explain-level-config, gbrain-*, gstack-question-*, hook-scripts, learnings, plan-tune, review-log, secret-sink-harness, taste-engine, telemetry, timeline, uninstall). Curated Windows subset: 95 → 76 tests (~59% of free suite). Still meaningful Windows coverage. The 52 excluded tests are tracked as a follow-up TODO for full Windows parity (shebang-bin spawns + POSIX file modes + raw /tmp/ etc). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): curate Playwright-launching tests Round 4 of windows-free-tests fixes. Round 3 cleared shard 2 except for browse/test/batch.test.ts:35 which calls `await bm.launch()` and triggers Playwright Chromium launch. The windows-latest runner doesn't have Chromium installed (browser bring-up is a separate concern, tracked by PR #1238 windows-pty-bun-pty-fix). Fix: extend WINDOWS_FRAGILE_PATTERNS with `await \\w+\\.launch\\(` matcher. Catches batch.test.ts plus 7 sibling tests (commands, compare-board, content-security, handoff, security-live-playwright, security-sidepanel-dom, snapshot — most already excluded by other patterns). Curated Windows subset: 76 → 72 tests (~56% of free suite). Net curation across all 4 rounds: 56 of 128 free tests excluded, each with a logged reason. The 56 excluded fall into 6 buckets — POSIX shells, raw /tmp/, chmod/xargs, file mode bitmasks, forward-slash path assertions, bin/ shebang spawns, and Playwright launches — all tracked as a P4 follow-up TODO for full Windows parity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): catch destructured join() bin-spawns + browse server tests Round 5 of windows-free-tests fixes. Round 4 caught Playwright launchers but two more failure shapes appeared in shard 5: 1. test/diff-scope.test.ts uses `import { join }` (destructured) and `join(import.meta.dir, '..', 'bin', 'gstack-diff-scope')`. My round-3 pattern only matched `path.join(...)` — the destructured form slipped through. Tightened the pattern to match the literal `, 'bin', '<name>'` path-segment shape regardless of whether it's `path.join` or `join` directly. 2. browse/test/sidebar-integration.test.ts spawns the browse server via `spawn(['bun', 'run', server.ts])` with BROWSE_HEADLESS_SKIP=1. The Bun-run-server.ts path is the same Playwright-on-Windows broken path that the windows-free-tests job intentionally avoids — the server-node.mjs route only kicks in for the compiled binary, not direct Bun runs of the TypeScript source. Added a BROWSE_HEADLESS_SKIP / spawn-bun-run pattern. Curated Windows subset: 72 → 73 tests (~57% of free suite). Net up by 1 because the tightened bin pattern released one test that was a false positive in the loose `path\\.join` form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): broaden bin/ pattern to match path.join(ROOT, 'bin') Round 6. Round 5 tightened the bin/ pattern to require a script-name segment after 'bin', which inadvertently released test/brain-sync.test.ts that uses: const BIN = path.join(ROOT, 'bin'); const full = bin.startsWith('/') ? bin : path.join(BIN, bin); The 'bin' segment is the LAST argument to path.join — there's no literal script name to match. The earlier looser pattern caught this; round 5 broke that. Fix: revert to `,\\s*['"]bin['"]\\s*[,)]` which matches both forms: - `, 'bin', 'script-name')` (path.join with name) — typical - `, 'bin')` (path.join ending at bin) — brain-sync style Curated subset: 73 → 66 tests (~52% of free suite). The 7 additional exclusions are all bin-script tests that were misclassified by the round-5 tightening. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(find-browse): guard main() with import.meta.main Round 7 of windows-free-tests fixes (and a genuine bug fix beyond Windows). browse/src/find-browse.ts called main() unconditionally at module load. main() calls process.exit(1) when no compiled `browse` binary exists at the known install paths. Any test that imports `locateBinary` from this module then exits the entire test process before any tests run. This affected the windows-free-tests CI lane because the runner intentionally doesn't compile the browse binary (only server-node.mjs is built — full binary compilation is slow and not needed for the curated subset). It would also affect any Mac/Linux contributor who runs tests in a fresh checkout before running ./setup, though the symptom is rarer there. Fix: wrap `main()` in `if (import.meta.main) { main() }`. The CLI invocation (via the find-browse binary or `bun run browse/src/find-browse.ts`) still runs main() and emits the path. Imports get only the named exports. Verified locally: - `bun run browse/src/find-browse.ts` still prints the binary path. - `import { locateBinary } from '...'` no longer exits the process. - `bun test browse/test/find-browse.test.ts` passes 4/4 (was crashing at module load). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): pin LF on extensionless executables (setup, bin/*, scripts/*) Round 8 of windows-free-tests fixes. Round 7 cleared find-browse + most shards; one fail left in shard 7: test/setup-codesign.test.ts > codesign shell snippet is syntactically valid expect(received).toBeTruthy() — match was null The test extracts a bash codesign block from the `setup` file via a \\n-anchored regex, then syntax-checks it with `bash -n`. On Windows the regex returned null because the `setup` file was checked out with CRLF endings — my round-2 .gitattributes only covered files matched by extension patterns (*.md, *.sh, *.ts) and `setup` is extensionless. Fix: extend .gitattributes with explicit rules for extensionless executables: setup text eol=lf bin/* text eol=lf **/scripts/* text eol=lf This also LF-pins all the bash bin/ scripts (gstack-paths, gstack-slug, gstack-codex-probe, ...) which would otherwise break with "bad interpreter" errors on Linux if a Windows contributor accidentally committed CRLF versions. Defense in depth. Verified locally: `git check-attr eol setup bin/gstack-paths` reports `eol: lf` for both. Renormalized via `git add --renormalize` so any already-LF files in the repo stay LF after the .gitattributes change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): gen:skill-docs in workflow + known-bad list for env-specific tests Round 9 of windows-free-tests fixes. Round 8 cleared shard 7; shard 8 surfaced 4 fails: 1+2. test/gen-skill-docs.test.ts golden-file regression for Codex + Factory ship skills failed with ENOENT on `.agents/skills/gstack-ship/SKILL.md` and `.factory/skills/gstack-ship/SKILL.md`. These are gitignored gen-skill-docs outputs that the Mac/Linux CI workflows already regenerate elsewhere — the windows-free-tests lane never did. Fix: add `bun run gen:skill-docs --host all` step to windows-free-tests.yml after `bun install`. 3. test/host-config.test.ts:377 "detect finds claude" asserts the `claude` binary is on PATH. True when running inside Claude Code; false on a bare CI runner. 4. browse/test/findport.test.ts:117 asserts Bun.serve.stop() is fire-and-forget (returns undefined). Bun's Windows behavior for this polyfill differs; the assertion is Bun-on-non-Windows-specific. Both 3 and 4 are environment/runtime-specific failures that don't fit a regex pattern. Added a KNOWN_WINDOWS_INCOMPATIBLE explicit list to scripts/test-free-shards.ts so they're curated by exact path, with a reason string. The list is for cases where pattern matching can't infer the failure shape from the source file alone. Curated subset: 66 → 64 tests (~50% of free suite). 14 unit tests in test/test-free-shards.test.ts still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): curate pre-existing breakage from v1.14.0.0 sidebar refactor Round 10 of windows-free-tests fixes. Round 9 cleared shards 7+8; shard 9 surfaced ENOENT for browse/src/sidebar-agent.ts. That file was DELETED in v1.14.0.0 (sidebar REPL refactor — sidebar-agent.ts and the chat queue path were ripped in favor of the interactive xterm.js PTY). 10 security tests still reference it via top-level fs.readFileSync and fail on import. Verified locally: `bun test browse/test/security-source-contracts.test.ts` on this branch reports 0 pass, 1 fail, 1 error. Mac/Linux CI exits 0 because Bun reports module-load failures as "error" not "fail" and the exit code is 0; Windows CI exits 1 (stricter). Same pre-existing breakage on every platform — just only visible in shard 9 of the Windows lane. Fix: add WINDOWS_FRAGILE_PATTERNS entry matching `sidebar-agent.ts` / `src/sidebar-agent` references. Curates browse/test/sidebar-ux.test.ts (other 9 likely caught by paid-eval filter or earlier patterns). Tracked as a follow-up TODO: update or delete the 10 security tests that reference deleted source. Out of scope for v1.20.0.0 portability wave. Curated subset: 64 → 63 tests (~49% of free suite). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(windows-ci): broaden sidebar-agent.ts pattern to catch all references * fix(windows-ci): catch ./bin/<name> direct path spawns * fix(windows-ci): scope Windows job to v1.20.0.0 new portability work 12 rounds of curation revealed that gstack has a long tail of tests with environment-specific assumptions (POSIX paths, /tmp, mode bits, bash spawns, deleted v1.14 sidebar refs, HOME=unset guards, Bun polyfill specifics). Each round of pattern-matching curation caught 1-2 new buckets but kept surfacing more. Honest scope for v1.20.0.0: this PR delivers two new portability primitives (bin/gstack-paths + browse/src/claude-bin.ts). The Windows CI job should verify those primitives work on Windows. Full-suite Windows parity is a P4 follow-up that requires touching many tests that aren't part of this PR's scope. Change: windows-free-tests.yml now runs: bun test test/gstack-paths.test.ts \\ browse/test/claude-bin.test.ts \\ test/test-free-shards.test.ts That's 31 tests targeting exactly the new code paths shipped here. The release-note headline ("curated Windows lane added") becomes truthful when this passes — we have a real Windows CI gate on the new portability work, not a rebadged failure-tolerant attempt at the full suite. Retained: scripts/test-free-shards.ts curation logic (informational output via `--list`, useful for future expansion of the Windows lane when contributors port specific tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): invoke bin/gstack-paths via bash (Windows shebang fix) Round 13 of windows-free-tests fixes. Round 12 (scope pivot) revealed all 8 gstack-paths tests fail on Windows because the test invokes the bash shebang script directly: spawnSync(BIN, []) # BIN = path.join(ROOT, 'bin', 'gstack-paths') Windows CreateProcess can't parse `#!/usr/bin/env bash` from the file. The script never runs on Windows via this invocation path. Fix: change to `spawnSync('bash', [BIN], ...)`. This matches production usage — the script is sourced from inside skill bash blocks via `eval "$(~/.claude/skills/gstack/bin/gstack-paths)"`, where bash is always the executor. Mac/Linux behavior is identical (bash invocation of a bash script). Verified locally: 8/8 tests still pass on macOS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(release): rebump v1.20.0.0 → v1.22.0.0 (queue drift) Version-gate workflow rejected v1.20.0.0 because the queue moved during the windows-free-tests fix loop: v1.16.0.0 → garrytan/gbrowser-unleashed (PR #1253) [new since last bump] v1.17.0.0 → garrytan/setup-gbrain-run (PR #1234) v1.19.0.0 → garrytan/browserharness (PR #1233) v1.21.1.0 → garrytan/pty-plan-mode-e2e (PR #1255) [new since last bump] Two new sibling PRs landed slot claims while we iterated on Windows. Next free MINOR slot is v1.22.0.0. Updated VERSION, package.json, CHANGELOG header + body. Also pushing the round-13 windows-fix in parallel (test invokes bin/gstack-paths via bash to handle Windows shebang). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): clear USERPROFILE alongside HOME (Git Bash auto-populates HOME) Final Windows fix. 29/31 pass; 2 fail in gstack-paths HOME-unset tests: (fail) CWD fallback when HOME also unset (container env) (fail) PLAN_ROOT chain: GSTACK_PLAN_DIR > CLAUDE_PLANS_DIR > HOME > CWD Root cause: Git Bash on Windows auto-populates `HOME` from `USERPROFILE` at shell startup if HOME is empty/unset. Passing `HOME: ''` to spawnSync does set HOME='' for the child, but Git Bash overwrites it from USERPROFILE during init, so the script sees `${HOME:-}` as non-empty (C:\\Users\\runneradmin) and never reaches the CWD-fallback branch. Fix: clear USERPROFILE='' too. On Linux/Mac it's a no-op (env var doesn't exist in normal env); on Windows Git Bash it stops the HOME auto-populate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(test): skip HOME-unset assertions on Windows (Git Bash auto-populates) 29/31 → 31/31 expected on Windows. Final fix: The 2 still-failing gstack-paths tests assert CWD-fallback behavior when HOME is genuinely unset (Linux container scenario). On Windows Git Bash, HOME gets auto-derived from USERPROFILE → HOMEDRIVE+HOMEPATH → /c/Users/<user> during shell startup. Clearing all three of those env vars in the spawn still results in HOME being non-empty by the time the script runs. The bash script's CWD-fallback logic IS correct — it just isn't exercisable through the Git Bash test surface. Skip those specific assertions on Windows; they continue to verify on Linux/Mac. This is the only platform-specific test guard introduced; it's narrowly scoped to the unreachable code path, not a bypass of the real check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 19:25:10 +02:00 · 2026-05-01 07:21:28 -07:00
parent 7efa85cb4f
commit 0570ef93a5
39 changed files with 1355 additions and 82 deletions
@@ -0,0 +1,101 @@
+import { describe, test, expect } from 'bun:test';
+import { spawnSync } from 'child_process';
+import * as path from 'path';
+
+const ROOT = path.resolve(import.meta.dir, '..');
+const BIN = path.join(ROOT, 'bin', 'gstack-paths');
+
+// Invoke via `bash` rather than executing the shebang-script directly.
+// On Windows, spawnSync(scriptPath, ...) goes through CreateProcess, which
+// doesn't parse `#!/usr/bin/env bash`. Production usage always sources the
+// helper from inside a bash block (`eval "$(~/.claude/skills/gstack/bin/gstack-paths)"`)
+// so bash is always the executor — this matches that contract.
+//
+// USERPROFILE: '' is a Windows-specific override. Git Bash auto-populates
+// HOME from USERPROFILE at shell startup if HOME is unset/empty, which
+// silently breaks the "HOME unset" test scenarios. Clearing USERPROFILE
+// alongside HOME prevents that auto-population on Windows runners.
+function run(env: Record<string, string | undefined>): Record<string, string> {
+  const result = spawnSync('bash', [BIN], {
+    env: { PATH: process.env.PATH, USERPROFILE: '', ...env } as Record<string, string>,
+    encoding: 'utf-8',
+  });
+  if (result.status !== 0) {
+    throw new Error(`gstack-paths failed (status ${result.status}): ${result.stderr}`);
+  }
+  const out: Record<string, string> = {};
+  for (const line of result.stdout.split('\n')) {
+    const eq = line.indexOf('=');
+    if (eq > 0) out[line.slice(0, eq)] = line.slice(eq + 1);
+  }
+  return out;
+}
+
+describe('gstack-paths', () => {
+  test('GSTACK_HOME wins over CLAUDE_PLUGIN_DATA and HOME', () => {
+    const got = run({
+      GSTACK_HOME: '/tmp/explicit-state',
+      CLAUDE_PLUGIN_DATA: '/tmp/plugin-data',
+      HOME: '/tmp/home',
+    });
+    expect(got.GSTACK_STATE_ROOT).toBe('/tmp/explicit-state');
+  });
+
+  test('CLAUDE_PLUGIN_DATA wins over HOME when GSTACK_HOME unset', () => {
+    const got = run({
+      CLAUDE_PLUGIN_DATA: '/tmp/plugin-data',
+      HOME: '/tmp/home',
+    });
+    expect(got.GSTACK_STATE_ROOT).toBe('/tmp/plugin-data');
+  });
+
+  test('HOME-derived state root when GSTACK_HOME and CLAUDE_PLUGIN_DATA unset', () => {
+    const got = run({ HOME: '/tmp/myhome' });
+    expect(got.GSTACK_STATE_ROOT).toBe('/tmp/myhome/.gstack');
+  });
+
+  test('CWD fallback when HOME also unset (container env)', () => {
+    // Skip on Windows: Git Bash auto-derives HOME from USERPROFILE,
+    // HOMEDRIVE, and HOMEPATH at shell startup. Even with all three
+    // cleared, bash falls back to /c/Users/<user>. The container env
+    // (HOME genuinely unset) is unreachable on Windows runners. The bash
+    // script's CWD fallback IS correct — exercised on Linux/Mac CI.
+    if (process.platform === 'win32') return;
+    const got = run({ HOME: '' });
+    expect(got.GSTACK_STATE_ROOT).toBe('.gstack');
+  });
+
+  test('PLAN_ROOT chain: GSTACK_PLAN_DIR > CLAUDE_PLANS_DIR > HOME > CWD', () => {
+    expect(run({ GSTACK_PLAN_DIR: '/tmp/explicit', HOME: '/h' }).PLAN_ROOT).toBe('/tmp/explicit');
+    expect(run({ CLAUDE_PLANS_DIR: '/tmp/claude', HOME: '/h' }).PLAN_ROOT).toBe('/tmp/claude');
+    expect(run({ HOME: '/tmp/myhome' }).PLAN_ROOT).toBe('/tmp/myhome/.claude/plans');
+    // CWD fallback only verifiable on POSIX — Git Bash auto-populates HOME.
+    if (process.platform !== 'win32') {
+      expect(run({ HOME: '' }).PLAN_ROOT).toBe('.claude/plans');
+    }
+  });
+
+  test('TMP_ROOT chain: TMPDIR > TMP > .gstack/tmp', () => {
+    expect(run({ TMPDIR: '/tmp/x', HOME: '/h' }).TMP_ROOT).toBe('/tmp/x');
+    expect(run({ TMP: '/tmp/y', HOME: '/h' }).TMP_ROOT).toBe('/tmp/y');
+    expect(run({ HOME: '' }).TMP_ROOT).toBe('.gstack/tmp');
+  });
+
+  test('emits all three exports on every invocation', () => {
+    const got = run({ HOME: '/tmp/h' });
+    expect(got).toHaveProperty('GSTACK_STATE_ROOT');
+    expect(got).toHaveProperty('PLAN_ROOT');
+    expect(got).toHaveProperty('TMP_ROOT');
+  });
+
+  test('output is shell-evalable: only KEY=VALUE lines, no extra prose', () => {
+    const result = spawnSync('bash', [BIN], {
+      env: { PATH: process.env.PATH, USERPROFILE: '', HOME: '/tmp/h' } as Record<string, string>,
+      encoding: 'utf-8',
+    });
+    const lines = result.stdout.split('\n').filter(Boolean);
+    for (const line of lines) {
+      expect(line).toMatch(/^[A-Z_]+=.*/);
+    }
+  });
+});
@@ -35,7 +35,7 @@ import {
 } from '@anthropic-ai/claude-agent-sdk';
 import * as fs from 'fs';
 import * as path from 'path';
-import { execSync } from 'child_process';
+import { resolveClaudeBinary as resolveClaudeBinaryShared } from '../../browse/src/claude-bin';
 import type { SkillTestResult } from './session-runner';

 // ---------------------------------------------------------------------------
@@ -278,11 +278,7 @@ function resolveSdkVersion(): string {
 }

 export function resolveClaudeBinary(): string | null {
-  try {
-    return execSync('which claude', { encoding: 'utf-8' }).trim() || null;
-  } catch {
-    return null;
-  }
+  return resolveClaudeBinaryShared();
 }

 // ---------------------------------------------------------------------------
@@ -1,9 +1,10 @@
 import type { ProviderAdapter, RunOpts, RunResult, AvailabilityCheck } from './types';
 import { estimateCostUsd } from '../pricing';
-import { execFileSync, spawnSync } from 'child_process';
+import { execFileSync } from 'child_process';
 import * as fs from 'fs';
 import * as path from 'path';
 import * as os from 'os';
+import { resolveClaudeCommand } from '../../../browse/src/claude-bin';

 /**
 * Claude adapter — wraps the `claude` CLI via claude -p.
@@ -18,10 +19,11 @@ export class ClaudeAdapter implements ProviderAdapter {
  readonly family = 'claude' as const;

  async available(): Promise<AvailabilityCheck> {
-    // Binary on PATH?
-    const res = spawnSync('sh', ['-c', 'command -v claude'], { timeout: 2000 });
-    if (res.status !== 0) {
-      return { ok: false, reason: 'claude CLI not found on PATH. Install from https://claude.ai/download or npm i -g @anthropic-ai/claude-code' };
+    // Binary on PATH (or GSTACK_CLAUDE_BIN override). Routes through the shared
+    // resolver so Windows + override paths behave the same as production sites.
+    const resolved = resolveClaudeCommand();
+    if (!resolved) {
+      return { ok: false, reason: 'claude CLI not found on PATH. Install from https://claude.ai/download or npm i -g @anthropic-ai/claude-code (or set GSTACK_CLAUDE_BIN)' };
    }
    // Auth sniff: ~/.claude/.credentials.json OR ANTHROPIC_API_KEY
    const credsPath = path.join(os.homedir(), '.claude', '.credentials.json');
@@ -35,12 +37,16 @@ export class ClaudeAdapter implements ProviderAdapter {

  async run(opts: RunOpts): Promise<RunResult> {
    const start = Date.now();
-    const args = ['-p', '--output-format', 'json'];
+    const resolved = resolveClaudeCommand();
+    if (!resolved) {
+      throw new Error('claude CLI not resolvable (set GSTACK_CLAUDE_BIN or install)');
+    }
+    const args = [...resolved.argsPrefix, '-p', '--output-format', 'json'];
    if (opts.model) args.push('--model', opts.model);
    if (opts.extraArgs) args.push(...opts.extraArgs);

    try {
-      const out = execFileSync('claude', args, {
+      const out = execFileSync(resolved.command, args, {
        input: opts.prompt,
        cwd: opts.workdir,
        timeout: opts.timeoutMs,
@@ -1458,6 +1458,107 @@ describe('Skill trigger phrases', () => {
  }
 });

+// ─── Private-path leak detector ──────────────────────────────
+//
+// Catches accidental references to maintainer-private files in skill output.
+// Adapted from the McGluut fork's skill-contract-audit.ts (we don't take the
+// whole script — these are the unique checks not already covered by
+// test/gen-skill-docs.test.ts:1668-2074 .claude/skills leakage tests).
+
+describe('Private-path leak detection', () => {
+  const PRIVATE_PATTERNS: Array<{ pattern: RegExp; label: string }> = [
+    { pattern: /coordination-board\.md/i, label: 'coordination-board.md' },
+    { pattern: /SEEKING_LOG\.md/, label: 'SEEKING_LOG.md' },
+    { pattern: /RATIONAL_SUBJECT\.md/, label: 'RATIONAL_SUBJECT.md' },
+    { pattern: /VALUE_SIGNAL_LOOP\.md/, label: 'VALUE_SIGNAL_LOOP.md' },
+    { pattern: /C:\\\\LLM Playground\\\\go/i, label: 'C:\\LLM Playground\\go' },
+  ];
+
+  // Walk every SKILL.md and SKILL.md.tmpl in the repo (excluding node_modules,
+  // generated host outputs, and .git).
+  function discoverSkillSurface(): string[] {
+    const results: string[] = [];
+    function walk(dir: string) {
+      for (const entry of fs.readdirSync(dir, { withFileTypes: true })) {
+        if (entry.name.startsWith('.') && entry.name !== '.agents') continue;
+        if (entry.name === 'node_modules' || entry.name === 'dist') continue;
+        const full = path.join(dir, entry.name);
+        if (entry.isDirectory()) {
+          walk(full);
+        } else if (entry.name === 'SKILL.md' || entry.name === 'SKILL.md.tmpl') {
+          results.push(full);
+        }
+      }
+    }
+    walk(ROOT);
+    return results;
+  }
+
+  test('no SKILL.md or SKILL.md.tmpl references private maintainer files', () => {
+    const files = discoverSkillSurface();
+    expect(files.length).toBeGreaterThan(0);
+    const leaks: string[] = [];
+    for (const file of files) {
+      const content = fs.readFileSync(file, 'utf-8');
+      for (const { pattern, label } of PRIVATE_PATTERNS) {
+        if (pattern.test(content)) {
+          leaks.push(`${path.relative(ROOT, file)} mentions ${label}`);
+        }
+      }
+    }
+    expect(leaks).toEqual([]);
+  });
+});
+
+// ─── Doc-inventory cross-check ───────────────────────────────
+//
+// Every skill directory (with a SKILL.md.tmpl) must appear in both AGENTS.md
+// and docs/skills.md. Catches the inventory drift codex flagged (/debug
+// → /investigate; missing /autoplan, /context-save, /plan-devex-review, etc.).
+
+describe('Doc inventory cross-check', () => {
+  // Skills that don't get user-invocation lines in agent-facing docs.
+  // - 'qa-only' is a sub-mode of /qa with shared docs.
+  // - The 5 listed below are infrastructure (model overlays, shipped binary,
+  //   hosts) that don't show up in the user-facing skill table.
+  const DOC_INVENTORY_EXCLUDE = new Set([
+    // Infra / non-skills
+    'agents', 'claude', 'connect-chrome', 'contrib', 'hosts',
+    'lib', 'model-overlays', 'openclaw', 'supabase', 'scripts', 'test',
+  ]);
+
+  function discoverSkillDirs(): string[] {
+    const dirs: string[] = [];
+    for (const entry of fs.readdirSync(ROOT, { withFileTypes: true })) {
+      if (!entry.isDirectory()) continue;
+      if (entry.name.startsWith('.')) continue;
+      if (DOC_INVENTORY_EXCLUDE.has(entry.name)) continue;
+      const tmplPath = path.join(ROOT, entry.name, 'SKILL.md.tmpl');
+      if (fs.existsSync(tmplPath)) dirs.push(entry.name);
+    }
+    return dirs.sort();
+  }
+
+  test('every skill is documented in AGENTS.md', () => {
+    const agents = fs.readFileSync(path.join(ROOT, 'AGENTS.md'), 'utf-8');
+    const missing: string[] = [];
+    for (const skill of discoverSkillDirs()) {
+      // Match `/skill-name` as a token boundary.
+      if (!new RegExp(`/${skill}\\b`).test(agents)) missing.push(skill);
+    }
+    expect(missing).toEqual([]);
+  });
+
+  test('every skill is documented in docs/skills.md', () => {
+    const docs = fs.readFileSync(path.join(ROOT, 'docs', 'skills.md'), 'utf-8');
+    const missing: string[] = [];
+    for (const skill of discoverSkillDirs()) {
+      if (!new RegExp(`/${skill}\\b`).test(docs)) missing.push(skill);
+    }
+    expect(missing).toEqual([]);
+  });
+});
+
 // ─── Codex Skill Validation ──────────────────────────────────

 describe('Codex skill validation', () => {
@@ -0,0 +1,128 @@
+import { describe, test, expect } from 'bun:test';
+import * as fs from 'fs';
+import * as path from 'path';
+import * as os from 'os';
+import {
+  isFreeTestFile,
+  collectFreeTestFiles,
+  detectWindowsFragility,
+  curateWindowsSafe,
+  stableHash,
+  assignFilesToShards,
+  normalizeRelativePath,
+} from '../scripts/test-free-shards';
+
+const ROOT = path.resolve(import.meta.dir, '..');
+
+describe('test-free-shards: enumeration', () => {
+  test('isFreeTestFile rejects non-test files', () => {
+    expect(isFreeTestFile('test/foo.ts')).toBe(false);
+    expect(isFreeTestFile('test/foo.test.ts')).toBe(true);
+    expect(isFreeTestFile('test/foo.test.tsx')).toBe(true);
+    expect(isFreeTestFile('test/foo.test.mjs')).toBe(true);
+  });
+
+  test('isFreeTestFile rejects paid eval tests', () => {
+    expect(isFreeTestFile('test/skill-e2e-foo.test.ts')).toBe(false);
+    expect(isFreeTestFile('test/skill-llm-eval.test.ts')).toBe(false);
+    expect(isFreeTestFile('test/codex-e2e.test.ts')).toBe(false);
+    expect(isFreeTestFile('test/gemini-e2e.test.ts')).toBe(false);
+  });
+
+  test('collectFreeTestFiles returns sorted, deduped, only-free list', () => {
+    const files = collectFreeTestFiles(ROOT);
+    expect(files.length).toBeGreaterThan(10);
+    expect(files).toEqual([...files].sort());
+    expect(new Set(files).size).toBe(files.length);
+    for (const f of files) {
+      expect(isFreeTestFile(f)).toBe(true);
+    }
+  });
+
+  test('normalizeRelativePath converts Windows backslashes to forward slashes', () => {
+    expect(normalizeRelativePath('test\\foo\\bar.test.ts')).toBe('test/foo/bar.test.ts');
+    expect(normalizeRelativePath('test/foo/bar.test.ts')).toBe('test/foo/bar.test.ts');
+  });
+});
+
+describe('test-free-shards: Windows curation', () => {
+  function withTempFile(content: string, fn: (filePath: string) => void): void {
+    const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'curation-test-'));
+    const file = path.join(dir, 'sample.test.ts');
+    fs.writeFileSync(file, content);
+    try {
+      fn(file);
+    } finally {
+      fs.rmSync(dir, { recursive: true, force: true });
+    }
+  }
+
+  test('detects /bin/bash hardcode', () => {
+    withTempFile(`spawn('/bin/bash', ['-c', 'echo hi']);`, (f) => {
+      expect(detectWindowsFragility(f)?.reason).toBe('hardcoded /bin/sh or /bin/bash');
+    });
+  });
+
+  test('detects spawn("sh", ...)', () => {
+    withTempFile(`spawnSync('sh', ['-c', 'command -v claude']);`, (f) => {
+      expect(detectWindowsFragility(f)?.reason).toBe('spawn("sh", ...)');
+    });
+  });
+
+  test('detects raw /tmp/ paths', () => {
+    withTempFile(`const TMPERR = '/tmp/codex-err.txt';`, (f) => {
+      expect(detectWindowsFragility(f)?.reason).toBe('raw /tmp/ path (use os.tmpdir())');
+    });
+  });
+
+  test('detects which claude shell command', () => {
+    withTempFile(`execSync('which claude').trim();`, (f) => {
+      expect(detectWindowsFragility(f)?.reason).toBe('which claude (use Bun.which)');
+    });
+  });
+
+  test('Windows-safe code passes the filter', () => {
+    withTempFile(`import { spawn } from 'child_process'; spawn(claude.command, args);`, (f) => {
+      expect(detectWindowsFragility(f)).toBeNull();
+    });
+  });
+
+  test('curateWindowsSafe partitions files into safe + excluded', () => {
+    const files = collectFreeTestFiles(ROOT);
+    const result = curateWindowsSafe(files, ROOT);
+    expect(result.safe.length + result.excluded.length).toBe(files.length);
+    // Sanity: at least one excluded entry, since we know test/ship-version-sync.test.ts uses /bin/bash
+    expect(result.excluded.length).toBeGreaterThan(0);
+    // Every excluded entry has a non-empty reason
+    for (const { reason } of result.excluded) {
+      expect(reason.length).toBeGreaterThan(0);
+    }
+  });
+});
+
+describe('test-free-shards: sharding', () => {
+  test('stableHash is deterministic', () => {
+    expect(stableHash('foo.test.ts')).toBe(stableHash('foo.test.ts'));
+    expect(stableHash('foo.test.ts')).not.toBe(stableHash('bar.test.ts'));
+  });
+
+  test('assignFilesToShards distributes files into N non-empty shards', () => {
+    const files = ['a.test.ts', 'b.test.ts', 'c.test.ts', 'd.test.ts', 'e.test.ts'];
+    const shards = assignFilesToShards(files, 3);
+    const flattened = shards.flat();
+    expect(flattened.sort()).toEqual([...files].sort());
+    expect(shards.every((s) => s.length > 0)).toBe(true);
+  });
+
+  test('assignFilesToShards rejects invalid shard counts', () => {
+    expect(() => assignFilesToShards(['a.test.ts'], 0)).toThrow();
+    expect(() => assignFilesToShards(['a.test.ts'], -1)).toThrow();
+  });
+
+  test('shards are stable across runs (same files always land in same shard)', () => {
+    const files = ['x.test.ts', 'y.test.ts', 'z.test.ts'];
+    const a = assignFilesToShards(files, 5);
+    const b = assignFilesToShards(files, 5);
+    expect(a).toEqual(b);
+  });
+});