Second round of windows-free-tests fixes after the first push. Curated subset
went from 386/34 to 58/4 fails. Remaining 4 fails + 1 error trace to two root
causes:
1. Line-ending sensitivity. Windows checkout with core.autocrlf=true converts
.md/.tmpl files to CRLF. Tests that parse YAML frontmatter with
`/^---\n([\\s\\S]+?)\n---/` then return zero matches — skill-collision-
sentinel.test.ts:120 enumerated 0 skills on Windows, cascading into 3
downstream test failures (sanity, KNOWN_COLLISIONS, /checkpoint resolved).
Fix: add .gitattributes that pins LF for .md/.tmpl/.yml/.json/.toml/.sh/
.ts/.tsx/.js/.mjs/.cjs/.bash. Root-cause fix; prevents future similar
tests from hitting the same trap. Also keeps bash scripts LF on Linux
runners (CRLF in shebangs produces "bad interpreter" errors).
2. Module-level Windows assertion in browse/src/cli.ts:82 throws if
browse/dist/server-node.mjs is missing. Any test that transitively loads
cli.ts (e.g., browse/test/tab-isolation.test.ts via shard mate imports)
then fails to even start. server-node.mjs is generated by bash
browse/scripts/build-node-server.sh, which `bun run build` calls but
`bun install` does not.
Fix: add a "Build server-node.mjs" step to .github/workflows/
windows-free-tests.yml. Calls only the node-server build script, not
full `bun run build` — we don't need the compiled binaries for tests
and the full build is slow.
Expected: skill-collision-sentinel goes 0→3 pass (sanity, KNOWN_COLLISIONS,
/checkpoint resolved). tab-isolation's "unhandled error between tests"
disappears. Remaining tests should be green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First windows-free-tests CI run surfaced 34 failures across two patterns:
1. Tests that init a temp git repo via execSync('git commit ...') — Windows
runner has no default git user.email/user.name, so the commit fails.
Fix: add a "Configure git identity" step to .github/workflows/windows-free-tests.yml
that sets a CI-only identity globally.
2. Tests that use POSIX-only APIs unconditionally:
- file-mode bitmask checks (`stat.mode & 0o600`, `mode & 0o111`) — Windows
fakes mode bits and these assertions don't compose
- hardcoded forward-slash path assertions (`file.endsWith('/tab-42.json')`)
— Windows path separators are '\\'
Fix: extend WINDOWS_FRAGILE_PATTERNS in scripts/test-free-shards.ts to
detect both. 8 additional tests now excluded from the curated Windows
subset with logged reasons:
- browse/test/security-review-flow.test.ts (file mode)
- browse/test/security-sidepanel-dom.test.ts (forward-slash path)
- browse/test/url-validation.test.ts (forward-slash path)
- test/gbrain-repo-policy.test.ts (file mode)
- test/relink.test.ts (file mode)
- test/skill-validation.test.ts (file mode — single assertion at :934)
- test/team-mode.test.ts (file mode — also kills its 30 git-init beforeEach failures)
- test/upgrade-migration-v1.test.ts (file mode)
Curated Windows subset: 103 → 95 tests (still ~74% of free suite). All
14 test-free-shards unit tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex's v1.18.0.0 review flagged that a windows-latest matrix entry on the
existing Linux-container evals.yml workflow can't work as a drop-in, and that
the free test suite has POSIX-bound dependencies a sharded runner doesn't fix
on its own. This commit takes McGluut's test-free-shards.ts (190 LOC), adds a
Windows-fragility scan, and runs the curated subset on a separate non-container
windows-latest job.
scripts/test-free-shards.ts:
- Enumeration + paid-eval filtering + stable-hash sharding (FNV-1a). Adapted
from McGluut/gstack fork.
- Upstream-original: --windows-only filter scans each test's content for
POSIX-bound patterns: hardcoded /bin/sh, spawn('sh', ...), bash -c, raw
/tmp/, chmod, xargs, which claude. Files matching are excluded with the
reason logged. Currently filters 25 of 128 free tests; remaining 103 run
on windows-latest.
.github/workflows/windows-free-tests.yml:
- Separate non-container job (NOT a matrix entry on evals.yml). Runs:
bun run test:windows # curated subset
bun test browse/test/claude-bin.test.ts # PATHEXT+overrides on Windows
bun test test/gstack-paths.test.ts # state-root resolution
package.json: new test:free + test:windows scripts.
Honest about scope (codex-flagged): this does NOT make the full free suite
Windows-safe. The 25 excluded tests need POSIX-only surfaces ported off shell
primitives (test/ship-version-sync.test.ts:72 hardcodes /bin/bash, etc).
Tracked as a P4 follow-up TODO. Full Windows parity is the next wave; this
release ships the curated lane.
Tests: test/test-free-shards.test.ts has 14 unit tests covering enumeration,
paid-eval filtering, Windows-fragility detection (POSIX patterns + safe code),
and stable sharding determinism.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>