First windows-free-tests CI run surfaced 34 failures across two patterns:
1. Tests that init a temp git repo via execSync('git commit ...') — Windows
runner has no default git user.email/user.name, so the commit fails.
Fix: add a "Configure git identity" step to .github/workflows/windows-free-tests.yml
that sets a CI-only identity globally.
2. Tests that use POSIX-only APIs unconditionally:
- file-mode bitmask checks (`stat.mode & 0o600`, `mode & 0o111`) — Windows
fakes mode bits and these assertions don't compose
- hardcoded forward-slash path assertions (`file.endsWith('/tab-42.json')`)
— Windows path separators are '\\'
Fix: extend WINDOWS_FRAGILE_PATTERNS in scripts/test-free-shards.ts to
detect both. 8 additional tests now excluded from the curated Windows
subset with logged reasons:
- browse/test/security-review-flow.test.ts (file mode)
- browse/test/security-sidepanel-dom.test.ts (forward-slash path)
- browse/test/url-validation.test.ts (forward-slash path)
- test/gbrain-repo-policy.test.ts (file mode)
- test/relink.test.ts (file mode)
- test/skill-validation.test.ts (file mode — single assertion at :934)
- test/team-mode.test.ts (file mode — also kills its 30 git-init beforeEach failures)
- test/upgrade-migration-v1.test.ts (file mode)
Curated Windows subset: 103 → 95 tests (still ~74% of free suite). All
14 test-free-shards unit tests still pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex's v1.18.0.0 review flagged that a windows-latest matrix entry on the
existing Linux-container evals.yml workflow can't work as a drop-in, and that
the free test suite has POSIX-bound dependencies a sharded runner doesn't fix
on its own. This commit takes McGluut's test-free-shards.ts (190 LOC), adds a
Windows-fragility scan, and runs the curated subset on a separate non-container
windows-latest job.
scripts/test-free-shards.ts:
- Enumeration + paid-eval filtering + stable-hash sharding (FNV-1a). Adapted
from McGluut/gstack fork.
- Upstream-original: --windows-only filter scans each test's content for
POSIX-bound patterns: hardcoded /bin/sh, spawn('sh', ...), bash -c, raw
/tmp/, chmod, xargs, which claude. Files matching are excluded with the
reason logged. Currently filters 25 of 128 free tests; remaining 103 run
on windows-latest.
.github/workflows/windows-free-tests.yml:
- Separate non-container job (NOT a matrix entry on evals.yml). Runs:
bun run test:windows # curated subset
bun test browse/test/claude-bin.test.ts # PATHEXT+overrides on Windows
bun test test/gstack-paths.test.ts # state-root resolution
package.json: new test:free + test:windows scripts.
Honest about scope (codex-flagged): this does NOT make the full free suite
Windows-safe. The 25 excluded tests need POSIX-only surfaces ported off shell
primitives (test/ship-version-sync.test.ts:72 hardcodes /bin/bash, etc).
Tracked as a P4 follow-up TODO. Full Windows parity is the next wave; this
release ships the curated lane.
Tests: test/test-free-shards.test.ts has 14 unit tests covering enumeration,
paid-eval filtering, Windows-fragility detection (POSIX patterns + safe code),
and stable sharding determinism.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>