mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-07 05:56:41 +02:00
8745f89ad4
Codex's v1.18.0.0 review flagged that a windows-latest matrix entry on the
existing Linux-container evals.yml workflow can't work as a drop-in, and that
the free test suite has POSIX-bound dependencies a sharded runner doesn't fix
on its own. This commit takes McGluut's test-free-shards.ts (190 LOC), adds a
Windows-fragility scan, and runs the curated subset on a separate non-container
windows-latest job.
scripts/test-free-shards.ts:
- Enumeration + paid-eval filtering + stable-hash sharding (FNV-1a). Adapted
from McGluut/gstack fork.
- Upstream-original: --windows-only filter scans each test's content for
POSIX-bound patterns: hardcoded /bin/sh, spawn('sh', ...), bash -c, raw
/tmp/, chmod, xargs, which claude. Files matching are excluded with the
reason logged. Currently filters 25 of 128 free tests; remaining 103 run
on windows-latest.
.github/workflows/windows-free-tests.yml:
- Separate non-container job (NOT a matrix entry on evals.yml). Runs:
bun run test:windows # curated subset
bun test browse/test/claude-bin.test.ts # PATHEXT+overrides on Windows
bun test test/gstack-paths.test.ts # state-root resolution
package.json: new test:free + test:windows scripts.
Honest about scope (codex-flagged): this does NOT make the full free suite
Windows-safe. The 25 excluded tests need POSIX-only surfaces ported off shell
primitives (test/ship-version-sync.test.ts:72 hardcodes /bin/bash, etc).
Tracked as a P4 follow-up TODO. Full Windows parity is the next wave; this
release ships the curated lane.
Tests: test/test-free-shards.test.ts has 14 unit tests covering enumeration,
paid-eval filtering, Windows-fragility detection (POSIX patterns + safe code),
and stable sharding determinism.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
58 lines
1.9 KiB
YAML
58 lines
1.9 KiB
YAML
name: Windows Free Tests
|
|
|
|
# Curated subset of the free test suite that runs on windows-latest.
|
|
#
|
|
# Codex's v1.18.0.0 review flagged that the existing evals.yml workflow uses
|
|
# a Linux container, so a windows-latest matrix entry there isn't a drop-in.
|
|
# This workflow is non-container, runs the curated Windows-safe subset, plus
|
|
# targeted resolver tests that exercise the Bun.which-based claude binary
|
|
# resolution + the GSTACK_CLAUDE_BIN override path on Windows.
|
|
#
|
|
# What this DOES NOT do (out of scope for v1.18.0.0):
|
|
# - Run the full free suite on Windows. The 24 tests that hardcode /bin/sh,
|
|
# spawn('sh',...), or raw /tmp/ paths are excluded by scripts/test-free-shards.ts
|
|
# --windows-only. They need POSIX-bound surfaces to be ported off shell
|
|
# primitives before they can run on Windows. Tracked as a follow-up TODO.
|
|
# - Run Playwright/browser-backed tests. Browse server bring-up on Windows is
|
|
# a separate concern (PR #1238 windows-pty-bun-pty-fix is in flight).
|
|
|
|
on:
|
|
pull_request:
|
|
branches: [main]
|
|
workflow_dispatch:
|
|
|
|
concurrency:
|
|
group: windows-free-${{ github.head_ref }}
|
|
cancel-in-progress: true
|
|
|
|
jobs:
|
|
windows-free-tests:
|
|
runs-on: windows-latest
|
|
timeout-minutes: 15
|
|
|
|
steps:
|
|
- uses: actions/checkout@v4
|
|
|
|
- uses: oven-sh/setup-bun@v1
|
|
with:
|
|
bun-version: latest
|
|
|
|
- name: Install dependencies
|
|
run: bun install --frozen-lockfile
|
|
|
|
- name: Show curated subset (for build log audit trail)
|
|
run: bun run scripts/test-free-shards.ts --windows-only --list
|
|
shell: bash
|
|
|
|
- name: Run curated Windows-safe subset
|
|
run: bun run test:windows
|
|
shell: bash
|
|
|
|
- name: Targeted Claude resolver tests (real PATHEXT coverage on Windows)
|
|
run: bun test browse/test/claude-bin.test.ts
|
|
shell: bash
|
|
|
|
- name: gstack-paths helper test (resolves $GSTACK_STATE_ROOT etc. on Windows)
|
|
run: bun test test/gstack-paths.test.ts
|
|
shell: bash
|