gstack/test at 6f67406e01da94020064ba4c4c644cf66a5cc25c - gstack - MS-GitHub-Backup (Gitea)

CalvinBackup/gstack

mirror of https://github.com/garrytan/gstack.git synced 2026-08-03 04:48:42 +02:00

Files

T

History

Garry Tan 6f67406e01 fix(test): skill-e2e-autoplan-dual-voice was shipped broken

The test shipped on main in v0.18.4.0 used wrong option names and
wrong result fields throughout. It could not have passed in any
environment:

Broken API calls:
- `workdir` → should be `workingDirectory`
  The fixture setup (git init, copy autoplan + plan-*-review dirs,
  write TEST_PLAN.md) was completely ignored. claude -p spawned with
  undefined cwd instead of the tmp workdir.
- `timeoutMs: 300_000` → should be `timeout: 300_000`
  Fell back to default 120s. Explains the observed ~170s failure
  (test harness overhead + retry startup).
- `name: 'autoplan-dual-voice'` → should be `testName: 'autoplan-dual-voice'`
  No per-test run directory was created.
- `evalCollector` → not a recognized `runSkillTest` option at all.

Broken result access:
- `result.stdout + result.stderr` → SkillTestResult has neither
  field. `out` was literally "undefinedundefined" every time.
- Every regex match fired false. All 3 assertions (claudeVoiceFired,
  codex-or-unavailable, reachedPhase1) failed on every attempt.
- `logCost(result)` → signature is `logCost(label, result)`.
- `recordE2E('autoplan-dual-voice', result)` → signature is
  `recordE2E(evalCollector, name, suite, result, extra)`.

Fixes:
- Renamed all 4 broken options in the runSkillTest call.
- Changed assertion source to `result.output` plus JSON-serialized
  `result.transcript` (broader net for voice fingerprints in tool
  inputs/outputs).
- Widened regex alternatives: codex voice now matches "CODEX SAYS"
  and "codex-plan-review"; Claude voice now matches subagent_type;
  unavailable matches CODEX_NOT_AVAILABLE.
- Added Agent + Skill + Edit + Grep + Glob to allowedTools. Without
  Agent, /autoplan can't spawn subagents and never reaches Phase 1.
- Raised maxTurns 15 → 30 (autoplan is a long multi-phase skill).
- Fixed logCost + recordE2E signatures, passing `passed:` flag into
  recordE2E per the neighboring context-save pattern.

2026-04-18 22:50:34 +08:00

..

merge: origin/main v1.0.0.0 into garrytan/fix-checkpoints

2026-04-18 17:24:03 +08:00

merge: origin/main v1.0.0.0 into garrytan/fix-checkpoints

2026-04-18 17:24:03 +08:00

analytics.test.ts

feat: safety hook skills + skill usage telemetry (v0.7.1) (#189 )

2026-03-18 23:57:59 -05:00

audit-compliance.test.ts

fix: security audit round 2 (v0.13.4.0) (#640 )

2026-03-29 22:46:33 -06:00

builder-profile.test.ts

feat: relationship closing — office-hours adapts to repeat users (v0.16.2.0) (#937 )

2026-04-08 22:21:28 -10:00

codex-e2e.test.ts

feat: worktree isolation for E2E tests + infrastructure elegance (v0.11.12.0) (#425 )

2026-03-23 23:05:22 -07:00

codex-hardening.test.ts

codex + Apple Silicon hardening wave (v0.18.4.0) (#1056 )

2026-04-18 12:30:54 +08:00

diff-scope.test.ts

feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692 )

2026-03-30 22:07:50 -06:00

explain-level-config.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

gemini-e2e.test.ts

feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005 )

2026-04-16 10:41:38 -07:00

gen-skill-docs.test.ts

codex + Apple Silicon hardening wave (v0.18.4.0) (#1056 )

2026-04-18 12:30:54 +08:00

global-discover.test.ts

fix: close redundant PRs + friendly error on all design commands (v0.15.8.1) (#817 )

2026-04-05 02:02:06 -07:00

gstack-developer-profile.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

gstack-question-log.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

gstack-question-preference.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

hook-scripts.test.ts

feat: safety hook skills + skill usage telemetry (v0.7.1) (#189 )

2026-03-18 23:57:59 -05:00

host-config.test.ts

community wave: 6 PRs + hardening (v0.18.1.0) (#1028 )

2026-04-17 00:45:13 -07:00

jargon-list.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

learnings-injection.test.ts

fix: community security wave — 8 PRs, 4 contributors (v0.15.13.0) (#847 )

2026-04-06 00:47:04 -07:00

learnings.test.ts

feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0) (#622 )

2026-03-29 17:02:01 -06:00

migration-checkpoint-ownership.test.ts

merge: origin/main v1.0.0.0 into garrytan/fix-checkpoints

2026-04-18 17:24:03 +08:00

openclaw-native-skills.test.ts

community wave: 6 PRs + hardening (v0.18.1.0) (#1028 )

2026-04-17 00:45:13 -07:00

plan-tune.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

readme-throughput.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

relink.test.ts

fix: headed browser auto-shutdown + disconnect cleanup (v0.18.1.0) (#1025 )

2026-04-16 15:39:44 -07:00

review-log.test.ts

fix: community PRs + security hardening + E2E stability (v0.12.7.0) (#552 )

2026-03-26 23:21:27 -06:00

setup-codesign.test.ts

codex + Apple Silicon hardening wave (v0.18.4.0) (#1056 )

2026-04-18 12:30:54 +08:00

skill-e2e-autoplan-dual-voice.test.ts

fix(test): skill-e2e-autoplan-dual-voice was shipped broken

2026-04-18 22:50:34 +08:00

skill-e2e-bws.test.ts

fix: cookie picker auth token leak (v0.15.17.0) (#904 )

2026-04-08 10:10:13 -07:00

skill-e2e-cso.test.ts

feat: /cso v2 — infrastructure-first security audit (v0.11.6.0) (#384 )

2026-03-23 06:57:22 -07:00

skill-e2e-deploy.test.ts

feat: /land-and-deploy first-run dry run + staging-first + trust ladder (v0.12.2.0) (#518 )

2026-03-26 11:08:31 -07:00

skill-e2e-design.test.ts

feat: CI evals on Ubicloud — 12 parallel runners + Docker image (v0.11.10.0) (#360 )

2026-03-23 10:17:33 -07:00

skill-e2e-learnings.test.ts

feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647 )

2026-03-31 23:08:22 -06:00

skill-e2e-plan-tune.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

skill-e2e-plan.test.ts

test: E2E tests for plan review report and Codex offering (v0.11.15.0) (#449 )

2026-03-24 07:30:24 -07:00

skill-e2e-qa-bugs.test.ts

feat: CI evals on Ubicloud — 12 parallel runners + Docker image (v0.11.10.0) (#360 )

2026-03-23 10:17:33 -07:00

skill-e2e-qa-workflow.test.ts

feat: CI evals on Ubicloud — 12 parallel runners + Docker image (v0.11.10.0) (#360 )

2026-03-23 10:17:33 -07:00

skill-e2e-review-army.test.ts

feat: Review Army — parallel specialist reviewers for /review (v0.14.3.0) (#692 )

2026-03-30 22:07:50 -06:00

skill-e2e-review.test.ts

feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005 )

2026-04-16 10:41:38 -07:00

skill-e2e-session-intelligence.test.ts

tests: split checkpoint-save-resume into context-save + context-restore E2Es

2026-04-18 16:42:52 +08:00

skill-e2e-sidebar.test.ts

feat: declarative multi-host platform + OpenCode, Slate, Cursor, OpenClaw (v0.15.5.0) (#793 )

2026-04-04 15:32:20 -07:00

skill-e2e-workflow.test.ts

refactor: extract TabSession for per-tab state isolation (v0.15.16.0) (#873 )

2026-04-07 00:23:36 -07:00

skill-e2e.test.ts

feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647 )

2026-03-31 23:08:22 -06:00

skill-llm-eval.test.ts

feat: voice directive for all skills (v0.12.3.0) (#520 )

2026-03-26 17:31:53 -06:00

skill-parser.test.ts

feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41 )

2026-03-13 21:08:12 -07:00

skill-routing-e2e.test.ts

feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005 )

2026-04-16 10:41:38 -07:00

skill-validation.test.ts

feat: context rot defense for /ship — subagent isolation + clean step numbering (v0.18.1.0) (#1030 )

2026-04-16 23:14:03 -07:00

team-mode.test.ts

feat: Confusion Protocol, Hermes + GBrain hosts, brain-first resolver (v0.18.0.0) (#1005 )

2026-04-16 10:41:38 -07:00

telemetry.test.ts

feat: community wave — 7 fixes, relink, sidebar Write, discoverability (v0.13.5.0) (#641 )

2026-03-29 21:43:36 -06:00

timeline.test.ts

feat: Session Intelligence Layer — /checkpoint + /health + context recovery (v0.15.0.0) (#733 )

2026-04-01 00:50:42 -06:00

touchfiles.test.ts

feat: recursive self-improvement — operational learning + full skill wiring (v0.13.8.0) (#647 )

2026-03-31 23:08:22 -06:00

uninstall.test.ts

feat: community PRs — faster install, skill namespacing, uninstall, Codex fallback, Windows fix, Python patterns (v0.12.9.0) (#561 )

2026-03-27 00:44:37 -06:00

upgrade-migration-v1.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

v0-dormancy.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00

worktree.test.ts

feat: content security — 4-layer prompt injection defense for pair-agent (#815 )

2026-04-06 14:41:06 -07:00

writing-style-resolver.test.ts

feat: gstack v1 — simpler prompts + real LOC receipts (v1.0.0.0) (#1039 )

2026-04-18 15:05:42 +08:00