test: /review hardening — NOT-READY env isolation, workdir cleanup, perf

Applied from the adversarial subagent pass during /review on this branch: - test/benchmark-cli.test.ts — new "NOT READY path fires when auth env vars are stripped" test. The default dry-run test always showed OK on dev machines with auth, hiding regressions in the remediation-hint branch. Stripped env (no auth vars, HOME→empty tmpdir) now force- exercises gpt + gemini NOT READY paths and asserts every NOT READY line includes a concrete remediation hint (install/login/export). (claude adapter's os.homedir() call is Bun-cached; the 2-of-3 adapter coverage is sufficient to exercise the branch.) - test/taste-engine.test.ts — session-cap test rewritten to seed the profile with 50 entries + one real CLI call, instead of 55 sequential subprocess spawns. Same coverage (FIFO eviction at the boundary), ~5s faster CI time. Also pins first-casing-wins on the Geist/GEIST merge assertion — bumpPref() keeps the first-arrival casing, so the test documents that policy. - test/skill-e2e-benchmark-providers.test.ts — workdir creation moved from module-load into beforeAll, cleanup added in afterAll. Previous shape leaked a /tmp/bench-e2e-* dir every CI run. - test/publish-dry-run.test.ts — removed unused empty test/helpers mkdirSync from the sandbox setup. The bin doesn't import from there, so the empty dir was a footgun for future maintainers. - test/helpers/providers/gpt.ts — expanded the inline comment on `--skip-git-repo-check` to explicitly note that `-s read-only` is now load-bearing safety (the trust prompt was the secondary boundary; removing read-only while keeping skip-git-repo-check would be unsafe). Net: 45 passing tests (was 44), session-cap test 5s faster, one real regression surface covered that didn't exist before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 21:46:40 +02:00 · 2026-04-18 06:54:09 +08:00
parent 6a8b637669
commit 620f5dbaea
5 changed files with 90 additions and 13 deletions
@@ -97,6 +97,49 @@ describe('gstack-model-benchmark --dry-run', () => {
    }
  });

+  test('NOT READY path fires when auth env vars are stripped', () => {
+    // On a dev machine with full auth configured, the default --dry-run output
+    // shows OK for every provider with credentials. Strip auth env vars AND
+    // point HOME at an empty temp dir so adapters can't find file-based creds.
+    // This test exists to catch regressions where the NOT READY branch itself
+    // breaks (crash, missing remediation hint, wrong message format).
+    //
+    // Note: claude adapter's `os.homedir()` call is sometimes cached by Bun and
+    // doesn't always pick up the HOME override, so this test asserts only on
+    // gpt + gemini adapters where HOME redirection reliably makes the adapter's
+    // credentials-path check fail. Two adapters hitting NOT READY with full
+    // remediation messages is sufficient coverage for the branch.
+    const emptyHome = fs.mkdtempSync(path.join(os.tmpdir(), 'bench-noauth-home-'));
+    try {
+      const minimalEnv: Record<string, string> = {
+        PATH: process.env.PATH ?? '',
+        TERM: process.env.TERM ?? 'xterm',
+        HOME: emptyHome,
+      };
+      const result = spawnSync('bun', ['run', BIN, '--prompt', 'hi', '--models', 'claude,gpt,gemini', '--dry-run'], {
+        cwd: ROOT,
+        env: minimalEnv,
+        encoding: 'utf-8',
+        timeout: 15000,
+      });
+      expect(result.status).toBe(0);
+      const out = result.stdout?.toString() ?? '';
+      // gpt + gemini must report NOT READY in this clean env (their auth check
+      // reads paths under the overridden HOME).
+      expect(out).toMatch(/gpt:\s+NOT READY/);
+      expect(out).toMatch(/gemini:\s+NOT READY/);
+      // Every NOT READY line must include a concrete remediation hint so users
+      // can resolve the missing auth. This is the regression we care about.
+      const notReadyLines = out.split('\n').filter(l => l.includes('NOT READY'));
+      expect(notReadyLines.length).toBeGreaterThanOrEqual(2);
+      for (const line of notReadyLines) {
+        expect(line).toMatch(/(install|Install|login|export|Run|Log in)/);
+      }
+    } finally {
+      fs.rmSync(emptyHome, { recursive: true, force: true });
+    }
+  });
+
  test('long prompt is truncated in dry-run display', () => {
    const longPrompt = 'x'.repeat(200);
    const r = run(['--prompt', longPrompt, '--dry-run']);