mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 11:45:20 +02:00
656df0e37e
* feat(v1.5.2.0): Opus 4.7 migration — model overlay, voice, routing
Adapts GStack skill text for Claude Opus 4.7's behavioral changes per
Anthropic's migration guide and community findings.
Key changes:
model-overlays/claude.md:
- Fan out explicitly (4.7 spawns fewer subagents by default)
- Effort-match the step (avoid overthinking simple tasks at max)
- Batch questions in one AskUserQuestion turn
- Literal interpretation awareness (deliver full scope)
hosts/claude.ts:
- coAuthorTrailer updated to Claude Opus 4.7
SKILL.md.tmpl:
- Expanded routing triggers with colloquial variants ("wtf",
"this doesn't work", "send it", "where was I") — 4.7 won't
generalize from sparse trigger patterns like 4.6 did
- Added missing routes: /context-save, /context-restore, /cso, /make-pdf
- Changed routing fallback from strict "do NOT answer directly" to
"when in doubt, invoke the skill" — false positives are cheaper
than false negatives on 4.7's literal interpreter
generate-voice-directive.ts:
- Added concrete good/bad voice example — 4.7 needs shown examples,
not just described tone. "auth.ts:47 returns undefined..." vs
"I've identified a potential issue..."
Regenerated all 38 SKILL.md files. All tests pass.
* refactor(opus-4.7): split overlay, align routing, fix trailer fallback
Follow-up to wintermute's initial Opus 4.7 migration commit (addresses
ship-quality review findings before v1.6.1.0 release).
Overlay split (model-overlays/):
- Move 4 Opus-4.7-specific nudges (Fan out, Effort-match, Batch your
questions, Literal interpretation) from claude.md into new
opus-4-7.md with {{INHERIT:claude}}
- claude.md now holds only model-agnostic nudges (Todo discipline,
Think before heavy, Dedicated tools over Bash)
- Prevents Opus-4.7-specific guidance leaking onto Sonnet/Haiku
- Uses existing {{INHERIT:claude}} mechanism at
scripts/resolvers/model-overlay.ts:28-43
scripts/models.ts:
- Add opus-4-7 to ALL_MODEL_NAMES
- resolveModel: claude-opus-4-7-* variants route to opus-4-7,
all other claude-* variants continue to route to claude
scripts/resolvers/utility.ts:
- Update coAuthor trailer fallback: Opus 4.6 -> Opus 4.7
(fallback was missed in the initial migration commit)
scripts/resolvers/preamble/generate-routing-injection.ts:
- Align policy with new SKILL.md.tmpl: soft "when in doubt, invoke"
instead of hard "ALWAYS invoke... Do NOT answer directly"
- Replace stale /checkpoint reference with /context-save +
/context-restore (skills were renamed in v1.0.1.0)
- Expand route coverage to match full skill inventory:
/plan-devex-review, /qa-only, /devex-review, /land-and-deploy,
/setup-deploy, /canary, /open-gstack-browser,
/setup-browser-cookies, /benchmark, /learn, /plan-tune, /health
scripts/resolvers/preamble/generate-voice-directive.ts:
- Voice example closing: "Want me to ship it?" -> "Want me to fix it?"
- Preserves directness while routing through review gates
SKILL.md.tmpl:
- Add routing triggers for skills that were missing from the list:
/plan-devex-review, /qa-only, /devex-review, /land-and-deploy,
/setup-deploy, /canary, /open-gstack-browser,
/setup-browser-cookies, /benchmark, /learn, /plan-tune, /health
- Within Opus 4.7 overlay, added scope boundary to
"Literal interpretation" nudge ("fix tests that this branch
introduced or is responsible for")
- Added pacing exception to "Batch your questions" nudge so skills
that require one-question-at-a-time pacing still win
Follow-up commit will regenerate SKILL.md files + update goldens.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(opus-4.7): regenerate SKILL.md files + update golden fixtures
Mechanical consequence of the preceding source changes (overlay split,
routing alignment, voice example, routing expansion). No behavior change
beyond what that commit introduced.
- 36 SKILL.md files regenerated via bun run gen:skill-docs
- 3 golden fixtures updated (claude, codex, factory ship skill)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(routing): assert slash-prefixed skills + new policy + current names
Align gen-skill-docs.test.ts routing assertions with the remediated
routing-injection output:
- Expect '/office-hours' slash-prefixed form (matches SKILL.md.tmpl style)
- Add test asserting /context-save + /context-restore references
(guards against stale '/checkpoint' name regression)
- Add test asserting "When in doubt, invoke the skill" soft policy
(guards against "Do NOT answer directly" hard policy regression)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(binary-guard): replace xargs-per-file loops with fs.statSync + mode filter
The "no compiled binaries in git" describe block had two flaky tests:
- "git tracks no files larger than 2MB" timed out at 5s regularly because
it spawned one `sh -c` per tracked file via `xargs -I{}` (~571 shells
on every run, ~11s locally).
- "git tracks no Mach-O or ELF binaries" ran `file --mime-type` over every
tracked file (~3-10s, flaky near the timeout).
Both were pre-existing — not caused by any recent change — but showed up
as red in every local `bun test` run and masked legit failures in the
same suite.
Rewrites:
- 2MB test: `fs.statSync(f).size` in a filter. Millisecond-fast.
- Mach-O test: pre-filter to mode 100755 files via `git ls-files -s`,
then batch-invoke `file --mime-type` once across all executables.
With zero executables tracked, the `file` invocation is skipped.
Test suite: 320 pass, 0 fail, 907ms (was ~12.7s with 2 fails).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(team-mode): give setup -q / setup --local tests a 3-minute budget
./setup runs a full install, Bun binary build, and skill regeneration.
On a cold cache it takes 60-90s, comfortably above bun test's 5s default.
Both "setup -q produces no stdout" and "setup --local prints deprecation
warning" have been flaky-to-failing for a while with [5001.78ms] timeouts.
The test logic was fine, the budget wasn't. Bumped both to 180s via the
third-arg timeout.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(opus-4.7): E2E eval for fanout rate + routing precision
Closes the measurement gap flagged by the ship-quality review: "zero
tests exercise Opus 4.7 behavior; every skill-e2e hardcodes 4.6."
Two cases, both pinned to claude-opus-4-7:
1. Fanout rate (A/B)
- Arm A: regen SKILL.md with --model opus-4-7 (overlay ON, includes
"Fan out explicitly" nudge).
- Arm B: regen SKILL.md with --model claude (overlay OFF, only
model-agnostic nudges).
- Prompt: "Read alpha.txt, beta.txt, gamma.txt. These are independent."
- Measure: parallel tool calls in first assistant turn.
- Assert: arm A >= arm B.
2. Routing precision (6-case mini-benchmark)
- 3 positive prompts that should route (wtf bug, send it, does it work)
- 3 negative prompts that match keywords but should NOT route
(syntax question, algorithm question, slack message)
- Assert: TP rate >= 66%, FP rate <= 33%.
Cost estimate: ~$3-5 per full run. Classified as periodic tier per
CLAUDE.md convention (Opus model, non-deterministic). Runs only with
EVALS=1 env var, touchfile-gated so unrelated diffs don't trigger it.
Test plan artifact at
~/.gstack/projects/garrytan-gstack/garrytan-feat-opus-4.7-migration-eng-review-test-plan-20260421-230611.md
tracks the full specification.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor(opus-4.7): rewrite fanout nudge to show parallel tool_use pattern
The original fanout nudge told 4.7 to "spawn subagents in the same turn"
and "run independent checks concurrently" in prose. An E2E eval on
claude-opus-4-7 reading 3 independent files showed zero effect: both
overlay-ON and overlay-OFF arms emitted serial Reads across 3-4 turns.
Rewrite follows the same "show not tell" principle the PR introduced for
voice examples. The nudge now includes a concrete wrong/right contrast
showing the exact tool_use structure:
Wrong (3 turns):
Turn 1: Read(foo.ts), then wait
Turn 2: Read(bar.ts), then wait
Turn 3: Read(baz.ts)
Right (1 turn, 3 parallel tool_use blocks in one assistant message):
Turn 1: [Read(foo.ts), Read(bar.ts), Read(baz.ts)]
Applies to Read, Bash, Grep, Glob, WebFetch, Agent, and any tool where
sub-calls don't depend on each other's output.
Effect on test/skill-e2e-opus-47.test.ts fanout eval: unchanged (both
arms still 0 parallel in first turn via `claude -p`). May land better in
Claude Code's interactive harness, where the system prompt + tool
handlers differ. Tracked as P0 TODO for follow-up verification in the
correct harness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(opus-4.7): tighten ambiguous /qa routing prompt
"does this feature work on mobile? can you check the deploy?" was too
vague — a reasonable agent asks "which feature?" via AskUserQuestion
instead of routing to /qa. That's not a routing miss, it's an under-
specified prompt.
Replaced with "I just pushed the login flow changes. Test the deployed
site and find any bugs." — concrete subject + clear QA verb.
Result: pos-does-it-work went from MISS to OK, routing TP rate 2/3 -> 3/3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(opus-4.7): rewrite scratch-root helper + add afterAll cleanup
First run of the Opus 4.7 eval exposed two test-setup gaps that made
results misleading:
- Only the root gstack SKILL.md was installed. Claude Code does
auto-discovery per-directory under .claude/skills/{name}/SKILL.md, so
without individual skill dirs the Skill tool had nothing to route to.
Positive routing cases all failed.
- `claude -p` does not load SKILL.md content as system context the way
the Claude Code harness does. The overlay nudges in SKILL.md were
invisible to the model, so the fanout A/B could not actually differ.
New `mkEvalRoot(suffix, includeOverlay)` helper, modelled on the pattern
in skill-routing-e2e.test.ts:
- Installs per-skill SKILL.md under .claude/skills/ for ~14 key skills
so the Skill tool has discoverable targets.
- Writes an explicit routing block into project CLAUDE.md.
- When includeOverlay is true, inlines the content of
model-overlays/opus-4-7.md into CLAUDE.md too. This is what makes the
fanout A/B observable in `claude -p`: arm ON gets the overlay in
context, arm OFF does not.
Plus an afterAll that re-runs gen-skill-docs at the default model so
the working tree is not left with opus-4-7-generated SKILL.md files
after the eval finishes (would break golden-file tests in the next
`bun test` run otherwise).
With this setup in place: routing went from 3/3 FAIL to 3/3 PASS
(correct skill or clarification in every positive case, zero false
positives on negatives). Fanout A/B is now a fair comparison; still
shows 0 parallel in both arms under `claude -p` (tracked as a P0 TODO
for re-measurement inside Claude Code's harness, where fanout may land
differently).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* docs(todos): verify Opus 4.7 fanout nudge in Claude Code harness (P0)
v1.6.1.0 shipped a rewritten "Fan out explicitly" nudge with a concrete
tool_use example. Under `claude -p` on claude-opus-4-7, the A/B eval
showed zero parallel tool calls in the first turn for both arms
(overlay ON and OFF). Routing verified 3/3 in the same harness, so the
gap is specific to fanout and likely to `claude -p`'s system prompt +
tool wiring.
This TODO closes the measurement loop the ship-quality review flagged:
re-run the fanout A/B inside Claude Code's real harness (or a faithful
replica) before landing another Opus migration claim.
P0 because it is a ship-quality commitment from the v1.6.1.0 release
notes, not a nice-to-have.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(release): v1.6.1.0 — Opus 4.7 migration, reviewed
Bump VERSION + package.json from 1.6.0.0 to 1.6.1.0. New CHANGELOG
entry describing the ship-quality remediation of PR #1117:
- Overlay split (model-agnostic claude.md + opus-4-7.md with INHERIT)
- Routing-injection aligned with SKILL.md.tmpl ("when in doubt" policy,
current skill names, full skill inventory)
- utility.ts trailer fallback updated
- Voice example closes through review gate instead of ship-bypass
- Literal-interpretation nudge bounded to branch scope
- Batch-questions nudge has explicit pacing exception
- First Opus 4.7 eval: routing verified 3/3, fanout A/B unverified
under `claude -p` (tracked as P0 TODO for next rev)
- Pre-existing test failures fixed: fs.statSync binary guard, 180s
setup timeout, golden-file updates
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(opus-4.7): key touchfile entries by testName, not describe text
TOUCHFILES completeness scan in test/touchfiles.test.ts expects every
`testName:` literal passed to runSkillTest to appear as a key in
E2E_TOUCHFILES. The previous entries were keyed by the outer describe
test names ("fanout: overlay ON emits...") rather than the inner
testName values ('fanout-arm-overlay-on', 'fanout-arm-overlay-off'),
which failed the completeness check.
Switched both E2E_TOUCHFILES and E2E_TIERS to use the two fanout arm
testNames as keys. The routing sub-tests use a template literal
(`routing-${c.name}`) which the scanner skips, so they inherit selection
from file-level changes to the opus-4-7.md / routing-injection.ts paths
already covered by the fanout entries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: gstack <ship@gstack.dev>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
351 lines
14 KiB
TypeScript
351 lines
14 KiB
TypeScript
import { describe, test, expect, beforeEach, afterEach } from 'bun:test';
|
|
import * as fs from 'fs';
|
|
import * as path from 'path';
|
|
import * as os from 'os';
|
|
import { execSync } from 'child_process';
|
|
|
|
const ROOT = path.resolve(import.meta.dir, '..');
|
|
const SETTINGS_HOOK = path.join(ROOT, 'bin', 'gstack-settings-hook');
|
|
const SESSION_UPDATE = path.join(ROOT, 'bin', 'gstack-session-update');
|
|
const TEAM_INIT = path.join(ROOT, 'bin', 'gstack-team-init');
|
|
|
|
function mkTmpDir(): string {
|
|
return fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-team-test-'));
|
|
}
|
|
|
|
function run(cmd: string, opts: { cwd?: string; env?: Record<string, string> } = {}): { stdout: string; stderr: string; exitCode: number } {
|
|
try {
|
|
const stdout = execSync(cmd, {
|
|
cwd: opts.cwd,
|
|
env: { ...process.env, ...opts.env },
|
|
encoding: 'utf-8',
|
|
timeout: 10000,
|
|
});
|
|
return { stdout, stderr: '', exitCode: 0 };
|
|
} catch (e: any) {
|
|
return { stdout: e.stdout || '', stderr: e.stderr || '', exitCode: e.status ?? 1 };
|
|
}
|
|
}
|
|
|
|
describe('gstack-settings-hook', () => {
|
|
let tmpDir: string;
|
|
let settingsFile: string;
|
|
|
|
beforeEach(() => {
|
|
tmpDir = mkTmpDir();
|
|
settingsFile = path.join(tmpDir, 'settings.json');
|
|
});
|
|
|
|
afterEach(() => {
|
|
fs.rmSync(tmpDir, { recursive: true, force: true });
|
|
});
|
|
|
|
test('add creates settings.json if missing', () => {
|
|
const result = run(`${SETTINGS_HOOK} add /path/to/gstack-session-update`, {
|
|
env: { GSTACK_SETTINGS_FILE: settingsFile },
|
|
});
|
|
expect(result.exitCode).toBe(0);
|
|
const settings = JSON.parse(fs.readFileSync(settingsFile, 'utf-8'));
|
|
expect(settings.hooks.SessionStart).toHaveLength(1);
|
|
expect(settings.hooks.SessionStart[0].hooks[0].command).toBe('/path/to/gstack-session-update');
|
|
});
|
|
|
|
test('add preserves existing settings', () => {
|
|
fs.writeFileSync(settingsFile, JSON.stringify({ effortLevel: 'high', permissions: { defaultMode: 'auto' } }, null, 2));
|
|
const result = run(`${SETTINGS_HOOK} add /path/to/gstack-session-update`, {
|
|
env: { GSTACK_SETTINGS_FILE: settingsFile },
|
|
});
|
|
expect(result.exitCode).toBe(0);
|
|
const settings = JSON.parse(fs.readFileSync(settingsFile, 'utf-8'));
|
|
expect(settings.effortLevel).toBe('high');
|
|
expect(settings.permissions.defaultMode).toBe('auto');
|
|
expect(settings.hooks.SessionStart).toHaveLength(1);
|
|
});
|
|
|
|
test('add deduplicates (running twice does not double-add)', () => {
|
|
run(`${SETTINGS_HOOK} add /path/to/gstack-session-update`, {
|
|
env: { GSTACK_SETTINGS_FILE: settingsFile },
|
|
});
|
|
run(`${SETTINGS_HOOK} add /path/to/gstack-session-update`, {
|
|
env: { GSTACK_SETTINGS_FILE: settingsFile },
|
|
});
|
|
const settings = JSON.parse(fs.readFileSync(settingsFile, 'utf-8'));
|
|
expect(settings.hooks.SessionStart).toHaveLength(1);
|
|
});
|
|
|
|
test('remove removes the hook', () => {
|
|
run(`${SETTINGS_HOOK} add /path/to/gstack-session-update`, {
|
|
env: { GSTACK_SETTINGS_FILE: settingsFile },
|
|
});
|
|
const result = run(`${SETTINGS_HOOK} remove /path/to/gstack-session-update`, {
|
|
env: { GSTACK_SETTINGS_FILE: settingsFile },
|
|
});
|
|
expect(result.exitCode).toBe(0);
|
|
const settings = JSON.parse(fs.readFileSync(settingsFile, 'utf-8'));
|
|
expect(settings.hooks).toBeUndefined();
|
|
});
|
|
|
|
test('remove exits 1 when settings.json does not exist', () => {
|
|
const result = run(`${SETTINGS_HOOK} remove /path/to/gstack-session-update`, {
|
|
env: { GSTACK_SETTINGS_FILE: settingsFile },
|
|
});
|
|
expect(result.exitCode).toBe(1);
|
|
});
|
|
|
|
test('remove preserves other hooks', () => {
|
|
fs.writeFileSync(settingsFile, JSON.stringify({
|
|
hooks: {
|
|
SessionStart: [
|
|
{ hooks: [{ type: 'command', command: '/path/to/gstack-session-update' }] },
|
|
{ hooks: [{ type: 'command', command: '/other/hook' }] },
|
|
],
|
|
},
|
|
}, null, 2));
|
|
run(`${SETTINGS_HOOK} remove /path/to/gstack-session-update`, {
|
|
env: { GSTACK_SETTINGS_FILE: settingsFile },
|
|
});
|
|
const settings = JSON.parse(fs.readFileSync(settingsFile, 'utf-8'));
|
|
expect(settings.hooks.SessionStart).toHaveLength(1);
|
|
expect(settings.hooks.SessionStart[0].hooks[0].command).toBe('/other/hook');
|
|
});
|
|
|
|
test('atomic write (no partial file on success)', () => {
|
|
run(`${SETTINGS_HOOK} add /path/to/gstack-session-update`, {
|
|
env: { GSTACK_SETTINGS_FILE: settingsFile },
|
|
});
|
|
// .tmp file should not exist after successful write
|
|
expect(fs.existsSync(settingsFile + '.tmp')).toBe(false);
|
|
// File should be valid JSON
|
|
expect(() => JSON.parse(fs.readFileSync(settingsFile, 'utf-8'))).not.toThrow();
|
|
});
|
|
});
|
|
|
|
describe('gstack-session-update', () => {
|
|
let tmpDir: string;
|
|
let gstackDir: string;
|
|
let stateDir: string;
|
|
|
|
beforeEach(() => {
|
|
tmpDir = mkTmpDir();
|
|
gstackDir = path.join(tmpDir, 'gstack');
|
|
stateDir = path.join(tmpDir, 'state');
|
|
fs.mkdirSync(gstackDir, { recursive: true });
|
|
fs.mkdirSync(stateDir, { recursive: true });
|
|
|
|
// Init a git repo to pass the .git guard
|
|
execSync('git init', { cwd: gstackDir });
|
|
execSync('git commit --allow-empty -m "init"', { cwd: gstackDir });
|
|
fs.writeFileSync(path.join(gstackDir, 'VERSION'), '0.1.0');
|
|
|
|
// Create a minimal gstack-config that returns auto_upgrade=true
|
|
const binDir = path.join(gstackDir, 'bin');
|
|
fs.mkdirSync(binDir, { recursive: true });
|
|
fs.writeFileSync(path.join(binDir, 'gstack-config'), '#!/bin/bash\necho "true"');
|
|
fs.chmodSync(path.join(binDir, 'gstack-config'), 0o755);
|
|
});
|
|
|
|
afterEach(() => {
|
|
fs.rmSync(tmpDir, { recursive: true, force: true });
|
|
});
|
|
|
|
test('exits 0 when .git is missing', () => {
|
|
fs.rmSync(path.join(gstackDir, '.git'), { recursive: true });
|
|
const result = run(SESSION_UPDATE, {
|
|
env: { GSTACK_DIR: gstackDir, GSTACK_STATE_DIR: stateDir },
|
|
});
|
|
expect(result.exitCode).toBe(0);
|
|
});
|
|
|
|
test('exits 0 when auto_upgrade is not true', () => {
|
|
// Override gstack-config to return false
|
|
fs.writeFileSync(path.join(gstackDir, 'bin', 'gstack-config'), '#!/bin/bash\necho "false"');
|
|
const result = run(SESSION_UPDATE, {
|
|
env: { GSTACK_DIR: gstackDir, GSTACK_STATE_DIR: stateDir },
|
|
});
|
|
expect(result.exitCode).toBe(0);
|
|
});
|
|
|
|
test('throttle: skips when checked recently', () => {
|
|
// Write a recent throttle timestamp
|
|
const throttleFile = path.join(stateDir, '.last-session-update');
|
|
fs.writeFileSync(throttleFile, String(Math.floor(Date.now() / 1000)));
|
|
|
|
const result = run(SESSION_UPDATE, {
|
|
env: { GSTACK_DIR: gstackDir, GSTACK_STATE_DIR: stateDir },
|
|
});
|
|
expect(result.exitCode).toBe(0);
|
|
// No log file should be created (throttled before forking)
|
|
});
|
|
|
|
test('always exits 0 (non-fatal)', () => {
|
|
// Even with a broken setup, should exit 0
|
|
const result = run(SESSION_UPDATE, {
|
|
env: { GSTACK_DIR: '/nonexistent/path', GSTACK_STATE_DIR: stateDir },
|
|
});
|
|
expect(result.exitCode).toBe(0);
|
|
});
|
|
});
|
|
|
|
describe('gstack-team-init', () => {
|
|
let tmpDir: string;
|
|
|
|
beforeEach(() => {
|
|
tmpDir = mkTmpDir();
|
|
execSync('git init', { cwd: tmpDir });
|
|
execSync('git commit --allow-empty -m "init"', { cwd: tmpDir });
|
|
});
|
|
|
|
afterEach(() => {
|
|
fs.rmSync(tmpDir, { recursive: true, force: true });
|
|
});
|
|
|
|
test('errors without a mode argument', () => {
|
|
const result = run(TEAM_INIT, { cwd: tmpDir });
|
|
expect(result.exitCode).not.toBe(0);
|
|
expect(result.stderr).toContain('Usage');
|
|
});
|
|
|
|
test('errors outside a git repo', () => {
|
|
const nonGitDir = mkTmpDir();
|
|
const result = run(`${TEAM_INIT} optional`, { cwd: nonGitDir });
|
|
expect(result.exitCode).not.toBe(0);
|
|
expect(result.stderr).toContain('not in a git repository');
|
|
fs.rmSync(nonGitDir, { recursive: true, force: true });
|
|
});
|
|
|
|
test('optional: creates CLAUDE.md with recommended section', () => {
|
|
const result = run(`${TEAM_INIT} optional`, { cwd: tmpDir });
|
|
expect(result.exitCode).toBe(0);
|
|
const claude = fs.readFileSync(path.join(tmpDir, 'CLAUDE.md'), 'utf-8');
|
|
expect(claude).toContain('## gstack (recommended)');
|
|
expect(claude).toContain('./setup --team');
|
|
});
|
|
|
|
test('required: creates CLAUDE.md with required section', () => {
|
|
const result = run(`${TEAM_INIT} required`, { cwd: tmpDir });
|
|
expect(result.exitCode).toBe(0);
|
|
const claude = fs.readFileSync(path.join(tmpDir, 'CLAUDE.md'), 'utf-8');
|
|
expect(claude).toContain('## gstack (REQUIRED');
|
|
expect(claude).toContain('GSTACK_MISSING');
|
|
});
|
|
|
|
test('required: creates enforcement hook', () => {
|
|
run(`${TEAM_INIT} required`, { cwd: tmpDir });
|
|
const hookPath = path.join(tmpDir, '.claude', 'hooks', 'check-gstack.sh');
|
|
expect(fs.existsSync(hookPath)).toBe(true);
|
|
const hook = fs.readFileSync(hookPath, 'utf-8');
|
|
expect(hook).toContain('BLOCKED: gstack is not installed');
|
|
// Should be executable
|
|
const stat = fs.statSync(hookPath);
|
|
expect(stat.mode & 0o111).toBeGreaterThan(0);
|
|
});
|
|
|
|
test('required: creates project settings.json with PreToolUse hook', () => {
|
|
run(`${TEAM_INIT} required`, { cwd: tmpDir });
|
|
const settingsPath = path.join(tmpDir, '.claude', 'settings.json');
|
|
expect(fs.existsSync(settingsPath)).toBe(true);
|
|
const settings = JSON.parse(fs.readFileSync(settingsPath, 'utf-8'));
|
|
expect(settings.hooks.PreToolUse).toHaveLength(1);
|
|
expect(settings.hooks.PreToolUse[0].matcher).toBe('Skill');
|
|
expect(settings.hooks.PreToolUse[0].hooks[0].command).toContain('check-gstack');
|
|
});
|
|
|
|
test('idempotent: running twice does not duplicate CLAUDE.md section', () => {
|
|
run(`${TEAM_INIT} optional`, { cwd: tmpDir });
|
|
run(`${TEAM_INIT} optional`, { cwd: tmpDir });
|
|
const claude = fs.readFileSync(path.join(tmpDir, 'CLAUDE.md'), 'utf-8');
|
|
const matches = claude.match(/## gstack/g);
|
|
expect(matches).toHaveLength(1);
|
|
});
|
|
|
|
test('removes vendored copy when present', () => {
|
|
// Create a fake vendored gstack with VERSION file
|
|
const vendoredDir = path.join(tmpDir, '.claude', 'skills', 'gstack');
|
|
fs.mkdirSync(vendoredDir, { recursive: true });
|
|
fs.writeFileSync(path.join(vendoredDir, 'VERSION'), '0.14.0.0');
|
|
fs.writeFileSync(path.join(vendoredDir, 'README.md'), 'vendored');
|
|
// Track it in git
|
|
execSync('git add .claude/skills/gstack/', { cwd: tmpDir });
|
|
execSync('git commit -m "add vendored gstack"', { cwd: tmpDir });
|
|
|
|
const result = run(`${TEAM_INIT} optional`, { cwd: tmpDir });
|
|
expect(result.exitCode).toBe(0);
|
|
expect(result.stdout).toContain('Found vendored gstack copy');
|
|
expect(result.stdout).toContain('Removed vendored copy');
|
|
// Vendored dir should be gone
|
|
expect(fs.existsSync(vendoredDir)).toBe(false);
|
|
// .gitignore should have the entry
|
|
const gitignore = fs.readFileSync(path.join(tmpDir, '.gitignore'), 'utf-8');
|
|
expect(gitignore).toContain('.claude/skills/gstack/');
|
|
});
|
|
|
|
test('skips when no vendored copy present', () => {
|
|
const result = run(`${TEAM_INIT} optional`, { cwd: tmpDir });
|
|
expect(result.exitCode).toBe(0);
|
|
expect(result.stdout).not.toContain('Found vendored gstack copy');
|
|
});
|
|
|
|
test('skips when .claude/skills/gstack is a symlink', () => {
|
|
// Create a symlink (not a real vendored copy)
|
|
const skillsDir = path.join(tmpDir, '.claude', 'skills');
|
|
fs.mkdirSync(skillsDir, { recursive: true });
|
|
const targetDir = mkTmpDir();
|
|
fs.writeFileSync(path.join(targetDir, 'VERSION'), '0.14.0.0');
|
|
fs.symlinkSync(targetDir, path.join(skillsDir, 'gstack'));
|
|
|
|
const result = run(`${TEAM_INIT} optional`, { cwd: tmpDir });
|
|
expect(result.exitCode).toBe(0);
|
|
expect(result.stdout).not.toContain('Found vendored gstack copy');
|
|
// Symlink should still exist
|
|
expect(fs.lstatSync(path.join(skillsDir, 'gstack')).isSymbolicLink()).toBe(true);
|
|
fs.rmSync(targetDir, { recursive: true, force: true });
|
|
});
|
|
|
|
test('does not duplicate .gitignore entry on re-run', () => {
|
|
// Create vendored copy
|
|
const vendoredDir = path.join(tmpDir, '.claude', 'skills', 'gstack');
|
|
fs.mkdirSync(vendoredDir, { recursive: true });
|
|
fs.writeFileSync(path.join(vendoredDir, 'VERSION'), '0.14.0.0');
|
|
execSync('git add .claude/skills/gstack/', { cwd: tmpDir });
|
|
execSync('git commit -m "add vendored"', { cwd: tmpDir });
|
|
|
|
run(`${TEAM_INIT} optional`, { cwd: tmpDir });
|
|
|
|
// Re-create vendored dir to simulate re-run scenario
|
|
fs.mkdirSync(vendoredDir, { recursive: true });
|
|
fs.writeFileSync(path.join(vendoredDir, 'VERSION'), '0.14.0.0');
|
|
run(`${TEAM_INIT} optional`, { cwd: tmpDir });
|
|
|
|
const gitignore = fs.readFileSync(path.join(tmpDir, '.gitignore'), 'utf-8');
|
|
const matches = gitignore.match(/\.claude\/skills\/gstack\//g);
|
|
expect(matches).toHaveLength(1);
|
|
});
|
|
});
|
|
|
|
describe('setup --team / --no-team / -q', () => {
|
|
// `./setup` does a full install + build + skill regeneration. On a cold cache
|
|
// it routinely takes 60-90s. Give both tests a 3-minute budget so CI doesn't
|
|
// report pre-existing timeouts as failures.
|
|
test(
|
|
'setup -q produces no stdout',
|
|
() => {
|
|
const result = run(`${path.join(ROOT, 'setup')} -q`, { cwd: ROOT });
|
|
// -q should suppress informational output (may still have some output from build)
|
|
// The key test is that the "Skill naming:" prompt and "gstack ready" messages are suppressed
|
|
expect(result.stdout).not.toContain('Skill naming:');
|
|
expect(result.stdout).not.toContain('gstack ready');
|
|
},
|
|
180_000,
|
|
);
|
|
|
|
test(
|
|
'setup --local prints deprecation warning',
|
|
() => {
|
|
// stderr capture: run via bash redirect so we can capture stderr
|
|
const result = run(`bash -c '${path.join(ROOT, 'setup')} --local -q 2>&1'`, { cwd: ROOT });
|
|
expect(result.stdout).toContain('deprecated');
|
|
},
|
|
180_000,
|
|
);
|
|
});
|