mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-01 19:25:10 +02:00
7ff0f84b1e
* refactor: extract {{TEST_COVERAGE_AUDIT}} shared resolver
DRY extraction of the test coverage audit methodology into a shared
generator function with three explicit placeholders:
- TEST_COVERAGE_AUDIT_PLAN (plan-eng-review)
- TEST_COVERAGE_AUDIT_SHIP (ship)
- TEST_COVERAGE_AUDIT_REVIEW (review)
Shared across all modes: codepath tracing, ASCII diagram format,
quality scoring rubric, E2E test decision matrix, regression rule,
and test framework detection via CLAUDE.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: plan-eng-review uses shared test coverage audit
Replace the thin 6-line Section 3 test review with the full shared
methodology via {{TEST_COVERAGE_AUDIT_PLAN}}. Plan mode now:
- Traces every codepath with full ASCII diagrams
- Adds missing tests to the plan (not just "check for tests")
- Writes test plan artifact for /qa consumption
- Includes E2E/eval recommendations and regression detection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor: ship uses shared test coverage audit
Replace 135 lines of inline Step 3.4 methodology with
{{TEST_COVERAGE_AUDIT_SHIP}}. Functionally identical output plus:
- E2E test decision matrix (marks paths needing E2E vs unit)
- Eval recommendations for LLM prompt changes
- Regression detection iron rule
- Test framework detection via CLAUDE.md first
- Test plan artifact for /qa consumption
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: /review Step 4.75 test coverage diagram
Add codepath tracing to the pre-landing review via
{{TEST_COVERAGE_AUDIT_REVIEW}}. Review mode:
- Produces ASCII coverage diagram (same methodology as plan/ship)
- Generates tests for gaps via Fix-First (ASK user)
- Subsumes Pass 2 "Test Gaps" checklist category
- Gaps are INFORMATIONAL findings
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: mode differentiation + regression guard for coverage audit
10 new tests verifying the three TEST_COVERAGE_AUDIT placeholders:
- All modes share: codepath tracing, E2E matrix, regression rule
- Plan mode: adds to plan + artifact, no ship-specific content
- Ship mode: auto-generates + before/after count + coverage summary
- Review mode: Fix-First ASK + INFORMATIONAL, no artifact
- Regression guard: ship SKILL.md preserves all key phrases
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: extract shared coverage audit fixture + review E2E
- Extract billing.ts fixture into coverage-audit-fixture.ts (DRY)
- Refactor ship-coverage-audit E2E to use shared fixture
- Add review-coverage-audit E2E for Step 4.75
- Update touchfiles: both E2Es depend on shared fixture
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: strengthen E2E assertions for coverage audit tests
The coverage audit E2E tests (ship + review) were only asserting
exitReason === 'success' and readCalls > 0 — they passed even
if the agent produced no coverage diagram. Add assertion that
the output contains either GAP or TESTED markers.
Found during /review.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: plan mode traces the plan, not the git diff
Codex adversarial review caught that plan-eng-review was inheriting
"git diff origin/<base>...HEAD" from the shared resolver, but plan mode
reviews a plan document, not a code diff. Plan mode now says:
"Trace every codepath in the plan" and "Read the plan document."
Ship and review modes keep the git diff instruction.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.9.5.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: test coverage catalog + failure triage (merged branches) (#285)
* feat: add bin/gstack-repo-mode — solo vs collaborative detection with caching
Detects whether a repo is solo-dev (one person does 80%+ of recent commits)
or collaborative. Uses 90-day git shortlog window with 7-day cache in
~/.gstack/projects/{SLUG}/repo-mode.json. Config override via
`gstack-config set repo_mode solo|collaborative` takes precedence over
the heuristic. Minimum 5 commits required to classify (otherwise unknown).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: test failure ownership triage — see something say something
Adds two new preamble sections to all gstack skills:
- Repo Ownership Mode: explains solo vs collaborative behavior
- See Something, Say Something: proactive issue flagging principle
Adds {{TEST_FAILURE_TRIAGE}} template variable (opt-in, used by /ship):
- Classifies test failures as in-branch vs pre-existing
- Solo mode defaults to "investigate and fix now"
- Collaborative mode offers "blame + assign GitHub issue" option
- Also offers P0 TODO and skip options
/ship Step 3 now triages test failures instead of hard-stopping on all
failures. In-branch failures still block shipping. Pre-existing failures
get user-directed triage based on repo mode.
Adds P2 TODO for gstack notes system (deferred lightweight reminder).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: regenerate SKILL.md files for Claude and Codex hosts
All 22 Claude skills and 21 Codex skills regenerated with new preamble
sections (Repo Ownership Mode, See Something Say Something) and
{{TEST_FAILURE_TRIAGE}} resolved in ship/SKILL.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: validate repo mode values to prevent shell injection
Codex adversarial review found that unvalidated config/cache values
could be injected into shell via source <(gstack-repo-mode). Added
validate_mode() that only allows solo|collaborative|unknown — anything
else becomes "unknown". Prevents persistent code execution through
malicious config.yaml or tampered cache JSON.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: shell injection via branch names + feature-branch sampling bias
Codex code review found two issues:
P1: eval $(gstack-slug) in gstack-repo-mode executes branch names as
shell. Branch names like foo$(touch${IFS}pwned) are valid git refs and
would execute arbitrary commands. Fix: compute SLUG directly with sed
instead of eval'ing gstack-slug output.
P2: git shortlog HEAD only sees current branch history. On feature
branches that haven't merged main recently, other contributors disappear
from the sample. Fix: use git shortlog on the default branch
(origin/main) instead of HEAD.
Also improved blame lookup in collaborative triage to check both the
test file and the production code it covers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: broaden codex-host stripping test to accommodate triage section
"Investigate and fix" now appears in TEST_FAILURE_TRIAGE (not just the
Codex review step). Use CODEX_REVIEWS config string as a more specific
marker for detecting the Codex review step in Codex-hosted skills.
* fix: replace template placeholder in TODOS.md with readable text
{{TEST_FAILURE_TRIAGE}} is template syntax but TODOS.md is not processed
by gen-skill-docs — replaced with human-readable reference.
* chore: bump version and changelog (v0.9.5.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add bin/ directory to project structure in CLAUDE.md
* test: add triage resolver unit tests, plan-eng coverage audit E2E, and triage E2E
- TEST_FAILURE_TRIAGE resolver: 6 unit tests verifying all triage steps (T1-T4),
REPO_MODE branching, and safety default for ambiguous failures
- plan-eng-coverage-audit E2E: tests /plan-eng-review coverage audit codepath
(gap identified during eng review — existed on neither branch)
- ship-triage E2E: planted-bug fixture with in-branch (truncate null) and
pre-existing (divide-by-zero) failures; verifies correct classification
- Touchfile entries for diff-based test selection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: regenerate stale Codex SKILL.md for retro
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: gstack-repo-mode handles repos without origin remote
Split `git remote get-url origin` into a separate variable with `|| true`
so the script doesn't crash under `set -euo pipefail` in local-only repos.
Falls back to REPO_MODE=unknown gracefully.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: REPO_MODE defaults to unknown when helper emits nothing
Changed preamble from `source <(...) || REPO_MODE=unknown` (which doesn't
catch empty output) to `source <(...) || true` followed by
`REPO_MODE=${REPO_MODE:-unknown}`. Regenerated all SKILL.md files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: triage E2E runs both test files in subprocesses
math.test.js called process.exit(1) which killed the runner before
string.test.js could execute. Changed test runner to use child_process
so each test runs independently and both failure classes are exercised.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: gstack-repo-mode handles repos without origin remote
Fall back through origin/main → origin/master → HEAD when
git symbolic-ref refs/remotes/origin/HEAD is not set. Prevents
shortlog crash in repos where origin/HEAD isn't configured.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: triage E2E runs both test files in subprocesses
Add assertions verifying both math.test.js (pre-existing failure) and
string.test.js (in-branch failure) actually executed during triage.
Prevents false passes where only one failure class is exercised.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: REPO_MODE defaults to unknown when helper emits nothing
- Remove head -20 truncation that biased solo classification by
dropping low-volume contributors from the denominator
- Use atomic write (mktemp + mv) for cache to prevent concurrent
preamble reads from seeing partial JSON
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add test coverage catalog to CHANGELOG + update project structure
- CHANGELOG: add 6 entries for coverage audit, review Step 4.75, E2E
recommendations, regression iron rule, failure triage, repo-mode fix
- CLAUDE.md: add missing skill directories (autoplan, benchmark, canary,
codex, land-and-deploy, setup-deploy) to project structure
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.10.1.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: CHANGELOG rules — branch-scoped versions, never fold into old entries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1510 lines
57 KiB
TypeScript
1510 lines
57 KiB
TypeScript
import { describe, test, expect } from 'bun:test';
|
||
import { validateSkill, extractRemoteSlugPatterns, extractWeightsFromTable } from './helpers/skill-parser';
|
||
import { ALL_COMMANDS, COMMAND_DESCRIPTIONS, READ_COMMANDS, WRITE_COMMANDS, META_COMMANDS } from '../browse/src/commands';
|
||
import { SNAPSHOT_FLAGS } from '../browse/src/snapshot';
|
||
import * as fs from 'fs';
|
||
import * as path from 'path';
|
||
|
||
const ROOT = path.resolve(import.meta.dir, '..');
|
||
|
||
describe('SKILL.md command validation', () => {
|
||
test('all $B commands in SKILL.md are valid browse commands', () => {
|
||
const result = validateSkill(path.join(ROOT, 'SKILL.md'));
|
||
expect(result.invalid).toHaveLength(0);
|
||
expect(result.valid.length).toBeGreaterThan(0);
|
||
});
|
||
|
||
test('all snapshot flags in SKILL.md are valid', () => {
|
||
const result = validateSkill(path.join(ROOT, 'SKILL.md'));
|
||
expect(result.snapshotFlagErrors).toHaveLength(0);
|
||
});
|
||
|
||
test('all $B commands in browse/SKILL.md are valid browse commands', () => {
|
||
const result = validateSkill(path.join(ROOT, 'browse', 'SKILL.md'));
|
||
expect(result.invalid).toHaveLength(0);
|
||
expect(result.valid.length).toBeGreaterThan(0);
|
||
});
|
||
|
||
test('all snapshot flags in browse/SKILL.md are valid', () => {
|
||
const result = validateSkill(path.join(ROOT, 'browse', 'SKILL.md'));
|
||
expect(result.snapshotFlagErrors).toHaveLength(0);
|
||
});
|
||
|
||
test('all $B commands in qa/SKILL.md are valid browse commands', () => {
|
||
const qaSkill = path.join(ROOT, 'qa', 'SKILL.md');
|
||
if (!fs.existsSync(qaSkill)) return; // skip if missing
|
||
const result = validateSkill(qaSkill);
|
||
expect(result.invalid).toHaveLength(0);
|
||
});
|
||
|
||
test('all snapshot flags in qa/SKILL.md are valid', () => {
|
||
const qaSkill = path.join(ROOT, 'qa', 'SKILL.md');
|
||
if (!fs.existsSync(qaSkill)) return;
|
||
const result = validateSkill(qaSkill);
|
||
expect(result.snapshotFlagErrors).toHaveLength(0);
|
||
});
|
||
|
||
test('all $B commands in qa-only/SKILL.md are valid browse commands', () => {
|
||
const qaOnlySkill = path.join(ROOT, 'qa-only', 'SKILL.md');
|
||
if (!fs.existsSync(qaOnlySkill)) return;
|
||
const result = validateSkill(qaOnlySkill);
|
||
expect(result.invalid).toHaveLength(0);
|
||
});
|
||
|
||
test('all snapshot flags in qa-only/SKILL.md are valid', () => {
|
||
const qaOnlySkill = path.join(ROOT, 'qa-only', 'SKILL.md');
|
||
if (!fs.existsSync(qaOnlySkill)) return;
|
||
const result = validateSkill(qaOnlySkill);
|
||
expect(result.snapshotFlagErrors).toHaveLength(0);
|
||
});
|
||
|
||
test('all $B commands in plan-design-review/SKILL.md are valid browse commands', () => {
|
||
const skill = path.join(ROOT, 'plan-design-review', 'SKILL.md');
|
||
if (!fs.existsSync(skill)) return;
|
||
const result = validateSkill(skill);
|
||
expect(result.invalid).toHaveLength(0);
|
||
});
|
||
|
||
test('all snapshot flags in plan-design-review/SKILL.md are valid', () => {
|
||
const skill = path.join(ROOT, 'plan-design-review', 'SKILL.md');
|
||
if (!fs.existsSync(skill)) return;
|
||
const result = validateSkill(skill);
|
||
expect(result.snapshotFlagErrors).toHaveLength(0);
|
||
});
|
||
|
||
test('all $B commands in design-review/SKILL.md are valid browse commands', () => {
|
||
const skill = path.join(ROOT, 'design-review', 'SKILL.md');
|
||
if (!fs.existsSync(skill)) return;
|
||
const result = validateSkill(skill);
|
||
expect(result.invalid).toHaveLength(0);
|
||
});
|
||
|
||
test('all snapshot flags in design-review/SKILL.md are valid', () => {
|
||
const skill = path.join(ROOT, 'design-review', 'SKILL.md');
|
||
if (!fs.existsSync(skill)) return;
|
||
const result = validateSkill(skill);
|
||
expect(result.snapshotFlagErrors).toHaveLength(0);
|
||
});
|
||
|
||
test('all $B commands in design-consultation/SKILL.md are valid browse commands', () => {
|
||
const skill = path.join(ROOT, 'design-consultation', 'SKILL.md');
|
||
if (!fs.existsSync(skill)) return;
|
||
const result = validateSkill(skill);
|
||
expect(result.invalid).toHaveLength(0);
|
||
});
|
||
|
||
test('all snapshot flags in design-consultation/SKILL.md are valid', () => {
|
||
const skill = path.join(ROOT, 'design-consultation', 'SKILL.md');
|
||
if (!fs.existsSync(skill)) return;
|
||
const result = validateSkill(skill);
|
||
expect(result.snapshotFlagErrors).toHaveLength(0);
|
||
});
|
||
|
||
test('all $B commands in autoplan/SKILL.md are valid browse commands', () => {
|
||
const skill = path.join(ROOT, 'autoplan', 'SKILL.md');
|
||
if (!fs.existsSync(skill)) return;
|
||
const result = validateSkill(skill);
|
||
expect(result.invalid).toHaveLength(0);
|
||
});
|
||
|
||
test('all snapshot flags in autoplan/SKILL.md are valid', () => {
|
||
const skill = path.join(ROOT, 'autoplan', 'SKILL.md');
|
||
if (!fs.existsSync(skill)) return;
|
||
const result = validateSkill(skill);
|
||
expect(result.snapshotFlagErrors).toHaveLength(0);
|
||
});
|
||
});
|
||
|
||
describe('Command registry consistency', () => {
|
||
test('COMMAND_DESCRIPTIONS covers all commands in sets', () => {
|
||
const allCmds = new Set([...READ_COMMANDS, ...WRITE_COMMANDS, ...META_COMMANDS]);
|
||
const descKeys = new Set(Object.keys(COMMAND_DESCRIPTIONS));
|
||
for (const cmd of allCmds) {
|
||
expect(descKeys.has(cmd)).toBe(true);
|
||
}
|
||
});
|
||
|
||
test('COMMAND_DESCRIPTIONS has no extra commands not in sets', () => {
|
||
const allCmds = new Set([...READ_COMMANDS, ...WRITE_COMMANDS, ...META_COMMANDS]);
|
||
for (const key of Object.keys(COMMAND_DESCRIPTIONS)) {
|
||
expect(allCmds.has(key)).toBe(true);
|
||
}
|
||
});
|
||
|
||
test('ALL_COMMANDS matches union of all sets', () => {
|
||
const union = new Set([...READ_COMMANDS, ...WRITE_COMMANDS, ...META_COMMANDS]);
|
||
expect(ALL_COMMANDS.size).toBe(union.size);
|
||
for (const cmd of union) {
|
||
expect(ALL_COMMANDS.has(cmd)).toBe(true);
|
||
}
|
||
});
|
||
|
||
test('SNAPSHOT_FLAGS option keys are valid SnapshotOptions fields', () => {
|
||
const validKeys = new Set([
|
||
'interactive', 'compact', 'depth', 'selector',
|
||
'diff', 'annotate', 'outputPath', 'cursorInteractive',
|
||
]);
|
||
for (const flag of SNAPSHOT_FLAGS) {
|
||
expect(validKeys.has(flag.optionKey)).toBe(true);
|
||
}
|
||
});
|
||
});
|
||
|
||
describe('Usage string consistency', () => {
|
||
// Normalize a usage string to its structural skeleton for comparison.
|
||
// Replaces <param-names> with <>, [optional] with [], strips parenthetical hints.
|
||
// This catches format mismatches (e.g., <name>:<value> vs <name> <value>)
|
||
// without tripping on abbreviation differences (e.g., <sel> vs <selector>).
|
||
function skeleton(usage: string): string {
|
||
return usage
|
||
.replace(/\(.*?\)/g, '') // strip parenthetical hints like (e.g., Enter, Tab)
|
||
.replace(/<[^>]*>/g, '<>') // normalize <param-name> → <>
|
||
.replace(/\[[^\]]*\]/g, '[]') // normalize [optional] → []
|
||
.replace(/\s+/g, ' ') // collapse whitespace
|
||
.trim();
|
||
}
|
||
|
||
// Cross-check Usage: patterns in implementation against COMMAND_DESCRIPTIONS
|
||
test('implementation Usage: structural format matches COMMAND_DESCRIPTIONS', () => {
|
||
const implFiles = [
|
||
path.join(ROOT, 'browse', 'src', 'write-commands.ts'),
|
||
path.join(ROOT, 'browse', 'src', 'read-commands.ts'),
|
||
path.join(ROOT, 'browse', 'src', 'meta-commands.ts'),
|
||
];
|
||
|
||
// Extract "Usage: browse <pattern>" from throw new Error(...) calls
|
||
const usagePattern = /throw new Error\(['"`]Usage:\s*browse\s+(.+?)['"`]\)/g;
|
||
const implUsages = new Map<string, string>();
|
||
|
||
for (const file of implFiles) {
|
||
const content = fs.readFileSync(file, 'utf-8');
|
||
let match;
|
||
while ((match = usagePattern.exec(content)) !== null) {
|
||
const usage = match[1].split('\\n')[0].trim();
|
||
const cmd = usage.split(/\s/)[0];
|
||
implUsages.set(cmd, usage);
|
||
}
|
||
}
|
||
|
||
// Compare structural skeletons
|
||
const mismatches: string[] = [];
|
||
for (const [cmd, implUsage] of implUsages) {
|
||
const desc = COMMAND_DESCRIPTIONS[cmd];
|
||
if (!desc) continue;
|
||
if (!desc.usage) continue;
|
||
const descSkel = skeleton(desc.usage);
|
||
const implSkel = skeleton(implUsage);
|
||
if (descSkel !== implSkel) {
|
||
mismatches.push(`${cmd}: docs "${desc.usage}" (${descSkel}) vs impl "${implUsage}" (${implSkel})`);
|
||
}
|
||
}
|
||
|
||
expect(mismatches).toEqual([]);
|
||
});
|
||
});
|
||
|
||
describe('Generated SKILL.md freshness', () => {
|
||
test('no unresolved {{placeholders}} in generated SKILL.md', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
||
const unresolved = content.match(/\{\{\w+\}\}/g);
|
||
expect(unresolved).toBeNull();
|
||
});
|
||
|
||
test('no unresolved {{placeholders}} in generated browse/SKILL.md', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'browse', 'SKILL.md'), 'utf-8');
|
||
const unresolved = content.match(/\{\{\w+\}\}/g);
|
||
expect(unresolved).toBeNull();
|
||
});
|
||
|
||
test('generated SKILL.md has AUTO-GENERATED header', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('AUTO-GENERATED');
|
||
});
|
||
});
|
||
|
||
// --- Update check preamble validation ---
|
||
|
||
describe('Update check preamble', () => {
|
||
const skillsWithUpdateCheck = [
|
||
'SKILL.md', 'browse/SKILL.md', 'qa/SKILL.md',
|
||
'qa-only/SKILL.md',
|
||
'setup-browser-cookies/SKILL.md',
|
||
'ship/SKILL.md', 'review/SKILL.md',
|
||
'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md',
|
||
'retro/SKILL.md',
|
||
'office-hours/SKILL.md', 'investigate/SKILL.md',
|
||
'plan-design-review/SKILL.md',
|
||
'design-review/SKILL.md',
|
||
'design-consultation/SKILL.md',
|
||
'document-release/SKILL.md',
|
||
'canary/SKILL.md',
|
||
'benchmark/SKILL.md',
|
||
'land-and-deploy/SKILL.md',
|
||
'setup-deploy/SKILL.md',
|
||
];
|
||
|
||
for (const skill of skillsWithUpdateCheck) {
|
||
test(`${skill} update check line ends with || true`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
// The second line of the bash block must end with || true
|
||
// to avoid exit code 1 when _UPD is empty (up to date)
|
||
const match = content.match(/\[ -n "\$_UPD" \].*$/m);
|
||
expect(match).not.toBeNull();
|
||
expect(match![0]).toContain('|| true');
|
||
});
|
||
}
|
||
|
||
test('all skills with update check are generated from .tmpl', () => {
|
||
for (const skill of skillsWithUpdateCheck) {
|
||
const tmplPath = path.join(ROOT, skill + '.tmpl');
|
||
expect(fs.existsSync(tmplPath)).toBe(true);
|
||
}
|
||
});
|
||
|
||
test('update check bash block exits 0 when up to date', () => {
|
||
// Simulate the exact preamble command from SKILL.md
|
||
const result = Bun.spawnSync(['bash', '-c',
|
||
'_UPD=$(echo "" || true); [ -n "$_UPD" ] && echo "$_UPD" || true'
|
||
], { stdout: 'pipe', stderr: 'pipe' });
|
||
expect(result.exitCode).toBe(0);
|
||
});
|
||
|
||
test('update check bash block exits 0 when upgrade available', () => {
|
||
const result = Bun.spawnSync(['bash', '-c',
|
||
'_UPD=$(echo "UPGRADE_AVAILABLE 0.3.3 0.4.0" || true); [ -n "$_UPD" ] && echo "$_UPD" || true'
|
||
], { stdout: 'pipe', stderr: 'pipe' });
|
||
expect(result.exitCode).toBe(0);
|
||
expect(result.stdout.toString().trim()).toBe('UPGRADE_AVAILABLE 0.3.3 0.4.0');
|
||
});
|
||
});
|
||
|
||
// --- Part 7: Cross-skill path consistency (A1) ---
|
||
|
||
describe('Cross-skill path consistency', () => {
|
||
test('REMOTE_SLUG derivation pattern is identical across files that use it', () => {
|
||
const patterns = extractRemoteSlugPatterns(ROOT, ['qa', 'review']);
|
||
const allPatterns: string[] = [];
|
||
|
||
for (const [, filePatterns] of patterns) {
|
||
allPatterns.push(...filePatterns);
|
||
}
|
||
|
||
// Should find at least 2 occurrences (qa/SKILL.md + review/greptile-triage.md)
|
||
expect(allPatterns.length).toBeGreaterThanOrEqual(2);
|
||
|
||
// All occurrences must be character-for-character identical
|
||
const unique = new Set(allPatterns);
|
||
if (unique.size > 1) {
|
||
const variants = Array.from(unique);
|
||
throw new Error(
|
||
`REMOTE_SLUG pattern differs across files:\n` +
|
||
variants.map((v, i) => ` ${i + 1}: ${v}`).join('\n')
|
||
);
|
||
}
|
||
});
|
||
|
||
test('all greptile-history write references specify both per-project and global paths', () => {
|
||
const filesToCheck = [
|
||
'review/SKILL.md',
|
||
'ship/SKILL.md',
|
||
'review/greptile-triage.md',
|
||
];
|
||
|
||
for (const file of filesToCheck) {
|
||
const filePath = path.join(ROOT, file);
|
||
if (!fs.existsSync(filePath)) continue;
|
||
const content = fs.readFileSync(filePath, 'utf-8');
|
||
|
||
const hasBoth = (content.includes('per-project') && content.includes('global')) ||
|
||
(content.includes('$REMOTE_SLUG/greptile-history') && content.includes('~/.gstack/greptile-history'));
|
||
|
||
expect(hasBoth).toBe(true);
|
||
}
|
||
});
|
||
|
||
test('greptile-triage.md contains both project and global history paths', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'review', 'greptile-triage.md'), 'utf-8');
|
||
expect(content).toContain('$REMOTE_SLUG/greptile-history.md');
|
||
expect(content).toContain('~/.gstack/greptile-history.md');
|
||
});
|
||
|
||
test('retro/SKILL.md reads global greptile-history (not per-project)', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('~/.gstack/greptile-history.md');
|
||
// Should NOT reference per-project path for reads
|
||
expect(content).not.toContain('$REMOTE_SLUG/greptile-history.md');
|
||
});
|
||
});
|
||
|
||
// --- Part 7: QA skill structure validation (A2) ---
|
||
|
||
describe('QA skill structure validation', () => {
|
||
const qaContent = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
|
||
test('qa/SKILL.md has all 11 phases', () => {
|
||
const phases = [
|
||
'Phase 1', 'Initialize',
|
||
'Phase 2', 'Authenticate',
|
||
'Phase 3', 'Orient',
|
||
'Phase 4', 'Explore',
|
||
'Phase 5', 'Document',
|
||
'Phase 6', 'Wrap Up',
|
||
'Phase 7', 'Triage',
|
||
'Phase 8', 'Fix Loop',
|
||
'Phase 9', 'Final QA',
|
||
'Phase 10', 'Report',
|
||
'Phase 11', 'TODOS',
|
||
];
|
||
for (const phase of phases) {
|
||
expect(qaContent).toContain(phase);
|
||
}
|
||
});
|
||
|
||
test('has all four QA modes defined', () => {
|
||
const modes = [
|
||
'Diff-aware',
|
||
'Full',
|
||
'Quick',
|
||
'Regression',
|
||
];
|
||
for (const mode of modes) {
|
||
expect(qaContent).toContain(mode);
|
||
}
|
||
|
||
// Mode triggers/flags
|
||
expect(qaContent).toContain('--quick');
|
||
expect(qaContent).toContain('--regression');
|
||
});
|
||
|
||
test('has all three tiers defined', () => {
|
||
const tiers = ['Quick', 'Standard', 'Exhaustive'];
|
||
for (const tier of tiers) {
|
||
expect(qaContent).toContain(tier);
|
||
}
|
||
});
|
||
|
||
test('health score weights sum to 100%', () => {
|
||
const weights = extractWeightsFromTable(qaContent);
|
||
expect(weights.size).toBeGreaterThan(0);
|
||
|
||
let sum = 0;
|
||
for (const pct of weights.values()) {
|
||
sum += pct;
|
||
}
|
||
expect(sum).toBe(100);
|
||
});
|
||
|
||
test('health score has all 8 categories', () => {
|
||
const weights = extractWeightsFromTable(qaContent);
|
||
const expectedCategories = [
|
||
'Console', 'Links', 'Visual', 'Functional',
|
||
'UX', 'Performance', 'Content', 'Accessibility',
|
||
];
|
||
for (const cat of expectedCategories) {
|
||
expect(weights.has(cat)).toBe(true);
|
||
}
|
||
expect(weights.size).toBe(8);
|
||
});
|
||
|
||
test('has four mode definitions (Diff-aware/Full/Quick/Regression)', () => {
|
||
expect(qaContent).toContain('### Diff-aware');
|
||
expect(qaContent).toContain('### Full');
|
||
expect(qaContent).toContain('### Quick');
|
||
expect(qaContent).toContain('### Regression');
|
||
});
|
||
|
||
test('output structure references report directory layout', () => {
|
||
expect(qaContent).toContain('qa-report-');
|
||
expect(qaContent).toContain('baseline.json');
|
||
expect(qaContent).toContain('screenshots/');
|
||
expect(qaContent).toContain('.gstack/qa-reports/');
|
||
});
|
||
});
|
||
|
||
// --- Part 7: Greptile history format consistency (A3) ---
|
||
|
||
describe('Greptile history format consistency', () => {
|
||
test('greptile-triage.md defines the canonical history format', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'review', 'greptile-triage.md'), 'utf-8');
|
||
expect(content).toContain('<YYYY-MM-DD>');
|
||
expect(content).toContain('<owner/repo>');
|
||
expect(content).toContain('<type');
|
||
expect(content).toContain('<file-pattern>');
|
||
expect(content).toContain('<category>');
|
||
});
|
||
|
||
test('review/SKILL.md and ship/SKILL.md both reference greptile-triage.md for write details', () => {
|
||
const reviewContent = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
|
||
expect(reviewContent.toLowerCase()).toContain('greptile-triage.md');
|
||
expect(shipContent.toLowerCase()).toContain('greptile-triage.md');
|
||
});
|
||
|
||
test('greptile-triage.md defines all 9 valid categories', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'review', 'greptile-triage.md'), 'utf-8');
|
||
const categories = [
|
||
'race-condition', 'null-check', 'error-handling', 'style',
|
||
'type-safety', 'security', 'performance', 'correctness', 'other',
|
||
];
|
||
for (const cat of categories) {
|
||
expect(content).toContain(cat);
|
||
}
|
||
});
|
||
});
|
||
|
||
// --- Hardcoded branch name detection in templates ---
|
||
|
||
describe('No hardcoded branch names in SKILL templates', () => {
|
||
const tmplFiles = [
|
||
'ship/SKILL.md.tmpl',
|
||
'review/SKILL.md.tmpl',
|
||
'qa/SKILL.md.tmpl',
|
||
'plan-ceo-review/SKILL.md.tmpl',
|
||
'retro/SKILL.md.tmpl',
|
||
'document-release/SKILL.md.tmpl',
|
||
'plan-eng-review/SKILL.md.tmpl',
|
||
'plan-design-review/SKILL.md.tmpl',
|
||
'codex/SKILL.md.tmpl',
|
||
];
|
||
|
||
// Patterns that indicate hardcoded 'main' in git commands
|
||
const gitMainPatterns = [
|
||
/\bgit\s+diff\s+(?:origin\/)?main\b/,
|
||
/\bgit\s+log\s+(?:origin\/)?main\b/,
|
||
/\bgit\s+fetch\s+origin\s+main\b/,
|
||
/\bgit\s+merge\s+origin\/main\b/,
|
||
/\borigin\/main\b/,
|
||
];
|
||
|
||
// Lines that are allowed to mention 'main' (fallback logic, prose)
|
||
const allowlist = [
|
||
/fall\s*back\s+to\s+`main`/i,
|
||
/fall\s*back\s+to\s+`?main`?/i,
|
||
/typically\s+`?main`?/i,
|
||
/If\s+on\s+`main`/i, // old pattern — should not exist
|
||
];
|
||
|
||
for (const tmplFile of tmplFiles) {
|
||
test(`${tmplFile} has no hardcoded 'main' in git commands`, () => {
|
||
const filePath = path.join(ROOT, tmplFile);
|
||
if (!fs.existsSync(filePath)) return;
|
||
const lines = fs.readFileSync(filePath, 'utf-8').split('\n');
|
||
const violations: string[] = [];
|
||
|
||
for (let i = 0; i < lines.length; i++) {
|
||
const line = lines[i];
|
||
const isAllowlisted = allowlist.some(p => p.test(line));
|
||
if (isAllowlisted) continue;
|
||
|
||
for (const pattern of gitMainPatterns) {
|
||
if (pattern.test(line)) {
|
||
violations.push(`Line ${i + 1}: ${line.trim()}`);
|
||
break;
|
||
}
|
||
}
|
||
}
|
||
|
||
if (violations.length > 0) {
|
||
throw new Error(
|
||
`${tmplFile} has hardcoded 'main' in git commands:\n` +
|
||
violations.map(v => ` ${v}`).join('\n')
|
||
);
|
||
}
|
||
});
|
||
}
|
||
});
|
||
|
||
// --- Part 7b: TODOS-format.md reference consistency ---
|
||
|
||
describe('TODOS-format.md reference consistency', () => {
|
||
test('review/TODOS-format.md exists and defines canonical format', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'review', 'TODOS-format.md'), 'utf-8');
|
||
expect(content).toContain('**What:**');
|
||
expect(content).toContain('**Why:**');
|
||
expect(content).toContain('**Priority:**');
|
||
expect(content).toContain('**Effort:**');
|
||
expect(content).toContain('## Completed');
|
||
});
|
||
|
||
test('skills that write TODOs reference TODOS-format.md', () => {
|
||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
const ceoPlanContent = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
||
const engPlanContent = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
|
||
|
||
expect(shipContent).toContain('TODOS-format.md');
|
||
expect(ceoPlanContent).toContain('TODOS-format.md');
|
||
expect(engPlanContent).toContain('TODOS-format.md');
|
||
});
|
||
});
|
||
|
||
// --- v0.4.1 feature coverage: RECOMMENDATION format, session awareness, enum completeness ---
|
||
|
||
describe('v0.4.1 preamble features', () => {
|
||
const skillsWithPreamble = [
|
||
'SKILL.md', 'browse/SKILL.md', 'qa/SKILL.md',
|
||
'qa-only/SKILL.md',
|
||
'setup-browser-cookies/SKILL.md',
|
||
'ship/SKILL.md', 'review/SKILL.md',
|
||
'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md',
|
||
'retro/SKILL.md',
|
||
'office-hours/SKILL.md', 'investigate/SKILL.md',
|
||
'plan-design-review/SKILL.md',
|
||
'design-review/SKILL.md',
|
||
'design-consultation/SKILL.md',
|
||
'document-release/SKILL.md',
|
||
'canary/SKILL.md',
|
||
'benchmark/SKILL.md',
|
||
'land-and-deploy/SKILL.md',
|
||
'setup-deploy/SKILL.md',
|
||
];
|
||
|
||
for (const skill of skillsWithPreamble) {
|
||
test(`${skill} contains RECOMMENDATION format`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
expect(content).toContain('RECOMMENDATION: Choose');
|
||
expect(content).toContain('AskUserQuestion');
|
||
});
|
||
|
||
test(`${skill} contains session awareness`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
expect(content).toContain('_SESSIONS');
|
||
expect(content).toContain('RECOMMENDATION');
|
||
});
|
||
}
|
||
|
||
for (const skill of skillsWithPreamble) {
|
||
test(`${skill} contains escalation protocol`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
expect(content).toContain('DONE_WITH_CONCERNS');
|
||
expect(content).toContain('BLOCKED');
|
||
expect(content).toContain('NEEDS_CONTEXT');
|
||
});
|
||
}
|
||
});
|
||
|
||
// --- Structural tests for new skills ---
|
||
|
||
describe('office-hours skill structure', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'office-hours', 'SKILL.md'), 'utf-8');
|
||
|
||
// Original structural assertions
|
||
for (const section of ['Phase 1', 'Phase 2', 'Phase 3', 'Phase 4', 'Phase 5', 'Phase 6',
|
||
'Design Doc', 'Supersedes', 'APPROVED', 'Premise Challenge',
|
||
'Alternatives', 'Smart-skip']) {
|
||
test(`contains ${section}`, () => expect(content).toContain(section));
|
||
}
|
||
|
||
// Dual-mode structure
|
||
for (const section of ['Startup mode', 'Builder mode']) {
|
||
test(`contains ${section}`, () => expect(content).toContain(section));
|
||
}
|
||
|
||
// Mode detection question
|
||
test('contains explicit mode detection question', () => {
|
||
expect(content).toContain("what's your goal");
|
||
});
|
||
|
||
// Six forcing questions (startup mode)
|
||
for (const question of ['Demand Reality', 'Status Quo', 'Desperate Specificity',
|
||
'Narrowest Wedge', 'Observation & Surprise', 'Future-Fit']) {
|
||
test(`contains forcing question: ${question}`, () => expect(content).toContain(question));
|
||
}
|
||
|
||
// Builder mode questions
|
||
test('contains builder brainstorming questions', () => {
|
||
expect(content).toContain('coolest version');
|
||
expect(content).toContain('delightful');
|
||
});
|
||
|
||
// Intrapreneurship adaptation
|
||
test('contains intrapreneurship adaptation', () => {
|
||
expect(content).toContain('Intrapreneurship');
|
||
});
|
||
|
||
// YC founder discovery engine
|
||
test('contains YC apply CTA with ref tracking', () => {
|
||
expect(content).toContain('ycombinator.com/apply?ref=gstack');
|
||
});
|
||
|
||
test('contains "What I noticed" design doc section', () => {
|
||
expect(content).toContain('What I noticed about how you think');
|
||
});
|
||
|
||
test('contains golden age framing', () => {
|
||
expect(content).toContain('golden age');
|
||
});
|
||
|
||
test('contains Garry Tan personal plea', () => {
|
||
expect(content).toContain('Garry Tan, the creator of GStack');
|
||
});
|
||
|
||
test('contains founder signal synthesis phase', () => {
|
||
expect(content).toContain('Founder Signal Synthesis');
|
||
});
|
||
|
||
test('contains three-tier decision rubric', () => {
|
||
expect(content).toContain('Top tier');
|
||
expect(content).toContain('Middle tier');
|
||
expect(content).toContain('Base tier');
|
||
});
|
||
|
||
test('contains anti-slop examples', () => {
|
||
expect(content).toContain('GOOD:');
|
||
expect(content).toContain('BAD:');
|
||
});
|
||
|
||
test('contains "One more thing" transition beat', () => {
|
||
expect(content).toContain('One more thing');
|
||
});
|
||
|
||
// Operating principles per mode
|
||
test('contains startup operating principles', () => {
|
||
expect(content).toContain('Specificity is the only currency');
|
||
});
|
||
|
||
test('contains builder operating principles', () => {
|
||
expect(content).toContain('Delight is the currency');
|
||
});
|
||
|
||
// Spec Review Loop (Phase 5.5)
|
||
test('contains spec review loop', () => {
|
||
expect(content).toContain('Spec Review Loop');
|
||
});
|
||
|
||
test('contains adversarial review dimensions', () => {
|
||
for (const dim of ['Completeness', 'Consistency', 'Clarity', 'Scope', 'Feasibility']) {
|
||
expect(content).toContain(dim);
|
||
}
|
||
});
|
||
|
||
test('contains subagent dispatch instruction', () => {
|
||
expect(content).toMatch(/Agent.*tool|subagent/i);
|
||
});
|
||
|
||
test('contains max 3 iterations', () => {
|
||
expect(content).toMatch(/3.*iteration|maximum.*3/i);
|
||
});
|
||
|
||
test('contains quality score', () => {
|
||
expect(content).toContain('quality score');
|
||
});
|
||
|
||
test('contains spec review metrics path', () => {
|
||
expect(content).toContain('spec-review.jsonl');
|
||
});
|
||
|
||
test('contains convergence guard', () => {
|
||
expect(content).toMatch(/convergence/i);
|
||
});
|
||
|
||
// Visual Sketch (Phase 4.5)
|
||
test('contains visual sketch section', () => {
|
||
expect(content).toContain('Visual Sketch');
|
||
});
|
||
|
||
test('contains wireframe generation', () => {
|
||
expect(content).toMatch(/wireframe|sketch/i);
|
||
});
|
||
|
||
test('contains DESIGN.md awareness', () => {
|
||
expect(content).toContain('DESIGN.md');
|
||
});
|
||
|
||
test('contains browse rendering', () => {
|
||
expect(content).toContain('$B goto');
|
||
expect(content).toContain('$B screenshot');
|
||
});
|
||
|
||
test('contains rough aesthetic instruction', () => {
|
||
expect(content).toMatch(/rough|hand-drawn/i);
|
||
});
|
||
});
|
||
|
||
describe('investigate skill structure', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'investigate', 'SKILL.md'), 'utf-8');
|
||
for (const section of ['Iron Law', 'Root Cause', 'Pattern Analysis', 'Hypothesis',
|
||
'DEBUG REPORT', '3-strike', 'BLOCKED']) {
|
||
test(`contains ${section}`, () => expect(content).toContain(section));
|
||
}
|
||
});
|
||
|
||
// --- Contributor mode preamble structure validation ---
|
||
|
||
describe('Contributor mode preamble structure', () => {
|
||
const skillsWithPreamble = [
|
||
'SKILL.md', 'browse/SKILL.md', 'qa/SKILL.md',
|
||
'qa-only/SKILL.md',
|
||
'setup-browser-cookies/SKILL.md',
|
||
'ship/SKILL.md', 'review/SKILL.md',
|
||
'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md',
|
||
'retro/SKILL.md',
|
||
'plan-design-review/SKILL.md',
|
||
'design-review/SKILL.md',
|
||
'design-consultation/SKILL.md',
|
||
'document-release/SKILL.md',
|
||
'canary/SKILL.md',
|
||
'benchmark/SKILL.md',
|
||
'land-and-deploy/SKILL.md',
|
||
'setup-deploy/SKILL.md',
|
||
];
|
||
|
||
for (const skill of skillsWithPreamble) {
|
||
test(`${skill} has 0-10 rating in contributor mode`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
expect(content).toContain('0 to 10');
|
||
expect(content).toContain('My rating');
|
||
});
|
||
|
||
test(`${skill} has calibration example`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
expect(content).toContain('Calibration');
|
||
expect(content).toContain('the bar');
|
||
});
|
||
|
||
test(`${skill} has "what would make this a 10" field`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
expect(content).toContain('What would make this a 10');
|
||
});
|
||
|
||
test(`${skill} uses periodic reflection (not per-command)`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
expect(content).toContain('workflow step');
|
||
expect(content).not.toContain('After you use gstack-provided CLIs');
|
||
});
|
||
}
|
||
});
|
||
|
||
describe('Enum & Value Completeness in review checklist', () => {
|
||
const checklist = fs.readFileSync(path.join(ROOT, 'review', 'checklist.md'), 'utf-8');
|
||
|
||
test('checklist has Enum & Value Completeness section', () => {
|
||
expect(checklist).toContain('Enum & Value Completeness');
|
||
});
|
||
|
||
test('Enum & Value Completeness is classified as CRITICAL', () => {
|
||
// It should appear under Pass 1 — CRITICAL, not Pass 2
|
||
const pass1Start = checklist.indexOf('### Pass 1');
|
||
const pass2Start = checklist.indexOf('### Pass 2');
|
||
const enumStart = checklist.indexOf('Enum & Value Completeness');
|
||
expect(enumStart).toBeGreaterThan(pass1Start);
|
||
expect(enumStart).toBeLessThan(pass2Start);
|
||
});
|
||
|
||
test('Enum & Value Completeness mentions tracing through consumers', () => {
|
||
expect(checklist).toContain('Trace it through every consumer');
|
||
expect(checklist).toContain('case');
|
||
expect(checklist).toContain('allowlist');
|
||
});
|
||
|
||
test('Enum & Value Completeness is in the severity classification as CRITICAL', () => {
|
||
const gateSection = checklist.slice(checklist.indexOf('## Severity Classification'));
|
||
// The ASCII art has CRITICAL on the left and INFORMATIONAL on the right
|
||
// Enum & Value Completeness should appear on a line with the CRITICAL tree (├─ or └─)
|
||
const enumLine = gateSection.split('\n').find(l => l.includes('Enum & Value Completeness'));
|
||
expect(enumLine).toBeDefined();
|
||
// It's on the left (CRITICAL) side — starts with ├─ or └─
|
||
expect(enumLine!.trimStart().startsWith('├─') || enumLine!.trimStart().startsWith('└─')).toBe(true);
|
||
});
|
||
|
||
test('Fix-First Heuristic exists in checklist and is referenced by review + ship', () => {
|
||
expect(checklist).toContain('## Fix-First Heuristic');
|
||
expect(checklist).toContain('AUTO-FIX');
|
||
expect(checklist).toContain('ASK');
|
||
|
||
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review/SKILL.md'), 'utf-8');
|
||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship/SKILL.md'), 'utf-8');
|
||
expect(reviewSkill).toContain('AUTO-FIX');
|
||
expect(reviewSkill).toContain('[AUTO-FIXED]');
|
||
expect(shipSkill).toContain('AUTO-FIX');
|
||
expect(shipSkill).toContain('[AUTO-FIXED]');
|
||
});
|
||
});
|
||
|
||
// --- Completeness Principle spot-check ---
|
||
|
||
describe('Completeness Principle in generated SKILL.md files', () => {
|
||
const skillsWithPreamble = [
|
||
'SKILL.md', 'browse/SKILL.md', 'qa/SKILL.md',
|
||
'qa-only/SKILL.md',
|
||
'setup-browser-cookies/SKILL.md',
|
||
'ship/SKILL.md', 'review/SKILL.md',
|
||
'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md',
|
||
'retro/SKILL.md',
|
||
'plan-design-review/SKILL.md',
|
||
'design-review/SKILL.md',
|
||
'design-consultation/SKILL.md',
|
||
'document-release/SKILL.md',
|
||
];
|
||
|
||
for (const skill of skillsWithPreamble) {
|
||
test(`${skill} contains Completeness Principle section`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
expect(content).toContain('Completeness Principle');
|
||
expect(content).toContain('Boil the Lake');
|
||
});
|
||
}
|
||
|
||
test('Completeness Principle includes compression table', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('CC+gstack');
|
||
expect(content).toContain('Compression');
|
||
});
|
||
|
||
test('Completeness Principle includes anti-patterns', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('BAD:');
|
||
expect(content).toContain('Anti-patterns');
|
||
});
|
||
});
|
||
|
||
// --- Part 7: Planted-bug fixture validation (A4) ---
|
||
|
||
describe('Planted-bug fixture validation', () => {
|
||
test('qa-eval ground truth has exactly 5 planted bugs', () => {
|
||
const groundTruth = JSON.parse(
|
||
fs.readFileSync(path.join(ROOT, 'test', 'fixtures', 'qa-eval-ground-truth.json'), 'utf-8')
|
||
);
|
||
expect(groundTruth.bugs).toHaveLength(5);
|
||
expect(groundTruth.total_bugs).toBe(5);
|
||
});
|
||
|
||
test('qa-eval-spa ground truth has exactly 5 planted bugs', () => {
|
||
const groundTruth = JSON.parse(
|
||
fs.readFileSync(path.join(ROOT, 'test', 'fixtures', 'qa-eval-spa-ground-truth.json'), 'utf-8')
|
||
);
|
||
expect(groundTruth.bugs).toHaveLength(5);
|
||
expect(groundTruth.total_bugs).toBe(5);
|
||
});
|
||
|
||
test('qa-eval-checkout ground truth has exactly 5 planted bugs', () => {
|
||
const groundTruth = JSON.parse(
|
||
fs.readFileSync(path.join(ROOT, 'test', 'fixtures', 'qa-eval-checkout-ground-truth.json'), 'utf-8')
|
||
);
|
||
expect(groundTruth.bugs).toHaveLength(5);
|
||
expect(groundTruth.total_bugs).toBe(5);
|
||
});
|
||
|
||
test('qa-eval.html contains the planted bugs', () => {
|
||
const html = fs.readFileSync(path.join(ROOT, 'browse', 'test', 'fixtures', 'qa-eval.html'), 'utf-8');
|
||
// BUG 1: broken link
|
||
expect(html).toContain('/nonexistent-404-page');
|
||
// BUG 2: disabled submit
|
||
expect(html).toContain('disabled');
|
||
// BUG 3: overflow
|
||
expect(html).toContain('overflow: hidden');
|
||
// BUG 4: missing alt
|
||
expect(html).toMatch(/<img[^>]*src="\/logo\.png"[^>]*>/);
|
||
expect(html).not.toMatch(/<img[^>]*src="\/logo\.png"[^>]*alt=/);
|
||
// BUG 5: console error
|
||
expect(html).toContain("Cannot read properties of undefined");
|
||
});
|
||
|
||
test('review-eval-vuln.rb contains expected vulnerability patterns', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'test', 'fixtures', 'review-eval-vuln.rb'), 'utf-8');
|
||
expect(content).toContain('params[:id]');
|
||
expect(content).toContain('update_column');
|
||
});
|
||
});
|
||
|
||
// --- CEO review mode validation ---
|
||
|
||
describe('CEO review mode validation', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
||
|
||
test('has all four CEO review modes defined', () => {
|
||
const modes = ['SCOPE EXPANSION', 'SELECTIVE EXPANSION', 'HOLD SCOPE', 'SCOPE REDUCTION'];
|
||
for (const mode of modes) {
|
||
expect(content).toContain(mode);
|
||
}
|
||
});
|
||
|
||
test('has CEO plan persistence step', () => {
|
||
expect(content).toContain('ceo-plans');
|
||
expect(content).toContain('status: ACTIVE');
|
||
});
|
||
|
||
test('has docs/designs promotion section', () => {
|
||
expect(content).toContain('docs/designs');
|
||
expect(content).toContain('PROMOTED');
|
||
});
|
||
|
||
test('mode quick reference has four columns', () => {
|
||
expect(content).toContain('EXPANSION');
|
||
expect(content).toContain('SELECTIVE');
|
||
expect(content).toContain('HOLD SCOPE');
|
||
expect(content).toContain('REDUCTION');
|
||
});
|
||
|
||
// Skill chaining (benefits-from)
|
||
test('contains prerequisite skill offer for office-hours', () => {
|
||
expect(content).toContain('Prerequisite Skill Offer');
|
||
expect(content).toContain('/office-hours');
|
||
});
|
||
|
||
test('contains mid-session detection', () => {
|
||
expect(content).toContain('Mid-session detection');
|
||
expect(content).toMatch(/still figuring out|seems lost/i);
|
||
});
|
||
|
||
// Spec review on CEO plans
|
||
test('contains spec review loop for CEO plan documents', () => {
|
||
expect(content).toContain('Spec Review Loop');
|
||
});
|
||
});
|
||
|
||
// --- gstack-slug helper ---
|
||
|
||
describe('gstack-slug', () => {
|
||
const SLUG_BIN = path.join(ROOT, 'bin', 'gstack-slug');
|
||
|
||
test('binary exists and is executable', () => {
|
||
expect(fs.existsSync(SLUG_BIN)).toBe(true);
|
||
const stat = fs.statSync(SLUG_BIN);
|
||
expect(stat.mode & 0o111).toBeGreaterThan(0);
|
||
});
|
||
|
||
test('outputs SLUG and BRANCH lines in a git repo', () => {
|
||
const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' });
|
||
expect(result.exitCode).toBe(0);
|
||
const output = result.stdout.toString();
|
||
expect(output).toContain('SLUG=');
|
||
expect(output).toContain('BRANCH=');
|
||
});
|
||
|
||
test('SLUG does not contain forward slashes', () => {
|
||
const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' });
|
||
const slug = result.stdout.toString().match(/SLUG=(.*)/)?.[1] ?? '';
|
||
expect(slug).not.toContain('/');
|
||
expect(slug.length).toBeGreaterThan(0);
|
||
});
|
||
|
||
test('BRANCH does not contain forward slashes', () => {
|
||
const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' });
|
||
const branch = result.stdout.toString().match(/BRANCH=(.*)/)?.[1] ?? '';
|
||
expect(branch).not.toContain('/');
|
||
expect(branch.length).toBeGreaterThan(0);
|
||
});
|
||
|
||
test('output is eval-compatible (KEY=VALUE format)', () => {
|
||
const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' });
|
||
const lines = result.stdout.toString().trim().split('\n');
|
||
expect(lines.length).toBe(2);
|
||
expect(lines[0]).toMatch(/^SLUG=.+/);
|
||
expect(lines[1]).toMatch(/^BRANCH=.+/);
|
||
});
|
||
});
|
||
|
||
// --- Test Bootstrap validation ---
|
||
|
||
describe('Test Bootstrap ({{TEST_BOOTSTRAP}}) integration', () => {
|
||
test('TEST_BOOTSTRAP resolver produces valid content', () => {
|
||
const qaContent = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(qaContent).toContain('Test Framework Bootstrap');
|
||
expect(qaContent).toContain('RUNTIME:ruby');
|
||
expect(qaContent).toContain('RUNTIME:node');
|
||
expect(qaContent).toContain('RUNTIME:python');
|
||
expect(qaContent).toContain('no-test-bootstrap');
|
||
expect(qaContent).toContain('BOOTSTRAP_DECLINED');
|
||
});
|
||
|
||
test('TEST_BOOTSTRAP appears in qa/SKILL.md', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Test Framework Bootstrap');
|
||
expect(content).toContain('TESTING.md');
|
||
expect(content).toContain('CLAUDE.md');
|
||
});
|
||
|
||
test('TEST_BOOTSTRAP appears in ship/SKILL.md', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Test Framework Bootstrap');
|
||
expect(content).toContain('Step 2.5');
|
||
});
|
||
|
||
test('TEST_BOOTSTRAP appears in design-review/SKILL.md', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'design-review', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Test Framework Bootstrap');
|
||
});
|
||
|
||
test('TEST_BOOTSTRAP does NOT appear in qa-only/SKILL.md', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa-only', 'SKILL.md'), 'utf-8');
|
||
expect(content).not.toContain('Test Framework Bootstrap');
|
||
// But should have the recommendation note
|
||
expect(content).toContain('No test framework detected');
|
||
expect(content).toContain('Run `/qa` to bootstrap');
|
||
});
|
||
|
||
test('bootstrap includes framework knowledge table', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('vitest');
|
||
expect(content).toContain('minitest');
|
||
expect(content).toContain('pytest');
|
||
expect(content).toContain('cargo test');
|
||
expect(content).toContain('phpunit');
|
||
expect(content).toContain('ExUnit');
|
||
});
|
||
|
||
test('bootstrap includes CI/CD pipeline generation', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('.github/workflows/test.yml');
|
||
expect(content).toContain('GitHub Actions');
|
||
});
|
||
|
||
test('bootstrap includes first real tests step', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('First real tests');
|
||
expect(content).toContain('git log --since=30.days');
|
||
expect(content).toContain('Prioritize by risk');
|
||
});
|
||
|
||
test('bootstrap includes vibe coding philosophy', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('vibe coding');
|
||
expect(content).toContain('100% test coverage');
|
||
});
|
||
|
||
test('WebSearch is in allowed-tools for qa, ship, design-review', () => {
|
||
const qa = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
const ship = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
const qaDesign = fs.readFileSync(path.join(ROOT, 'design-review', 'SKILL.md'), 'utf-8');
|
||
expect(qa).toContain('WebSearch');
|
||
expect(ship).toContain('WebSearch');
|
||
expect(qaDesign).toContain('WebSearch');
|
||
});
|
||
});
|
||
|
||
// --- Phase 8e.5 regression test validation ---
|
||
|
||
describe('Phase 8e.5 regression test generation', () => {
|
||
test('qa/SKILL.md contains Phase 8e.5', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('8e.5. Regression Test');
|
||
expect(content).toContain('test(qa): regression test');
|
||
expect(content).toContain('WTF-likelihood exclusion');
|
||
});
|
||
|
||
test('qa/SKILL.md Rule 13 is amended for regression tests', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Only modify tests when generating regression tests in Phase 8e.5');
|
||
expect(content).not.toContain('Never modify tests or CI configuration');
|
||
});
|
||
|
||
test('design-review has CSS-aware Phase 8e.5 variant', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'design-review', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('8e.5. Regression Test (design-review variant)');
|
||
expect(content).toContain('CSS-only');
|
||
expect(content).toContain('test(design): regression test');
|
||
});
|
||
|
||
test('regression test includes full attribution comment format', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('// Regression: ISSUE-NNN');
|
||
expect(content).toContain('// Found by /qa on');
|
||
expect(content).toContain('// Report: .gstack/qa-reports/');
|
||
});
|
||
|
||
test('regression test uses auto-incrementing names', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('auto-incrementing');
|
||
expect(content).toContain('max number + 1');
|
||
});
|
||
});
|
||
|
||
// --- Step 3.4 coverage audit validation ---
|
||
|
||
describe('Step 3.4 test coverage audit', () => {
|
||
test('ship/SKILL.md contains Step 3.4', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Step 3.4: Test Coverage Audit');
|
||
expect(content).toContain('CODE PATH COVERAGE');
|
||
});
|
||
|
||
test('Step 3.4 includes quality scoring rubric', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('★★★');
|
||
expect(content).toContain('★★');
|
||
expect(content).toContain('edge cases AND error paths');
|
||
expect(content).toContain('happy path only');
|
||
});
|
||
|
||
test('Step 3.4 includes before/after test count', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Count test files before');
|
||
expect(content).toContain('Count test files after');
|
||
});
|
||
|
||
test('ship PR body includes Test Coverage section', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('## Test Coverage');
|
||
});
|
||
|
||
test('ship rules include test generation rule', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Step 3.4 generates coverage tests');
|
||
expect(content).toContain('Never commit failing tests');
|
||
});
|
||
|
||
test('Step 3.4 includes vibe coding philosophy', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('vibe coding becomes yolo coding');
|
||
});
|
||
|
||
test('Step 3.4 traces actual codepaths, not just syntax', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Trace every codepath');
|
||
expect(content).toContain('Trace data flow');
|
||
expect(content).toContain('Diagram the execution');
|
||
});
|
||
|
||
test('Step 3.4 maps user flows and interaction edge cases', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Map user flows');
|
||
expect(content).toContain('Interaction edge cases');
|
||
expect(content).toContain('Double-click');
|
||
expect(content).toContain('Navigate away');
|
||
expect(content).toContain('Error states the user can see');
|
||
expect(content).toContain('Empty/zero/boundary states');
|
||
});
|
||
|
||
test('Step 3.4 diagram includes USER FLOW COVERAGE section', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('USER FLOW COVERAGE');
|
||
expect(content).toContain('Code paths:');
|
||
expect(content).toContain('User flows:');
|
||
});
|
||
});
|
||
|
||
// --- Retro test health validation ---
|
||
|
||
describe('Retro test health tracking', () => {
|
||
test('retro/SKILL.md has test health data gathering commands', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('# 10. Test file count');
|
||
expect(content).toContain('# 11. Regression test commits');
|
||
expect(content).toContain('# 12. Test files changed');
|
||
});
|
||
|
||
test('retro/SKILL.md has Test Health metrics row', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Test Health');
|
||
expect(content).toContain('regression tests');
|
||
});
|
||
|
||
test('retro/SKILL.md has Test Health narrative section', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('### Test Health');
|
||
expect(content).toContain('Total test files');
|
||
expect(content).toContain('vibe coding safe');
|
||
});
|
||
|
||
test('retro JSON schema includes test_health field', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('test_health');
|
||
expect(content).toContain('total_test_files');
|
||
expect(content).toContain('regression_test_commits');
|
||
});
|
||
});
|
||
|
||
// --- QA report template regression tests section ---
|
||
|
||
describe('QA report template', () => {
|
||
test('qa-report-template.md has Regression Tests section', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'templates', 'qa-report-template.md'), 'utf-8');
|
||
expect(content).toContain('## Regression Tests');
|
||
expect(content).toContain('committed / deferred / skipped');
|
||
expect(content).toContain('### Deferred Tests');
|
||
expect(content).toContain('**Precondition:**');
|
||
});
|
||
});
|
||
|
||
// --- Codex skill validation ---
|
||
|
||
describe('Codex skill', () => {
|
||
test('codex/SKILL.md exists and has correct frontmatter', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('name: codex');
|
||
expect(content).toContain('version: 1.0.0');
|
||
expect(content).toContain('allowed-tools:');
|
||
});
|
||
|
||
test('codex/SKILL.md contains all three modes', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Step 2A: Review Mode');
|
||
expect(content).toContain('Step 2B: Challenge');
|
||
expect(content).toContain('Step 2C: Consult Mode');
|
||
});
|
||
|
||
test('codex/SKILL.md contains gate verdict logic', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('[P1]');
|
||
expect(content).toContain('GATE: PASS');
|
||
expect(content).toContain('GATE: FAIL');
|
||
});
|
||
|
||
test('codex/SKILL.md contains session continuity', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('codex-session-id');
|
||
expect(content).toContain('codex exec resume');
|
||
});
|
||
|
||
test('codex/SKILL.md contains cost tracking', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('tokens used');
|
||
expect(content).toContain('Est. cost');
|
||
});
|
||
|
||
test('codex/SKILL.md contains cross-model comparison', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('CROSS-MODEL ANALYSIS');
|
||
expect(content).toContain('Agreement rate');
|
||
});
|
||
|
||
test('codex/SKILL.md contains review log persistence', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('codex-review');
|
||
expect(content).toContain('gstack-review-log');
|
||
});
|
||
|
||
test('codex/SKILL.md uses which for binary discovery, not hardcoded path', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('which codex');
|
||
expect(content).not.toContain('/opt/homebrew/bin/codex');
|
||
});
|
||
|
||
test('codex/SKILL.md contains error handling for missing binary and auth', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('NOT_FOUND');
|
||
expect(content).toContain('codex login');
|
||
});
|
||
|
||
test('codex/SKILL.md uses mktemp for temp files', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('mktemp');
|
||
});
|
||
|
||
test('adversarial review in /review auto-scales by diff size', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Adversarial review (auto-scaled)');
|
||
// Diff size thresholds
|
||
expect(content).toContain('< 50');
|
||
expect(content).toContain('50–199');
|
||
expect(content).toContain('200+');
|
||
// All three tiers present
|
||
expect(content).toContain('Small');
|
||
expect(content).toContain('Medium tier');
|
||
expect(content).toContain('Large tier');
|
||
// Claude adversarial subagent dispatch
|
||
expect(content).toContain('Agent tool');
|
||
expect(content).toContain('FIXABLE');
|
||
expect(content).toContain('INVESTIGATE');
|
||
// Codex fallback logic
|
||
expect(content).toContain('CODEX_NOT_AVAILABLE');
|
||
expect(content).toContain('fall back to the Claude adversarial subagent');
|
||
// Review log uses new skill name
|
||
expect(content).toContain('adversarial-review');
|
||
expect(content).toContain('xhigh');
|
||
expect(content).toContain('ADVERSARIAL REVIEW SYNTHESIS');
|
||
});
|
||
|
||
test('adversarial review in /ship auto-scales by diff size', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Adversarial review (auto-scaled)');
|
||
expect(content).toContain('< 50');
|
||
expect(content).toContain('200+');
|
||
expect(content).toContain('adversarial-review');
|
||
expect(content).toContain('xhigh');
|
||
expect(content).toContain('Investigate and fix');
|
||
});
|
||
|
||
test('codex-host ship/review do NOT contain adversarial review step', () => {
|
||
const shipContent = fs.readFileSync(path.join(ROOT, '.agents', 'skills', 'gstack-ship', 'SKILL.md'), 'utf-8');
|
||
expect(shipContent).not.toContain('codex review --base');
|
||
expect(shipContent).not.toContain('CODEX_REVIEWS');
|
||
|
||
const reviewContent = fs.readFileSync(path.join(ROOT, '.agents', 'skills', 'gstack-review', 'SKILL.md'), 'utf-8');
|
||
expect(reviewContent).not.toContain('codex review --base');
|
||
expect(reviewContent).not.toContain('codex_reviews');
|
||
expect(reviewContent).not.toContain('CODEX_REVIEWS');
|
||
expect(reviewContent).not.toContain('adversarial-review');
|
||
expect(reviewContent).not.toContain('Investigate and fix');
|
||
});
|
||
|
||
test('codex integration in /plan-eng-review offers plan critique', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Codex');
|
||
expect(content).toContain('codex exec');
|
||
});
|
||
|
||
test('Review Readiness Dashboard includes Adversarial Review row', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Adversarial');
|
||
expect(content).toContain('codex-review');
|
||
});
|
||
});
|
||
|
||
// --- Trigger phrase validation ---
|
||
|
||
describe('Skill trigger phrases', () => {
|
||
// Skills that must have "Use when" trigger phrases in their description.
|
||
// Excluded: root gstack (browser tool), gstack-upgrade (gstack-specific),
|
||
// humanizer (text tool)
|
||
const SKILLS_REQUIRING_TRIGGERS = [
|
||
'qa', 'qa-only', 'ship', 'review', 'investigate', 'office-hours',
|
||
'plan-ceo-review', 'plan-eng-review', 'plan-design-review',
|
||
'design-review', 'design-consultation', 'retro', 'document-release',
|
||
'codex', 'browse', 'setup-browser-cookies',
|
||
];
|
||
|
||
for (const skill of SKILLS_REQUIRING_TRIGGERS) {
|
||
test(`${skill}/SKILL.md has "Use when" trigger phrases`, () => {
|
||
const skillPath = path.join(ROOT, skill, 'SKILL.md');
|
||
if (!fs.existsSync(skillPath)) return;
|
||
const content = fs.readFileSync(skillPath, 'utf-8');
|
||
// Extract description from frontmatter
|
||
const frontmatterEnd = content.indexOf('---', 4);
|
||
const frontmatter = content.slice(0, frontmatterEnd);
|
||
expect(frontmatter).toMatch(/Use when/i);
|
||
});
|
||
}
|
||
|
||
// Skills with proactive triggers should have "Proactively suggest" in description
|
||
const SKILLS_REQUIRING_PROACTIVE = [
|
||
'qa', 'qa-only', 'ship', 'review', 'investigate', 'office-hours',
|
||
'plan-ceo-review', 'plan-eng-review', 'plan-design-review',
|
||
'design-review', 'design-consultation', 'retro', 'document-release',
|
||
];
|
||
|
||
for (const skill of SKILLS_REQUIRING_PROACTIVE) {
|
||
test(`${skill}/SKILL.md has "Proactively suggest" phrase`, () => {
|
||
const skillPath = path.join(ROOT, skill, 'SKILL.md');
|
||
if (!fs.existsSync(skillPath)) return;
|
||
const content = fs.readFileSync(skillPath, 'utf-8');
|
||
const frontmatterEnd = content.indexOf('---', 4);
|
||
const frontmatter = content.slice(0, frontmatterEnd);
|
||
expect(frontmatter).toMatch(/Proactively suggest/i);
|
||
});
|
||
}
|
||
});
|
||
|
||
// ─── Codex Skill Validation ──────────────────────────────────
|
||
|
||
describe('Codex skill validation', () => {
|
||
const AGENTS_DIR = path.join(ROOT, '.agents', 'skills');
|
||
|
||
// Discover all Claude skills with templates (except /codex which is Claude-only)
|
||
const CLAUDE_SKILLS_WITH_TEMPLATES = (() => {
|
||
const skills: string[] = [];
|
||
for (const entry of fs.readdirSync(ROOT, { withFileTypes: true })) {
|
||
if (!entry.isDirectory() || entry.name.startsWith('.') || entry.name === 'node_modules') continue;
|
||
if (entry.name === 'codex') continue; // Claude-only skill
|
||
if (fs.existsSync(path.join(ROOT, entry.name, 'SKILL.md.tmpl'))) {
|
||
skills.push(entry.name);
|
||
}
|
||
}
|
||
return skills;
|
||
})();
|
||
|
||
test('all skills (except /codex) have both Claude and Codex variants', () => {
|
||
for (const skillDir of CLAUDE_SKILLS_WITH_TEMPLATES) {
|
||
// Claude variant
|
||
const claudeMd = path.join(ROOT, skillDir, 'SKILL.md');
|
||
expect(fs.existsSync(claudeMd)).toBe(true);
|
||
|
||
// Codex variant
|
||
const codexName = skillDir.startsWith('gstack-') ? skillDir : `gstack-${skillDir}`;
|
||
const codexMd = path.join(AGENTS_DIR, codexName, 'SKILL.md');
|
||
expect(fs.existsSync(codexMd)).toBe(true);
|
||
}
|
||
// Root template has both too
|
||
expect(fs.existsSync(path.join(ROOT, 'SKILL.md'))).toBe(true);
|
||
expect(fs.existsSync(path.join(AGENTS_DIR, 'gstack', 'SKILL.md'))).toBe(true);
|
||
});
|
||
|
||
test('/codex skill is Claude-only — no Codex variant', () => {
|
||
// Claude variant should exist
|
||
expect(fs.existsSync(path.join(ROOT, 'codex', 'SKILL.md'))).toBe(true);
|
||
// Codex variant must NOT exist
|
||
expect(fs.existsSync(path.join(AGENTS_DIR, 'gstack-codex', 'SKILL.md'))).toBe(false);
|
||
});
|
||
|
||
test('Codex skill names follow gstack-{name} convention', () => {
|
||
const codexDirs = fs.readdirSync(AGENTS_DIR);
|
||
for (const dir of codexDirs) {
|
||
// Every directory should start with gstack
|
||
expect(dir.startsWith('gstack')).toBe(true);
|
||
// Root is just 'gstack', others are 'gstack-{name}'
|
||
if (dir !== 'gstack') {
|
||
expect(dir.startsWith('gstack-')).toBe(true);
|
||
}
|
||
}
|
||
});
|
||
|
||
test('$B commands in Codex SKILL.md files are valid browse commands', () => {
|
||
const codexDirs = fs.readdirSync(AGENTS_DIR);
|
||
for (const dir of codexDirs) {
|
||
const skillMd = path.join(AGENTS_DIR, dir, 'SKILL.md');
|
||
if (!fs.existsSync(skillMd)) continue;
|
||
const content = fs.readFileSync(skillMd, 'utf-8');
|
||
// Only validate if the skill contains $B commands
|
||
if (!content.includes('$B ')) continue;
|
||
const result = validateSkill(skillMd);
|
||
expect(result.invalid).toHaveLength(0);
|
||
}
|
||
});
|
||
});
|
||
|
||
// --- Repo mode and test failure triage validation ---
|
||
|
||
describe('Repo mode preamble validation', () => {
|
||
test('generated SKILL.md preamble contains REPO_MODE output', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('REPO_MODE:');
|
||
expect(content).toContain('gstack-repo-mode');
|
||
});
|
||
|
||
test('generated SKILL.md contains See Something Say Something section', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('See Something, Say Something');
|
||
expect(content).toContain('REPO_MODE');
|
||
expect(content).toContain('solo');
|
||
expect(content).toContain('collaborative');
|
||
});
|
||
});
|
||
|
||
describe('Test failure triage in ship skill', () => {
|
||
test('ship/SKILL.md contains Test Failure Ownership Triage', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Test Failure Ownership Triage');
|
||
});
|
||
|
||
test('ship/SKILL.md triage uses git diff for classification', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('git diff origin/<base>...HEAD --name-only');
|
||
});
|
||
|
||
test('ship/SKILL.md triage has solo and collaborative paths', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('REPO_MODE');
|
||
expect(content).toContain('solo');
|
||
expect(content).toContain('collaborative');
|
||
expect(content).toContain('Investigate and fix now');
|
||
expect(content).toContain('Add as P0 TODO');
|
||
});
|
||
|
||
test('ship/SKILL.md triage has GitHub issue assignment for collaborative mode', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('gh issue create');
|
||
expect(content).toContain('--assignee');
|
||
});
|
||
|
||
test('{{TEST_FAILURE_TRIAGE}} placeholder is fully resolved in ship/SKILL.md', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).not.toContain('{{TEST_FAILURE_TRIAGE}}');
|
||
});
|
||
|
||
test('ship/SKILL.md uses in-branch language for stop condition', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('In-branch test failures');
|
||
});
|
||
});
|