mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
a1a933614c
* feat: CDP inspector module — persistent sessions, CSS cascade, style modification New browse/src/cdp-inspector.ts with full CDP inspection engine: - inspectElement() via CSS.getMatchedStylesForNode + DOM.getBoxModel - modifyStyle() via CSS.setStyleTexts with headless page.evaluate fallback - Persistent CDP session lifecycle (create, reuse, detach on nav, re-create) - Specificity sorting, overridden property detection, UA rule filtering - Modification history with undo support - formatInspectorResult() for CLI output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: browse server inspector endpoints + inspect/style/cleanup/prettyscreenshot CLI Server endpoints: POST /inspector/pick, GET /inspector, POST /inspector/apply, POST /inspector/reset, GET /inspector/history, GET /inspector/events (SSE). CLI commands: inspect (CDP cascade), style (live CSS mod), cleanup (page clutter removal), prettyscreenshot (clean screenshot pipeline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar CSS inspector — element picker, box model, rule cascade, quick edit Extension changes for the visual CSS inspector: - inspector.js: element picker with hover highlight, CSS selector generation, basic mode fallback (getComputedStyle + CSSOM), page alteration handlers - inspector.css: picker overlay styles (blue highlight + tooltip) - background.js: inspector message routing (picker <-> server <-> sidepanel) - sidepanel: Inspector tab with box model viz (gstack palette), matched rules with specificity badges, computed styles, click-to-edit quick edit, Send to Agent/Code button, empty/loading/error states Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: document inspect, style, cleanup, prettyscreenshot browse commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: auto-track user-created tabs and handle tab close browser-manager.ts changes: - context.on('page') listener: automatically tracks tabs opened by the user (Cmd+T, right-click open in new tab, window.open). Previously only programmatic newTab() was tracked, so user tabs were invisible. - page.on('close') handler in wirePageEvents: removes closed tabs from the pages map and switches activeTabId to the last remaining tab. - syncActiveTabByUrl: match Chrome extension's active tab URL to the correct Playwright page for accurate tab identity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: per-tab agent isolation via BROWSE_TAB environment variable Prevents parallel sidebar agents from interfering with each other's tab context. Three-layer fix: - sidebar-agent.ts: passes BROWSE_TAB=<tabId> env var to each claude process, per-tab processing set allows concurrent agents across tabs - cli.ts: reads process.env.BROWSE_TAB and includes tabId in command request body - server.ts: handleCommand() temporarily switches activeTabId when tabId is present, restores after command completes (safe: Bun event loop is single-threaded) Also: per-tab agent state (TabAgentState map), per-tab message queuing, per-tab chat buffers, verbose streaming narration, stop button endpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar per-tab chat context, tab bar sync, stop button, UX polish Extension changes: - sidepanel.js: per-tab chat history (tabChatHistories map), switchChatTab() swaps entire chat view, browserTabActivated handler for instant tab sync, stop button wired to /sidebar-agent/stop, pollTabs renders tab bar - sidepanel.html: updated banner text ("Browser co-pilot"), stop button markup, input placeholder "Ask about this page..." - sidepanel.css: tab bar styles, stop button styles, loading state fixes - background.js: chrome.tabs.onActivated sends browserTabActivated to sidepanel with tab URL for instant tab switch detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: per-tab isolation, BROWSE_TAB pinning, tab tracking, sidebar UX sidebar-agent.test.ts (new tests): - BROWSE_TAB env var passed to claude process - CLI reads BROWSE_TAB and sends tabId in body - handleCommand accepts tabId, saves/restores activeTabId - Tab pinning only activates when tabId provided - Per-tab agent state, queue, concurrency - processingTabs set for parallel agents sidebar-ux.test.ts (new tests): - context.on('page') tracks user-created tabs - page.on('close') removes tabs from pages map - Tab isolation uses BROWSE_TAB not system prompt hack - Per-tab chat context in sidepanel - Tab bar rendering, stop button, banner text Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve merge conflicts — keep security defenses + per-tab isolation Merged main's security improvements (XML escaping, prompt injection defense, allowed commands whitelist, --model opus, Write tool, stderr capture) with our branch's per-tab isolation (BROWSE_TAB env var, processingTabs set, no --resume). Updated test expectations for expanded system prompt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.13.9.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add inspector message types to background.js allowlist Pre-existing bug found by Codex: ALLOWED_TYPES in background.js was missing all inspector message types (startInspector, stopInspector, elementPicked, pickerCancelled, applyStyle, toggleClass, injectCSS, resetAll, inspectResult). Messages were silently rejected, making the inspector broken on ALL pages. Also: separate executeScript and insertCSS into individual try blocks in injectInspector(), store inspectorMode for routing, and add content.js fallback when script injection fails (CSP, chrome:// pages). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: basic element picker in content.js for CSP-restricted pages When inspector.js can't be injected (CSP, chrome:// pages), content.js provides a basic picker using getComputedStyle + CSSOM: - startBasicPicker/stopBasicPicker message handlers - captureBasicData() with ~30 key CSS properties, box model, matched rules - Hover highlight with outline save/restore (never leaves artifacts) - Click uses e.target directly (no re-querying by selector) - Sends inspectResult with mode:'basic' for sidebar rendering - Escape key cancels picker and restores outlines Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: cleanup + screenshot buttons in sidebar inspector toolbar Two action buttons in the inspector toolbar: - Cleanup (🧹): POSTs cleanup --all to server, shows spinner, chat notification on success, resets inspector state (element may be removed) - Screenshot (📸): POSTs screenshot to server, shows spinner, chat notification with saved file path Shared infrastructure: - .inspector-action-btn CSS with loading spinner via ::after pseudo-element - chat-notification type in addChatEntry() for system messages - package.json version bump to 0.13.9.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: inspector allowlist, CSP fallback, cleanup/screenshot buttons 16 new tests in sidebar-ux.test.ts: - Inspector message allowlist includes all inspector types - content.js basic picker (startBasicPicker, captureBasicData, CSSOM, outline save/restore, inspectResult with mode basic, Escape cleanup) - background.js CSP fallback (separate try blocks, inspectorMode, fallback) - Cleanup button (POST /command, inspector reset after success) - Screenshot button (POST /command, notification rendering) - Chat notification type and CSS styles Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.13.9.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: cleanup + screenshot buttons in chat toolbar (not just inspector) Quick actions toolbar (🧹 Cleanup, 📸 Screenshot) now appears above the chat input, always visible. Both inspector and chat buttons share runCleanup() and runScreenshot() helper functions. Clicking either set shows loading state on both simultaneously. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: chat toolbar buttons, shared helpers, quick-action-btn styles Tests that chat toolbar exists (chat-cleanup-btn, chat-screenshot-btn, quick-actions container), CSS styles (.quick-action-btn, .quick-action-btn.loading), shared runCleanup/runScreenshot helper functions, and cleanup inspector reset. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: aggressive cleanup heuristics — overlays, scroll unlock, blur removal Massively expanded CLEANUP_SELECTORS with patterns from uBlock Origin and Readability.js research: - ads: 30+ selectors (Google, Amazon, Outbrain, Taboola, Criteo, etc.) - cookies: OneTrust, Cookiebot, TrustArc, Quantcast + generic patterns - overlays (NEW): paywalls, newsletter popups, interstitials, push prompts, app download banners, survey modals - social: follow prompts, share tools - Cleanup now defaults to --all when no args (sidebar button fix) - Uses !important on all display:none (overrides inline styles) - Unlocks body/html scroll (overflow:hidden from modal lockout) - Removes blur/filter effects (paywall content blur) - Removes max-height truncation (article teaser truncation) - Collapses empty ad placeholder whitespace (empty divs after ad removal) - Skips gstack-ctrl indicator in sticky removal Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: disable action buttons when disconnected, no error spam - setActionButtonsEnabled() toggles .disabled class on all cleanup/screenshot buttons (both chat toolbar and inspector toolbar) - Called with false in updateConnection when server URL is null - Called with true when connection established - runCleanup/runScreenshot silently return when disconnected instead of showing 'Not connected' error notifications - CSS .disabled style: pointer-events:none, opacity:0.3, cursor:not-allowed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: cleanup heuristics, button disabled state, overlay selectors 17 new tests: - cleanup defaults to --all on empty args - CLEANUP_SELECTORS overlays category (paywall, newsletter, interstitial) - Major ad networks in selectors (doubleclick, taboola, criteo, etc.) - Major consent frameworks (OneTrust, Cookiebot, TrustArc, Quantcast) - !important override for inline styles - Scroll unlock (body overflow:hidden) - Blur removal (paywall content blur) - Article truncation removal (max-height) - Empty placeholder collapse - gstack-ctrl indicator skip in sticky cleanup - setActionButtonsEnabled function - Buttons disabled when disconnected - No error spam from cleanup/screenshot when disconnected - CSS disabled styles for action buttons Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: LLM-based page cleanup — agent analyzes page semantically Instead of brittle CSS selectors, the cleanup button now sends a prompt to the sidebar agent (which IS an LLM). The agent: 1. Runs deterministic $B cleanup --all as a quick first pass 2. Takes a snapshot to see what's left 3. Analyzes the page semantically to identify remaining clutter 4. Removes elements intelligently, preserving site branding This means cleanup works correctly on any site without site-specific selectors. The LLM understands that "Your Daily Puzzles" is clutter, "ADVERTISEMENT" is junk, but the SF Chronicle masthead should stay. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: aggressive cleanup heuristics + preserve top nav bar Deterministic cleanup improvements (used as first pass before LLM analysis): - New 'clutter' category: audio players, podcast widgets, sidebar puzzles/games, recirculation widgets (taboola, outbrain, nativo), cross-promotion banners - Text-content detection: removes "ADVERTISEMENT", "Article continues below", "Sponsored", "Paid content" labels and their parent wrappers - Sticky fix: preserves the topmost full-width element near viewport top (site nav bar) instead of hiding all sticky/fixed elements. Sorts by vertical position, preserves the first one that spans >80% viewport width. Tests: clutter category, ad label removal, nav bar preservation logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: LLM-based cleanup architecture, deterministic heuristics, sticky nav 22 new tests covering: - Cleanup button uses /sidebar-command (agent) not /command (deterministic) - Cleanup prompt includes deterministic first pass + agent snapshot analysis - Cleanup prompt lists specific clutter categories for agent guidance - Cleanup prompt preserves site identity (masthead, headline, body, byline) - Cleanup prompt instructs scroll unlock and $B eval removal - Loading state management (async agent, setTimeout) - Deterministic clutter: audio/podcast, games/puzzles, recirculation - Ad label text patterns (ADVERTISEMENT, Sponsored, Article continues) - Ad label parent wrapper hiding for small containers - Sticky nav preservation (sort by position, first full-width near top) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent repeat chat message rendering on reconnect/replay Root cause: server persists chat to disk (chat.jsonl) and replays on restart. Client had no dedup, so every reconnect re-rendered the entire history. Messages from an old HN session would repeat endlessly on the SF Chronicle tab. Fix: renderedEntryIds Set tracks which entry IDs have been rendered. addChatEntry skips entries already in the set. Entries without an id (local notifications) bypass the check. Clear chat resets the set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: agent stops when done, no focus stealing, opus for prompt injection safety Three fixes for sidebar agent UX: - System prompt: "Be CONCISE. STOP as soon as the task is done. Do NOT keep exploring or doing bonus work." Prevents agent from endlessly taking screenshots and highlighting elements after answering the question. - switchTab(id, opts): new bringToFront option. Internal tab pinning (BROWSE_TAB) uses bringToFront: false so agent commands never steal window focus from the user's active app. - Keep opus model (not sonnet) for prompt injection resistance on untrusted web pages. Remove Write from allowedTools (agent only needs Bash for $B). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: agent conciseness, focus stealing, opus model, switchTab opts Tests for the three UX fixes: - System prompt contains STOP/CONCISE/Do NOT keep exploring - sidebar agent uses opus (not sonnet) for prompt injection resistance - switchTab has bringToFront option, defaults to true (opt-out) - handleCommand tab pinning uses bringToFront: false (no focus steal) - Updated stale tests: switchTab signature, allowedTools excludes Write, narration -> conciseness, tab pinning restore calls Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: sidebar CSS interaction E2E — HN comment highlight round-trip New E2E test (periodic tier, ~$2/run) that exercises the full sidebar agent pipeline with CSS interaction: 1. Agent navigates to Hacker News 2. Clicks into the top story's comments 3. Reads comments and identifies the most insightful one 4. Highlights it with a 4px solid orange outline via style injection Tests: navigation, snapshot, text reading, LLM judgment, CSS modification. Requires real browser + real Claude (ANTHROPIC_API_KEY). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sidebar CSS E2E test — correct idle timeout (ms not s), pipe stdio Root cause of test failure: BROWSE_IDLE_TIMEOUT is in milliseconds, not seconds. '600' = 0.6 seconds, server died immediately after health check. Fixed to '600000' (10 minutes). Also: use 'pipe' stdio instead of file descriptors (closing fds kills child on macOS/bun), catch ConnectionRefused on poll retry, 4 min poll timeout for the multi-step opus task. Test passes: agent navigates to HN, reads comments, identifies most insightful one, highlights it with orange CSS, stops. 114s, $0.00. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1578 lines
61 KiB
TypeScript
1578 lines
61 KiB
TypeScript
import { describe, test, expect } from 'bun:test';
|
||
import { validateSkill, extractRemoteSlugPatterns, extractWeightsFromTable } from './helpers/skill-parser';
|
||
import { ALL_COMMANDS, COMMAND_DESCRIPTIONS, READ_COMMANDS, WRITE_COMMANDS, META_COMMANDS } from '../browse/src/commands';
|
||
import { SNAPSHOT_FLAGS } from '../browse/src/snapshot';
|
||
import * as fs from 'fs';
|
||
import * as path from 'path';
|
||
|
||
const ROOT = path.resolve(import.meta.dir, '..');
|
||
|
||
describe('SKILL.md command validation', () => {
|
||
test('all $B commands in SKILL.md are valid browse commands', () => {
|
||
const result = validateSkill(path.join(ROOT, 'SKILL.md'));
|
||
expect(result.invalid).toHaveLength(0);
|
||
expect(result.valid.length).toBeGreaterThan(0);
|
||
});
|
||
|
||
test('all snapshot flags in SKILL.md are valid', () => {
|
||
const result = validateSkill(path.join(ROOT, 'SKILL.md'));
|
||
expect(result.snapshotFlagErrors).toHaveLength(0);
|
||
});
|
||
|
||
test('all $B commands in browse/SKILL.md are valid browse commands', () => {
|
||
const result = validateSkill(path.join(ROOT, 'browse', 'SKILL.md'));
|
||
expect(result.invalid).toHaveLength(0);
|
||
expect(result.valid.length).toBeGreaterThan(0);
|
||
});
|
||
|
||
test('all snapshot flags in browse/SKILL.md are valid', () => {
|
||
const result = validateSkill(path.join(ROOT, 'browse', 'SKILL.md'));
|
||
expect(result.snapshotFlagErrors).toHaveLength(0);
|
||
});
|
||
|
||
test('all $B commands in qa/SKILL.md are valid browse commands', () => {
|
||
const qaSkill = path.join(ROOT, 'qa', 'SKILL.md');
|
||
if (!fs.existsSync(qaSkill)) return; // skip if missing
|
||
const result = validateSkill(qaSkill);
|
||
expect(result.invalid).toHaveLength(0);
|
||
});
|
||
|
||
test('all snapshot flags in qa/SKILL.md are valid', () => {
|
||
const qaSkill = path.join(ROOT, 'qa', 'SKILL.md');
|
||
if (!fs.existsSync(qaSkill)) return;
|
||
const result = validateSkill(qaSkill);
|
||
expect(result.snapshotFlagErrors).toHaveLength(0);
|
||
});
|
||
|
||
test('all $B commands in qa-only/SKILL.md are valid browse commands', () => {
|
||
const qaOnlySkill = path.join(ROOT, 'qa-only', 'SKILL.md');
|
||
if (!fs.existsSync(qaOnlySkill)) return;
|
||
const result = validateSkill(qaOnlySkill);
|
||
expect(result.invalid).toHaveLength(0);
|
||
});
|
||
|
||
test('all snapshot flags in qa-only/SKILL.md are valid', () => {
|
||
const qaOnlySkill = path.join(ROOT, 'qa-only', 'SKILL.md');
|
||
if (!fs.existsSync(qaOnlySkill)) return;
|
||
const result = validateSkill(qaOnlySkill);
|
||
expect(result.snapshotFlagErrors).toHaveLength(0);
|
||
});
|
||
|
||
test('all $B commands in plan-design-review/SKILL.md are valid browse commands', () => {
|
||
const skill = path.join(ROOT, 'plan-design-review', 'SKILL.md');
|
||
if (!fs.existsSync(skill)) return;
|
||
const result = validateSkill(skill);
|
||
expect(result.invalid).toHaveLength(0);
|
||
});
|
||
|
||
test('all snapshot flags in plan-design-review/SKILL.md are valid', () => {
|
||
const skill = path.join(ROOT, 'plan-design-review', 'SKILL.md');
|
||
if (!fs.existsSync(skill)) return;
|
||
const result = validateSkill(skill);
|
||
expect(result.snapshotFlagErrors).toHaveLength(0);
|
||
});
|
||
|
||
test('all $B commands in design-review/SKILL.md are valid browse commands', () => {
|
||
const skill = path.join(ROOT, 'design-review', 'SKILL.md');
|
||
if (!fs.existsSync(skill)) return;
|
||
const result = validateSkill(skill);
|
||
expect(result.invalid).toHaveLength(0);
|
||
});
|
||
|
||
test('all snapshot flags in design-review/SKILL.md are valid', () => {
|
||
const skill = path.join(ROOT, 'design-review', 'SKILL.md');
|
||
if (!fs.existsSync(skill)) return;
|
||
const result = validateSkill(skill);
|
||
expect(result.snapshotFlagErrors).toHaveLength(0);
|
||
});
|
||
|
||
test('all $B commands in design-consultation/SKILL.md are valid browse commands', () => {
|
||
const skill = path.join(ROOT, 'design-consultation', 'SKILL.md');
|
||
if (!fs.existsSync(skill)) return;
|
||
const result = validateSkill(skill);
|
||
expect(result.invalid).toHaveLength(0);
|
||
});
|
||
|
||
test('all snapshot flags in design-consultation/SKILL.md are valid', () => {
|
||
const skill = path.join(ROOT, 'design-consultation', 'SKILL.md');
|
||
if (!fs.existsSync(skill)) return;
|
||
const result = validateSkill(skill);
|
||
expect(result.snapshotFlagErrors).toHaveLength(0);
|
||
});
|
||
|
||
test('all $B commands in autoplan/SKILL.md are valid browse commands', () => {
|
||
const skill = path.join(ROOT, 'autoplan', 'SKILL.md');
|
||
if (!fs.existsSync(skill)) return;
|
||
const result = validateSkill(skill);
|
||
expect(result.invalid).toHaveLength(0);
|
||
});
|
||
|
||
test('all snapshot flags in autoplan/SKILL.md are valid', () => {
|
||
const skill = path.join(ROOT, 'autoplan', 'SKILL.md');
|
||
if (!fs.existsSync(skill)) return;
|
||
const result = validateSkill(skill);
|
||
expect(result.snapshotFlagErrors).toHaveLength(0);
|
||
});
|
||
});
|
||
|
||
describe('Command registry consistency', () => {
|
||
test('COMMAND_DESCRIPTIONS covers all commands in sets', () => {
|
||
const allCmds = new Set([...READ_COMMANDS, ...WRITE_COMMANDS, ...META_COMMANDS]);
|
||
const descKeys = new Set(Object.keys(COMMAND_DESCRIPTIONS));
|
||
for (const cmd of allCmds) {
|
||
expect(descKeys.has(cmd)).toBe(true);
|
||
}
|
||
});
|
||
|
||
test('COMMAND_DESCRIPTIONS has no extra commands not in sets', () => {
|
||
const allCmds = new Set([...READ_COMMANDS, ...WRITE_COMMANDS, ...META_COMMANDS]);
|
||
for (const key of Object.keys(COMMAND_DESCRIPTIONS)) {
|
||
expect(allCmds.has(key)).toBe(true);
|
||
}
|
||
});
|
||
|
||
test('ALL_COMMANDS matches union of all sets', () => {
|
||
const union = new Set([...READ_COMMANDS, ...WRITE_COMMANDS, ...META_COMMANDS]);
|
||
expect(ALL_COMMANDS.size).toBe(union.size);
|
||
for (const cmd of union) {
|
||
expect(ALL_COMMANDS.has(cmd)).toBe(true);
|
||
}
|
||
});
|
||
|
||
test('SNAPSHOT_FLAGS option keys are valid SnapshotOptions fields', () => {
|
||
const validKeys = new Set([
|
||
'interactive', 'compact', 'depth', 'selector',
|
||
'diff', 'annotate', 'outputPath', 'cursorInteractive',
|
||
]);
|
||
for (const flag of SNAPSHOT_FLAGS) {
|
||
expect(validKeys.has(flag.optionKey)).toBe(true);
|
||
}
|
||
});
|
||
});
|
||
|
||
describe('Usage string consistency', () => {
|
||
// Normalize a usage string to its structural skeleton for comparison.
|
||
// Replaces <param-names> with <>, [optional] with [], strips parenthetical hints.
|
||
// This catches format mismatches (e.g., <name>:<value> vs <name> <value>)
|
||
// without tripping on abbreviation differences (e.g., <sel> vs <selector>).
|
||
function skeleton(usage: string): string {
|
||
return usage
|
||
.replace(/\(.*?\)/g, '') // strip parenthetical hints like (e.g., Enter, Tab)
|
||
.replace(/<[^>]*>/g, '<>') // normalize <param-name> → <>
|
||
.replace(/\[[^\]]*\]/g, '[]') // normalize [optional] → []
|
||
.replace(/\s+/g, ' ') // collapse whitespace
|
||
.trim();
|
||
}
|
||
|
||
// Cross-check Usage: patterns in implementation against COMMAND_DESCRIPTIONS
|
||
test('implementation Usage: structural format matches COMMAND_DESCRIPTIONS', () => {
|
||
const implFiles = [
|
||
path.join(ROOT, 'browse', 'src', 'write-commands.ts'),
|
||
path.join(ROOT, 'browse', 'src', 'read-commands.ts'),
|
||
path.join(ROOT, 'browse', 'src', 'meta-commands.ts'),
|
||
];
|
||
|
||
// Extract "Usage: browse <pattern>" from throw new Error(...) calls
|
||
const usagePattern = /throw new Error\(['"`]Usage:\s*browse\s+(.+?)['"`]\)/g;
|
||
const implUsages = new Map<string, string>();
|
||
|
||
for (const file of implFiles) {
|
||
const content = fs.readFileSync(file, 'utf-8');
|
||
let match;
|
||
while ((match = usagePattern.exec(content)) !== null) {
|
||
const usage = match[1].split('\\n')[0].trim();
|
||
const cmd = usage.split(/\s/)[0];
|
||
implUsages.set(cmd, usage);
|
||
}
|
||
}
|
||
|
||
// Compare structural skeletons
|
||
const mismatches: string[] = [];
|
||
for (const [cmd, implUsage] of implUsages) {
|
||
const desc = COMMAND_DESCRIPTIONS[cmd];
|
||
if (!desc) continue;
|
||
if (!desc.usage) continue;
|
||
const descSkel = skeleton(desc.usage);
|
||
const implSkel = skeleton(implUsage);
|
||
if (descSkel !== implSkel) {
|
||
mismatches.push(`${cmd}: docs "${desc.usage}" (${descSkel}) vs impl "${implUsage}" (${implSkel})`);
|
||
}
|
||
}
|
||
|
||
expect(mismatches).toEqual([]);
|
||
});
|
||
});
|
||
|
||
describe('Generated SKILL.md freshness', () => {
|
||
test('no unresolved {{placeholders}} in generated SKILL.md', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
||
const unresolved = content.match(/\{\{\w+\}\}/g);
|
||
expect(unresolved).toBeNull();
|
||
});
|
||
|
||
test('no unresolved {{placeholders}} in generated browse/SKILL.md', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'browse', 'SKILL.md'), 'utf-8');
|
||
const unresolved = content.match(/\{\{\w+\}\}/g);
|
||
expect(unresolved).toBeNull();
|
||
});
|
||
|
||
test('generated SKILL.md has AUTO-GENERATED header', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('AUTO-GENERATED');
|
||
});
|
||
});
|
||
|
||
// --- Update check preamble validation ---
|
||
|
||
describe('Update check preamble', () => {
|
||
const skillsWithUpdateCheck = [
|
||
'SKILL.md', 'browse/SKILL.md', 'qa/SKILL.md',
|
||
'qa-only/SKILL.md',
|
||
'setup-browser-cookies/SKILL.md',
|
||
'ship/SKILL.md', 'review/SKILL.md',
|
||
'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md',
|
||
'retro/SKILL.md',
|
||
'office-hours/SKILL.md', 'investigate/SKILL.md',
|
||
'plan-design-review/SKILL.md',
|
||
'design-review/SKILL.md',
|
||
'design-consultation/SKILL.md',
|
||
'document-release/SKILL.md',
|
||
'canary/SKILL.md',
|
||
'benchmark/SKILL.md',
|
||
'land-and-deploy/SKILL.md',
|
||
'setup-deploy/SKILL.md',
|
||
'cso/SKILL.md',
|
||
];
|
||
|
||
for (const skill of skillsWithUpdateCheck) {
|
||
test(`${skill} update check line ends with || true`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
// The second line of the bash block must end with || true
|
||
// to avoid exit code 1 when _UPD is empty (up to date)
|
||
const match = content.match(/\[ -n "\$_UPD" \].*$/m);
|
||
expect(match).not.toBeNull();
|
||
expect(match![0]).toContain('|| true');
|
||
});
|
||
}
|
||
|
||
test('all skills with update check are generated from .tmpl', () => {
|
||
for (const skill of skillsWithUpdateCheck) {
|
||
const tmplPath = path.join(ROOT, skill + '.tmpl');
|
||
expect(fs.existsSync(tmplPath)).toBe(true);
|
||
}
|
||
});
|
||
|
||
test('update check bash block exits 0 when up to date', () => {
|
||
// Simulate the exact preamble command from SKILL.md
|
||
const result = Bun.spawnSync(['bash', '-c',
|
||
'_UPD=$(echo "" || true); [ -n "$_UPD" ] && echo "$_UPD" || true'
|
||
], { stdout: 'pipe', stderr: 'pipe' });
|
||
expect(result.exitCode).toBe(0);
|
||
});
|
||
|
||
test('update check bash block exits 0 when upgrade available', () => {
|
||
const result = Bun.spawnSync(['bash', '-c',
|
||
'_UPD=$(echo "UPGRADE_AVAILABLE 0.3.3 0.4.0" || true); [ -n "$_UPD" ] && echo "$_UPD" || true'
|
||
], { stdout: 'pipe', stderr: 'pipe' });
|
||
expect(result.exitCode).toBe(0);
|
||
expect(result.stdout.toString().trim()).toBe('UPGRADE_AVAILABLE 0.3.3 0.4.0');
|
||
});
|
||
});
|
||
|
||
// --- Part 7: Cross-skill path consistency (A1) ---
|
||
|
||
describe('Cross-skill path consistency', () => {
|
||
test('REMOTE_SLUG derivation pattern is identical across files that use it', () => {
|
||
const patterns = extractRemoteSlugPatterns(ROOT, ['qa', 'review']);
|
||
const allPatterns: string[] = [];
|
||
|
||
for (const [, filePatterns] of patterns) {
|
||
allPatterns.push(...filePatterns);
|
||
}
|
||
|
||
// Should find at least 2 occurrences (qa/SKILL.md + review/greptile-triage.md)
|
||
expect(allPatterns.length).toBeGreaterThanOrEqual(2);
|
||
|
||
// All occurrences must be character-for-character identical
|
||
const unique = new Set(allPatterns);
|
||
if (unique.size > 1) {
|
||
const variants = Array.from(unique);
|
||
throw new Error(
|
||
`REMOTE_SLUG pattern differs across files:\n` +
|
||
variants.map((v, i) => ` ${i + 1}: ${v}`).join('\n')
|
||
);
|
||
}
|
||
});
|
||
|
||
test('all greptile-history write references specify both per-project and global paths', () => {
|
||
const filesToCheck = [
|
||
'review/SKILL.md',
|
||
'ship/SKILL.md',
|
||
'review/greptile-triage.md',
|
||
];
|
||
|
||
for (const file of filesToCheck) {
|
||
const filePath = path.join(ROOT, file);
|
||
if (!fs.existsSync(filePath)) continue;
|
||
const content = fs.readFileSync(filePath, 'utf-8');
|
||
|
||
const hasBoth = (content.includes('per-project') && content.includes('global')) ||
|
||
(content.includes('$REMOTE_SLUG/greptile-history') && content.includes('~/.gstack/greptile-history'));
|
||
|
||
expect(hasBoth).toBe(true);
|
||
}
|
||
});
|
||
|
||
test('greptile-triage.md contains both project and global history paths', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'review', 'greptile-triage.md'), 'utf-8');
|
||
expect(content).toContain('$REMOTE_SLUG/greptile-history.md');
|
||
expect(content).toContain('~/.gstack/greptile-history.md');
|
||
});
|
||
|
||
test('retro/SKILL.md reads global greptile-history (not per-project)', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('~/.gstack/greptile-history.md');
|
||
// Should NOT reference per-project path for reads
|
||
expect(content).not.toContain('$REMOTE_SLUG/greptile-history.md');
|
||
});
|
||
});
|
||
|
||
// --- Part 7: QA skill structure validation (A2) ---
|
||
|
||
describe('QA skill structure validation', () => {
|
||
const qaContent = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
|
||
test('qa/SKILL.md has all 11 phases', () => {
|
||
const phases = [
|
||
'Phase 1', 'Initialize',
|
||
'Phase 2', 'Authenticate',
|
||
'Phase 3', 'Orient',
|
||
'Phase 4', 'Explore',
|
||
'Phase 5', 'Document',
|
||
'Phase 6', 'Wrap Up',
|
||
'Phase 7', 'Triage',
|
||
'Phase 8', 'Fix Loop',
|
||
'Phase 9', 'Final QA',
|
||
'Phase 10', 'Report',
|
||
'Phase 11', 'TODOS',
|
||
];
|
||
for (const phase of phases) {
|
||
expect(qaContent).toContain(phase);
|
||
}
|
||
});
|
||
|
||
test('has all four QA modes defined', () => {
|
||
const modes = [
|
||
'Diff-aware',
|
||
'Full',
|
||
'Quick',
|
||
'Regression',
|
||
];
|
||
for (const mode of modes) {
|
||
expect(qaContent).toContain(mode);
|
||
}
|
||
|
||
// Mode triggers/flags
|
||
expect(qaContent).toContain('--quick');
|
||
expect(qaContent).toContain('--regression');
|
||
});
|
||
|
||
test('has all three tiers defined', () => {
|
||
const tiers = ['Quick', 'Standard', 'Exhaustive'];
|
||
for (const tier of tiers) {
|
||
expect(qaContent).toContain(tier);
|
||
}
|
||
});
|
||
|
||
test('health score weights sum to 100%', () => {
|
||
const weights = extractWeightsFromTable(qaContent);
|
||
expect(weights.size).toBeGreaterThan(0);
|
||
|
||
let sum = 0;
|
||
for (const pct of weights.values()) {
|
||
sum += pct;
|
||
}
|
||
expect(sum).toBe(100);
|
||
});
|
||
|
||
test('health score has all 8 categories', () => {
|
||
const weights = extractWeightsFromTable(qaContent);
|
||
const expectedCategories = [
|
||
'Console', 'Links', 'Visual', 'Functional',
|
||
'UX', 'Performance', 'Content', 'Accessibility',
|
||
];
|
||
for (const cat of expectedCategories) {
|
||
expect(weights.has(cat)).toBe(true);
|
||
}
|
||
expect(weights.size).toBe(8);
|
||
});
|
||
|
||
test('has four mode definitions (Diff-aware/Full/Quick/Regression)', () => {
|
||
expect(qaContent).toContain('### Diff-aware');
|
||
expect(qaContent).toContain('### Full');
|
||
expect(qaContent).toContain('### Quick');
|
||
expect(qaContent).toContain('### Regression');
|
||
});
|
||
|
||
test('output structure references report directory layout', () => {
|
||
expect(qaContent).toContain('qa-report-');
|
||
expect(qaContent).toContain('baseline.json');
|
||
expect(qaContent).toContain('screenshots/');
|
||
expect(qaContent).toContain('.gstack/qa-reports/');
|
||
});
|
||
});
|
||
|
||
// --- Part 7: Greptile history format consistency (A3) ---
|
||
|
||
describe('Greptile history format consistency', () => {
|
||
test('greptile-triage.md defines the canonical history format', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'review', 'greptile-triage.md'), 'utf-8');
|
||
expect(content).toContain('<YYYY-MM-DD>');
|
||
expect(content).toContain('<owner/repo>');
|
||
expect(content).toContain('<type');
|
||
expect(content).toContain('<file-pattern>');
|
||
expect(content).toContain('<category>');
|
||
});
|
||
|
||
test('review/SKILL.md and ship/SKILL.md both reference greptile-triage.md for write details', () => {
|
||
const reviewContent = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
|
||
expect(reviewContent.toLowerCase()).toContain('greptile-triage.md');
|
||
expect(shipContent.toLowerCase()).toContain('greptile-triage.md');
|
||
});
|
||
|
||
test('greptile-triage.md defines all 9 valid categories', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'review', 'greptile-triage.md'), 'utf-8');
|
||
const categories = [
|
||
'race-condition', 'null-check', 'error-handling', 'style',
|
||
'type-safety', 'security', 'performance', 'correctness', 'other',
|
||
];
|
||
for (const cat of categories) {
|
||
expect(content).toContain(cat);
|
||
}
|
||
});
|
||
});
|
||
|
||
// --- Hardcoded branch name detection in templates ---
|
||
|
||
describe('No hardcoded branch names in SKILL templates', () => {
|
||
const tmplFiles = [
|
||
'ship/SKILL.md.tmpl',
|
||
'review/SKILL.md.tmpl',
|
||
'qa/SKILL.md.tmpl',
|
||
'plan-ceo-review/SKILL.md.tmpl',
|
||
'retro/SKILL.md.tmpl',
|
||
'document-release/SKILL.md.tmpl',
|
||
'plan-eng-review/SKILL.md.tmpl',
|
||
'plan-design-review/SKILL.md.tmpl',
|
||
'codex/SKILL.md.tmpl',
|
||
];
|
||
|
||
// Patterns that indicate hardcoded 'main' in git commands
|
||
const gitMainPatterns = [
|
||
/\bgit\s+diff\s+(?:origin\/)?main\b/,
|
||
/\bgit\s+log\s+(?:origin\/)?main\b/,
|
||
/\bgit\s+fetch\s+origin\s+main\b/,
|
||
/\bgit\s+merge\s+origin\/main\b/,
|
||
/\borigin\/main\b/,
|
||
];
|
||
|
||
// Lines that are allowed to mention 'main' (fallback logic, prose)
|
||
const allowlist = [
|
||
/fall\s*back\s+to\s+`main`/i,
|
||
/fall\s*back\s+to\s+`?main`?/i,
|
||
/typically\s+`?main`?/i,
|
||
/If\s+on\s+`main`/i, // old pattern — should not exist
|
||
];
|
||
|
||
for (const tmplFile of tmplFiles) {
|
||
test(`${tmplFile} has no hardcoded 'main' in git commands`, () => {
|
||
const filePath = path.join(ROOT, tmplFile);
|
||
if (!fs.existsSync(filePath)) return;
|
||
const lines = fs.readFileSync(filePath, 'utf-8').split('\n');
|
||
const violations: string[] = [];
|
||
|
||
for (let i = 0; i < lines.length; i++) {
|
||
const line = lines[i];
|
||
const isAllowlisted = allowlist.some(p => p.test(line));
|
||
if (isAllowlisted) continue;
|
||
|
||
for (const pattern of gitMainPatterns) {
|
||
if (pattern.test(line)) {
|
||
violations.push(`Line ${i + 1}: ${line.trim()}`);
|
||
break;
|
||
}
|
||
}
|
||
}
|
||
|
||
if (violations.length > 0) {
|
||
throw new Error(
|
||
`${tmplFile} has hardcoded 'main' in git commands:\n` +
|
||
violations.map(v => ` ${v}`).join('\n')
|
||
);
|
||
}
|
||
});
|
||
}
|
||
});
|
||
|
||
// --- Part 7b: TODOS-format.md reference consistency ---
|
||
|
||
describe('TODOS-format.md reference consistency', () => {
|
||
test('review/TODOS-format.md exists and defines canonical format', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'review', 'TODOS-format.md'), 'utf-8');
|
||
expect(content).toContain('**What:**');
|
||
expect(content).toContain('**Why:**');
|
||
expect(content).toContain('**Priority:**');
|
||
expect(content).toContain('**Effort:**');
|
||
expect(content).toContain('## Completed');
|
||
});
|
||
|
||
test('skills that write TODOs reference TODOS-format.md', () => {
|
||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
const ceoPlanContent = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
||
const engPlanContent = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
|
||
|
||
expect(shipContent).toContain('TODOS-format.md');
|
||
expect(ceoPlanContent).toContain('TODOS-format.md');
|
||
expect(engPlanContent).toContain('TODOS-format.md');
|
||
});
|
||
});
|
||
|
||
// --- v0.4.1 feature coverage: RECOMMENDATION format, session awareness, enum completeness ---
|
||
|
||
describe('v0.4.1 preamble features', () => {
|
||
// Tier 1 skills have core preamble only (no AskUserQuestion format)
|
||
const tier1Skills = ['SKILL.md', 'browse/SKILL.md', 'setup-browser-cookies/SKILL.md', 'benchmark/SKILL.md'];
|
||
|
||
// Tier 2+ skills have AskUserQuestion format with RECOMMENDATION
|
||
const tier2PlusSkills = [
|
||
'qa/SKILL.md', 'qa-only/SKILL.md',
|
||
'ship/SKILL.md', 'review/SKILL.md',
|
||
'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md',
|
||
'retro/SKILL.md',
|
||
'office-hours/SKILL.md', 'investigate/SKILL.md',
|
||
'plan-design-review/SKILL.md',
|
||
'design-review/SKILL.md',
|
||
'design-consultation/SKILL.md',
|
||
'document-release/SKILL.md',
|
||
'canary/SKILL.md',
|
||
'land-and-deploy/SKILL.md',
|
||
'setup-deploy/SKILL.md',
|
||
'cso/SKILL.md',
|
||
];
|
||
|
||
const skillsWithPreamble = [...tier1Skills, ...tier2PlusSkills];
|
||
|
||
for (const skill of tier2PlusSkills) {
|
||
test(`${skill} contains RECOMMENDATION format`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
expect(content).toContain('RECOMMENDATION: Choose');
|
||
expect(content).toContain('AskUserQuestion');
|
||
});
|
||
}
|
||
|
||
for (const skill of skillsWithPreamble) {
|
||
test(`${skill} contains session awareness`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
expect(content).toContain('_SESSIONS');
|
||
});
|
||
}
|
||
|
||
for (const skill of skillsWithPreamble) {
|
||
test(`${skill} contains escalation protocol`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
expect(content).toContain('DONE_WITH_CONCERNS');
|
||
expect(content).toContain('BLOCKED');
|
||
expect(content).toContain('NEEDS_CONTEXT');
|
||
});
|
||
}
|
||
});
|
||
|
||
// --- Structural tests for new skills ---
|
||
|
||
describe('office-hours skill structure', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'office-hours', 'SKILL.md'), 'utf-8');
|
||
|
||
// Original structural assertions
|
||
for (const section of ['Phase 1', 'Phase 2', 'Phase 3', 'Phase 4', 'Phase 5', 'Phase 6',
|
||
'Design Doc', 'Supersedes', 'APPROVED', 'Premise Challenge',
|
||
'Alternatives', 'Smart-skip']) {
|
||
test(`contains ${section}`, () => expect(content).toContain(section));
|
||
}
|
||
|
||
// Dual-mode structure
|
||
for (const section of ['Startup mode', 'Builder mode']) {
|
||
test(`contains ${section}`, () => expect(content).toContain(section));
|
||
}
|
||
|
||
// Mode detection question
|
||
test('contains explicit mode detection question', () => {
|
||
expect(content).toContain("what's your goal");
|
||
});
|
||
|
||
// Six forcing questions (startup mode)
|
||
for (const question of ['Demand Reality', 'Status Quo', 'Desperate Specificity',
|
||
'Narrowest Wedge', 'Observation & Surprise', 'Future-Fit']) {
|
||
test(`contains forcing question: ${question}`, () => expect(content).toContain(question));
|
||
}
|
||
|
||
// Builder mode questions
|
||
test('contains builder brainstorming questions', () => {
|
||
expect(content).toContain('coolest version');
|
||
expect(content).toContain('delightful');
|
||
});
|
||
|
||
// Intrapreneurship adaptation
|
||
test('contains intrapreneurship adaptation', () => {
|
||
expect(content).toContain('Intrapreneurship');
|
||
});
|
||
|
||
// YC founder discovery engine
|
||
test('contains YC apply CTA with ref tracking', () => {
|
||
expect(content).toContain('ycombinator.com/apply?ref=gstack');
|
||
});
|
||
|
||
test('contains "What I noticed" design doc section', () => {
|
||
expect(content).toContain('What I noticed about how you think');
|
||
});
|
||
|
||
test('contains golden age framing', () => {
|
||
expect(content).toContain('golden age');
|
||
});
|
||
|
||
test('contains Garry Tan personal plea', () => {
|
||
expect(content).toContain('Garry Tan, the creator of GStack');
|
||
});
|
||
|
||
test('contains founder signal synthesis phase', () => {
|
||
expect(content).toContain('Founder Signal Synthesis');
|
||
});
|
||
|
||
test('contains three-tier decision rubric', () => {
|
||
expect(content).toContain('Top tier');
|
||
expect(content).toContain('Middle tier');
|
||
expect(content).toContain('Base tier');
|
||
});
|
||
|
||
test('contains anti-slop examples', () => {
|
||
expect(content).toContain('GOOD:');
|
||
expect(content).toContain('BAD:');
|
||
});
|
||
|
||
test('contains "One more thing" transition beat', () => {
|
||
expect(content).toContain('One more thing');
|
||
});
|
||
|
||
// Operating principles per mode
|
||
test('contains startup operating principles', () => {
|
||
expect(content).toContain('Specificity is the only currency');
|
||
});
|
||
|
||
test('contains builder operating principles', () => {
|
||
expect(content).toContain('Delight is the currency');
|
||
});
|
||
|
||
// Spec Review Loop (Phase 5.5)
|
||
test('contains spec review loop', () => {
|
||
expect(content).toContain('Spec Review Loop');
|
||
});
|
||
|
||
test('contains adversarial review dimensions', () => {
|
||
for (const dim of ['Completeness', 'Consistency', 'Clarity', 'Scope', 'Feasibility']) {
|
||
expect(content).toContain(dim);
|
||
}
|
||
});
|
||
|
||
test('contains subagent dispatch instruction', () => {
|
||
expect(content).toMatch(/Agent.*tool|subagent/i);
|
||
});
|
||
|
||
test('contains max 3 iterations', () => {
|
||
expect(content).toMatch(/3.*iteration|maximum.*3/i);
|
||
});
|
||
|
||
test('contains quality score', () => {
|
||
expect(content).toContain('quality score');
|
||
});
|
||
|
||
test('contains spec review metrics path', () => {
|
||
expect(content).toContain('spec-review.jsonl');
|
||
});
|
||
|
||
test('contains convergence guard', () => {
|
||
expect(content).toMatch(/convergence/i);
|
||
});
|
||
|
||
// Visual Sketch (Phase 4.5)
|
||
test('contains visual sketch section', () => {
|
||
expect(content).toContain('Visual Sketch');
|
||
});
|
||
|
||
test('contains wireframe generation', () => {
|
||
expect(content).toMatch(/wireframe|sketch/i);
|
||
});
|
||
|
||
test('contains DESIGN.md awareness', () => {
|
||
expect(content).toContain('DESIGN.md');
|
||
});
|
||
|
||
test('contains browse rendering', () => {
|
||
expect(content).toContain('$B goto');
|
||
expect(content).toContain('$B screenshot');
|
||
});
|
||
|
||
test('contains rough aesthetic instruction', () => {
|
||
expect(content).toMatch(/rough|hand-drawn/i);
|
||
});
|
||
});
|
||
|
||
describe('investigate skill structure', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'investigate', 'SKILL.md'), 'utf-8');
|
||
for (const section of ['Iron Law', 'Root Cause', 'Pattern Analysis', 'Hypothesis',
|
||
'DEBUG REPORT', '3-strike', 'BLOCKED']) {
|
||
test(`contains ${section}`, () => expect(content).toContain(section));
|
||
}
|
||
});
|
||
|
||
// --- Contributor mode preamble structure validation ---
|
||
|
||
describe('Contributor mode preamble structure', () => {
|
||
const skillsWithPreamble = [
|
||
'SKILL.md', 'browse/SKILL.md', 'qa/SKILL.md',
|
||
'qa-only/SKILL.md',
|
||
'setup-browser-cookies/SKILL.md',
|
||
'ship/SKILL.md', 'review/SKILL.md',
|
||
'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md',
|
||
'retro/SKILL.md',
|
||
'plan-design-review/SKILL.md',
|
||
'design-review/SKILL.md',
|
||
'design-consultation/SKILL.md',
|
||
'document-release/SKILL.md',
|
||
'canary/SKILL.md',
|
||
'benchmark/SKILL.md',
|
||
'land-and-deploy/SKILL.md',
|
||
'setup-deploy/SKILL.md',
|
||
];
|
||
|
||
for (const skill of skillsWithPreamble) {
|
||
test(`${skill} has 0-10 rating in contributor mode`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
expect(content).toContain('0-10');
|
||
expect(content).toContain('Rating');
|
||
});
|
||
|
||
test(`${skill} has "what would make this a 10" field`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
expect(content).toContain('What would make this a 10');
|
||
});
|
||
|
||
test(`${skill} uses periodic reflection (not per-command)`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
expect(content).toContain('workflow step');
|
||
expect(content).not.toContain('After you use gstack-provided CLIs');
|
||
});
|
||
}
|
||
});
|
||
|
||
describe('Enum & Value Completeness in review checklist', () => {
|
||
const checklist = fs.readFileSync(path.join(ROOT, 'review', 'checklist.md'), 'utf-8');
|
||
|
||
test('checklist has Enum & Value Completeness section', () => {
|
||
expect(checklist).toContain('Enum & Value Completeness');
|
||
});
|
||
|
||
test('Enum & Value Completeness is classified as CRITICAL', () => {
|
||
// It should appear under Pass 1 — CRITICAL, not Pass 2
|
||
const pass1Start = checklist.indexOf('### Pass 1');
|
||
const pass2Start = checklist.indexOf('### Pass 2');
|
||
const enumStart = checklist.indexOf('Enum & Value Completeness');
|
||
expect(enumStart).toBeGreaterThan(pass1Start);
|
||
expect(enumStart).toBeLessThan(pass2Start);
|
||
});
|
||
|
||
test('Enum & Value Completeness mentions tracing through consumers', () => {
|
||
expect(checklist).toContain('Trace it through every consumer');
|
||
expect(checklist).toContain('case');
|
||
expect(checklist).toContain('allowlist');
|
||
});
|
||
|
||
test('Enum & Value Completeness is in the severity classification as CRITICAL', () => {
|
||
const gateSection = checklist.slice(checklist.indexOf('## Severity Classification'));
|
||
// The ASCII art has CRITICAL on the left and INFORMATIONAL on the right
|
||
// Enum & Value Completeness should appear on a line with the CRITICAL tree (├─ or └─)
|
||
const enumLine = gateSection.split('\n').find(l => l.includes('Enum & Value Completeness'));
|
||
expect(enumLine).toBeDefined();
|
||
// It's on the left (CRITICAL) side — starts with ├─ or └─
|
||
expect(enumLine!.trimStart().startsWith('├─') || enumLine!.trimStart().startsWith('└─')).toBe(true);
|
||
});
|
||
|
||
test('Fix-First Heuristic exists in checklist and is referenced by review + ship', () => {
|
||
expect(checklist).toContain('## Fix-First Heuristic');
|
||
expect(checklist).toContain('AUTO-FIX');
|
||
expect(checklist).toContain('ASK');
|
||
|
||
const reviewSkill = fs.readFileSync(path.join(ROOT, 'review/SKILL.md'), 'utf-8');
|
||
const shipSkill = fs.readFileSync(path.join(ROOT, 'ship/SKILL.md'), 'utf-8');
|
||
expect(reviewSkill).toContain('AUTO-FIX');
|
||
expect(reviewSkill).toContain('[AUTO-FIXED]');
|
||
expect(shipSkill).toContain('AUTO-FIX');
|
||
expect(shipSkill).toContain('[AUTO-FIXED]');
|
||
});
|
||
});
|
||
|
||
// --- Completeness Principle spot-check ---
|
||
|
||
describe('Completeness Principle in generated SKILL.md files', () => {
|
||
const skillsWithPreamble = [
|
||
'SKILL.md', 'browse/SKILL.md', 'qa/SKILL.md',
|
||
'qa-only/SKILL.md',
|
||
'setup-browser-cookies/SKILL.md',
|
||
'ship/SKILL.md', 'review/SKILL.md',
|
||
'plan-ceo-review/SKILL.md', 'plan-eng-review/SKILL.md',
|
||
'retro/SKILL.md',
|
||
'plan-design-review/SKILL.md',
|
||
'design-review/SKILL.md',
|
||
'design-consultation/SKILL.md',
|
||
'document-release/SKILL.md',
|
||
'cso/SKILL.md', ];
|
||
|
||
for (const skill of skillsWithPreamble) {
|
||
test(`${skill} contains Completeness Principle section`, () => {
|
||
const content = fs.readFileSync(path.join(ROOT, skill), 'utf-8');
|
||
expect(content).toContain('Completeness Principle');
|
||
expect(content).toContain('Boil the Lake');
|
||
});
|
||
}
|
||
|
||
test('Completeness Principle includes compression table in tier 2+ skills', () => {
|
||
// Root is tier 1 (no completeness). Check tier 2+ skill.
|
||
const content = fs.readFileSync(path.join(ROOT, 'cso', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('CC+gstack');
|
||
expect(content).toContain('Compression');
|
||
});
|
||
});
|
||
|
||
// --- Part 7: Planted-bug fixture validation (A4) ---
|
||
|
||
describe('Planted-bug fixture validation', () => {
|
||
test('qa-eval ground truth has exactly 5 planted bugs', () => {
|
||
const groundTruth = JSON.parse(
|
||
fs.readFileSync(path.join(ROOT, 'test', 'fixtures', 'qa-eval-ground-truth.json'), 'utf-8')
|
||
);
|
||
expect(groundTruth.bugs).toHaveLength(5);
|
||
expect(groundTruth.total_bugs).toBe(5);
|
||
});
|
||
|
||
test('qa-eval-spa ground truth has exactly 5 planted bugs', () => {
|
||
const groundTruth = JSON.parse(
|
||
fs.readFileSync(path.join(ROOT, 'test', 'fixtures', 'qa-eval-spa-ground-truth.json'), 'utf-8')
|
||
);
|
||
expect(groundTruth.bugs).toHaveLength(5);
|
||
expect(groundTruth.total_bugs).toBe(5);
|
||
});
|
||
|
||
test('qa-eval-checkout ground truth has exactly 5 planted bugs', () => {
|
||
const groundTruth = JSON.parse(
|
||
fs.readFileSync(path.join(ROOT, 'test', 'fixtures', 'qa-eval-checkout-ground-truth.json'), 'utf-8')
|
||
);
|
||
expect(groundTruth.bugs).toHaveLength(5);
|
||
expect(groundTruth.total_bugs).toBe(5);
|
||
});
|
||
|
||
test('qa-eval.html contains the planted bugs', () => {
|
||
const html = fs.readFileSync(path.join(ROOT, 'browse', 'test', 'fixtures', 'qa-eval.html'), 'utf-8');
|
||
// BUG 1: broken link
|
||
expect(html).toContain('/nonexistent-404-page');
|
||
// BUG 2: disabled submit
|
||
expect(html).toContain('disabled');
|
||
// BUG 3: overflow
|
||
expect(html).toContain('overflow: hidden');
|
||
// BUG 4: missing alt
|
||
expect(html).toMatch(/<img[^>]*src="\/logo\.png"[^>]*>/);
|
||
expect(html).not.toMatch(/<img[^>]*src="\/logo\.png"[^>]*alt=/);
|
||
// BUG 5: console error
|
||
expect(html).toContain("Cannot read properties of undefined");
|
||
});
|
||
|
||
test('review-eval-vuln.rb contains expected vulnerability patterns', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'test', 'fixtures', 'review-eval-vuln.rb'), 'utf-8');
|
||
expect(content).toContain('params[:id]');
|
||
expect(content).toContain('update_column');
|
||
});
|
||
});
|
||
|
||
// --- CEO review mode validation ---
|
||
|
||
describe('CEO review mode validation', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
||
|
||
test('has all four CEO review modes defined', () => {
|
||
const modes = ['SCOPE EXPANSION', 'SELECTIVE EXPANSION', 'HOLD SCOPE', 'SCOPE REDUCTION'];
|
||
for (const mode of modes) {
|
||
expect(content).toContain(mode);
|
||
}
|
||
});
|
||
|
||
test('has CEO plan persistence step', () => {
|
||
expect(content).toContain('ceo-plans');
|
||
expect(content).toContain('status: ACTIVE');
|
||
});
|
||
|
||
test('has docs/designs promotion section', () => {
|
||
expect(content).toContain('docs/designs');
|
||
expect(content).toContain('PROMOTED');
|
||
});
|
||
|
||
test('mode quick reference has four columns', () => {
|
||
expect(content).toContain('EXPANSION');
|
||
expect(content).toContain('SELECTIVE');
|
||
expect(content).toContain('HOLD SCOPE');
|
||
expect(content).toContain('REDUCTION');
|
||
});
|
||
|
||
// Skill chaining (benefits-from)
|
||
test('contains prerequisite skill offer for office-hours', () => {
|
||
expect(content).toContain('Prerequisite Skill Offer');
|
||
expect(content).toContain('/office-hours');
|
||
});
|
||
|
||
test('contains mid-session detection', () => {
|
||
expect(content).toContain('Mid-session detection');
|
||
expect(content).toMatch(/still figuring out|seems lost/i);
|
||
});
|
||
|
||
// Spec review on CEO plans
|
||
test('contains spec review loop for CEO plan documents', () => {
|
||
expect(content).toContain('Spec Review Loop');
|
||
});
|
||
});
|
||
|
||
// --- gstack-slug helper ---
|
||
|
||
describe('gstack-slug', () => {
|
||
const SLUG_BIN = path.join(ROOT, 'bin', 'gstack-slug');
|
||
|
||
test('binary exists and is executable', () => {
|
||
expect(fs.existsSync(SLUG_BIN)).toBe(true);
|
||
const stat = fs.statSync(SLUG_BIN);
|
||
expect(stat.mode & 0o111).toBeGreaterThan(0);
|
||
});
|
||
|
||
test('outputs SLUG and BRANCH lines in a git repo', () => {
|
||
const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' });
|
||
expect(result.exitCode).toBe(0);
|
||
const output = result.stdout.toString();
|
||
expect(output).toContain('SLUG=');
|
||
expect(output).toContain('BRANCH=');
|
||
});
|
||
|
||
test('SLUG does not contain forward slashes', () => {
|
||
const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' });
|
||
const slug = result.stdout.toString().match(/SLUG=(.*)/)?.[1] ?? '';
|
||
expect(slug).not.toContain('/');
|
||
expect(slug.length).toBeGreaterThan(0);
|
||
});
|
||
|
||
test('BRANCH does not contain forward slashes', () => {
|
||
const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' });
|
||
const branch = result.stdout.toString().match(/BRANCH=(.*)/)?.[1] ?? '';
|
||
expect(branch).not.toContain('/');
|
||
expect(branch.length).toBeGreaterThan(0);
|
||
});
|
||
|
||
test('output is eval-compatible (KEY=VALUE format)', () => {
|
||
const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' });
|
||
const lines = result.stdout.toString().trim().split('\n');
|
||
expect(lines.length).toBe(2);
|
||
expect(lines[0]).toMatch(/^SLUG=.+/);
|
||
expect(lines[1]).toMatch(/^BRANCH=.+/);
|
||
});
|
||
|
||
test('output values contain only safe characters (no shell metacharacters)', () => {
|
||
const result = Bun.spawnSync([SLUG_BIN], { cwd: ROOT, stdout: 'pipe', stderr: 'pipe' });
|
||
const slug = result.stdout.toString().match(/SLUG=(.*)/)?.[1] ?? '';
|
||
const branch = result.stdout.toString().match(/BRANCH=(.*)/)?.[1] ?? '';
|
||
// Only alphanumeric, dot, dash, underscore are allowed (#133)
|
||
expect(slug).toMatch(/^[a-zA-Z0-9._-]+$/);
|
||
expect(branch).toMatch(/^[a-zA-Z0-9._-]+$/);
|
||
});
|
||
test('eval sets variables under bash with set -euo pipefail', () => {
|
||
const result = Bun.spawnSync(
|
||
['bash', '-c', 'set -euo pipefail; eval "$(./bin/gstack-slug 2>/dev/null)"; echo "SLUG=$SLUG"; echo "BRANCH=$BRANCH"'],
|
||
{ cwd: ROOT, stdout: 'pipe', stderr: 'pipe' }
|
||
);
|
||
expect(result.exitCode).toBe(0);
|
||
const output = result.stdout.toString();
|
||
expect(output).toMatch(/^SLUG=.+/m);
|
||
expect(output).toMatch(/^BRANCH=.+/m);
|
||
});
|
||
|
||
test('no templates or bin scripts use source process substitution for gstack-slug', () => {
|
||
const result = Bun.spawnSync(
|
||
['grep', '-r', 'source <(.*gstack-slug', '--include=*.tmpl', '--include=gstack-review-*', '.'],
|
||
{ cwd: ROOT, stdout: 'pipe', stderr: 'pipe' }
|
||
);
|
||
// grep returns exit code 1 when no matches found — that's what we want
|
||
expect(result.stdout.toString().trim()).toBe('');
|
||
});
|
||
});
|
||
|
||
// --- Test Bootstrap validation ---
|
||
|
||
describe('Test Bootstrap ({{TEST_BOOTSTRAP}}) integration', () => {
|
||
test('TEST_BOOTSTRAP resolver produces valid content', () => {
|
||
const qaContent = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(qaContent).toContain('Test Framework Bootstrap');
|
||
expect(qaContent).toContain('RUNTIME:ruby');
|
||
expect(qaContent).toContain('RUNTIME:node');
|
||
expect(qaContent).toContain('RUNTIME:python');
|
||
expect(qaContent).toContain('no-test-bootstrap');
|
||
expect(qaContent).toContain('BOOTSTRAP_DECLINED');
|
||
});
|
||
|
||
test('TEST_BOOTSTRAP appears in qa/SKILL.md', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Test Framework Bootstrap');
|
||
expect(content).toContain('TESTING.md');
|
||
expect(content).toContain('CLAUDE.md');
|
||
});
|
||
|
||
test('TEST_BOOTSTRAP appears in ship/SKILL.md', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Test Framework Bootstrap');
|
||
expect(content).toContain('Step 2.5');
|
||
});
|
||
|
||
test('TEST_BOOTSTRAP appears in design-review/SKILL.md', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'design-review', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Test Framework Bootstrap');
|
||
});
|
||
|
||
test('TEST_BOOTSTRAP does NOT appear in qa-only/SKILL.md', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa-only', 'SKILL.md'), 'utf-8');
|
||
expect(content).not.toContain('Test Framework Bootstrap');
|
||
// But should have the recommendation note
|
||
expect(content).toContain('No test framework detected');
|
||
expect(content).toContain('Run `/qa` to bootstrap');
|
||
});
|
||
|
||
test('bootstrap includes framework knowledge table', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('vitest');
|
||
expect(content).toContain('minitest');
|
||
expect(content).toContain('pytest');
|
||
expect(content).toContain('cargo test');
|
||
expect(content).toContain('phpunit');
|
||
expect(content).toContain('ExUnit');
|
||
});
|
||
|
||
test('bootstrap includes CI/CD pipeline generation', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('.github/workflows/test.yml');
|
||
expect(content).toContain('GitHub Actions');
|
||
});
|
||
|
||
test('bootstrap includes first real tests step', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('First real tests');
|
||
expect(content).toContain('git log --since=30.days');
|
||
expect(content).toContain('Prioritize by risk');
|
||
});
|
||
|
||
test('bootstrap includes vibe coding philosophy', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('vibe coding');
|
||
expect(content).toContain('100% test coverage');
|
||
});
|
||
|
||
test('WebSearch is in allowed-tools for qa, ship, design-review', () => {
|
||
const qa = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
const ship = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
const qaDesign = fs.readFileSync(path.join(ROOT, 'design-review', 'SKILL.md'), 'utf-8');
|
||
expect(qa).toContain('WebSearch');
|
||
expect(ship).toContain('WebSearch');
|
||
expect(qaDesign).toContain('WebSearch');
|
||
});
|
||
});
|
||
|
||
// --- Phase 8e.5 regression test validation ---
|
||
|
||
describe('Phase 8e.5 regression test generation', () => {
|
||
test('qa/SKILL.md contains Phase 8e.5', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('8e.5. Regression Test');
|
||
expect(content).toContain('test(qa): regression test');
|
||
expect(content).toContain('WTF-likelihood exclusion');
|
||
});
|
||
|
||
test('qa/SKILL.md Rule 13 is amended for regression tests', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Only modify tests when generating regression tests in Phase 8e.5');
|
||
expect(content).not.toContain('Never modify tests or CI configuration');
|
||
});
|
||
|
||
test('design-review has CSS-aware Phase 8e.5 variant', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'design-review', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('8e.5. Regression Test (design-review variant)');
|
||
expect(content).toContain('CSS-only');
|
||
expect(content).toContain('test(design): regression test');
|
||
});
|
||
|
||
test('regression test includes full attribution comment format', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('// Regression: ISSUE-NNN');
|
||
expect(content).toContain('// Found by /qa on');
|
||
expect(content).toContain('// Report: .gstack/qa-reports/');
|
||
});
|
||
|
||
test('regression test uses auto-incrementing names', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('auto-incrementing');
|
||
expect(content).toContain('max number + 1');
|
||
});
|
||
});
|
||
|
||
// --- Step 3.4 coverage audit validation ---
|
||
|
||
describe('Step 3.4 test coverage audit', () => {
|
||
test('ship/SKILL.md contains Step 3.4', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Step 3.4: Test Coverage Audit');
|
||
expect(content).toContain('CODE PATH COVERAGE');
|
||
});
|
||
|
||
test('Step 3.4 includes quality scoring rubric', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('★★★');
|
||
expect(content).toContain('★★');
|
||
expect(content).toContain('edge cases AND error paths');
|
||
expect(content).toContain('happy path only');
|
||
});
|
||
|
||
test('Step 3.4 includes before/after test count', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Count test files before');
|
||
expect(content).toContain('Count test files after');
|
||
});
|
||
|
||
test('ship PR body includes Test Coverage section', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('## Test Coverage');
|
||
});
|
||
|
||
test('ship rules include test generation rule', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Step 3.4 generates coverage tests');
|
||
expect(content).toContain('Never commit failing tests');
|
||
});
|
||
|
||
test('Step 3.4 includes vibe coding philosophy', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('vibe coding becomes yolo coding');
|
||
});
|
||
|
||
test('Step 3.4 traces actual codepaths, not just syntax', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Trace every codepath');
|
||
expect(content).toContain('Trace data flow');
|
||
expect(content).toContain('Diagram the execution');
|
||
});
|
||
|
||
test('Step 3.4 maps user flows and interaction edge cases', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Map user flows');
|
||
expect(content).toContain('Interaction edge cases');
|
||
expect(content).toContain('Double-click');
|
||
expect(content).toContain('Navigate away');
|
||
expect(content).toContain('Error states the user can see');
|
||
expect(content).toContain('Empty/zero/boundary states');
|
||
});
|
||
|
||
test('Step 3.4 diagram includes USER FLOW COVERAGE section', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('USER FLOW COVERAGE');
|
||
expect(content).toContain('Code paths:');
|
||
expect(content).toContain('User flows:');
|
||
});
|
||
});
|
||
|
||
// --- Retro test health validation ---
|
||
|
||
describe('Retro test health tracking', () => {
|
||
test('retro/SKILL.md has test health data gathering commands', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('# 10. Test file count');
|
||
expect(content).toContain('# 11. Regression test commits');
|
||
expect(content).toContain('# 12. Test files changed');
|
||
});
|
||
|
||
test('retro/SKILL.md has Test Health metrics row', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Test Health');
|
||
expect(content).toContain('regression tests');
|
||
});
|
||
|
||
test('retro/SKILL.md has Test Health narrative section', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('### Test Health');
|
||
expect(content).toContain('Total test files');
|
||
expect(content).toContain('vibe coding safe');
|
||
});
|
||
|
||
test('retro JSON schema includes test_health field', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'retro', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('test_health');
|
||
expect(content).toContain('total_test_files');
|
||
expect(content).toContain('regression_test_commits');
|
||
});
|
||
});
|
||
|
||
// --- QA report template regression tests section ---
|
||
|
||
describe('QA report template', () => {
|
||
test('qa-report-template.md has Regression Tests section', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'qa', 'templates', 'qa-report-template.md'), 'utf-8');
|
||
expect(content).toContain('## Regression Tests');
|
||
expect(content).toContain('committed / deferred / skipped');
|
||
expect(content).toContain('### Deferred Tests');
|
||
expect(content).toContain('**Precondition:**');
|
||
});
|
||
});
|
||
|
||
// --- Codex skill validation ---
|
||
|
||
describe('Codex skill', () => {
|
||
test('codex/SKILL.md exists and has correct frontmatter', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('name: codex');
|
||
expect(content).toContain('version: 1.0.0');
|
||
expect(content).toContain('allowed-tools:');
|
||
});
|
||
|
||
test('codex/SKILL.md contains all three modes', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Step 2A: Review Mode');
|
||
expect(content).toContain('Step 2B: Challenge');
|
||
expect(content).toContain('Step 2C: Consult Mode');
|
||
});
|
||
|
||
test('codex/SKILL.md contains gate verdict logic', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('[P1]');
|
||
expect(content).toContain('GATE: PASS');
|
||
expect(content).toContain('GATE: FAIL');
|
||
});
|
||
|
||
test('codex/SKILL.md contains session continuity', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('codex-session-id');
|
||
expect(content).toContain('codex exec resume');
|
||
});
|
||
|
||
test('codex/SKILL.md contains cost tracking', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('tokens used');
|
||
expect(content).toContain('Est. cost');
|
||
});
|
||
|
||
test('codex/SKILL.md contains cross-model comparison', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('CROSS-MODEL ANALYSIS');
|
||
expect(content).toContain('Agreement rate');
|
||
});
|
||
|
||
test('codex/SKILL.md contains review log persistence', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('codex-review');
|
||
expect(content).toContain('gstack-review-log');
|
||
});
|
||
|
||
test('codex/SKILL.md uses which for binary discovery, not hardcoded path', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('which codex');
|
||
expect(content).not.toContain('/opt/homebrew/bin/codex');
|
||
});
|
||
|
||
test('codex/SKILL.md contains error handling for missing binary and auth', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('NOT_FOUND');
|
||
expect(content).toContain('codex login');
|
||
});
|
||
|
||
test('codex/SKILL.md uses mktemp for temp files', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'codex', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('mktemp');
|
||
});
|
||
|
||
test('adversarial review in /review auto-scales by diff size', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Adversarial review (auto-scaled)');
|
||
// Diff size thresholds
|
||
expect(content).toContain('< 50');
|
||
expect(content).toContain('50–199');
|
||
expect(content).toContain('200+');
|
||
// All three tiers present
|
||
expect(content).toContain('Small');
|
||
expect(content).toContain('Medium tier');
|
||
expect(content).toContain('Large tier');
|
||
// Claude adversarial subagent dispatch
|
||
expect(content).toContain('Agent tool');
|
||
expect(content).toContain('FIXABLE');
|
||
expect(content).toContain('INVESTIGATE');
|
||
// Codex fallback logic
|
||
expect(content).toContain('CODEX_NOT_AVAILABLE');
|
||
expect(content).toContain('fall back to the Claude adversarial subagent');
|
||
// Review log uses new skill name
|
||
expect(content).toContain('adversarial-review');
|
||
expect(content).toContain('reasoning_effort="high"');
|
||
expect(content).toContain('ADVERSARIAL REVIEW SYNTHESIS');
|
||
});
|
||
|
||
test('adversarial review in /ship auto-scales by diff size', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Adversarial review (auto-scaled)');
|
||
expect(content).toContain('< 50');
|
||
expect(content).toContain('200+');
|
||
expect(content).toContain('adversarial-review');
|
||
expect(content).toContain('reasoning_effort="high"');
|
||
expect(content).toContain('Investigate and fix');
|
||
});
|
||
|
||
test('codex-host ship/review do NOT contain adversarial review step', () => {
|
||
// .agents/ is gitignored — generate on demand
|
||
Bun.spawnSync(['bun', 'run', 'scripts/gen-skill-docs.ts', '--host', 'codex'], {
|
||
cwd: ROOT, stdout: 'pipe', stderr: 'pipe',
|
||
});
|
||
const shipContent = fs.readFileSync(path.join(ROOT, '.agents', 'skills', 'gstack-ship', 'SKILL.md'), 'utf-8');
|
||
expect(shipContent).not.toContain('codex review --base');
|
||
expect(shipContent).not.toContain('CODEX_REVIEWS');
|
||
|
||
const reviewContent = fs.readFileSync(path.join(ROOT, '.agents', 'skills', 'gstack-review', 'SKILL.md'), 'utf-8');
|
||
expect(reviewContent).not.toContain('codex review --base');
|
||
expect(reviewContent).not.toContain('codex_reviews');
|
||
expect(reviewContent).not.toContain('CODEX_REVIEWS');
|
||
expect(reviewContent).not.toContain('adversarial-review');
|
||
expect(reviewContent).not.toContain('Investigate and fix');
|
||
});
|
||
|
||
test('codex integration in /plan-eng-review offers plan critique', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Codex');
|
||
expect(content).toContain('codex exec');
|
||
});
|
||
|
||
test('/review persists a review-log entry for ship readiness', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'review', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('"skill":"review"');
|
||
expect(content).toContain('"issues_found":N');
|
||
expect(content).toContain('Persist Eng Review result');
|
||
});
|
||
|
||
test('Review Readiness Dashboard includes Adversarial Review row', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Adversarial');
|
||
expect(content).toContain('codex-review');
|
||
});
|
||
});
|
||
|
||
// --- Trigger phrase validation ---
|
||
|
||
describe('Skill trigger phrases', () => {
|
||
// Skills that must have "Use when" trigger phrases in their description.
|
||
// Excluded: root gstack (browser tool), gstack-upgrade (gstack-specific),
|
||
// humanizer (text tool)
|
||
const SKILLS_REQUIRING_TRIGGERS = [
|
||
'qa', 'qa-only', 'ship', 'review', 'investigate', 'office-hours',
|
||
'plan-ceo-review', 'plan-eng-review', 'plan-design-review',
|
||
'design-review', 'design-consultation', 'retro', 'document-release',
|
||
'codex', 'browse', 'setup-browser-cookies',
|
||
];
|
||
|
||
for (const skill of SKILLS_REQUIRING_TRIGGERS) {
|
||
test(`${skill}/SKILL.md has "Use when" trigger phrases`, () => {
|
||
const skillPath = path.join(ROOT, skill, 'SKILL.md');
|
||
if (!fs.existsSync(skillPath)) return;
|
||
const content = fs.readFileSync(skillPath, 'utf-8');
|
||
// Extract description from frontmatter
|
||
const frontmatterEnd = content.indexOf('---', 4);
|
||
const frontmatter = content.slice(0, frontmatterEnd);
|
||
expect(frontmatter).toMatch(/Use when/i);
|
||
});
|
||
}
|
||
|
||
// Skills with proactive triggers should have "Proactively suggest" in description
|
||
const SKILLS_REQUIRING_PROACTIVE = [
|
||
'qa', 'qa-only', 'ship', 'review', 'investigate', 'office-hours',
|
||
'plan-ceo-review', 'plan-eng-review', 'plan-design-review',
|
||
'design-review', 'design-consultation', 'retro', 'document-release',
|
||
];
|
||
|
||
for (const skill of SKILLS_REQUIRING_PROACTIVE) {
|
||
test(`${skill}/SKILL.md has proactive routing phrase`, () => {
|
||
const skillPath = path.join(ROOT, skill, 'SKILL.md');
|
||
if (!fs.existsSync(skillPath)) return;
|
||
const content = fs.readFileSync(skillPath, 'utf-8');
|
||
const frontmatterEnd = content.indexOf('---', 4);
|
||
const frontmatter = content.slice(0, frontmatterEnd);
|
||
expect(frontmatter).toMatch(/Proactively (suggest|invoke)/i);
|
||
});
|
||
}
|
||
});
|
||
|
||
// ─── Codex Skill Validation ──────────────────────────────────
|
||
|
||
describe('Codex skill validation', () => {
|
||
const AGENTS_DIR = path.join(ROOT, '.agents', 'skills');
|
||
|
||
// .agents/ is gitignored (v0.11.2.0) — generate on demand for tests
|
||
Bun.spawnSync(['bun', 'run', 'scripts/gen-skill-docs.ts', '--host', 'codex'], {
|
||
cwd: ROOT, stdout: 'pipe', stderr: 'pipe',
|
||
});
|
||
|
||
// Discover all Claude skills with templates (except /codex which is Claude-only)
|
||
const CLAUDE_SKILLS_WITH_TEMPLATES = (() => {
|
||
const skills: string[] = [];
|
||
for (const entry of fs.readdirSync(ROOT, { withFileTypes: true })) {
|
||
if (!entry.isDirectory() || entry.name.startsWith('.') || entry.name === 'node_modules') continue;
|
||
if (entry.name === 'codex') continue; // Claude-only skill
|
||
if (fs.existsSync(path.join(ROOT, entry.name, 'SKILL.md.tmpl'))) {
|
||
skills.push(entry.name);
|
||
}
|
||
}
|
||
return skills;
|
||
})();
|
||
|
||
test('all skills (except /codex) have both Claude and Codex variants', () => {
|
||
for (const skillDir of CLAUDE_SKILLS_WITH_TEMPLATES) {
|
||
// Claude variant
|
||
const claudeMd = path.join(ROOT, skillDir, 'SKILL.md');
|
||
expect(fs.existsSync(claudeMd)).toBe(true);
|
||
|
||
// Codex variant
|
||
const codexName = skillDir.startsWith('gstack-') ? skillDir : `gstack-${skillDir}`;
|
||
const codexMd = path.join(AGENTS_DIR, codexName, 'SKILL.md');
|
||
expect(fs.existsSync(codexMd)).toBe(true);
|
||
}
|
||
// Root template has both too
|
||
expect(fs.existsSync(path.join(ROOT, 'SKILL.md'))).toBe(true);
|
||
expect(fs.existsSync(path.join(AGENTS_DIR, 'gstack', 'SKILL.md'))).toBe(true);
|
||
});
|
||
|
||
test('/codex skill is Claude-only — no Codex variant', () => {
|
||
// Claude variant should exist
|
||
expect(fs.existsSync(path.join(ROOT, 'codex', 'SKILL.md'))).toBe(true);
|
||
// Codex variant must NOT exist
|
||
expect(fs.existsSync(path.join(AGENTS_DIR, 'gstack-codex', 'SKILL.md'))).toBe(false);
|
||
});
|
||
|
||
test('Codex skill names follow gstack-{name} convention', () => {
|
||
const codexDirs = fs.readdirSync(AGENTS_DIR);
|
||
for (const dir of codexDirs) {
|
||
// Every directory should start with gstack
|
||
expect(dir.startsWith('gstack')).toBe(true);
|
||
// Root is just 'gstack', others are 'gstack-{name}'
|
||
if (dir !== 'gstack') {
|
||
expect(dir.startsWith('gstack-')).toBe(true);
|
||
}
|
||
}
|
||
});
|
||
|
||
test('$B commands in Codex SKILL.md files are valid browse commands', () => {
|
||
const codexDirs = fs.readdirSync(AGENTS_DIR);
|
||
for (const dir of codexDirs) {
|
||
const skillMd = path.join(AGENTS_DIR, dir, 'SKILL.md');
|
||
if (!fs.existsSync(skillMd)) continue;
|
||
const content = fs.readFileSync(skillMd, 'utf-8');
|
||
// Only validate if the skill contains $B commands
|
||
if (!content.includes('$B ')) continue;
|
||
const result = validateSkill(skillMd);
|
||
expect(result.invalid).toHaveLength(0);
|
||
}
|
||
});
|
||
});
|
||
|
||
// --- Repo mode and test failure triage validation ---
|
||
|
||
describe('Repo mode preamble validation', () => {
|
||
test('generated SKILL.md preamble contains REPO_MODE output', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('REPO_MODE:');
|
||
expect(content).toContain('gstack-repo-mode');
|
||
});
|
||
|
||
test('tier 3+ skills contain See Something Say Something section', () => {
|
||
// Root SKILL.md is tier 1 (no Repo Mode). Check a tier 3 skill instead.
|
||
const content = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('See Something, Say Something');
|
||
expect(content).toContain('REPO_MODE');
|
||
expect(content).toContain('solo');
|
||
expect(content).toContain('collaborative');
|
||
});
|
||
});
|
||
|
||
describe('Test failure triage in ship skill', () => {
|
||
test('ship/SKILL.md contains Test Failure Ownership Triage', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('Test Failure Ownership Triage');
|
||
});
|
||
|
||
test('ship/SKILL.md triage uses git diff for classification', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('git diff origin/<base>...HEAD --name-only');
|
||
});
|
||
|
||
test('ship/SKILL.md triage has solo and collaborative paths', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('REPO_MODE');
|
||
expect(content).toContain('solo');
|
||
expect(content).toContain('collaborative');
|
||
expect(content).toContain('Investigate and fix now');
|
||
expect(content).toContain('Add as P0 TODO');
|
||
});
|
||
|
||
test('ship/SKILL.md triage has GitHub issue assignment for collaborative mode', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('gh issue create');
|
||
expect(content).toContain('--assignee');
|
||
});
|
||
|
||
test('{{TEST_FAILURE_TRIAGE}} placeholder is fully resolved in ship/SKILL.md', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).not.toContain('{{TEST_FAILURE_TRIAGE}}');
|
||
});
|
||
|
||
test('ship/SKILL.md uses in-branch language for stop condition', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||
expect(content).toContain('In-branch test failures');
|
||
});
|
||
});
|
||
|
||
describe('sidebar agent (#584)', () => {
|
||
// #584 — Sidebar Write: sidebar-agent.ts allowedTools includes Write
|
||
test('sidebar-agent.ts allowedTools includes Write', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'browse', 'src', 'sidebar-agent.ts'), 'utf-8');
|
||
// Find the allowedTools line in the askClaude function
|
||
const match = content.match(/--allowedTools['"]\s*,\s*['"]([^'"]+)['"]/);
|
||
expect(match).not.toBeNull();
|
||
expect(match![1]).toContain('Write');
|
||
});
|
||
|
||
// #584 — Server Write: server.ts allowedTools includes Write (DRY parity)
|
||
test('server.ts allowedTools excludes Write (agent is read-only + Bash)', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'browse', 'src', 'server.ts'), 'utf-8');
|
||
// Find the sidebar allowedTools in the headed-mode path
|
||
const match = content.match(/--allowedTools['"]\s*,\s*['"]([^'"]+)['"]/);
|
||
expect(match).not.toBeNull();
|
||
expect(match![1]).toContain('Bash');
|
||
expect(match![1]).not.toContain('Write');
|
||
});
|
||
|
||
// #584 — Sidebar stderr: stderr handler is not empty
|
||
test('sidebar-agent.ts stderr handler is not empty', () => {
|
||
const content = fs.readFileSync(path.join(ROOT, 'browse', 'src', 'sidebar-agent.ts'), 'utf-8');
|
||
// The stderr handler should NOT be an empty arrow function
|
||
expect(content).not.toContain("proc.stderr.on('data', () => {})");
|
||
});
|
||
});
|