Files
gstack/scripts/resolvers/utility.ts
Garry Tan 66c09644a7 feat: composable skills — INVOKE_SKILL resolver + factoring infrastructure (v0.13.7.0) (#644)
* feat: add parameterized resolver support to gen-skill-docs

Extend the placeholder regex from {{WORD}} to {{WORD:arg1:arg2}},
enabling parameterized resolvers like {{INVOKE_SKILL:plan-ceo-review}}.

- Widen ResolverFn type to accept optional args?: string[]
- Update RESOLVERS record to use ResolverFn type
- Both replacement and unresolved-check regexes updated
- Fully backward compatible: existing {{WORD}} patterns unchanged

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add INVOKE_SKILL resolver for composable skill loading

New composition.ts resolver module that emits prose instructing Claude
to read another skill's SKILL.md and follow it, skipping preamble
sections. Supports optional skip= parameter for additional sections.

Usage: {{INVOKE_SKILL:plan-ceo-review}} or
       {{INVOKE_SKILL:plan-ceo-review:skip=Outside Voice}}

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: use frontmatter name: for skill symlinks and Codex paths

Patch all 3 name-derivation paths to read name: from SKILL.md
frontmatter instead of relying solely on directory basenames.
This enables directory names that differ from invocation names
(e.g., run-tests/ directory with name: test).

- setup: link_claude_skill_dirs reads name: via grep, falls back to basename
- gen-skill-docs.ts: codexSkillName uses frontmatter name for Codex output paths
- gen-skill-docs.ts: moved frontmatter extraction before Codex path logic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: extract CHANGELOG_WORKFLOW resolver from /ship

Move changelog generation logic into a reusable resolver. The resolver
is changelog-only (no version bump per Codex review recommendation).
Adds voice rules inline. /ship Step 5 now uses {{CHANGELOG_WORKFLOW}}.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: use INVOKE_SKILL resolver for plan-ceo-review office-hours fallback

Replace inline skill loading prose (read file, skip sections) with
{{INVOKE_SKILL:office-hours}} in the mid-session detection path.
The BENEFITS_FROM prerequisite offer is unchanged (separate use case).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: BENEFITS_FROM resolver delegates to INVOKE_SKILL

Eliminate duplicated skip-list logic by having generateBenefitsFrom
call generateInvokeSkill internally. The wrapper (AskUserQuestion,
design doc re-check) stays in BENEFITS_FROM. The loading instructions
(read file, skip sections, error handling) come from INVOKE_SKILL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add resolver tests for INVOKE_SKILL, CHANGELOG_WORKFLOW, parameterized args

12 new tests covering:
- INVOKE_SKILL: template placeholder, default skip list, error handling,
  BENEFITS_FROM delegation
- CHANGELOG_WORKFLOW: content, cross-check, voice guidance, format
- Parameterized resolver infra: colon-separated args processing,
  no unresolved placeholders across all generated SKILL.md files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.7.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: journey routing tests — CLAUDE.md routing rules + stronger descriptions

Three journey E2E tests (ideation, ship, debug) were failing because
Claude answered directly instead of invoking the Skill tool. Root cause:
skill descriptions in system-reminder are too weak to override Claude's
default behavior for tasks it can handle natively.

Fix has two parts:
1. CLAUDE.md routing rules in test workdir — Claude weighs project-level
   instructions higher than skill description metadata
2. "Proactively invoke" (not "suggest") in office-hours, investigate,
   ship descriptions — reinforces the routing signal

10/10 journey tests now pass (was 7/10).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: one-time CLAUDE.md routing injection prompt

Add a preamble section that checks if the project's CLAUDE.md has
skill routing rules. If not (and user hasn't declined), asks once
via AskUserQuestion to inject a "## Skill routing" section.

Root cause: skill descriptions in system-reminder metadata are too
weak to reliably trigger proactive Skill tool invocation. CLAUDE.md
project instructions carry higher weight in Claude's decision making.

- Preamble bash checks for "## Skill routing" in CLAUDE.md
- Stores decline in gstack-config (routing_declined=true)
- Only asks once per project (HAS_ROUTING check + config check)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: annotated config file + routing injection tests

gstack-config now writes a documented header on first config creation
with every supported key explained (proactive, telemetry, auto_upgrade,
skill_prefix, routing_declined, codex_reviews, skip_eng_review, etc.).
Users can edit ~/.gstack/config.yaml directly, anytime.

Also fixes grep to use ^KEY: anchoring so commented header lines don't
shadow real config values.

Tests added:
- 7 new gstack-config tests (annotated header, no duplication, comment
  safety, routing_declined get/set/reset)
- 6 new gen-skill-docs tests (preamble routing injection: bash checks,
  config reads, AskUserQuestion, decline persistence, routing rules)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump to v0.13.9.0, separate CHANGELOG from main's releases

Split our branch's changes into a new 0.13.9.0 entry instead of
jamming them into 0.13.7.0 which already landed on main as
"Community Wave."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: clarify branch-scoped VERSION/CHANGELOG after merging main

Add explicit rules: merging main doesn't mean adopting main's version.
Branch always gets its own entry on top with a higher version number.
Three-point checklist after every merge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: put our 0.13.9.0 entry on top of CHANGELOG

Newest version goes on top. Our branch lands next, so our entry
must be above main's 0.13.8.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restore missing 0.13.7.0 Community Wave entry

Accidentally dropped the 0.13.7.0 entry when reordering.
All entries now present: 0.13.9.0 > 0.13.8.0 > 0.13.7.0 > 0.13.6.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add CHANGELOG integrity check rule

After any edit that moves/adds/removes entries, grep for version
headers and verify no gaps or duplicates before committing.
Prevents accidentally dropping entries during reordering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 23:35:17 -06:00

422 lines
17 KiB
TypeScript
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
import type { TemplateContext } from './types';
export function generateSlugEval(ctx: TemplateContext): string {
return `eval "$(${ctx.paths.binDir}/gstack-slug 2>/dev/null)"`;
}
export function generateSlugSetup(ctx: TemplateContext): string {
return `eval "$(${ctx.paths.binDir}/gstack-slug 2>/dev/null)" && mkdir -p ~/.gstack/projects/$SLUG`;
}
export function generateBaseBranchDetect(_ctx: TemplateContext): string {
return `## Step 0: Detect platform and base branch
First, detect the git hosting platform from the remote URL:
\`\`\`bash
git remote get-url origin 2>/dev/null
\`\`\`
- If the URL contains "github.com" → platform is **GitHub**
- If the URL contains "gitlab" → platform is **GitLab**
- Otherwise, check CLI availability:
- \`gh auth status 2>/dev/null\` succeeds → platform is **GitHub** (covers GitHub Enterprise)
- \`glab auth status 2>/dev/null\` succeeds → platform is **GitLab** (covers self-hosted)
- Neither → **unknown** (use git-native commands only)
Determine which branch this PR/MR targets, or the repo's default branch if no
PR/MR exists. Use the result as "the base branch" in all subsequent steps.
**If GitHub:**
1. \`gh pr view --json baseRefName -q .baseRefName\` — if succeeds, use it
2. \`gh repo view --json defaultBranchRef -q .defaultBranchRef.name\` — if succeeds, use it
**If GitLab:**
1. \`glab mr view -F json 2>/dev/null\` and extract the \`target_branch\` field — if succeeds, use it
2. \`glab repo view -F json 2>/dev/null\` and extract the \`default_branch\` field — if succeeds, use it
**Git-native fallback (if unknown platform, or CLI commands fail):**
1. \`git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's|refs/remotes/origin/||'\`
2. If that fails: \`git rev-parse --verify origin/main 2>/dev/null\` → use \`main\`
3. If that fails: \`git rev-parse --verify origin/master 2>/dev/null\` → use \`master\`
If all fail, fall back to \`main\`.
Print the detected base branch name. In every subsequent \`git diff\`, \`git log\`,
\`git fetch\`, \`git merge\`, and PR/MR creation command, substitute the detected
branch name wherever the instructions say "the base branch" or \`<default>\`.
---`;
}
export function generateDeployBootstrap(_ctx: TemplateContext): string {
return `\`\`\`bash
# Check for persisted deploy config in CLAUDE.md
DEPLOY_CONFIG=$(grep -A 20 "## Deploy Configuration" CLAUDE.md 2>/dev/null || echo "NO_CONFIG")
echo "$DEPLOY_CONFIG"
# If config exists, parse it
if [ "$DEPLOY_CONFIG" != "NO_CONFIG" ]; then
PROD_URL=$(echo "$DEPLOY_CONFIG" | grep -i "production.*url" | head -1 | sed 's/.*: *//')
PLATFORM=$(echo "$DEPLOY_CONFIG" | grep -i "platform" | head -1 | sed 's/.*: *//')
echo "PERSISTED_PLATFORM:$PLATFORM"
echo "PERSISTED_URL:$PROD_URL"
fi
# Auto-detect platform from config files
[ -f fly.toml ] && echo "PLATFORM:fly"
[ -f render.yaml ] && echo "PLATFORM:render"
([ -f vercel.json ] || [ -d .vercel ]) && echo "PLATFORM:vercel"
[ -f netlify.toml ] && echo "PLATFORM:netlify"
[ -f Procfile ] && echo "PLATFORM:heroku"
([ -f railway.json ] || [ -f railway.toml ]) && echo "PLATFORM:railway"
# Detect deploy workflows
for f in $(find .github/workflows -maxdepth 1 \\( -name '*.yml' -o -name '*.yaml' \\) 2>/dev/null); do
[ -f "$f" ] && grep -qiE "deploy|release|production|cd" "$f" 2>/dev/null && echo "DEPLOY_WORKFLOW:$f"
[ -f "$f" ] && grep -qiE "staging" "$f" 2>/dev/null && echo "STAGING_WORKFLOW:$f"
done
\`\`\`
If \`PERSISTED_PLATFORM\` and \`PERSISTED_URL\` were found in CLAUDE.md, use them directly
and skip manual detection. If no persisted config exists, use the auto-detected platform
to guide deploy verification. If nothing is detected, ask the user via AskUserQuestion
in the decision tree below.
If you want to persist deploy settings for future runs, suggest the user run \`/setup-deploy\`.`;
}
export function generateQAMethodology(_ctx: TemplateContext): string {
return `## Modes
### Diff-aware (automatic when on a feature branch with no URL)
This is the **primary mode** for developers verifying their work. When the user says \`/qa\` without a URL and the repo is on a feature branch, automatically:
1. **Analyze the branch diff** to understand what changed:
\`\`\`bash
git diff main...HEAD --name-only
git log main..HEAD --oneline
\`\`\`
2. **Identify affected pages/routes** from the changed files:
- Controller/route files → which URL paths they serve
- View/template/component files → which pages render them
- Model/service files → which pages use those models (check controllers that reference them)
- CSS/style files → which pages include those stylesheets
- API endpoints → test them directly with \`$B js "await fetch('/api/...')"\`
- Static pages (markdown, HTML) → navigate to them directly
**If no obvious pages/routes are identified from the diff:** Do not skip browser testing. The user invoked /qa because they want browser-based verification. Fall back to Quick mode — navigate to the homepage, follow the top 5 navigation targets, check console for errors, and test any interactive elements found. Backend, config, and infrastructure changes affect app behavior — always verify the app still works.
3. **Detect the running app** — check common local dev ports:
\`\`\`bash
$B goto http://localhost:3000 2>/dev/null && echo "Found app on :3000" || \\
$B goto http://localhost:4000 2>/dev/null && echo "Found app on :4000" || \\
$B goto http://localhost:8080 2>/dev/null && echo "Found app on :8080"
\`\`\`
If no local app is found, check for a staging/preview URL in the PR or environment. If nothing works, ask the user for the URL.
4. **Test each affected page/route:**
- Navigate to the page
- Take a screenshot
- Check console for errors
- If the change was interactive (forms, buttons, flows), test the interaction end-to-end
- Use \`snapshot -D\` before and after actions to verify the change had the expected effect
5. **Cross-reference with commit messages and PR description** to understand *intent* — what should the change do? Verify it actually does that.
6. **Check TODOS.md** (if it exists) for known bugs or issues related to the changed files. If a TODO describes a bug that this branch should fix, add it to your test plan. If you find a new bug during QA that isn't in TODOS.md, note it in the report.
7. **Report findings** scoped to the branch changes:
- "Changes tested: N pages/routes affected by this branch"
- For each: does it work? Screenshot evidence.
- Any regressions on adjacent pages?
**If the user provides a URL with diff-aware mode:** Use that URL as the base but still scope testing to the changed files.
### Full (default when URL is provided)
Systematic exploration. Visit every reachable page. Document 5-10 well-evidenced issues. Produce health score. Takes 5-15 minutes depending on app size.
### Quick (\`--quick\`)
30-second smoke test. Visit homepage + top 5 navigation targets. Check: page loads? Console errors? Broken links? Produce health score. No detailed issue documentation.
### Regression (\`--regression <baseline>\`)
Run full mode, then load \`baseline.json\` from a previous run. Diff: which issues are fixed? Which are new? What's the score delta? Append regression section to report.
---
## Workflow
### Phase 1: Initialize
1. Find browse binary (see Setup above)
2. Create output directories
3. Copy report template from \`qa/templates/qa-report-template.md\` to output dir
4. Start timer for duration tracking
### Phase 2: Authenticate (if needed)
**If the user specified auth credentials:**
\`\`\`bash
$B goto <login-url>
$B snapshot -i # find the login form
$B fill @e3 "user@example.com"
$B fill @e4 "[REDACTED]" # NEVER include real passwords in report
$B click @e5 # submit
$B snapshot -D # verify login succeeded
\`\`\`
**If the user provided a cookie file:**
\`\`\`bash
$B cookie-import cookies.json
$B goto <target-url>
\`\`\`
**If 2FA/OTP is required:** Ask the user for the code and wait.
**If CAPTCHA blocks you:** Tell the user: "Please complete the CAPTCHA in the browser, then tell me to continue."
### Phase 3: Orient
Get a map of the application:
\`\`\`bash
$B goto <target-url>
$B snapshot -i -a -o "$REPORT_DIR/screenshots/initial.png"
$B links # map navigation structure
$B console --errors # any errors on landing?
\`\`\`
**Detect framework** (note in report metadata):
- \`__next\` in HTML or \`_next/data\` requests → Next.js
- \`csrf-token\` meta tag → Rails
- \`wp-content\` in URLs → WordPress
- Client-side routing with no page reloads → SPA
**For SPAs:** The \`links\` command may return few results because navigation is client-side. Use \`snapshot -i\` to find nav elements (buttons, menu items) instead.
### Phase 4: Explore
Visit pages systematically. At each page:
\`\`\`bash
$B goto <page-url>
$B snapshot -i -a -o "$REPORT_DIR/screenshots/page-name.png"
$B console --errors
\`\`\`
Then follow the **per-page exploration checklist** (see \`qa/references/issue-taxonomy.md\`):
1. **Visual scan** — Look at the annotated screenshot for layout issues
2. **Interactive elements** — Click buttons, links, controls. Do they work?
3. **Forms** — Fill and submit. Test empty, invalid, edge cases
4. **Navigation** — Check all paths in and out
5. **States** — Empty state, loading, error, overflow
6. **Console** — Any new JS errors after interactions?
7. **Responsiveness** — Check mobile viewport if relevant:
\`\`\`bash
$B viewport 375x812
$B screenshot "$REPORT_DIR/screenshots/page-mobile.png"
$B viewport 1280x720
\`\`\`
**Depth judgment:** Spend more time on core features (homepage, dashboard, checkout, search) and less on secondary pages (about, terms, privacy).
**Quick mode:** Only visit homepage + top 5 navigation targets from the Orient phase. Skip the per-page checklist — just check: loads? Console errors? Broken links visible?
### Phase 5: Document
Document each issue **immediately when found** — don't batch them.
**Two evidence tiers:**
**Interactive bugs** (broken flows, dead buttons, form failures):
1. Take a screenshot before the action
2. Perform the action
3. Take a screenshot showing the result
4. Use \`snapshot -D\` to show what changed
5. Write repro steps referencing screenshots
\`\`\`bash
$B screenshot "$REPORT_DIR/screenshots/issue-001-step-1.png"
$B click @e5
$B screenshot "$REPORT_DIR/screenshots/issue-001-result.png"
$B snapshot -D
\`\`\`
**Static bugs** (typos, layout issues, missing images):
1. Take a single annotated screenshot showing the problem
2. Describe what's wrong
\`\`\`bash
$B snapshot -i -a -o "$REPORT_DIR/screenshots/issue-002.png"
\`\`\`
**Write each issue to the report immediately** using the template format from \`qa/templates/qa-report-template.md\`.
### Phase 6: Wrap Up
1. **Compute health score** using the rubric below
2. **Write "Top 3 Things to Fix"** — the 3 highest-severity issues
3. **Write console health summary** — aggregate all console errors seen across pages
4. **Update severity counts** in the summary table
5. **Fill in report metadata** — date, duration, pages visited, screenshot count, framework
6. **Save baseline** — write \`baseline.json\` with:
\`\`\`json
{
"date": "YYYY-MM-DD",
"url": "<target>",
"healthScore": N,
"issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }],
"categoryScores": { "console": N, "links": N, ... }
}
\`\`\`
**Regression mode:** After writing the report, load the baseline file. Compare:
- Health score delta
- Issues fixed (in baseline but not current)
- New issues (in current but not baseline)
- Append the regression section to the report
---
## Health Score Rubric
Compute each category score (0-100), then take the weighted average.
### Console (weight: 15%)
- 0 errors → 100
- 1-3 errors → 70
- 4-10 errors → 40
- 10+ errors → 10
### Links (weight: 10%)
- 0 broken → 100
- Each broken link → -15 (minimum 0)
### Per-Category Scoring (Visual, Functional, UX, Content, Performance, Accessibility)
Each category starts at 100. Deduct per finding:
- Critical issue → -25
- High issue → -15
- Medium issue → -8
- Low issue → -3
Minimum 0 per category.
### Weights
| Category | Weight |
|----------|--------|
| Console | 15% |
| Links | 10% |
| Visual | 10% |
| Functional | 20% |
| UX | 15% |
| Performance | 10% |
| Content | 5% |
| Accessibility | 15% |
### Final Score
\`score = Σ (category_score × weight)\`
---
## Framework-Specific Guidance
### Next.js
- Check console for hydration errors (\`Hydration failed\`, \`Text content did not match\`)
- Monitor \`_next/data\` requests in network — 404s indicate broken data fetching
- Test client-side navigation (click links, don't just \`goto\`) — catches routing issues
- Check for CLS (Cumulative Layout Shift) on pages with dynamic content
### Rails
- Check for N+1 query warnings in console (if development mode)
- Verify CSRF token presence in forms
- Test Turbo/Stimulus integration — do page transitions work smoothly?
- Check for flash messages appearing and dismissing correctly
### WordPress
- Check for plugin conflicts (JS errors from different plugins)
- Verify admin bar visibility for logged-in users
- Test REST API endpoints (\`/wp-json/\`)
- Check for mixed content warnings (common with WP)
### General SPA (React, Vue, Angular)
- Use \`snapshot -i\` for navigation — \`links\` command misses client-side routes
- Check for stale state (navigate away and back — does data refresh?)
- Test browser back/forward — does the app handle history correctly?
- Check for memory leaks (monitor console after extended use)
---
## Important Rules
1. **Repro is everything.** Every issue needs at least one screenshot. No exceptions.
2. **Verify before documenting.** Retry the issue once to confirm it's reproducible, not a fluke.
3. **Never include credentials.** Write \`[REDACTED]\` for passwords in repro steps.
4. **Write incrementally.** Append each issue to the report as you find it. Don't batch.
5. **Never read source code.** Test as a user, not a developer.
6. **Check console after every interaction.** JS errors that don't surface visually are still bugs.
7. **Test like a user.** Use realistic data. Walk through complete workflows end-to-end.
8. **Depth over breadth.** 5-10 well-documented issues with evidence > 20 vague descriptions.
9. **Never delete output files.** Screenshots and reports accumulate — that's intentional.
10. **Use \`snapshot -C\` for tricky UIs.** Finds clickable divs that the accessibility tree misses.
11. **Show screenshots to the user.** After every \`$B screenshot\`, \`$B snapshot -a -o\`, or \`$B responsive\` command, use the Read tool on the output file(s) so the user can see them inline. For \`responsive\` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user.
12. **Never refuse to use the browser.** When the user invokes /qa or /qa-only, they are requesting browser-based testing. Never suggest evals, unit tests, or other alternatives as a substitute. Even if the diff appears to have no UI changes, backend changes affect app behavior — always open the browser and test.`;
}
export function generateCoAuthorTrailer(ctx: TemplateContext): string {
if (ctx.host === 'codex') {
return 'Co-Authored-By: OpenAI Codex <noreply@openai.com>';
}
if (ctx.host === 'factory') {
return 'Co-Authored-By: Factory Droid <droid@users.noreply.github.com>';
}
return 'Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>';
}
export function generateChangelogWorkflow(_ctx: TemplateContext): string {
return `## CHANGELOG (auto-generate)
1. Read \`CHANGELOG.md\` header to know the format.
2. **First, enumerate every commit on the branch:**
\`\`\`bash
git log <base>..HEAD --oneline
\`\`\`
Copy the full list. Count the commits. You will use this as a checklist.
3. **Read the full diff** to understand what each commit actually changed:
\`\`\`bash
git diff <base>...HEAD
\`\`\`
4. **Group commits by theme** before writing anything. Common themes:
- New features / capabilities
- Performance improvements
- Bug fixes
- Dead code removal / cleanup
- Infrastructure / tooling / tests
- Refactoring
5. **Write the CHANGELOG entry** covering ALL groups:
- If existing CHANGELOG entries on the branch already cover some commits, replace them with one unified entry for the new version
- Categorize changes into applicable sections:
- \`### Added\` — new features
- \`### Changed\` — changes to existing functionality
- \`### Fixed\` — bug fixes
- \`### Removed\` — removed features
- Write concise, descriptive bullet points
- Insert after the file header (line 5), dated today
- Format: \`## [X.Y.Z.W] - YYYY-MM-DD\`
- **Voice:** Lead with what the user can now **do** that they couldn't before. Use plain language, not implementation details. Never mention TODOS.md, internal tracking, or contributor-facing details.
6. **Cross-check:** Compare your CHANGELOG entry against the commit list from step 2.
Every commit must map to at least one bullet point. If any commit is unrepresented,
add it now. If the branch has N commits spanning K themes, the CHANGELOG must
reflect all K themes.
**Do NOT ask the user to describe changes.** Infer from the diff and commit history.`;
}