Files
gstack/SKILL.md.tmpl
T
Garry Tan 66c09644a7 feat: composable skills — INVOKE_SKILL resolver + factoring infrastructure (v0.13.7.0) (#644)
* feat: add parameterized resolver support to gen-skill-docs

Extend the placeholder regex from {{WORD}} to {{WORD:arg1:arg2}},
enabling parameterized resolvers like {{INVOKE_SKILL:plan-ceo-review}}.

- Widen ResolverFn type to accept optional args?: string[]
- Update RESOLVERS record to use ResolverFn type
- Both replacement and unresolved-check regexes updated
- Fully backward compatible: existing {{WORD}} patterns unchanged

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add INVOKE_SKILL resolver for composable skill loading

New composition.ts resolver module that emits prose instructing Claude
to read another skill's SKILL.md and follow it, skipping preamble
sections. Supports optional skip= parameter for additional sections.

Usage: {{INVOKE_SKILL:plan-ceo-review}} or
       {{INVOKE_SKILL:plan-ceo-review:skip=Outside Voice}}

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: use frontmatter name: for skill symlinks and Codex paths

Patch all 3 name-derivation paths to read name: from SKILL.md
frontmatter instead of relying solely on directory basenames.
This enables directory names that differ from invocation names
(e.g., run-tests/ directory with name: test).

- setup: link_claude_skill_dirs reads name: via grep, falls back to basename
- gen-skill-docs.ts: codexSkillName uses frontmatter name for Codex output paths
- gen-skill-docs.ts: moved frontmatter extraction before Codex path logic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: extract CHANGELOG_WORKFLOW resolver from /ship

Move changelog generation logic into a reusable resolver. The resolver
is changelog-only (no version bump per Codex review recommendation).
Adds voice rules inline. /ship Step 5 now uses {{CHANGELOG_WORKFLOW}}.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: use INVOKE_SKILL resolver for plan-ceo-review office-hours fallback

Replace inline skill loading prose (read file, skip sections) with
{{INVOKE_SKILL:office-hours}} in the mid-session detection path.
The BENEFITS_FROM prerequisite offer is unchanged (separate use case).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: BENEFITS_FROM resolver delegates to INVOKE_SKILL

Eliminate duplicated skip-list logic by having generateBenefitsFrom
call generateInvokeSkill internally. The wrapper (AskUserQuestion,
design doc re-check) stays in BENEFITS_FROM. The loading instructions
(read file, skip sections, error handling) come from INVOKE_SKILL.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add resolver tests for INVOKE_SKILL, CHANGELOG_WORKFLOW, parameterized args

12 new tests covering:
- INVOKE_SKILL: template placeholder, default skip list, error handling,
  BENEFITS_FROM delegation
- CHANGELOG_WORKFLOW: content, cross-check, voice guidance, format
- Parameterized resolver infra: colon-separated args processing,
  no unresolved placeholders across all generated SKILL.md files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.7.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: journey routing tests — CLAUDE.md routing rules + stronger descriptions

Three journey E2E tests (ideation, ship, debug) were failing because
Claude answered directly instead of invoking the Skill tool. Root cause:
skill descriptions in system-reminder are too weak to override Claude's
default behavior for tasks it can handle natively.

Fix has two parts:
1. CLAUDE.md routing rules in test workdir — Claude weighs project-level
   instructions higher than skill description metadata
2. "Proactively invoke" (not "suggest") in office-hours, investigate,
   ship descriptions — reinforces the routing signal

10/10 journey tests now pass (was 7/10).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: one-time CLAUDE.md routing injection prompt

Add a preamble section that checks if the project's CLAUDE.md has
skill routing rules. If not (and user hasn't declined), asks once
via AskUserQuestion to inject a "## Skill routing" section.

Root cause: skill descriptions in system-reminder metadata are too
weak to reliably trigger proactive Skill tool invocation. CLAUDE.md
project instructions carry higher weight in Claude's decision making.

- Preamble bash checks for "## Skill routing" in CLAUDE.md
- Stores decline in gstack-config (routing_declined=true)
- Only asks once per project (HAS_ROUTING check + config check)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: annotated config file + routing injection tests

gstack-config now writes a documented header on first config creation
with every supported key explained (proactive, telemetry, auto_upgrade,
skill_prefix, routing_declined, codex_reviews, skip_eng_review, etc.).
Users can edit ~/.gstack/config.yaml directly, anytime.

Also fixes grep to use ^KEY: anchoring so commented header lines don't
shadow real config values.

Tests added:
- 7 new gstack-config tests (annotated header, no duplication, comment
  safety, routing_declined get/set/reset)
- 6 new gen-skill-docs tests (preamble routing injection: bash checks,
  config reads, AskUserQuestion, decline persistence, routing rules)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump to v0.13.9.0, separate CHANGELOG from main's releases

Split our branch's changes into a new 0.13.9.0 entry instead of
jamming them into 0.13.7.0 which already landed on main as
"Community Wave."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: clarify branch-scoped VERSION/CHANGELOG after merging main

Add explicit rules: merging main doesn't mean adopting main's version.
Branch always gets its own entry on top with a higher version number.
Three-point checklist after every merge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: put our 0.13.9.0 entry on top of CHANGELOG

Newest version goes on top. Our branch lands next, so our entry
must be above main's 0.13.8.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restore missing 0.13.7.0 Community Wave entry

Accidentally dropped the 0.13.7.0 entry when reordering.
All entries now present: 0.13.9.0 > 0.13.8.0 > 0.13.7.0 > 0.13.6.0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add CHANGELOG integrity check rule

After any edit that moves/adds/removes entries, grep for version
headers and verify no gaps or duplicates before committing.
Prevents accidentally dropping entries during reordering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 23:35:17 -06:00

286 lines
9.1 KiB
Cheetah

---
name: gstack
preamble-tier: 1
version: 1.1.0
description: |
Fast headless browser for QA testing and site dogfooding. Navigate pages, interact with
elements, verify state, diff before/after, take annotated screenshots, test responsive
layouts, forms, uploads, dialogs, and capture bug evidence. Use when asked to open or
test a site, verify a deployment, dogfood a user flow, or file a bug with screenshots. (gstack)
allowed-tools:
- Bash
- Read
- AskUserQuestion
---
{{PREAMBLE}}
If `PROACTIVE` is `false`: do NOT proactively invoke or suggest other gstack skills during
this session. Only run skills the user explicitly invokes. This preference persists across
sessions via `gstack-config`.
If `PROACTIVE` is `true` (default): **invoke the Skill tool** when the user's request
matches a skill's purpose. Do NOT answer directly when a skill exists for the task.
Use the Skill tool to invoke it. The skill has specialized workflows, checklists, and
quality gates that produce better results than answering inline.
**Routing rules — when you see these patterns, INVOKE the skill via the Skill tool:**
- User describes a new idea, asks "is this worth building", wants to brainstorm → invoke `/office-hours`
- User asks about strategy, scope, ambition, "think bigger" → invoke `/plan-ceo-review`
- User asks to review architecture, lock in the plan → invoke `/plan-eng-review`
- User asks about design system, brand, visual identity → invoke `/design-consultation`
- User asks to review design of a plan → invoke `/plan-design-review`
- User wants all reviews done automatically → invoke `/autoplan`
- User reports a bug, error, broken behavior, asks "why is this broken" → invoke `/investigate`
- User asks to test the site, find bugs, QA → invoke `/qa`
- User asks to review code, check the diff, pre-landing review → invoke `/review`
- User asks about visual polish, design audit of a live site → invoke `/design-review`
- User asks to ship, deploy, push, create a PR → invoke `/ship`
- User asks to update docs after shipping → invoke `/document-release`
- User asks for a weekly retro, what did we ship → invoke `/retro`
- User asks for a second opinion, codex review → invoke `/codex`
- User asks for safety mode, careful mode → invoke `/careful` or `/guard`
- User asks to restrict edits to a directory → invoke `/freeze` or `/unfreeze`
- User asks to upgrade gstack → invoke `/gstack-upgrade`
**Do NOT answer the user's question directly when a matching skill exists.** The skill
provides a structured, multi-step workflow that is always better than an ad-hoc answer.
Invoke the skill first. If no skill matches, answer directly as usual.
If the user opts out of suggestions, run `gstack-config set proactive false`.
If they opt back in, run `gstack-config set proactive true`.
# gstack browse: QA Testing & Dogfooding
Persistent headless Chromium. First call auto-starts (~3s), then ~100-200ms per command.
Auto-shuts down after 30 min idle. State persists between calls (cookies, tabs, sessions).
{{BROWSE_SETUP}}
## IMPORTANT
- Use the compiled binary via Bash: `$B <command>`
- NEVER use `mcp__claude-in-chrome__*` tools. They are slow and unreliable.
- Browser persists between calls — cookies, login sessions, and tabs carry over.
- Dialogs (alert/confirm/prompt) are auto-accepted by default — no browser lockup.
- **Show screenshots:** After `$B screenshot`, `$B snapshot -a -o`, or `$B responsive`, always use the Read tool on the output PNG(s) so the user can see them. Without this, screenshots are invisible.
## QA Workflows
> **Credential safety:** Use environment variables for test credentials.
> Set them before running: `export TEST_EMAIL="..." TEST_PASSWORD="..."`
### Test a user flow (login, signup, checkout, etc.)
```bash
# 1. Go to the page
$B goto https://app.example.com/login
# 2. See what's interactive
$B snapshot -i
# 3. Fill the form using refs
$B fill @e3 "$TEST_EMAIL"
$B fill @e4 "$TEST_PASSWORD"
$B click @e5
# 4. Verify it worked
$B snapshot -D # diff shows what changed after clicking
$B is visible ".dashboard" # assert the dashboard appeared
$B screenshot /tmp/after-login.png
```
### Verify a deployment / check prod
```bash
$B goto https://yourapp.com
$B text # read the page — does it load?
$B console # any JS errors?
$B network # any failed requests?
$B js "document.title" # correct title?
$B is visible ".hero-section" # key elements present?
$B screenshot /tmp/prod-check.png
```
### Dogfood a feature end-to-end
```bash
# Navigate to the feature
$B goto https://app.example.com/new-feature
# Take annotated screenshot — shows every interactive element with labels
$B snapshot -i -a -o /tmp/feature-annotated.png
# Find ALL clickable things (including divs with cursor:pointer)
$B snapshot -C
# Walk through the flow
$B snapshot -i # baseline
$B click @e3 # interact
$B snapshot -D # what changed? (unified diff)
# Check element states
$B is visible ".success-toast"
$B is enabled "#next-step-btn"
$B is checked "#agree-checkbox"
# Check console for errors after interactions
$B console
```
### Test responsive layouts
```bash
# Quick: 3 screenshots at mobile/tablet/desktop
$B goto https://yourapp.com
$B responsive /tmp/layout
# Manual: specific viewport
$B viewport 375x812 # iPhone
$B screenshot /tmp/mobile.png
$B viewport 1440x900 # Desktop
$B screenshot /tmp/desktop.png
# Element screenshot (crop to specific element)
$B screenshot "#hero-banner" /tmp/hero.png
$B snapshot -i
$B screenshot @e3 /tmp/button.png
# Region crop
$B screenshot --clip 0,0,800,600 /tmp/above-fold.png
# Viewport only (no scroll)
$B screenshot --viewport /tmp/viewport.png
```
### Test file upload
```bash
$B goto https://app.example.com/upload
$B snapshot -i
$B upload @e3 /path/to/test-file.pdf
$B is visible ".upload-success"
$B screenshot /tmp/upload-result.png
```
### Test forms with validation
```bash
$B goto https://app.example.com/form
$B snapshot -i
# Submit empty — check validation errors appear
$B click @e10 # submit button
$B snapshot -D # diff shows error messages appeared
$B is visible ".error-message"
# Fill and resubmit
$B fill @e3 "valid input"
$B click @e10
$B snapshot -D # diff shows errors gone, success state
```
### Test dialogs (delete confirmations, prompts)
```bash
# Set up dialog handling BEFORE triggering
$B dialog-accept # will auto-accept next alert/confirm
$B click "#delete-button" # triggers confirmation dialog
$B dialog # see what dialog appeared
$B snapshot -D # verify the item was deleted
# For prompts that need input
$B dialog-accept "my answer" # accept with text
$B click "#rename-button" # triggers prompt
```
### Test authenticated pages (import real browser cookies)
```bash
# Import cookies from your real browser (opens interactive picker)
$B cookie-import-browser
# Or import a specific domain directly
$B cookie-import-browser comet --domain .github.com
# Now test authenticated pages
$B goto https://github.com/settings/profile
$B snapshot -i
$B screenshot /tmp/github-profile.png
```
> **Cookie safety:** `cookie-import-browser` transfers real session data.
> Only import cookies from browsers you control.
### Compare two pages / environments
```bash
$B diff https://staging.app.com https://prod.app.com
```
### Multi-step chain (efficient for long flows)
```bash
echo '[
["goto","https://app.example.com"],
["snapshot","-i"],
["fill","@e3","$TEST_EMAIL"],
["fill","@e4","$TEST_PASSWORD"],
["click","@e5"],
["snapshot","-D"],
["screenshot","/tmp/result.png"]
]' | $B chain
```
## Quick Assertion Patterns
```bash
# Element exists and is visible
$B is visible ".modal"
# Button is enabled/disabled
$B is enabled "#submit-btn"
$B is disabled "#submit-btn"
# Checkbox state
$B is checked "#agree"
# Input is editable
$B is editable "#name-field"
# Element has focus
$B is focused "#search-input"
# Page contains text
$B js "document.body.textContent.includes('Success')"
# Element count
$B js "document.querySelectorAll('.list-item').length"
# Specific attribute value
$B attrs "#logo" # returns all attributes as JSON
# CSS property
$B css ".button" "background-color"
```
## Snapshot System
{{SNAPSHOT_FLAGS}}
## Command Reference
{{COMMAND_REFERENCE}}
## Tips
1. **Navigate once, query many times.** `goto` loads the page; then `text`, `js`, `screenshot` all hit the loaded page instantly.
2. **Use `snapshot -i` first.** See all interactive elements, then click/fill by ref. No CSS selector guessing.
3. **Use `snapshot -D` to verify.** Baseline → action → diff. See exactly what changed.
4. **Use `is` for assertions.** `is visible .modal` is faster and more reliable than parsing page text.
5. **Use `snapshot -a` for evidence.** Annotated screenshots are great for bug reports.
6. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses.
7. **Check `console` after actions.** Catch JS errors that don't surface visually.
8. **Use `chain` for long flows.** Single command, no per-step CLI overhead.