Files
gstack/SKILL.md.tmpl
T
Garry Tan f8bb59094d v1.47.0.0 feat: /spec — author backlog-ready spec in 5 phases + optional agent spawn (#1698) (#1733)
* feat(issue): add /issue skill for backlog-ready GitHub issue authoring

Interrogates an ambiguous request through five strict phases (why, scope,
technical, draft, final) and produces a GitHub issue precise enough that an
unfamiliar engineer or AI agent can execute it without follow-up. Slots in
after /office-hours (when the idea has passed the "worth building" bar) and
before /plan-eng-review (which assumes a plan already exists).

- issue/SKILL.md.tmpl + generated SKILL.md
- routing entry in root SKILL.md.tmpl
- llms.txt regenerated to include the new skill

* chore(spec): rename /issue → /spec + fix duplicate analytics block

Foundation commit for the /spec skill (extends PR #1698 by @jayzalowitz).

- Renames issue/ → spec/ (template + generated)
- Removes the hand-rolled analytics block in spec/SKILL.md.tmpl (lines 46-49 of the original); {{PREAMBLE}} already emits the analytics write with the telemetry opt-out guard, so the duplicate would have bypassed gstack-config set telemetry off
- Updates frontmatter (name: spec, expanded description with magical-moment preview, triggers reordered to lead with "spec this out")
- Updates root SKILL.md.tmpl routing entry → /spec
- Regenerates spec/SKILL.md and gstack/llms.txt via bun run gen:skill-docs

Co-Authored-By: Jay Zalowitz <jayzalowitz@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(spec): expansions — flags, archive, quality gate, plan-mode-aware Phase 5, /ship integration, tests

Builds on the @jayzalowitz foundation (commit a4e6ee38) with the full
expansion set from CEO + Eng + DX review (24 user decisions + 23 of 28
codex adversarial findings).

spec/SKILL.md.tmpl additions:
- Flag reference table (--dedupe / --no-gate / --audit / --execute /
  --no-execute / --file-only / --plan-file / --sync-archive).
- Phase 1b --dedupe (default ON): gh issue list --search with graceful
  skip on gh-not-installed / unauthed / rate-limited / other errors.
  AskUserQuestion when matches found (merge / file-new / cancel).
- Phase 3 HARD requirement: agent MUST grep/read at least one piece of
  evidence before asking. Project-level fallback prose for prompts with
  no concrete file mapping. Greenfield escape clause.
- Phase 4.5 quality gate (default ON): codex adversarial dispatch with
  fail-closed redaction (AWS/GitHub/Anthropic/OpenAI/private-key regex),
  hard <<<USER_SPEC>>> delimiters + instruction boundary (prompt-injection
  defense), score 0-10 with <7 block, up to 3 iterations, AskUserQuestion
  escape on persistent <7 (ship anyway / save draft / one more try).
- Phase 5 plan-mode-aware dispatch: reads GSTACK_PLAN_MODE env. Active
  → file-only + load into plan file. Inactive → file + --execute spawn
  by default. CLI overrides for explicit control.
- Archive block via eval $(gstack-paths) → $GSTACK_STATE_ROOT/projects/
  $SLUG/specs/<datetime>-<pid>-<slug>.md. Atomic .tmp/mv write. Sync
  excluded by default; --sync-archive to opt in.
- --execute path: dirty-worktree gate (porcelain check + 3-option AUQ
  continue/stash/cancel), TOCTOU re-check after AUQ answer, SHA pin
  via git rev-parse HEAD, unique branch spec/<slug>-$$ + PID-suffixed
  worktree, mandatory final-confirm gate, stash policy with restore
  safety (preserve ref, never auto-drop).
- TTHW timestamps captured at Phase 1 / first citation / file-or-spawn,
  emitted as ttfc_ms + tthw_ms in preamble telemetry envelope.

Cross-system plumbing:
- scripts/resolvers/preamble/generate-preamble-bash.ts: emit
  GSTACK_PLAN_MODE=active|inactive based on CLAUDE_PLAN_FILE presence.
- scripts/resolvers/preamble/generate-routing-injection.ts: add /spec
  to the routing block injected into project CLAUDE.md.
- ship/SKILL.md.tmpl: new "Linked Spec" PR-body section. Reads archive
  frontmatter spec_issue_number and adds Closes #N when full delivery
  confirmed by existing plan-completion gate (codex F4 — conditional).
  Branch-name inference NOT used (codex F3 — fragile under rebase).

Tests (W7):
- test/spec-template-invariants.test.ts: 35 deterministic assertions
  covering Phase 1 hard gate, Phase 3 hard-grep mandate, --dedupe
  graceful-skip paths, --execute race + security hardening (TOCTOU,
  SHA pin, unique branch), quality-gate redaction + BLOCKED path,
  archive atomic write + sync exclusion, plan-mode-aware Phase 5.
- test/spec-template-sync.test.ts: regen + byte-identical check.
- test/skill-e2e-spec-execute.test.ts (periodic-tier scaffold).
- test/skill-llm-eval-spec.test.ts (periodic-tier scaffold).
- test/helpers/touchfiles.ts: register both periodics in E2E_TIERS +
  LLM_JUDGE_TOUCHFILES.

37/37 /spec tests pass. Full bun test exit 0 (pre-existing
url-validation timeout unrelated to /spec).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: v1.45.0.0 — regen all SKILL.md, bump VERSION, CHANGELOG entry

Mechanical regen pulling in two template-side changes:
- /spec expansion (spec/SKILL.md picks up ~1100 new lines)
- {{PREAMBLE}} now echoes GSTACK_PLAN_MODE env (every skill picks up
  the new echo line in the preamble bash block)

VERSION 1.44.0.0 → 1.45.0.0 (MINOR per scale-aware rules: substantial
new capability — /spec skill with 5 CLI flags + race/security
hardening + plan-mode-aware Phase 5 + /ship integration).

CHANGELOG entry frames /spec as agent feedstock with the two-line
headline, "numbers that matter" table, and "what this means for
builders" close. Credits @jayzalowitz for the foundation contribution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(spec): register /spec in scripts/proactive-suggestions.json

Auto-generated by bun run gen:skill-docs after the v1.46 catalog-trim
contract picked up /spec's frontmatter. lead + routing extracted from
spec/SKILL.md.tmpl description: block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(spec): TODOS deferrals + package.json sync for v1.47.0.0

- TODOS.md: add P2 entry for /spec --epic mode (deferred from CEO SCOPE
  EXPANSION review), P3 entry for --dedupe semantic matching upgrade.
  Both have full context blocks so future picker can resume cold.
- package.json: bump 1.46.0.0 → 1.47.0.0 to match VERSION (was stale
  from the main merge; /ship Step 12 idempotency caught it).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: register /spec skill in README, AGENTS, CLAUDE.md project tree

Adds /spec to the three discoverability surfaces it was missing:
- README.md sprint skills table (between /autoplan and /learn)
- AGENTS.md plan-mode reviews table
- CLAUDE.md project structure tree (between /investigate and /retro)

/spec shipped in v1.47.0.0 with CHANGELOG coverage but the entry-point
docs hadn't been updated; a user landing on README or AGENTS would not
discover the skill exists without reading CHANGELOG.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Jay Zalowitz <jayzalowitz@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 21:36:53 -07:00

311 lines
11 KiB
Cheetah

---
name: gstack
preamble-tier: 1
version: 1.1.0
description: |
Fast headless browser for QA testing and site dogfooding. Navigate pages, interact with
elements, verify state, diff before/after, take annotated screenshots, test responsive
layouts, forms, uploads, dialogs, and capture bug evidence. Use when asked to open or
test a site, verify a deployment, dogfood a user flow, or file a bug with screenshots. (gstack)
allowed-tools:
- Bash
- Read
- AskUserQuestion
triggers:
- browse this page
- take a screenshot
- navigate to url
- inspect the page
---
{{PREAMBLE}}
If `PROACTIVE` is `false`: do NOT proactively invoke or suggest other gstack skills during
this session. Only run skills the user explicitly invokes. This preference persists across
sessions via `gstack-config`.
If `PROACTIVE` is `true` (default): **invoke the Skill tool** when the user's request
matches a skill's purpose. Do NOT answer directly when a skill exists for the task.
Use the Skill tool to invoke it. The skill has specialized workflows, checklists, and
quality gates that produce better results than answering inline.
**Routing rules — when you see these patterns, INVOKE the skill via the Skill tool:**
- User describes a new idea, asks "is this worth building", brainstorms, pitches a concept → invoke `/office-hours`
- User asks to spec something out, file an issue, write up a ticket, "turn this into a GitHub issue", "backlog item" → invoke `/spec`
- User asks about strategy, scope, ambition, "think bigger", "what should we build" → invoke `/plan-ceo-review`
- User asks to review architecture, lock in the plan, "does this design make sense" → invoke `/plan-eng-review`
- User asks about design system, brand, visual identity, "how should this look" → invoke `/design-consultation`
- User asks to review design of a plan → invoke `/plan-design-review`
- User asks about developer experience of a plan, API/CLI/SDK design → invoke `/plan-devex-review`
- User wants all reviews done automatically, "review everything" → invoke `/autoplan`
- User reports a bug, error, broken behavior, "why is this broken", "this doesn't work", "wtf", "something's wrong" → invoke `/investigate`
- User asks to test the site, find bugs, QA, "does this work", "check the deploy" → invoke `/qa`
- User asks to just report bugs without fixing → invoke `/qa-only`
- User asks to review code, check the diff, pre-landing review, "look at my changes" → invoke `/review`
- User asks about visual polish, design audit of a live site, "this looks off" → invoke `/design-review`
- User asks to audit the live developer experience, time-to-hello-world → invoke `/devex-review`
- User asks to ship, deploy, push, create a PR, "let's land this", "send it" → invoke `/ship`
- User asks to merge + deploy + verify as one flow → invoke `/land-and-deploy`
- User asks to configure deployment for the project → invoke `/setup-deploy`
- User asks to monitor prod after shipping, post-deploy checks → invoke `/canary`
- User asks to update docs after shipping → invoke `/document-release`
- User asks to write docs from scratch, generate documentation, "document this feature/module" → invoke `/document-generate`
- User asks for a weekly retro, what did we ship, "how'd we do" → invoke `/retro`
- User asks for a second opinion, codex review → invoke `/codex`
- User asks for safety mode, careful mode → invoke `/careful` or `/guard`
- User asks to restrict edits to a directory → invoke `/freeze` or `/unfreeze`
- User asks to upgrade gstack → invoke `/gstack-upgrade`
- User asks to save progress, checkpoint, "save my work" → invoke `/context-save`
- User asks to resume, restore, "where was I" → invoke `/context-restore`
- User asks about security, OWASP, vulnerabilities, "is this secure" → invoke `/cso`
- User asks to make a PDF, document, publication → invoke `/make-pdf`
- User asks to launch a real browser for QA, "open the browser" → invoke `/open-gstack-browser`
- User asks to import cookies for authenticated testing → invoke `/setup-browser-cookies`
- User asks about page speed, performance regression, benchmarks → invoke `/benchmark`
- User asks what gstack has learned, "show learnings" → invoke `/learn`
- User asks to tune question sensitivity, "stop asking me that" → invoke `/plan-tune`
- User asks for code quality dashboard, "health check" → invoke `/health`
**When in doubt, invoke the skill.** A false positive (invoking a skill that wasn't
needed) is cheaper than a false negative (answering ad-hoc when a structured workflow
exists). The skill provides multi-step workflows, checklists, and quality gates that
always produce better results than an ad-hoc answer. If no skill matches, answer
directly as usual.
If the user opts out of suggestions, run `gstack-config set proactive false`.
If they opt back in, run `gstack-config set proactive true`.
# gstack browse: QA Testing & Dogfooding
Persistent headless Chromium. First call auto-starts (~3s), then ~100-200ms per command.
Auto-shuts down after 30 min idle. State persists between calls (cookies, tabs, sessions).
{{BROWSE_SETUP}}
## IMPORTANT
- Use the compiled binary via Bash: `$B <command>`
- NEVER use `mcp__claude-in-chrome__*` tools. They are slow and unreliable.
- Browser persists between calls — cookies, login sessions, and tabs carry over.
- Dialogs (alert/confirm/prompt) are auto-accepted by default — no browser lockup.
- **Show screenshots:** After `$B screenshot`, `$B snapshot -a -o`, or `$B responsive`, always use the Read tool on the output PNG(s) so the user can see them. Without this, screenshots are invisible.
## QA Workflows
> **Credential safety:** Use environment variables for test credentials.
> Set them before running: `export TEST_EMAIL="..." TEST_PASSWORD="..."`
### Test a user flow (login, signup, checkout, etc.)
```bash
# 1. Go to the page
$B goto https://app.example.com/login
# 2. See what's interactive
$B snapshot -i
# 3. Fill the form using refs
$B fill @e3 "$TEST_EMAIL"
$B fill @e4 "$TEST_PASSWORD"
$B click @e5
# 4. Verify it worked
$B snapshot -D # diff shows what changed after clicking
$B is visible ".dashboard" # assert the dashboard appeared
$B screenshot /tmp/after-login.png
```
### Verify a deployment / check prod
```bash
$B goto https://yourapp.com
$B text # read the page — does it load?
$B console # any JS errors?
$B network # any failed requests?
$B js "document.title" # correct title?
$B is visible ".hero-section" # key elements present?
$B screenshot /tmp/prod-check.png
```
### Dogfood a feature end-to-end
```bash
# Navigate to the feature
$B goto https://app.example.com/new-feature
# Take annotated screenshot — shows every interactive element with labels
$B snapshot -i -a -o /tmp/feature-annotated.png
# Find ALL clickable things (including divs with cursor:pointer)
$B snapshot -C
# Walk through the flow
$B snapshot -i # baseline
$B click @e3 # interact
$B snapshot -D # what changed? (unified diff)
# Check element states
$B is visible ".success-toast"
$B is enabled "#next-step-btn"
$B is checked "#agree-checkbox"
# Check console for errors after interactions
$B console
```
### Test responsive layouts
```bash
# Quick: 3 screenshots at mobile/tablet/desktop
$B goto https://yourapp.com
$B responsive /tmp/layout
# Manual: specific viewport
$B viewport 375x812 # iPhone
$B screenshot /tmp/mobile.png
$B viewport 1440x900 # Desktop
$B screenshot /tmp/desktop.png
# Element screenshot (crop to specific element)
$B screenshot "#hero-banner" /tmp/hero.png
$B snapshot -i
$B screenshot @e3 /tmp/button.png
# Region crop
$B screenshot --clip 0,0,800,600 /tmp/above-fold.png
# Viewport only (no scroll)
$B screenshot --viewport /tmp/viewport.png
```
### Test file upload
```bash
$B goto https://app.example.com/upload
$B snapshot -i
$B upload @e3 /path/to/test-file.pdf
$B is visible ".upload-success"
$B screenshot /tmp/upload-result.png
```
### Test forms with validation
```bash
$B goto https://app.example.com/form
$B snapshot -i
# Submit empty — check validation errors appear
$B click @e10 # submit button
$B snapshot -D # diff shows error messages appeared
$B is visible ".error-message"
# Fill and resubmit
$B fill @e3 "valid input"
$B click @e10
$B snapshot -D # diff shows errors gone, success state
```
### Test dialogs (delete confirmations, prompts)
```bash
# Set up dialog handling BEFORE triggering
$B dialog-accept # will auto-accept next alert/confirm
$B click "#delete-button" # triggers confirmation dialog
$B dialog # see what dialog appeared
$B snapshot -D # verify the item was deleted
# For prompts that need input
$B dialog-accept "my answer" # accept with text
$B click "#rename-button" # triggers prompt
```
### Test authenticated pages (import real browser cookies)
```bash
# Import cookies from your real browser (opens interactive picker)
$B cookie-import-browser
# Or import a specific domain directly
$B cookie-import-browser comet --domain .github.com
# Now test authenticated pages
$B goto https://github.com/settings/profile
$B snapshot -i
$B screenshot /tmp/github-profile.png
```
> **Cookie safety:** `cookie-import-browser` transfers real session data.
> Only import cookies from browsers you control.
### Compare two pages / environments
```bash
$B diff https://staging.app.com https://prod.app.com
```
### Multi-step chain (efficient for long flows)
```bash
echo '[
["goto","https://app.example.com"],
["snapshot","-i"],
["fill","@e3","$TEST_EMAIL"],
["fill","@e4","$TEST_PASSWORD"],
["click","@e5"],
["snapshot","-D"],
["screenshot","/tmp/result.png"]
]' | $B chain
```
## Quick Assertion Patterns
```bash
# Element exists and is visible
$B is visible ".modal"
# Button is enabled/disabled
$B is enabled "#submit-btn"
$B is disabled "#submit-btn"
# Checkbox state
$B is checked "#agree"
# Input is editable
$B is editable "#name-field"
# Element has focus
$B is focused "#search-input"
# Page contains text
$B js "document.body.textContent.includes('Success')"
# Element count
$B js "document.querySelectorAll('.list-item').length"
# Specific attribute value
$B attrs "#logo" # returns all attributes as JSON
# CSS property
$B css ".button" "background-color"
```
## Snapshot System
{{SNAPSHOT_FLAGS}}
## Command Reference
{{COMMAND_REFERENCE}}
## Tips
1. **Navigate once, query many times.** `goto` loads the page; then `text`, `js`, `screenshot` all hit the loaded page instantly.
2. **Use `snapshot -i` first.** See all interactive elements, then click/fill by ref. No CSS selector guessing.
3. **Use `snapshot -D` to verify.** Baseline → action → diff. See exactly what changed.
4. **Use `is` for assertions.** `is visible .modal` is faster and more reliable than parsing page text.
5. **Use `snapshot -a` for evidence.** Annotated screenshots are great for bug reports.
6. **Use `snapshot -C` for tricky UIs.** Finds clickable divs that the accessibility tree misses.
7. **Check `console` after actions.** Catch JS errors that don't surface visually.
8. **Use `chain` for long flows.** Single command, no per-step CLI overhead.