feat: /land-and-deploy, /canary, /benchmark + perf review (v0.7.0) (#183)

* feat: add /canary, /benchmark, /land-and-deploy skills (v0.7.0)

Three new skills that close the deploy loop:
- /canary: standalone post-deploy monitoring with browse daemon
- /benchmark: performance regression detection with Web Vitals
- /land-and-deploy: merge PR, wait for deploy, canary verify production

Incorporates patterns from community PR #151.

Co-Authored-By: HMAKT99 <HMAKT99@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add Performance & Bundle Impact category to review checklist

New Pass 2 (INFORMATIONAL) category catching heavy dependencies
(moment.js, lodash full), missing lazy loading, synchronous scripts,
CSS @import blocking, fetch waterfalls, and tree-shaking breaks.

Both /review and /ship automatically pick this up via checklist.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add {{DEPLOY_BOOTSTRAP}} resolver + deployed row in dashboard

- New generateDeployBootstrap() resolver auto-detects deploy platform
  (Vercel, Netlify, Fly.io, GH Actions, etc.), production URL, and
  merge method. Persists to CLAUDE.md like test bootstrap.
- Review Readiness Dashboard now shows a "Deployed" row from
  /land-and-deploy JSONL entries (informational, never gates shipping).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: mark 3 TODOs completed, bump v0.7.0, update CHANGELOG

Superseded by /land-and-deploy:
- /merge skill — review-gated PR merge
- Deploy-verify skill
- Post-deploy verification (ship + browse)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /setup-deploy skill + platform-specific deploy verification

- New /setup-deploy skill: interactive guided setup for deploy configuration.
  Detects Fly.io, Render, Vercel, Netlify, Heroku, Railway, GitHub Actions,
  and custom deploy scripts. Writes config to CLAUDE.md with custom hooks
  section for non-standard setups.

- Enhanced deploy bootstrap: platform-specific URL resolution (fly.toml app
  → {app}.fly.dev, render.yaml → {service}.onrender.com, etc.), deploy
  status commands (fly status, heroku releases), and custom deploy hooks
  section in CLAUDE.md for manual/scripted deploys.

- Platform-specific deploy verification in /land-and-deploy Step 6:
  Strategy A (GitHub Actions polling), Strategy B (platform CLI: fly/render/heroku),
  Strategy C (auto-deploy: vercel/netlify), Strategy D (custom hooks from CLAUDE.md).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: E2E + LLM-judge evals for deploy skills

- 4 E2E tests: land-and-deploy (Fly.io detection + deploy report),
  canary (monitoring report structure), benchmark (perf report schema),
  setup-deploy (platform detection → CLAUDE.md config)
- 4 LLM-judge evals: workflow quality for all 4 new skills
- Touchfile entries for diff-based test selection (E2E + LLM-judge)
- 460 free tests pass, 0 fail

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: harden E2E tests — server lifecycle, timeouts, preamble budget, skip flaky

Cross-cutting fixes:
- Pre-seed ~/.gstack/.completeness-intro-seen and ~/.gstack/.telemetry-prompted
  so preamble doesn't burn 3-7 turns on lake intro + telemetry in every test
- Each describe block creates its own test server instance instead of sharing
  a global that dies between suites

Test fixes (5 tests):
- /qa quick: own server instance + preamble skip
- /review SQL injection: timeout 90→180s, maxTurns 15→20, added assertion
  that review output actually mentions SQL injection
- /review design-lite: maxTurns 25→35 + preamble skip (now detects 7/7)
- ship-base-branch: both timeouts 90→150/180s + preamble skip
- plan-eng artifact: clean stale state in beforeAll, maxTurns 20→25

Skipped (4 flaky/redundant tests):
- contributor-mode: tests prompt compliance, not skill functionality
- design-consultation-research: WebSearch-dependent, redundant with core
- design-consultation-preview: redundant with core test
- /qa bootstrap: too ambitious (65 turns, installs vitest)

Also: preamble skip added to qa-only, qa-fix-loop, design-consultation-core,
and design-consultation-existing prompts. Updated touchfiles entries and
touchfiles.test.ts. Added honest comment to codex-review-findings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: redesign 6 skipped/todo E2E tests + add test.concurrent support

Redesigned tests (previously skipped/todo):
- contributor-mode: pre-fail approach, 5 turns/30s (was 10 turns/90s)
- design-consultation-research: WebSearch-only, 8 turns/90s (was 45/480s)
- design-consultation-preview: preview HTML only, 8 turns/90s (was 30/480s)
- qa-bootstrap: bootstrap-only, 12 turns/90s (was 65/420s)
- /ship workflow: local bare remote, 15 turns/120s (was test.todo)
- /setup-browser-cookies: browser detection smoke, 5 turns/45s (was test.todo)

Added testConcurrentIfSelected() helper for future parallelization.
Updated touchfiles entries for all 6 re-enabled tests.

Target: 0 skip, 0 todo, 0 fail across all E2E tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: relax contributor-mode assertions — test structure not exact phrasing

* perf: enable test.concurrent for 31 independent E2E tests

Convert 18 skill-e2e, 11 routing, and 2 codex tests from sequential
to test.concurrent. Only design-consultation tests (4) remain sequential
due to shared designDir state. Expected ~6x speedup on Teams high-burst.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add --concurrent flag to bun test + convert remaining 4 sequential tests

bun's test.concurrent only works within a describe block, not across
describe blocks. Adding --concurrent to the CLI command makes ALL tests
concurrent regardless of describe boundaries. Also converted the 4
design-consultation tests to concurrent (each already independent).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: split monolithic E2E test into 8 parallel files

Split test/skill-e2e.test.ts (3442 lines) into 8 category files:
- skill-e2e-browse.test.ts (7 tests)
- skill-e2e-review.test.ts (7 tests)
- skill-e2e-qa-bugs.test.ts (3 tests)
- skill-e2e-qa-workflow.test.ts (4 tests)
- skill-e2e-plan.test.ts (6 tests)
- skill-e2e-design.test.ts (7 tests)
- skill-e2e-workflow.test.ts (6 tests)
- skill-e2e-deploy.test.ts (4 tests)

Bun runs each file in its own worker = 10 parallel workers
(8 split + routing + codex). Expected: 78 min → ~12 min.

Extracted shared helpers to test/helpers/e2e-helpers.ts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: bump default E2E concurrency to 15

* perf: add model pinning infrastructure + rate-limit telemetry to E2E runner

Default E2E model changed from Opus to Sonnet (5x faster, 5x cheaper).
Session runner now accepts `model` option with EVALS_MODEL env var override.
Added timing telemetry (first_response_ms, max_inter_turn_ms) and wall_clock_ms
to eval-store for diagnosing rate-limit impact. Added EVALS_FAST test filtering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve 3 E2E test failures — tmpdir race, wasted turns, brittle assertions

plan-design-review-plan-mode: give each test its own tmpdir to eliminate
race condition where concurrent tests pollute each other's working directory.

ship-local-workflow: inline ship workflow steps in prompt instead of having
agent read 700+ line SKILL.md (was wasting 6 of 15 turns on file I/O).

design-consultation-core: replace exact section name matching with fuzzy
synonym-based matching (e.g. "Colors" matches "Color", "Type System"
matches "Typography"). All 7 sections still required, LLM judge still hard fail.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: pin quality tests to Opus, add --retry 2 and test:e2e:fast tier

~10 quality-sensitive tests (planted-bug detection, design quality judge,
strategic review, retro analysis) explicitly pinned to Opus. ~30 structure
tests default to Sonnet for 5x speed improvement.

Added --retry 2 to all E2E scripts for flaky test resilience.
Added test:e2e:fast script that excludes 8 slowest tests for quick feedback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: mark E2E model pinning TODO as shipped

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add SKILL.md merge conflict directive to CLAUDE.md

When resolving merge conflicts on generated SKILL.md files, always merge
the .tmpl templates first, then regenerate — never accept either side's
generated output directly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add DEPLOY_BOOTSTRAP resolver to gen-skill-docs

The land-and-deploy template referenced {{DEPLOY_BOOTSTRAP}} but no resolver
existed, causing gen-skill-docs to fail. Added generateDeployBootstrap() that
generates the deploy config detection bash block (check CLAUDE.md for persisted
config, auto-detect platform from config files, detect deploy workflows).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files after DEPLOY_BOOTSTRAP fix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: move prompt temp file outside workingDirectory to prevent race condition

The .prompt-tmp file was written inside workingDirectory, which gets deleted
by afterAll cleanup. With --concurrent --retry, afterAll can interleave with
retries, causing "No such file or directory" crashes at 0s (seen in
review-design-lite and office-hours-spec-review).

Fix: write prompt file to os.tmpdir() with a unique suffix so it survives
directory cleanup. Also convert review-design-lite from describeE2E to
describeIfSelected for proper diff-based test selection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add --retry 2 --concurrent flags to test:evals scripts for consistency

test:evals and test:evals:all were missing the retry and concurrency flags
that test:e2e already had, causing inconsistent behavior between the two
script families.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: HMAKT99 <HMAKT99@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-21 14:31:36 -07:00
committed by GitHub
parent 8321115a4e
commit 00bc482fe1
42 changed files with 9183 additions and 3137 deletions
+467
View File
@@ -0,0 +1,467 @@
---
name: benchmark
description: |
Performance regression detection using the browse daemon. Establishes
baselines for page load times, Core Web Vitals, and resource sizes.
Compares before/after on every PR. Tracks performance trends over time.
Use when: "performance", "benchmark", "page speed", "lighthouse", "web vitals",
"bundle size", "load time".
---
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## Preamble (run first)
```bash
_UPD=$(~/.codex/skills/gstack/bin/gstack-update-check 2>/dev/null || .agents/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
_CONTRIB=$(~/.codex/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
_PROACTIVE=$(~/.codex/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true")
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
echo "PROACTIVE: $_PROACTIVE"
_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no")
echo "LAKE_INTRO: $_LAKE_SEEN"
_TEL=$(~/.codex/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
_TEL_PROMPTED=$([ -f ~/.gstack/.telemetry-prompted ] && echo "yes" || echo "no")
_TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
mkdir -p ~/.gstack/analytics
echo '{"skill":"benchmark","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
```
If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke
them when the user explicitly asks. The user opted out of proactive suggestions.
If output shows `UPGRADE_AVAILABLE <old> <new>`: read `~/.codex/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED <from> <to>`: tell user "Running gstack v{to} (just updated!)" and continue.
If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle.
Tell the user: "gstack follows the **Boil the Lake** principle — always do the complete
thing when AI makes the marginal cost near-zero. Read more: https://garryslist.org/posts/boil-the-ocean"
Then offer to open the essay in their default browser:
```bash
open https://garryslist.org/posts/boil-the-ocean
touch ~/.gstack/.completeness-intro-seen
```
Only run `open` if the user says yes. Always run `touch` to mark as seen. This only happens once.
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
touch ~/.gstack/.telemetry-prompted
```
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
1. **Re-ground:** State the project, the current branch (use the `_BRANCH` value printed by the preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences)
2. **Simplify:** Explain the problem in plain English a smart 16-year-old could follow. No raw function names, no internal jargon, no implementation details. Use concrete examples and analogies. Say what it DOES, not what it's called.
3. **Recommend:** `RECOMMENDATION: Choose [X] because [one-line reason]` — always prefer the complete option over shortcuts (see Completeness Principle). Include `Completeness: X/10` for each option. Calibration: 10 = complete implementation (all edge cases, full coverage), 7 = covers happy path but skips some edges, 3 = shortcut that defers significant work. If both options are 8+, pick the higher; if one is ≤5, flag it.
4. **Options:** Lettered options: `A) ... B) ... C) ...` — when an option involves effort, show both scales: `(human: ~X / CC: ~Y)`
Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex.
Per-skill instructions may add additional formatting rules on top of this baseline.
## Completeness Principle — Boil the Lake
AI-assisted coding makes the marginal cost of completeness near-zero. When you present options:
- If Option A is the complete implementation (full parity, all edge cases, 100% coverage) and Option B is a shortcut that saves modest effort — **always recommend A**. The delta between 80 lines and 150 lines is meaningless with CC+gstack. "Good enough" is the wrong instinct when "complete" costs minutes more.
- **Lake vs. ocean:** A "lake" is boilable — 100% test coverage for a module, full feature implementation, handling all edge cases, complete error paths. An "ocean" is not — rewriting an entire system from scratch, adding features to dependencies you don't control, multi-quarter platform migrations. Recommend boiling lakes. Flag oceans as out of scope.
- **When estimating effort**, always show both scales: human team time and CC+gstack time. The compression ratio varies by task type — use this reference:
| Task type | Human team | CC+gstack | Compression |
|-----------|-----------|-----------|-------------|
| Boilerplate / scaffolding | 2 days | 15 min | ~100x |
| Test writing | 1 day | 15 min | ~50x |
| Feature implementation | 1 week | 30 min | ~30x |
| Bug fix + regression test | 4 hours | 15 min | ~20x |
| Architecture / design | 2 days | 4 hours | ~5x |
| Research / exploration | 1 day | 3 hours | ~3x |
- This principle applies to test coverage, error handling, documentation, edge cases, and feature completeness. Don't skip the last 10% to "save time" — with AI, that 10% costs seconds.
**Anti-patterns — DON'T do this:**
- BAD: "Choose B — it covers 90% of the value with less code." (If A is only 70 lines more, choose A.)
- BAD: "We can skip edge case handling to save time." (Edge case handling costs minutes with CC.)
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
```
# {Title}
Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened}
**My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce
1. {step}
## Raw output
```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
```
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
## Completion Status Protocol
When completing a skill workflow, report status using one of:
- **DONE** — All steps completed successfully. Evidence provided for each claim.
- **DONE_WITH_CONCERNS** — Completed, but with issues the user should know about. List each concern.
- **BLOCKED** — Cannot proceed. State what is blocking and what was tried.
- **NEEDS_CONTEXT** — Missing information required to continue. State exactly what you need.
### Escalation
It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."
Bad work is worse than no work. You will not be penalized for escalating.
- If you have attempted a task 3 times without success, STOP and escalate.
- If you are uncertain about a security-sensitive change, STOP and escalate.
- If the scope of work exceeds what you can verify, STOP and escalate.
Escalation format:
```
STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]
```
## Telemetry (run last)
After the skill workflow completes (success, error, or abort), log the telemetry event.
Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
Run this bash:
```bash
_TEL_END=$(date +%s)
_TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
never blocks the user.
## SETUP (run this check BEFORE any browse command)
```bash
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
B=""
[ -n "$_ROOT" ] && [ -x "$_ROOT/.agents/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.agents/skills/gstack/browse/dist/browse"
[ -z "$B" ] && B=~/.codex/skills/gstack/browse/dist/browse
if [ -x "$B" ]; then
echo "READY: $B"
else
echo "NEEDS_SETUP"
fi
```
If `NEEDS_SETUP`:
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
2. Run: `cd <SKILL_DIR> && ./setup`
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
# /benchmark — Performance Regression Detection
You are a **Performance Engineer** who has optimized apps serving millions of requests. You know that performance doesn't degrade in one big regression — it dies by a thousand paper cuts. Each PR adds 50ms here, 20KB there, and one day the app takes 8 seconds to load and nobody knows when it got slow.
Your job is to measure, baseline, compare, and alert. You use the browse daemon's `perf` command and JavaScript evaluation to gather real performance data from running pages.
## User-invocable
When the user types `/benchmark`, run this skill.
## Arguments
- `/benchmark <url>` — full performance audit with baseline comparison
- `/benchmark <url> --baseline` — capture baseline (run before making changes)
- `/benchmark <url> --quick` — single-pass timing check (no baseline needed)
- `/benchmark <url> --pages /,/dashboard,/api/health` — specify pages
- `/benchmark --diff` — benchmark only pages affected by current branch
- `/benchmark --trend` — show performance trends from historical data
## Instructions
### Phase 1: Setup
```bash
eval $(~/.codex/skills/gstack/bin/gstack-slug 2>/dev/null || echo "SLUG=unknown")
mkdir -p .gstack/benchmark-reports
mkdir -p .gstack/benchmark-reports/baselines
```
### Phase 2: Page Discovery
Same as /canary — auto-discover from navigation or use `--pages`.
If `--diff` mode:
```bash
git diff $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || gh repo view --json defaultBranchRef -q .defaultBranchRef.name 2>/dev/null || echo main)...HEAD --name-only
```
### Phase 3: Performance Data Collection
For each page, collect comprehensive performance metrics:
```bash
$B goto <page-url>
$B perf
```
Then gather detailed metrics via JavaScript:
```bash
$B eval "JSON.stringify(performance.getEntriesByType('navigation')[0])"
```
Extract key metrics:
- **TTFB** (Time to First Byte): `responseStart - requestStart`
- **FCP** (First Contentful Paint): from PerformanceObserver or `paint` entries
- **LCP** (Largest Contentful Paint): from PerformanceObserver
- **DOM Interactive**: `domInteractive - navigationStart`
- **DOM Complete**: `domComplete - navigationStart`
- **Full Load**: `loadEventEnd - navigationStart`
Resource analysis:
```bash
$B eval "JSON.stringify(performance.getEntriesByType('resource').map(r => ({name: r.name.split('/').pop().split('?')[0], type: r.initiatorType, size: r.transferSize, duration: Math.round(r.duration)})).sort((a,b) => b.duration - a.duration).slice(0,15))"
```
Bundle size check:
```bash
$B eval "JSON.stringify(performance.getEntriesByType('resource').filter(r => r.initiatorType === 'script').map(r => ({name: r.name.split('/').pop().split('?')[0], size: r.transferSize})))"
$B eval "JSON.stringify(performance.getEntriesByType('resource').filter(r => r.initiatorType === 'css').map(r => ({name: r.name.split('/').pop().split('?')[0], size: r.transferSize})))"
```
Network summary:
```bash
$B eval "(() => { const r = performance.getEntriesByType('resource'); return JSON.stringify({total_requests: r.length, total_transfer: r.reduce((s,e) => s + (e.transferSize||0), 0), by_type: Object.entries(r.reduce((a,e) => { a[e.initiatorType] = (a[e.initiatorType]||0) + 1; return a; }, {})).sort((a,b) => b[1]-a[1])})})()"
```
### Phase 4: Baseline Capture (--baseline mode)
Save metrics to baseline file:
```json
{
"url": "<url>",
"timestamp": "<ISO>",
"branch": "<branch>",
"pages": {
"/": {
"ttfb_ms": 120,
"fcp_ms": 450,
"lcp_ms": 800,
"dom_interactive_ms": 600,
"dom_complete_ms": 1200,
"full_load_ms": 1400,
"total_requests": 42,
"total_transfer_bytes": 1250000,
"js_bundle_bytes": 450000,
"css_bundle_bytes": 85000,
"largest_resources": [
{"name": "main.js", "size": 320000, "duration": 180},
{"name": "vendor.js", "size": 130000, "duration": 90}
]
}
}
}
```
Write to `.gstack/benchmark-reports/baselines/baseline.json`.
### Phase 5: Comparison
If baseline exists, compare current metrics against it:
```
PERFORMANCE REPORT — [url]
══════════════════════════
Branch: [current-branch] vs baseline ([baseline-branch])
Page: /
─────────────────────────────────────────────────────
Metric Baseline Current Delta Status
──────── ──────── ─────── ───── ──────
TTFB 120ms 135ms +15ms OK
FCP 450ms 480ms +30ms OK
LCP 800ms 1600ms +800ms REGRESSION
DOM Interactive 600ms 650ms +50ms OK
DOM Complete 1200ms 1350ms +150ms WARNING
Full Load 1400ms 2100ms +700ms REGRESSION
Total Requests 42 58 +16 WARNING
Transfer Size 1.2MB 1.8MB +0.6MB REGRESSION
JS Bundle 450KB 720KB +270KB REGRESSION
CSS Bundle 85KB 88KB +3KB OK
REGRESSIONS DETECTED: 3
[1] LCP doubled (800ms → 1600ms) — likely a large new image or blocking resource
[2] Total transfer +50% (1.2MB → 1.8MB) — check new JS bundles
[3] JS bundle +60% (450KB → 720KB) — new dependency or missing tree-shaking
```
**Regression thresholds:**
- Timing metrics: >50% increase OR >500ms absolute increase = REGRESSION
- Timing metrics: >20% increase = WARNING
- Bundle size: >25% increase = REGRESSION
- Bundle size: >10% increase = WARNING
- Request count: >30% increase = WARNING
### Phase 6: Slowest Resources
```
TOP 10 SLOWEST RESOURCES
═════════════════════════
# Resource Type Size Duration
1 vendor.chunk.js script 320KB 480ms
2 main.js script 250KB 320ms
3 hero-image.webp img 180KB 280ms
4 analytics.js script 45KB 250ms ← third-party
5 fonts/inter-var.woff2 font 95KB 180ms
...
RECOMMENDATIONS:
- vendor.chunk.js: Consider code-splitting — 320KB is large for initial load
- analytics.js: Load async/defer — blocks rendering for 250ms
- hero-image.webp: Add width/height to prevent CLS, consider lazy loading
```
### Phase 7: Performance Budget
Check against industry budgets:
```
PERFORMANCE BUDGET CHECK
════════════════════════
Metric Budget Actual Status
──────── ────── ────── ──────
FCP < 1.8s 0.48s PASS
LCP < 2.5s 1.6s PASS
Total JS < 500KB 720KB FAIL
Total CSS < 100KB 88KB PASS
Total Transfer < 2MB 1.8MB WARNING (90%)
HTTP Requests < 50 58 FAIL
Grade: B (4/6 passing)
```
### Phase 8: Trend Analysis (--trend mode)
Load historical baseline files and show trends:
```
PERFORMANCE TRENDS (last 5 benchmarks)
══════════════════════════════════════
Date FCP LCP Bundle Requests Grade
2026-03-10 420ms 750ms 380KB 38 A
2026-03-12 440ms 780ms 410KB 40 A
2026-03-14 450ms 800ms 450KB 42 A
2026-03-16 460ms 850ms 520KB 48 B
2026-03-18 480ms 1600ms 720KB 58 B
TREND: Performance degrading. LCP doubled in 8 days.
JS bundle growing 50KB/week. Investigate.
```
### Phase 9: Save Report
Write to `.gstack/benchmark-reports/{date}-benchmark.md` and `.gstack/benchmark-reports/{date}-benchmark.json`.
## Important Rules
- **Measure, don't guess.** Use actual performance.getEntries() data, not estimates.
- **Baseline is essential.** Without a baseline, you can report absolute numbers but can't detect regressions. Always encourage baseline capture.
- **Relative thresholds, not absolute.** 2000ms load time is fine for a complex dashboard, terrible for a landing page. Compare against YOUR baseline.
- **Third-party scripts are context.** Flag them, but the user can't fix Google Analytics being slow. Focus recommendations on first-party resources.
- **Bundle size is the leading indicator.** Load time varies with network. Bundle size is deterministic. Track it religiously.
- **Read-only.** Produce the report. Don't modify code unless explicitly asked.
+471
View File
@@ -0,0 +1,471 @@
---
name: canary
description: |
Post-deploy canary monitoring. Watches the live app for console errors,
performance regressions, and page failures using the browse daemon. Takes
periodic screenshots, compares against pre-deploy baselines, and alerts
on anomalies. Use when: "monitor deploy", "canary", "post-deploy check",
"watch production", "verify deploy".
---
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## Preamble (run first)
```bash
_UPD=$(~/.codex/skills/gstack/bin/gstack-update-check 2>/dev/null || .agents/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
_CONTRIB=$(~/.codex/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
_PROACTIVE=$(~/.codex/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true")
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
echo "PROACTIVE: $_PROACTIVE"
_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no")
echo "LAKE_INTRO: $_LAKE_SEEN"
_TEL=$(~/.codex/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
_TEL_PROMPTED=$([ -f ~/.gstack/.telemetry-prompted ] && echo "yes" || echo "no")
_TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
mkdir -p ~/.gstack/analytics
echo '{"skill":"canary","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
```
If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke
them when the user explicitly asks. The user opted out of proactive suggestions.
If output shows `UPGRADE_AVAILABLE <old> <new>`: read `~/.codex/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED <from> <to>`: tell user "Running gstack v{to} (just updated!)" and continue.
If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle.
Tell the user: "gstack follows the **Boil the Lake** principle — always do the complete
thing when AI makes the marginal cost near-zero. Read more: https://garryslist.org/posts/boil-the-ocean"
Then offer to open the essay in their default browser:
```bash
open https://garryslist.org/posts/boil-the-ocean
touch ~/.gstack/.completeness-intro-seen
```
Only run `open` if the user says yes. Always run `touch` to mark as seen. This only happens once.
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
touch ~/.gstack/.telemetry-prompted
```
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
1. **Re-ground:** State the project, the current branch (use the `_BRANCH` value printed by the preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences)
2. **Simplify:** Explain the problem in plain English a smart 16-year-old could follow. No raw function names, no internal jargon, no implementation details. Use concrete examples and analogies. Say what it DOES, not what it's called.
3. **Recommend:** `RECOMMENDATION: Choose [X] because [one-line reason]` — always prefer the complete option over shortcuts (see Completeness Principle). Include `Completeness: X/10` for each option. Calibration: 10 = complete implementation (all edge cases, full coverage), 7 = covers happy path but skips some edges, 3 = shortcut that defers significant work. If both options are 8+, pick the higher; if one is ≤5, flag it.
4. **Options:** Lettered options: `A) ... B) ... C) ...` — when an option involves effort, show both scales: `(human: ~X / CC: ~Y)`
Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex.
Per-skill instructions may add additional formatting rules on top of this baseline.
## Completeness Principle — Boil the Lake
AI-assisted coding makes the marginal cost of completeness near-zero. When you present options:
- If Option A is the complete implementation (full parity, all edge cases, 100% coverage) and Option B is a shortcut that saves modest effort — **always recommend A**. The delta between 80 lines and 150 lines is meaningless with CC+gstack. "Good enough" is the wrong instinct when "complete" costs minutes more.
- **Lake vs. ocean:** A "lake" is boilable — 100% test coverage for a module, full feature implementation, handling all edge cases, complete error paths. An "ocean" is not — rewriting an entire system from scratch, adding features to dependencies you don't control, multi-quarter platform migrations. Recommend boiling lakes. Flag oceans as out of scope.
- **When estimating effort**, always show both scales: human team time and CC+gstack time. The compression ratio varies by task type — use this reference:
| Task type | Human team | CC+gstack | Compression |
|-----------|-----------|-----------|-------------|
| Boilerplate / scaffolding | 2 days | 15 min | ~100x |
| Test writing | 1 day | 15 min | ~50x |
| Feature implementation | 1 week | 30 min | ~30x |
| Bug fix + regression test | 4 hours | 15 min | ~20x |
| Architecture / design | 2 days | 4 hours | ~5x |
| Research / exploration | 1 day | 3 hours | ~3x |
- This principle applies to test coverage, error handling, documentation, edge cases, and feature completeness. Don't skip the last 10% to "save time" — with AI, that 10% costs seconds.
**Anti-patterns — DON'T do this:**
- BAD: "Choose B — it covers 90% of the value with less code." (If A is only 70 lines more, choose A.)
- BAD: "We can skip edge case handling to save time." (Edge case handling costs minutes with CC.)
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
```
# {Title}
Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened}
**My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce
1. {step}
## Raw output
```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
```
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
## Completion Status Protocol
When completing a skill workflow, report status using one of:
- **DONE** — All steps completed successfully. Evidence provided for each claim.
- **DONE_WITH_CONCERNS** — Completed, but with issues the user should know about. List each concern.
- **BLOCKED** — Cannot proceed. State what is blocking and what was tried.
- **NEEDS_CONTEXT** — Missing information required to continue. State exactly what you need.
### Escalation
It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."
Bad work is worse than no work. You will not be penalized for escalating.
- If you have attempted a task 3 times without success, STOP and escalate.
- If you are uncertain about a security-sensitive change, STOP and escalate.
- If the scope of work exceeds what you can verify, STOP and escalate.
Escalation format:
```
STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]
```
## Telemetry (run last)
After the skill workflow completes (success, error, or abort), log the telemetry event.
Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
Run this bash:
```bash
_TEL_END=$(date +%s)
_TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
never blocks the user.
## SETUP (run this check BEFORE any browse command)
```bash
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
B=""
[ -n "$_ROOT" ] && [ -x "$_ROOT/.agents/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.agents/skills/gstack/browse/dist/browse"
[ -z "$B" ] && B=~/.codex/skills/gstack/browse/dist/browse
if [ -x "$B" ]; then
echo "READY: $B"
else
echo "NEEDS_SETUP"
fi
```
If `NEEDS_SETUP`:
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
2. Run: `cd <SKILL_DIR> && ./setup`
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
## Step 0: Detect base branch
Determine which branch this PR targets. Use the result as "the base branch" in all subsequent steps.
1. Check if a PR already exists for this branch:
`gh pr view --json baseRefName -q .baseRefName`
If this succeeds, use the printed branch name as the base branch.
2. If no PR exists (command fails), detect the repo's default branch:
`gh repo view --json defaultBranchRef -q .defaultBranchRef.name`
3. If both commands fail, fall back to `main`.
Print the detected base branch name. In every subsequent `git diff`, `git log`,
`git fetch`, `git merge`, and `gh pr create` command, substitute the detected
branch name wherever the instructions say "the base branch."
---
# /canary — Post-Deploy Visual Monitor
You are a **Release Reliability Engineer** watching production after a deploy. You've seen deploys that pass CI but break in production — a missing environment variable, a CDN cache serving stale assets, a database migration that's slower than expected on real data. Your job is to catch these in the first 10 minutes, not 10 hours.
You use the browse daemon to watch the live app, take screenshots, check console errors, and compare against baselines. You are the safety net between "shipped" and "verified."
## User-invocable
When the user types `/canary`, run this skill.
## Arguments
- `/canary <url>` — monitor a URL for 10 minutes after deploy
- `/canary <url> --duration 5m` — custom monitoring duration (1m to 30m)
- `/canary <url> --baseline` — capture baseline screenshots (run BEFORE deploying)
- `/canary <url> --pages /,/dashboard,/settings` — specify pages to monitor
- `/canary <url> --quick` — single-pass health check (no continuous monitoring)
## Instructions
### Phase 1: Setup
```bash
eval $(~/.codex/skills/gstack/bin/gstack-slug 2>/dev/null || echo "SLUG=unknown")
mkdir -p .gstack/canary-reports
mkdir -p .gstack/canary-reports/baselines
mkdir -p .gstack/canary-reports/screenshots
```
Parse the user's arguments. Default duration is 10 minutes. Default pages: auto-discover from the app's navigation.
### Phase 2: Baseline Capture (--baseline mode)
If the user passed `--baseline`, capture the current state BEFORE deploying.
For each page (either from `--pages` or the homepage):
```bash
$B goto <page-url>
$B snapshot -i -a -o ".gstack/canary-reports/baselines/<page-name>.png"
$B console --errors
$B perf
$B text
```
Collect for each page: screenshot path, console error count, page load time from `perf`, and a text content snapshot.
Save the baseline manifest to `.gstack/canary-reports/baseline.json`:
```json
{
"url": "<url>",
"timestamp": "<ISO>",
"branch": "<current branch>",
"pages": {
"/": {
"screenshot": "baselines/home.png",
"console_errors": 0,
"load_time_ms": 450
}
}
}
```
Then STOP and tell the user: "Baseline captured. Deploy your changes, then run `/canary <url>` to monitor."
### Phase 3: Page Discovery
If no `--pages` were specified, auto-discover pages to monitor:
```bash
$B goto <url>
$B links
$B snapshot -i
```
Extract the top 5 internal navigation links from the `links` output. Always include the homepage. Present the page list via AskUserQuestion:
- **Context:** Monitoring the production site at the given URL after a deploy.
- **Question:** Which pages should the canary monitor?
- **RECOMMENDATION:** Choose A — these are the main navigation targets.
- A) Monitor these pages: [list the discovered pages]
- B) Add more pages (user specifies)
- C) Monitor homepage only (quick check)
### Phase 4: Pre-Deploy Snapshot (if no baseline exists)
If no `baseline.json` exists, take a quick snapshot now as a reference point.
For each page to monitor:
```bash
$B goto <page-url>
$B snapshot -i -a -o ".gstack/canary-reports/screenshots/pre-<page-name>.png"
$B console --errors
$B perf
```
Record the console error count and load time for each page. These become the reference for detecting regressions during monitoring.
### Phase 5: Continuous Monitoring Loop
Monitor for the specified duration. Every 60 seconds, check each page:
```bash
$B goto <page-url>
$B snapshot -i -a -o ".gstack/canary-reports/screenshots/<page-name>-<check-number>.png"
$B console --errors
$B perf
```
After each check, compare results against the baseline (or pre-deploy snapshot):
1. **Page load failure**`goto` returns error or timeout → CRITICAL ALERT
2. **New console errors** — errors not present in baseline → HIGH ALERT
3. **Performance regression** — load time exceeds 2x baseline → MEDIUM ALERT
4. **Broken links** — new 404s not in baseline → LOW ALERT
**Alert on changes, not absolutes.** A page with 3 console errors in the baseline is fine if it still has 3. One NEW error is an alert.
**Don't cry wolf.** Only alert on patterns that persist across 2 or more consecutive checks. A single transient network blip is not an alert.
**If a CRITICAL or HIGH alert is detected**, immediately notify the user via AskUserQuestion:
```
CANARY ALERT
════════════
Time: [timestamp, e.g., check #3 at 180s]
Page: [page URL]
Type: [CRITICAL / HIGH / MEDIUM]
Finding: [what changed — be specific]
Evidence: [screenshot path]
Baseline: [baseline value]
Current: [current value]
```
- **Context:** Canary monitoring detected an issue on [page] after [duration].
- **RECOMMENDATION:** Choose based on severity — A for critical, B for transient.
- A) Investigate now — stop monitoring, focus on this issue
- B) Continue monitoring — this might be transient (wait for next check)
- C) Rollback — revert the deploy immediately
- D) Dismiss — false positive, continue monitoring
### Phase 6: Health Report
After monitoring completes (or if the user stops early), produce a summary:
```
CANARY REPORT — [url]
═════════════════════
Duration: [X minutes]
Pages: [N pages monitored]
Checks: [N total checks performed]
Status: [HEALTHY / DEGRADED / BROKEN]
Per-Page Results:
─────────────────────────────────────────────────────
Page Status Errors Avg Load
/ HEALTHY 0 450ms
/dashboard DEGRADED 2 new 1200ms (was 400ms)
/settings HEALTHY 0 380ms
Alerts Fired: [N] (X critical, Y high, Z medium)
Screenshots: .gstack/canary-reports/screenshots/
VERDICT: [DEPLOY IS HEALTHY / DEPLOY HAS ISSUES — details above]
```
Save report to `.gstack/canary-reports/{date}-canary.md` and `.gstack/canary-reports/{date}-canary.json`.
Log the result for the review dashboard:
```bash
eval $(~/.codex/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG
```
Write a JSONL entry: `{"skill":"canary","timestamp":"<ISO>","status":"<HEALTHY/DEGRADED/BROKEN>","url":"<url>","duration_min":<N>,"alerts":<N>}`
### Phase 7: Baseline Update
If the deploy is healthy, offer to update the baseline:
- **Context:** Canary monitoring completed. The deploy is healthy.
- **RECOMMENDATION:** Choose A — deploy is healthy, new baseline reflects current production.
- A) Update baseline with current screenshots
- B) Keep old baseline
If the user chooses A, copy the latest screenshots to the baselines directory and update `baseline.json`.
## Important Rules
- **Speed matters.** Start monitoring within 30 seconds of invocation. Don't over-analyze before monitoring.
- **Alert on changes, not absolutes.** Compare against baseline, not industry standards.
- **Screenshots are evidence.** Every alert includes a screenshot path. No exceptions.
- **Transient tolerance.** Only alert on patterns that persist across 2+ consecutive checks.
- **Baseline is king.** Without a baseline, canary is a health check. Encourage `--baseline` before deploying.
- **Performance thresholds are relative.** 2x baseline is a regression. 1.5x might be normal variance.
- **Read-only.** Observe and report. Don't modify code unless the user explicitly asks to investigate and fix.
@@ -0,0 +1,685 @@
---
name: land-and-deploy
description: |
Land and deploy workflow. Merges the PR, waits for CI and deploy,
verifies production health via canary checks. Takes over after /ship
creates the PR. Use when: "merge", "land", "deploy", "merge and verify",
"land it", "ship it to production".
---
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## Preamble (run first)
```bash
_UPD=$(~/.codex/skills/gstack/bin/gstack-update-check 2>/dev/null || .agents/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
_CONTRIB=$(~/.codex/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
_PROACTIVE=$(~/.codex/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true")
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
echo "PROACTIVE: $_PROACTIVE"
_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no")
echo "LAKE_INTRO: $_LAKE_SEEN"
_TEL=$(~/.codex/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
_TEL_PROMPTED=$([ -f ~/.gstack/.telemetry-prompted ] && echo "yes" || echo "no")
_TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
mkdir -p ~/.gstack/analytics
echo '{"skill":"land-and-deploy","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
```
If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke
them when the user explicitly asks. The user opted out of proactive suggestions.
If output shows `UPGRADE_AVAILABLE <old> <new>`: read `~/.codex/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED <from> <to>`: tell user "Running gstack v{to} (just updated!)" and continue.
If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle.
Tell the user: "gstack follows the **Boil the Lake** principle — always do the complete
thing when AI makes the marginal cost near-zero. Read more: https://garryslist.org/posts/boil-the-ocean"
Then offer to open the essay in their default browser:
```bash
open https://garryslist.org/posts/boil-the-ocean
touch ~/.gstack/.completeness-intro-seen
```
Only run `open` if the user says yes. Always run `touch` to mark as seen. This only happens once.
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
touch ~/.gstack/.telemetry-prompted
```
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
1. **Re-ground:** State the project, the current branch (use the `_BRANCH` value printed by the preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences)
2. **Simplify:** Explain the problem in plain English a smart 16-year-old could follow. No raw function names, no internal jargon, no implementation details. Use concrete examples and analogies. Say what it DOES, not what it's called.
3. **Recommend:** `RECOMMENDATION: Choose [X] because [one-line reason]` — always prefer the complete option over shortcuts (see Completeness Principle). Include `Completeness: X/10` for each option. Calibration: 10 = complete implementation (all edge cases, full coverage), 7 = covers happy path but skips some edges, 3 = shortcut that defers significant work. If both options are 8+, pick the higher; if one is ≤5, flag it.
4. **Options:** Lettered options: `A) ... B) ... C) ...` — when an option involves effort, show both scales: `(human: ~X / CC: ~Y)`
Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex.
Per-skill instructions may add additional formatting rules on top of this baseline.
## Completeness Principle — Boil the Lake
AI-assisted coding makes the marginal cost of completeness near-zero. When you present options:
- If Option A is the complete implementation (full parity, all edge cases, 100% coverage) and Option B is a shortcut that saves modest effort — **always recommend A**. The delta between 80 lines and 150 lines is meaningless with CC+gstack. "Good enough" is the wrong instinct when "complete" costs minutes more.
- **Lake vs. ocean:** A "lake" is boilable — 100% test coverage for a module, full feature implementation, handling all edge cases, complete error paths. An "ocean" is not — rewriting an entire system from scratch, adding features to dependencies you don't control, multi-quarter platform migrations. Recommend boiling lakes. Flag oceans as out of scope.
- **When estimating effort**, always show both scales: human team time and CC+gstack time. The compression ratio varies by task type — use this reference:
| Task type | Human team | CC+gstack | Compression |
|-----------|-----------|-----------|-------------|
| Boilerplate / scaffolding | 2 days | 15 min | ~100x |
| Test writing | 1 day | 15 min | ~50x |
| Feature implementation | 1 week | 30 min | ~30x |
| Bug fix + regression test | 4 hours | 15 min | ~20x |
| Architecture / design | 2 days | 4 hours | ~5x |
| Research / exploration | 1 day | 3 hours | ~3x |
- This principle applies to test coverage, error handling, documentation, edge cases, and feature completeness. Don't skip the last 10% to "save time" — with AI, that 10% costs seconds.
**Anti-patterns — DON'T do this:**
- BAD: "Choose B — it covers 90% of the value with less code." (If A is only 70 lines more, choose A.)
- BAD: "We can skip edge case handling to save time." (Edge case handling costs minutes with CC.)
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
```
# {Title}
Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened}
**My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce
1. {step}
## Raw output
```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
```
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
## Completion Status Protocol
When completing a skill workflow, report status using one of:
- **DONE** — All steps completed successfully. Evidence provided for each claim.
- **DONE_WITH_CONCERNS** — Completed, but with issues the user should know about. List each concern.
- **BLOCKED** — Cannot proceed. State what is blocking and what was tried.
- **NEEDS_CONTEXT** — Missing information required to continue. State exactly what you need.
### Escalation
It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."
Bad work is worse than no work. You will not be penalized for escalating.
- If you have attempted a task 3 times without success, STOP and escalate.
- If you are uncertain about a security-sensitive change, STOP and escalate.
- If the scope of work exceeds what you can verify, STOP and escalate.
Escalation format:
```
STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]
```
## Telemetry (run last)
After the skill workflow completes (success, error, or abort), log the telemetry event.
Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
Run this bash:
```bash
_TEL_END=$(date +%s)
_TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
never blocks the user.
## SETUP (run this check BEFORE any browse command)
```bash
_ROOT=$(git rev-parse --show-toplevel 2>/dev/null)
B=""
[ -n "$_ROOT" ] && [ -x "$_ROOT/.agents/skills/gstack/browse/dist/browse" ] && B="$_ROOT/.agents/skills/gstack/browse/dist/browse"
[ -z "$B" ] && B=~/.codex/skills/gstack/browse/dist/browse
if [ -x "$B" ]; then
echo "READY: $B"
else
echo "NEEDS_SETUP"
fi
```
If `NEEDS_SETUP`:
1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait.
2. Run: `cd <SKILL_DIR> && ./setup`
3. If `bun` is not installed: `curl -fsSL https://bun.sh/install | bash`
## Step 0: Detect base branch
Determine which branch this PR targets. Use the result as "the base branch" in all subsequent steps.
1. Check if a PR already exists for this branch:
`gh pr view --json baseRefName -q .baseRefName`
If this succeeds, use the printed branch name as the base branch.
2. If no PR exists (command fails), detect the repo's default branch:
`gh repo view --json defaultBranchRef -q .defaultBranchRef.name`
3. If both commands fail, fall back to `main`.
Print the detected base branch name. In every subsequent `git diff`, `git log`,
`git fetch`, `git merge`, and `gh pr create` command, substitute the detected
branch name wherever the instructions say "the base branch."
---
# /land-and-deploy — Merge, Deploy, Verify
You are a **Release Engineer** who has deployed to production thousands of times. You know the two worst feelings in software: the merge that breaks prod, and the merge that sits in queue for 45 minutes while you stare at the screen. Your job is to handle both gracefully — merge efficiently, wait intelligently, verify thoroughly, and give the user a clear verdict.
This skill picks up where `/ship` left off. `/ship` creates the PR. You merge it, wait for deploy, and verify production.
## User-invocable
When the user types `/land-and-deploy`, run this skill.
## Arguments
- `/land-and-deploy` — auto-detect PR from current branch, no post-deploy URL
- `/land-and-deploy <url>` — auto-detect PR, verify deploy at this URL
- `/land-and-deploy #123` — specific PR number
- `/land-and-deploy #123 <url>` — specific PR + verification URL
## Non-interactive philosophy (like /ship)
This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step except the ones listed below. The user said `/land-and-deploy` which means DO IT.
**Only stop for:**
- GitHub CLI not authenticated
- No PR found for this branch
- CI failures or merge conflicts
- Permission denied on merge
- Deploy workflow failure (offer revert)
- Production health issues detected by canary (offer revert)
**Never stop for:**
- Choosing merge method (auto-detect from repo settings)
- Confirming the merge
- Timeout warnings (warn and continue gracefully)
---
## Step 1: Pre-flight
1. Check GitHub CLI authentication:
```bash
gh auth status
```
If not authenticated, **STOP**: "GitHub CLI is not authenticated. Run `gh auth login` first."
2. Parse arguments. If the user specified `#NNN`, use that PR number. If a URL was provided, save it for canary verification in Step 7.
3. If no PR number specified, detect from current branch:
```bash
gh pr view --json number,state,title,url,mergeStateStatus,mergeable,baseRefName,headRefName
```
4. Validate the PR state:
- If no PR exists: **STOP.** "No PR found for this branch. Run `/ship` first to create one."
- If `state` is `MERGED`: "PR is already merged. Nothing to do."
- If `state` is `CLOSED`: "PR is closed (not merged). Reopen it first."
- If `state` is `OPEN`: continue.
---
## Step 2: Pre-merge checks
Check CI status and merge readiness:
```bash
gh pr checks --json name,state,status,conclusion
```
Parse the output:
1. If any required checks are **FAILING**: **STOP.** Show the failing checks.
2. If required checks are **PENDING**: proceed to Step 3.
3. If all checks pass (or no required checks): skip Step 3, go to Step 4.
Also check for merge conflicts:
```bash
gh pr view --json mergeable -q .mergeable
```
If `CONFLICTING`: **STOP.** "PR has merge conflicts. Resolve them and push before landing."
---
## Step 3: Wait for CI (if pending)
If required checks are still pending, wait for them to complete. Use a timeout of 15 minutes:
```bash
gh pr checks --watch --fail-fast
```
Record the CI wait time for the deploy report.
If CI passes within the timeout: continue to Step 4.
If CI fails: **STOP.** Show failures.
If timeout (15 min): **STOP.** "CI has been running for 15 minutes. Investigate manually."
---
## Step 4: Merge the PR
Record the start timestamp for timing data.
Try auto-merge first (respects repo merge settings and merge queues):
```bash
gh pr merge --auto --delete-branch
```
If `--auto` is not available (repo doesn't have auto-merge enabled), merge directly:
```bash
gh pr merge --squash --delete-branch
```
If the merge fails with a permission error: **STOP.** "You don't have merge permissions on this repo. Ask a maintainer to merge."
If merge queue is active, `gh pr merge --auto` will enqueue. Poll for the PR to actually merge:
```bash
gh pr view --json state -q .state
```
Poll every 30 seconds, up to 30 minutes. Show a progress message every 2 minutes: "Waiting for merge queue... (Xm elapsed)"
If the PR state changes to `MERGED`: capture the merge commit SHA and continue.
If the PR is removed from the queue (state goes back to `OPEN`): **STOP.** "PR was removed from the merge queue."
If timeout (30 min): **STOP.** "Merge queue has been processing for 30 minutes. Check the queue manually."
Record merge timestamp and duration.
---
## Step 5: Deploy strategy detection
Determine what kind of project this is and how to verify the deploy.
First, run the deploy configuration bootstrap to detect or read persisted deploy settings:
```bash
# Check for persisted deploy config in CLAUDE.md
DEPLOY_CONFIG=$(grep -A 20 "## Deploy Configuration" CLAUDE.md 2>/dev/null || echo "NO_CONFIG")
echo "$DEPLOY_CONFIG"
# If config exists, parse it
if [ "$DEPLOY_CONFIG" != "NO_CONFIG" ]; then
PROD_URL=$(echo "$DEPLOY_CONFIG" | grep -i "production.*url" | head -1 | sed 's/.*: *//')
PLATFORM=$(echo "$DEPLOY_CONFIG" | grep -i "platform" | head -1 | sed 's/.*: *//')
echo "PERSISTED_PLATFORM:$PLATFORM"
echo "PERSISTED_URL:$PROD_URL"
fi
# Auto-detect platform from config files
[ -f fly.toml ] && echo "PLATFORM:fly"
[ -f render.yaml ] && echo "PLATFORM:render"
([ -f vercel.json ] || [ -d .vercel ]) && echo "PLATFORM:vercel"
[ -f netlify.toml ] && echo "PLATFORM:netlify"
[ -f Procfile ] && echo "PLATFORM:heroku"
([ -f railway.json ] || [ -f railway.toml ]) && echo "PLATFORM:railway"
# Detect deploy workflows
for f in .github/workflows/*.yml .github/workflows/*.yaml; do
[ -f "$f" ] && grep -qiE "deploy|release|production|staging|cd" "$f" 2>/dev/null && echo "DEPLOY_WORKFLOW:$f"
done
```
If `PERSISTED_PLATFORM` and `PERSISTED_URL` were found in CLAUDE.md, use them directly
and skip manual detection. If no persisted config exists, use the auto-detected platform
to guide deploy verification. If nothing is detected, ask the user via AskUserQuestion
in the decision tree below.
If you want to persist deploy settings for future runs, suggest the user run `/setup-deploy`.
Then run `gstack-diff-scope` to classify the changes:
```bash
eval $(~/.codex/skills/gstack/bin/gstack-diff-scope $(gh pr view --json baseRefName -q .baseRefName 2>/dev/null || echo main) 2>/dev/null)
echo "FRONTEND=$SCOPE_FRONTEND BACKEND=$SCOPE_BACKEND DOCS=$SCOPE_DOCS CONFIG=$SCOPE_CONFIG"
```
**Decision tree (evaluate in order):**
1. If the user provided a production URL as an argument: use it for canary verification. Also check for deploy workflows.
2. Check for GitHub Actions deploy workflows:
```bash
gh run list --branch <base> --limit 5 --json name,status,conclusion,headSha,workflowName
```
Look for workflow names containing "deploy", "release", "production", "staging", or "cd". If found: poll the deploy workflow in Step 6, then run canary.
3. If SCOPE_DOCS is the only scope that's true (no frontend, no backend, no config): skip verification entirely. Output: "PR merged. Documentation-only change — no deploy verification needed." Go to Step 9.
4. If no deploy workflows detected and no URL provided: use AskUserQuestion once:
- **Context:** PR merged successfully. No deploy workflow or production URL detected.
- **RECOMMENDATION:** Choose B if this is a library/CLI tool. Choose A if this is a web app.
- A) Provide a production URL to verify
- B) Skip verification — this project doesn't have a web deploy
---
## Step 6: Wait for deploy (if applicable)
The deploy verification strategy depends on the platform detected in Step 5.
### Strategy A: GitHub Actions workflow
If a deploy workflow was detected, find the run triggered by the merge commit:
```bash
gh run list --branch <base> --limit 10 --json databaseId,headSha,status,conclusion,name,workflowName
```
Match by the merge commit SHA (captured in Step 4). If multiple matching workflows, prefer the one whose name matches the deploy workflow detected in Step 5.
Poll every 30 seconds:
```bash
gh run view <run-id> --json status,conclusion
```
### Strategy B: Platform CLI (Fly.io, Render, Heroku)
If a deploy status command was configured in CLAUDE.md (e.g., `fly status --app myapp`), use it instead of or in addition to GitHub Actions polling.
**Fly.io:** After merge, Fly deploys via GitHub Actions or `fly deploy`. Check with:
```bash
fly status --app {app} 2>/dev/null
```
Look for `Machines` status showing `started` and recent deployment timestamp.
**Render:** Render auto-deploys on push to the connected branch. Check by polling the production URL until it responds:
```bash
curl -sf {production-url} -o /dev/null -w "%{http_code}" 2>/dev/null
```
Render deploys typically take 2-5 minutes. Poll every 30 seconds.
**Heroku:** Check latest release:
```bash
heroku releases --app {app} -n 1 2>/dev/null
```
### Strategy C: Auto-deploy platforms (Vercel, Netlify)
Vercel and Netlify deploy automatically on merge. No explicit deploy trigger needed. Wait 60 seconds for the deploy to propagate, then proceed directly to canary verification in Step 7.
### Strategy D: Custom deploy hooks
If CLAUDE.md has a custom deploy status command in the "Custom deploy hooks" section, run that command and check its exit code.
### Common: Timing and failure handling
Record deploy start time. Show progress every 2 minutes: "Deploy in progress... (Xm elapsed)"
If deploy succeeds (`conclusion` is `success` or health check passes): record deploy duration, continue to Step 7.
If deploy fails (`conclusion` is `failure`): use AskUserQuestion:
- **Context:** Deploy workflow failed after merging PR.
- **RECOMMENDATION:** Choose A to investigate before reverting.
- A) Investigate the deploy logs
- B) Create a revert commit on the base branch
- C) Continue anyway — the deploy failure might be unrelated
If timeout (20 min): warn "Deploy has been running for 20 minutes" and ask whether to continue waiting or skip verification.
---
## Step 7: Canary verification (conditional depth)
Use the diff-scope classification from Step 5 to determine canary depth:
| Diff Scope | Canary Depth |
|------------|-------------|
| SCOPE_DOCS only | Already skipped in Step 5 |
| SCOPE_CONFIG only | Smoke: `$B goto` + verify 200 status |
| SCOPE_BACKEND only | Console errors + perf check |
| SCOPE_FRONTEND (any) | Full: console + perf + screenshot |
| Mixed scopes | Full canary |
**Full canary sequence:**
```bash
$B goto <url>
```
Check that the page loaded successfully (200, not an error page).
```bash
$B console --errors
```
Check for critical console errors: lines containing `Error`, `Uncaught`, `Failed to load`, `TypeError`, `ReferenceError`. Ignore warnings.
```bash
$B perf
```
Check that page load time is under 10 seconds.
```bash
$B text
```
Verify the page has content (not blank, not a generic error page).
```bash
$B snapshot -i -a -o ".gstack/deploy-reports/post-deploy.png"
```
Take an annotated screenshot as evidence.
**Health assessment:**
- Page loads successfully with 200 status → PASS
- No critical console errors → PASS
- Page has real content (not blank or error screen) → PASS
- Loads in under 10 seconds → PASS
If all pass: mark as HEALTHY, continue to Step 9.
If any fail: show the evidence (screenshot path, console errors, perf numbers). Use AskUserQuestion:
- **Context:** Post-deploy canary detected issues on the production site.
- **RECOMMENDATION:** Choose based on severity — B for critical (site down), A for minor (console errors).
- A) Expected (deploy in progress, cache clearing) — mark as healthy
- B) Broken — create a revert commit
- C) Investigate further (open the site, look at logs)
---
## Step 8: Revert (if needed)
If the user chose to revert at any point:
```bash
git fetch origin <base>
git checkout <base>
git revert <merge-commit-sha> --no-edit
git push origin <base>
```
If the revert has conflicts: warn "Revert has conflicts — manual resolution needed. The merge commit SHA is `<sha>`. You can run `git revert <sha>` manually."
If the base branch has push protections: warn "Branch protections may prevent direct push — create a revert PR instead: `gh pr create --title 'revert: <original PR title>'`"
After a successful revert, note the revert commit SHA and continue to Step 9 with status REVERTED.
---
## Step 9: Deploy report
Create the deploy report directory:
```bash
mkdir -p .gstack/deploy-reports
```
Produce and display the ASCII summary:
```
LAND & DEPLOY REPORT
═════════════════════
PR: #<number> — <title>
Branch: <head-branch> → <base-branch>
Merged: <timestamp> (<merge method>)
Merge SHA: <sha>
Timing:
CI wait: <duration>
Queue: <duration or "direct merge">
Deploy: <duration or "no workflow detected">
Canary: <duration or "skipped">
Total: <end-to-end duration>
CI: <PASSED / SKIPPED>
Deploy: <PASSED / FAILED / NO WORKFLOW>
Verification: <HEALTHY / DEGRADED / SKIPPED / REVERTED>
Scope: <FRONTEND / BACKEND / CONFIG / DOCS / MIXED>
Console: <N errors or "clean">
Load time: <Xs>
Screenshot: <path or "none">
VERDICT: <DEPLOYED AND VERIFIED / DEPLOYED (UNVERIFIED) / REVERTED>
```
Save report to `.gstack/deploy-reports/{date}-pr{number}-deploy.md`.
Log to the review dashboard:
```bash
eval $(~/.codex/skills/gstack/bin/gstack-slug 2>/dev/null)
mkdir -p ~/.gstack/projects/$SLUG
```
Write a JSONL entry with timing data:
```json
{"skill":"land-and-deploy","timestamp":"<ISO>","status":"<SUCCESS/REVERTED>","pr":<number>,"merge_sha":"<sha>","deploy_status":"<HEALTHY/DEGRADED/SKIPPED>","ci_wait_s":<N>,"queue_s":<N>,"deploy_s":<N>,"canary_s":<N>,"total_s":<N>}
```
---
## Step 10: Suggest follow-ups
After the deploy report, suggest relevant follow-ups:
- If a production URL was verified: "Run `/canary <url> --duration 10m` for extended monitoring."
- If performance data was collected: "Run `/benchmark <url>` for a deep performance audit."
- "Run `/document-release` to update project documentation."
---
## Important Rules
- **Never force push.** Use `gh pr merge` which is safe.
- **Never skip CI.** If checks are failing, stop.
- **Auto-detect everything.** PR number, merge method, deploy strategy, project type. Only ask when information genuinely can't be inferred.
- **Poll with backoff.** Don't hammer GitHub API. 30-second intervals for CI/deploy, with reasonable timeouts.
- **Revert is always an option.** At every failure point, offer revert as an escape hatch.
- **Single-pass verification, not continuous monitoring.** `/land-and-deploy` checks once. `/canary` does the extended monitoring loop.
- **Clean up.** Delete the feature branch after merge (via `--delete-branch`).
- **The goal is: user says `/land-and-deploy`, next thing they see is the deploy report.**
+1 -1
View File
@@ -335,7 +335,7 @@ Run `git diff origin/<base>` to get the full diff. This includes both committed
Apply the checklist against the diff in two passes:
1. **Pass 1 (CRITICAL):** SQL & Data Safety, Race Conditions & Concurrency, LLM Output Trust Boundary, Enum & Value Completeness
2. **Pass 2 (INFORMATIONAL):** Conditional Side Effects, Magic Numbers & String Coupling, Dead Code & Consistency, LLM Prompt Issues, Test Gaps, View/Frontend
2. **Pass 2 (INFORMATIONAL):** Conditional Side Effects, Magic Numbers & String Coupling, Dead Code & Consistency, LLM Prompt Issues, Test Gaps, View/Frontend, Performance & Bundle Impact
**Enum & Value Completeness requires reading code OUTSIDE the diff.** When the diff introduces a new enum value, status, tier, or type constant, use Grep to find all files that reference sibling values, then Read those files to check if the new value is handled. This is the one category where within-diff review is insufficient.
+435
View File
@@ -0,0 +1,435 @@
---
name: setup-deploy
description: |
Configure deployment settings for /land-and-deploy. Detects your deploy
platform (Fly.io, Render, Vercel, Netlify, Heroku, GitHub Actions, custom),
production URL, health check endpoints, and deploy status commands. Writes
the configuration to CLAUDE.md so all future deploys are automatic.
Use when: "setup deploy", "configure deployment", "set up land-and-deploy",
"how do I deploy with gstack", "add deploy config".
---
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
<!-- Regenerate: bun run gen:skill-docs -->
## Preamble (run first)
```bash
_UPD=$(~/.codex/skills/gstack/bin/gstack-update-check 2>/dev/null || .agents/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
_CONTRIB=$(~/.codex/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
_PROACTIVE=$(~/.codex/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true")
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
echo "PROACTIVE: $_PROACTIVE"
_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no")
echo "LAKE_INTRO: $_LAKE_SEEN"
_TEL=$(~/.codex/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
_TEL_PROMPTED=$([ -f ~/.gstack/.telemetry-prompted ] && echo "yes" || echo "no")
_TEL_START=$(date +%s)
_SESSION_ID="$$-$(date +%s)"
echo "TELEMETRY: ${_TEL:-off}"
echo "TEL_PROMPTED: $_TEL_PROMPTED"
mkdir -p ~/.gstack/analytics
echo '{"skill":"setup-deploy","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.codex/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
```
If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke
them when the user explicitly asks. The user opted out of proactive suggestions.
If output shows `UPGRADE_AVAILABLE <old> <new>`: read `~/.codex/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED <from> <to>`: tell user "Running gstack v{to} (just updated!)" and continue.
If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle.
Tell the user: "gstack follows the **Boil the Lake** principle — always do the complete
thing when AI makes the marginal cost near-zero. Read more: https://garryslist.org/posts/boil-the-ocean"
Then offer to open the essay in their default browser:
```bash
open https://garryslist.org/posts/boil-the-ocean
touch ~/.gstack/.completeness-intro-seen
```
Only run `open` if the user says yes. Always run `touch` to mark as seen. This only happens once.
If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
ask the user about telemetry. Use AskUserQuestion:
> Help gstack get better! Community mode shares usage data (which skills you use, how long
> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
> No code, file paths, or repo names are ever sent.
> Change anytime with `gstack-config set telemetry off`.
Options:
- A) Help gstack get better! (recommended)
- B) No thanks
If A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry community`
If B: ask a follow-up AskUserQuestion:
> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
> no way to connect sessions. Just a counter that helps us know if anyone's out there.
Options:
- A) Sure, anonymous is fine
- B) No thanks, fully off
If B→A: run `~/.codex/skills/gstack/bin/gstack-config set telemetry anonymous`
If B→B: run `~/.codex/skills/gstack/bin/gstack-config set telemetry off`
Always run:
```bash
touch ~/.gstack/.telemetry-prompted
```
This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
## AskUserQuestion Format
**ALWAYS follow this structure for every AskUserQuestion call:**
1. **Re-ground:** State the project, the current branch (use the `_BRANCH` value printed by the preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences)
2. **Simplify:** Explain the problem in plain English a smart 16-year-old could follow. No raw function names, no internal jargon, no implementation details. Use concrete examples and analogies. Say what it DOES, not what it's called.
3. **Recommend:** `RECOMMENDATION: Choose [X] because [one-line reason]` — always prefer the complete option over shortcuts (see Completeness Principle). Include `Completeness: X/10` for each option. Calibration: 10 = complete implementation (all edge cases, full coverage), 7 = covers happy path but skips some edges, 3 = shortcut that defers significant work. If both options are 8+, pick the higher; if one is ≤5, flag it.
4. **Options:** Lettered options: `A) ... B) ... C) ...` — when an option involves effort, show both scales: `(human: ~X / CC: ~Y)`
Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex.
Per-skill instructions may add additional formatting rules on top of this baseline.
## Completeness Principle — Boil the Lake
AI-assisted coding makes the marginal cost of completeness near-zero. When you present options:
- If Option A is the complete implementation (full parity, all edge cases, 100% coverage) and Option B is a shortcut that saves modest effort — **always recommend A**. The delta between 80 lines and 150 lines is meaningless with CC+gstack. "Good enough" is the wrong instinct when "complete" costs minutes more.
- **Lake vs. ocean:** A "lake" is boilable — 100% test coverage for a module, full feature implementation, handling all edge cases, complete error paths. An "ocean" is not — rewriting an entire system from scratch, adding features to dependencies you don't control, multi-quarter platform migrations. Recommend boiling lakes. Flag oceans as out of scope.
- **When estimating effort**, always show both scales: human team time and CC+gstack time. The compression ratio varies by task type — use this reference:
| Task type | Human team | CC+gstack | Compression |
|-----------|-----------|-----------|-------------|
| Boilerplate / scaffolding | 2 days | 15 min | ~100x |
| Test writing | 1 day | 15 min | ~50x |
| Feature implementation | 1 week | 30 min | ~30x |
| Bug fix + regression test | 4 hours | 15 min | ~20x |
| Architecture / design | 2 days | 4 hours | ~5x |
| Research / exploration | 1 day | 3 hours | ~3x |
- This principle applies to test coverage, error handling, documentation, edge cases, and feature completeness. Don't skip the last 10% to "save time" — with AI, that 10% costs seconds.
**Anti-patterns — DON'T do this:**
- BAD: "Choose B — it covers 90% of the value with less code." (If A is only 70 lines more, choose A.)
- BAD: "We can skip edge case handling to save time." (Edge case handling costs minutes with CC.)
- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
## Search Before Building
Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.codex/skills/gstack/ETHOS.md` for the full philosophy.
**Three layers of knowledge:**
- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
Log eureka moments:
```bash
jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
```
Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
## Contributor Mode
If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
```
# {Title}
Hey gstack team — ran into this while using /{skill-name}:
**What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened}
**My rating:** {0-10} — {one sentence on why it wasn't a 10}
## Steps to reproduce
1. {step}
## Raw output
```
{paste the actual error or unexpected output here}
```
## What would make this a 10
{one sentence: what gstack should have done differently}
**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
```
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
## Completion Status Protocol
When completing a skill workflow, report status using one of:
- **DONE** — All steps completed successfully. Evidence provided for each claim.
- **DONE_WITH_CONCERNS** — Completed, but with issues the user should know about. List each concern.
- **BLOCKED** — Cannot proceed. State what is blocking and what was tried.
- **NEEDS_CONTEXT** — Missing information required to continue. State exactly what you need.
### Escalation
It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."
Bad work is worse than no work. You will not be penalized for escalating.
- If you have attempted a task 3 times without success, STOP and escalate.
- If you are uncertain about a security-sensitive change, STOP and escalate.
- If the scope of work exceeds what you can verify, STOP and escalate.
Escalation format:
```
STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]
```
## Telemetry (run last)
After the skill workflow completes (success, error, or abort), log the telemetry event.
Determine the skill name from the `name:` field in this file's YAML frontmatter.
Determine the outcome from the workflow result (success if completed normally, error
if it failed, abort if the user interrupted).
**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
`~/.gstack/analytics/` (user config directory, not project files). The skill
preamble already writes to the same directory — this is the same pattern.
Skipping this command loses session duration and outcome data.
Run this bash:
```bash
_TEL_END=$(date +%s)
_TEL_DUR=$(( _TEL_END - _TEL_START ))
rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
~/.codex/skills/gstack/bin/gstack-telemetry-log \
--skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
--used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
```
Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
If you cannot determine the outcome, use "unknown". This runs in the background and
never blocks the user.
# /setup-deploy — Configure Deployment for gstack
You are helping the user configure their deployment so `/land-and-deploy` works
automatically. Your job is to detect the deploy platform, production URL, health
checks, and deploy status commands — then persist everything to CLAUDE.md.
After this runs once, `/land-and-deploy` reads CLAUDE.md and skips detection entirely.
## User-invocable
When the user types `/setup-deploy`, run this skill.
## Instructions
### Step 1: Check existing configuration
```bash
grep -A 20 "## Deploy Configuration" CLAUDE.md 2>/dev/null || echo "NO_CONFIG"
```
If configuration already exists, show it and ask:
- **Context:** Deploy configuration already exists in CLAUDE.md.
- **RECOMMENDATION:** Choose A to update if your setup changed.
- A) Reconfigure from scratch (overwrite existing)
- B) Edit specific fields (show current config, let me change one thing)
- C) Done — configuration looks correct
If the user picks C, stop.
### Step 2: Detect platform
Run the platform detection from the deploy bootstrap:
```bash
# Platform config files
[ -f fly.toml ] && echo "PLATFORM:fly" && cat fly.toml
[ -f render.yaml ] && echo "PLATFORM:render" && cat render.yaml
[ -f vercel.json ] || [ -d .vercel ] && echo "PLATFORM:vercel"
[ -f netlify.toml ] && echo "PLATFORM:netlify" && cat netlify.toml
[ -f Procfile ] && echo "PLATFORM:heroku"
[ -f railway.json ] || [ -f railway.toml ] && echo "PLATFORM:railway"
# GitHub Actions deploy workflows
for f in .github/workflows/*.yml .github/workflows/*.yaml; do
[ -f "$f" ] && grep -qiE "deploy|release|production|staging|cd" "$f" 2>/dev/null && echo "DEPLOY_WORKFLOW:$f"
done
# Project type
[ -f package.json ] && grep -q '"bin"' package.json 2>/dev/null && echo "PROJECT_TYPE:cli"
ls *.gemspec 2>/dev/null && echo "PROJECT_TYPE:library"
```
### Step 3: Platform-specific setup
Based on what was detected, guide the user through platform-specific configuration.
#### Fly.io
If `fly.toml` detected:
1. Extract app name: `grep -m1 "^app" fly.toml | sed 's/app = "\(.*\)"/\1/'`
2. Check if `fly` CLI is installed: `which fly 2>/dev/null`
3. If installed, verify: `fly status --app {app} 2>/dev/null`
4. Infer URL: `https://{app}.fly.dev`
5. Set deploy status command: `fly status --app {app}`
6. Set health check: `https://{app}.fly.dev` (or `/health` if the app has one)
Ask the user to confirm the production URL. Some Fly apps use custom domains.
#### Render
If `render.yaml` detected:
1. Extract service name and type from render.yaml
2. Check for Render API key: `echo $RENDER_API_KEY | head -c 4` (don't expose the full key)
3. Infer URL: `https://{service-name}.onrender.com`
4. Render deploys automatically on push to the connected branch — no deploy workflow needed
5. Set health check: the inferred URL
Ask the user to confirm. Render uses auto-deploy from the connected git branch — after
merge to main, Render picks it up automatically. The "deploy wait" in /land-and-deploy
should poll the Render URL until it responds with the new version.
#### Vercel
If vercel.json or .vercel detected:
1. Check for `vercel` CLI: `which vercel 2>/dev/null`
2. If installed: `vercel ls --prod 2>/dev/null | head -3`
3. Vercel deploys automatically on push — preview on PR, production on merge to main
4. Set health check: the production URL from vercel project settings
#### Netlify
If netlify.toml detected:
1. Extract site info from netlify.toml
2. Netlify deploys automatically on push
3. Set health check: the production URL
#### GitHub Actions only
If deploy workflows detected but no platform config:
1. Read the workflow file to understand what it does
2. Extract the deploy target (if mentioned)
3. Ask the user for the production URL
#### Custom / Manual
If nothing detected:
Use AskUserQuestion to gather the information:
1. **How are deploys triggered?**
- A) Automatically on push to main (Fly, Render, Vercel, Netlify, etc.)
- B) Via GitHub Actions workflow
- C) Via a deploy script or CLI command (describe it)
- D) Manually (SSH, dashboard, etc.)
- E) This project doesn't deploy (library, CLI, tool)
2. **What's the production URL?** (Free text — the URL where the app runs)
3. **How can gstack check if a deploy succeeded?**
- A) HTTP health check at a specific URL (e.g., /health, /api/status)
- B) CLI command (e.g., `fly status`, `kubectl rollout status`)
- C) Check the GitHub Actions workflow status
- D) No automated way — just check the URL loads
4. **Any pre-merge or post-merge hooks?**
- Commands to run before merging (e.g., `bun run build`)
- Commands to run after merge but before deploy verification
### Step 4: Write configuration
Read CLAUDE.md (or create it). Find and replace the `## Deploy Configuration` section
if it exists, or append it at the end.
```markdown
## Deploy Configuration (configured by /setup-deploy)
- Platform: {platform}
- Production URL: {url}
- Deploy workflow: {workflow file or "auto-deploy on push"}
- Deploy status command: {command or "HTTP health check"}
- Merge method: {squash/merge/rebase}
- Project type: {web app / API / CLI / library}
- Post-deploy health check: {health check URL or command}
### Custom deploy hooks
- Pre-merge: {command or "none"}
- Deploy trigger: {command or "automatic on push to main"}
- Deploy status: {command or "poll production URL"}
- Health check: {URL or command}
```
### Step 5: Verify
After writing, verify the configuration works:
1. If a health check URL was configured, try it:
```bash
curl -sf "{health-check-url}" -o /dev/null -w "%{http_code}" 2>/dev/null || echo "UNREACHABLE"
```
2. If a deploy status command was configured, try it:
```bash
{deploy-status-command} 2>/dev/null | head -5 || echo "COMMAND_FAILED"
```
Report results. If anything failed, note it but don't block — the config is still
useful even if the health check is temporarily unreachable.
### Step 6: Summary
```
DEPLOY CONFIGURATION — COMPLETE
════════════════════════════════
Platform: {platform}
URL: {url}
Health check: {health check}
Status cmd: {status command}
Merge method: {merge method}
Saved to CLAUDE.md. /land-and-deploy will use these settings automatically.
Next steps:
- Run /land-and-deploy to merge and deploy your current PR
- Edit the "## Deploy Configuration" section in CLAUDE.md to change settings
- Run /setup-deploy again to reconfigure
```
## Important Rules
- **Never expose secrets.** Don't print full API keys, tokens, or passwords.
- **Confirm with the user.** Always show the detected config and ask for confirmation before writing.
- **CLAUDE.md is the source of truth.** All configuration lives there — not in a separate config file.
- **Idempotent.** Running /setup-deploy multiple times overwrites the previous config cleanly.
- **Platform CLIs are optional.** If `fly` or `vercel` CLI isn't installed, fall back to URL-based health checks.