feat: /land-and-deploy, /canary, /benchmark + perf review (v0.7.0) (#183)

* feat: add /canary, /benchmark, /land-and-deploy skills (v0.7.0) Three new skills that close the deploy loop: - /canary: standalone post-deploy monitoring with browse daemon - /benchmark: performance regression detection with Web Vitals - /land-and-deploy: merge PR, wait for deploy, canary verify production Incorporates patterns from community PR #151. Co-Authored-By: HMAKT99 <HMAKT99@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Performance & Bundle Impact category to review checklist New Pass 2 (INFORMATIONAL) category catching heavy dependencies (moment.js, lodash full), missing lazy loading, synchronous scripts, CSS @import blocking, fetch waterfalls, and tree-shaking breaks. Both /review and /ship automatically pick this up via checklist.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add {{DEPLOY_BOOTSTRAP}} resolver + deployed row in dashboard - New generateDeployBootstrap() resolver auto-detects deploy platform (Vercel, Netlify, Fly.io, GH Actions, etc.), production URL, and merge method. Persists to CLAUDE.md like test bootstrap. - Review Readiness Dashboard now shows a "Deployed" row from /land-and-deploy JSONL entries (informational, never gates shipping). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: mark 3 TODOs completed, bump v0.7.0, update CHANGELOG Superseded by /land-and-deploy: - /merge skill — review-gated PR merge - Deploy-verify skill - Post-deploy verification (ship + browse) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /setup-deploy skill + platform-specific deploy verification - New /setup-deploy skill: interactive guided setup for deploy configuration. Detects Fly.io, Render, Vercel, Netlify, Heroku, Railway, GitHub Actions, and custom deploy scripts. Writes config to CLAUDE.md with custom hooks section for non-standard setups. - Enhanced deploy bootstrap: platform-specific URL resolution (fly.toml app → {app}.fly.dev, render.yaml → {service}.onrender.com, etc.), deploy status commands (fly status, heroku releases), and custom deploy hooks section in CLAUDE.md for manual/scripted deploys. - Platform-specific deploy verification in /land-and-deploy Step 6: Strategy A (GitHub Actions polling), Strategy B (platform CLI: fly/render/heroku), Strategy C (auto-deploy: vercel/netlify), Strategy D (custom hooks from CLAUDE.md). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: E2E + LLM-judge evals for deploy skills - 4 E2E tests: land-and-deploy (Fly.io detection + deploy report), canary (monitoring report structure), benchmark (perf report schema), setup-deploy (platform detection → CLAUDE.md config) - 4 LLM-judge evals: workflow quality for all 4 new skills - Touchfile entries for diff-based test selection (E2E + LLM-judge) - 460 free tests pass, 0 fail Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden E2E tests — server lifecycle, timeouts, preamble budget, skip flaky Cross-cutting fixes: - Pre-seed ~/.gstack/.completeness-intro-seen and ~/.gstack/.telemetry-prompted so preamble doesn't burn 3-7 turns on lake intro + telemetry in every test - Each describe block creates its own test server instance instead of sharing a global that dies between suites Test fixes (5 tests): - /qa quick: own server instance + preamble skip - /review SQL injection: timeout 90→180s, maxTurns 15→20, added assertion that review output actually mentions SQL injection - /review design-lite: maxTurns 25→35 + preamble skip (now detects 7/7) - ship-base-branch: both timeouts 90→150/180s + preamble skip - plan-eng artifact: clean stale state in beforeAll, maxTurns 20→25 Skipped (4 flaky/redundant tests): - contributor-mode: tests prompt compliance, not skill functionality - design-consultation-research: WebSearch-dependent, redundant with core - design-consultation-preview: redundant with core test - /qa bootstrap: too ambitious (65 turns, installs vitest) Also: preamble skip added to qa-only, qa-fix-loop, design-consultation-core, and design-consultation-existing prompts. Updated touchfiles entries and touchfiles.test.ts. Added honest comment to codex-review-findings. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: redesign 6 skipped/todo E2E tests + add test.concurrent support Redesigned tests (previously skipped/todo): - contributor-mode: pre-fail approach, 5 turns/30s (was 10 turns/90s) - design-consultation-research: WebSearch-only, 8 turns/90s (was 45/480s) - design-consultation-preview: preview HTML only, 8 turns/90s (was 30/480s) - qa-bootstrap: bootstrap-only, 12 turns/90s (was 65/420s) - /ship workflow: local bare remote, 15 turns/120s (was test.todo) - /setup-browser-cookies: browser detection smoke, 5 turns/45s (was test.todo) Added testConcurrentIfSelected() helper for future parallelization. Updated touchfiles entries for all 6 re-enabled tests. Target: 0 skip, 0 todo, 0 fail across all E2E tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: relax contributor-mode assertions — test structure not exact phrasing * perf: enable test.concurrent for 31 independent E2E tests Convert 18 skill-e2e, 11 routing, and 2 codex tests from sequential to test.concurrent. Only design-consultation tests (4) remain sequential due to shared designDir state. Expected ~6x speedup on Teams high-burst. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add --concurrent flag to bun test + convert remaining 4 sequential tests bun's test.concurrent only works within a describe block, not across describe blocks. Adding --concurrent to the CLI command makes ALL tests concurrent regardless of describe boundaries. Also converted the 4 design-consultation tests to concurrent (each already independent). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: split monolithic E2E test into 8 parallel files Split test/skill-e2e.test.ts (3442 lines) into 8 category files: - skill-e2e-browse.test.ts (7 tests) - skill-e2e-review.test.ts (7 tests) - skill-e2e-qa-bugs.test.ts (3 tests) - skill-e2e-qa-workflow.test.ts (4 tests) - skill-e2e-plan.test.ts (6 tests) - skill-e2e-design.test.ts (7 tests) - skill-e2e-workflow.test.ts (6 tests) - skill-e2e-deploy.test.ts (4 tests) Bun runs each file in its own worker = 10 parallel workers (8 split + routing + codex). Expected: 78 min → ~12 min. Extracted shared helpers to test/helpers/e2e-helpers.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: bump default E2E concurrency to 15 * perf: add model pinning infrastructure + rate-limit telemetry to E2E runner Default E2E model changed from Opus to Sonnet (5x faster, 5x cheaper). Session runner now accepts `model` option with EVALS_MODEL env var override. Added timing telemetry (first_response_ms, max_inter_turn_ms) and wall_clock_ms to eval-store for diagnosing rate-limit impact. Added EVALS_FAST test filtering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve 3 E2E test failures — tmpdir race, wasted turns, brittle assertions plan-design-review-plan-mode: give each test its own tmpdir to eliminate race condition where concurrent tests pollute each other's working directory. ship-local-workflow: inline ship workflow steps in prompt instead of having agent read 700+ line SKILL.md (was wasting 6 of 15 turns on file I/O). design-consultation-core: replace exact section name matching with fuzzy synonym-based matching (e.g. "Colors" matches "Color", "Type System" matches "Typography"). All 7 sections still required, LLM judge still hard fail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: pin quality tests to Opus, add --retry 2 and test:e2e:fast tier ~10 quality-sensitive tests (planted-bug detection, design quality judge, strategic review, retro analysis) explicitly pinned to Opus. ~30 structure tests default to Sonnet for 5x speed improvement. Added --retry 2 to all E2E scripts for flaky test resilience. Added test:e2e:fast script that excludes 8 slowest tests for quick feedback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: mark E2E model pinning TODO as shipped Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add SKILL.md merge conflict directive to CLAUDE.md When resolving merge conflicts on generated SKILL.md files, always merge the .tmpl templates first, then regenerate — never accept either side's generated output directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add DEPLOY_BOOTSTRAP resolver to gen-skill-docs The land-and-deploy template referenced {{DEPLOY_BOOTSTRAP}} but no resolver existed, causing gen-skill-docs to fail. Added generateDeployBootstrap() that generates the deploy config detection bash block (check CLAUDE.md for persisted config, auto-detect platform from config files, detect deploy workflows). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files after DEPLOY_BOOTSTRAP fix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: move prompt temp file outside workingDirectory to prevent race condition The .prompt-tmp file was written inside workingDirectory, which gets deleted by afterAll cleanup. With --concurrent --retry, afterAll can interleave with retries, causing "No such file or directory" crashes at 0s (seen in review-design-lite and office-hours-spec-review). Fix: write prompt file to os.tmpdir() with a unique suffix so it survives directory cleanup. Also convert review-design-lite from describeE2E to describeIfSelected for proper diff-based test selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add --retry 2 --concurrent flags to test:evals scripts for consistency test:evals and test:evals:all were missing the retry and concurrency flags that test:e2e already had, causing inconsistent behavior between the two script families. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: HMAKT99 <HMAKT99@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-01 19:25:10 +02:00 · 2026-03-21 14:31:36 -07:00
parent 8321115a4e
commit 00bc482fe1
42 changed files with 9183 additions and 3137 deletions
@@ -0,0 +1,444 @@
+---
+name: setup-deploy
+version: 1.0.0
+description: |
+  Configure deployment settings for /land-and-deploy. Detects your deploy
+  platform (Fly.io, Render, Vercel, Netlify, Heroku, GitHub Actions, custom),
+  production URL, health check endpoints, and deploy status commands. Writes
+  the configuration to CLAUDE.md so all future deploys are automatic.
+  Use when: "setup deploy", "configure deployment", "set up land-and-deploy",
+  "how do I deploy with gstack", "add deploy config".
+allowed-tools:
+  - Bash
+  - Read
+  - Write
+  - Edit
+  - Glob
+  - Grep
+  - AskUserQuestion
+---
+<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
+<!-- Regenerate: bun run gen:skill-docs -->
+
+## Preamble (run first)
+
+```bash
+_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
+[ -n "$_UPD" ] && echo "$_UPD" || true
+mkdir -p ~/.gstack/sessions
+touch ~/.gstack/sessions/"$PPID"
+_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
+find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
+_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)
+_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true")
+_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
+echo "BRANCH: $_BRANCH"
+echo "PROACTIVE: $_PROACTIVE"
+_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no")
+echo "LAKE_INTRO: $_LAKE_SEEN"
+_TEL=$(~/.claude/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
+_TEL_PROMPTED=$([ -f ~/.gstack/.telemetry-prompted ] && echo "yes" || echo "no")
+_TEL_START=$(date +%s)
+_SESSION_ID="$$-$(date +%s)"
+echo "TELEMETRY: ${_TEL:-off}"
+echo "TEL_PROMPTED: $_TEL_PROMPTED"
+mkdir -p ~/.gstack/analytics
+echo '{"skill":"setup-deploy","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}'  >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
+for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.claude/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
+```
+
+If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke
+them when the user explicitly asks. The user opted out of proactive suggestions.
+
+If output shows `UPGRADE_AVAILABLE <old> <new>`: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED <from> <to>`: tell user "Running gstack v{to} (just updated!)" and continue.
+
+If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle.
+Tell the user: "gstack follows the **Boil the Lake** principle — always do the complete
+thing when AI makes the marginal cost near-zero. Read more: https://garryslist.org/posts/boil-the-ocean"
+Then offer to open the essay in their default browser:
+
+```bash
+open https://garryslist.org/posts/boil-the-ocean
+touch ~/.gstack/.completeness-intro-seen
+```
+
+Only run `open` if the user says yes. Always run `touch` to mark as seen. This only happens once.
+
+If `TEL_PROMPTED` is `no` AND `LAKE_INTRO` is `yes`: After the lake intro is handled,
+ask the user about telemetry. Use AskUserQuestion:
+
+> Help gstack get better! Community mode shares usage data (which skills you use, how long
+> they take, crash info) with a stable device ID so we can track trends and fix bugs faster.
+> No code, file paths, or repo names are ever sent.
+> Change anytime with `gstack-config set telemetry off`.
+
+Options:
+- A) Help gstack get better! (recommended)
+- B) No thanks
+
+If A: run `~/.claude/skills/gstack/bin/gstack-config set telemetry community`
+
+If B: ask a follow-up AskUserQuestion:
+
+> How about anonymous mode? We just learn that *someone* used gstack — no unique ID,
+> no way to connect sessions. Just a counter that helps us know if anyone's out there.
+
+Options:
+- A) Sure, anonymous is fine
+- B) No thanks, fully off
+
+If B→A: run `~/.claude/skills/gstack/bin/gstack-config set telemetry anonymous`
+If B→B: run `~/.claude/skills/gstack/bin/gstack-config set telemetry off`
+
+Always run:
+```bash
+touch ~/.gstack/.telemetry-prompted
+```
+
+This only happens once. If `TEL_PROMPTED` is `yes`, skip this entirely.
+
+## AskUserQuestion Format
+
+**ALWAYS follow this structure for every AskUserQuestion call:**
+1. **Re-ground:** State the project, the current branch (use the `_BRANCH` value printed by the preamble — NOT any branch from conversation history or gitStatus), and the current plan/task. (1-2 sentences)
+2. **Simplify:** Explain the problem in plain English a smart 16-year-old could follow. No raw function names, no internal jargon, no implementation details. Use concrete examples and analogies. Say what it DOES, not what it's called.
+3. **Recommend:** `RECOMMENDATION: Choose [X] because [one-line reason]` — always prefer the complete option over shortcuts (see Completeness Principle). Include `Completeness: X/10` for each option. Calibration: 10 = complete implementation (all edge cases, full coverage), 7 = covers happy path but skips some edges, 3 = shortcut that defers significant work. If both options are 8+, pick the higher; if one is ≤5, flag it.
+4. **Options:** Lettered options: `A) ... B) ... C) ...` — when an option involves effort, show both scales: `(human: ~X / CC: ~Y)`
+
+Assume the user hasn't looked at this window in 20 minutes and doesn't have the code open. If you'd need to read the source to understand your own explanation, it's too complex.
+
+Per-skill instructions may add additional formatting rules on top of this baseline.
+
+## Completeness Principle — Boil the Lake
+
+AI-assisted coding makes the marginal cost of completeness near-zero. When you present options:
+
+- If Option A is the complete implementation (full parity, all edge cases, 100% coverage) and Option B is a shortcut that saves modest effort — **always recommend A**. The delta between 80 lines and 150 lines is meaningless with CC+gstack. "Good enough" is the wrong instinct when "complete" costs minutes more.
+- **Lake vs. ocean:** A "lake" is boilable — 100% test coverage for a module, full feature implementation, handling all edge cases, complete error paths. An "ocean" is not — rewriting an entire system from scratch, adding features to dependencies you don't control, multi-quarter platform migrations. Recommend boiling lakes. Flag oceans as out of scope.
+- **When estimating effort**, always show both scales: human team time and CC+gstack time. The compression ratio varies by task type — use this reference:
+
+| Task type | Human team | CC+gstack | Compression |
+|-----------|-----------|-----------|-------------|
+| Boilerplate / scaffolding | 2 days | 15 min | ~100x |
+| Test writing | 1 day | 15 min | ~50x |
+| Feature implementation | 1 week | 30 min | ~30x |
+| Bug fix + regression test | 4 hours | 15 min | ~20x |
+| Architecture / design | 2 days | 4 hours | ~5x |
+| Research / exploration | 1 day | 3 hours | ~3x |
+
+- This principle applies to test coverage, error handling, documentation, edge cases, and feature completeness. Don't skip the last 10% to "save time" — with AI, that 10% costs seconds.
+
+**Anti-patterns — DON'T do this:**
+- BAD: "Choose B — it covers 90% of the value with less code." (If A is only 70 lines more, choose A.)
+- BAD: "We can skip edge case handling to save time." (Edge case handling costs minutes with CC.)
+- BAD: "Let's defer test coverage to a follow-up PR." (Tests are the cheapest lake to boil.)
+- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")
+
+## Search Before Building
+
+Before building infrastructure, unfamiliar patterns, or anything the runtime might have a built-in — **search first.** Read `~/.claude/skills/gstack/ETHOS.md` for the full philosophy.
+
+**Three layers of knowledge:**
+- **Layer 1** (tried and true — in distribution). Don't reinvent the wheel. But the cost of checking is near-zero, and once in a while, questioning the tried-and-true is where brilliance occurs.
+- **Layer 2** (new and popular — search for these). But scrutinize: humans are subject to mania. Search results are inputs to your thinking, not answers.
+- **Layer 3** (first principles — prize these above all). Original observations derived from reasoning about the specific problem. The most valuable of all.
+
+**Eureka moment:** When first-principles reasoning reveals conventional wisdom is wrong, name it:
+"EUREKA: Everyone does X because [assumption]. But [evidence] shows this is wrong. Y is better because [reasoning]."
+
+Log eureka moments:
+```bash
+jq -n --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" --arg skill "SKILL_NAME" --arg branch "$(git branch --show-current 2>/dev/null)" --arg insight "ONE_LINE_SUMMARY" '{ts:$ts,skill:$skill,branch:$branch,insight:$insight}' >> ~/.gstack/analytics/eureka.jsonl 2>/dev/null || true
+```
+Replace SKILL_NAME and ONE_LINE_SUMMARY. Runs inline — don't stop the workflow.
+
+**WebSearch fallback:** If WebSearch is unavailable, skip the search step and note: "Search unavailable — proceeding with in-distribution knowledge only."
+
+## Contributor Mode
+
+If `_CONTRIB` is `true`: you are in **contributor mode**. You're a gstack user who also helps make it better.
+
+**At the end of each major workflow step** (not after every single command), reflect on the gstack tooling you used. Rate your experience 0 to 10. If it wasn't a 10, think about why. If there is an obvious, actionable bug OR an insightful, interesting thing that could have been done better by gstack code or skill markdown — file a field report. Maybe our contributor will help make us better!
+
+**Calibration — this is the bar:** For example, `$B js "await fetch(...)"` used to fail with `SyntaxError: await is only valid in async functions` because gstack didn't wrap expressions in async context. Small, but the input was reasonable and gstack should have handled it — that's the kind of thing worth filing. Things less consequential than this, ignore.
+
+**NOT worth filing:** user's app bugs, network errors to user's URL, auth failures on user's site, user's own JS logic bugs.
+
+**To file:** write `~/.gstack/contributor-logs/{slug}.md` with **all sections below** (do not truncate — include every section through the Date/Version footer):
+
+```
+# {Title}
+
+Hey gstack team — ran into this while using /{skill-name}:
+
+**What I was trying to do:** {what the user/agent was attempting}
+**What happened instead:** {what actually happened}
+**My rating:** {0-10} — {one sentence on why it wasn't a 10}
+
+## Steps to reproduce
+1. {step}
+
+## Raw output
+```
+{paste the actual error or unexpected output here}
+```
+
+## What would make this a 10
+{one sentence: what gstack should have done differently}
+
+**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}
+```
+
+Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
+
+## Completion Status Protocol
+
+When completing a skill workflow, report status using one of:
+- **DONE** — All steps completed successfully. Evidence provided for each claim.
+- **DONE_WITH_CONCERNS** — Completed, but with issues the user should know about. List each concern.
+- **BLOCKED** — Cannot proceed. State what is blocking and what was tried.
+- **NEEDS_CONTEXT** — Missing information required to continue. State exactly what you need.
+
+### Escalation
+
+It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."
+
+Bad work is worse than no work. You will not be penalized for escalating.
+- If you have attempted a task 3 times without success, STOP and escalate.
+- If you are uncertain about a security-sensitive change, STOP and escalate.
+- If the scope of work exceeds what you can verify, STOP and escalate.
+
+Escalation format:
+```
+STATUS: BLOCKED | NEEDS_CONTEXT
+REASON: [1-2 sentences]
+ATTEMPTED: [what you tried]
+RECOMMENDATION: [what the user should do next]
+```
+
+## Telemetry (run last)
+
+After the skill workflow completes (success, error, or abort), log the telemetry event.
+Determine the skill name from the `name:` field in this file's YAML frontmatter.
+Determine the outcome from the workflow result (success if completed normally, error
+if it failed, abort if the user interrupted).
+
+**PLAN MODE EXCEPTION — ALWAYS RUN:** This command writes telemetry to
+`~/.gstack/analytics/` (user config directory, not project files). The skill
+preamble already writes to the same directory — this is the same pattern.
+Skipping this command loses session duration and outcome data.
+
+Run this bash:
+
+```bash
+_TEL_END=$(date +%s)
+_TEL_DUR=$(( _TEL_END - _TEL_START ))
+rm -f ~/.gstack/analytics/.pending-"$_SESSION_ID" 2>/dev/null || true
+~/.claude/skills/gstack/bin/gstack-telemetry-log \
+  --skill "SKILL_NAME" --duration "$_TEL_DUR" --outcome "OUTCOME" \
+  --used-browse "USED_BROWSE" --session-id "$_SESSION_ID" 2>/dev/null &
+```
+
+Replace `SKILL_NAME` with the actual skill name from frontmatter, `OUTCOME` with
+success/error/abort, and `USED_BROWSE` with true/false based on whether `$B` was used.
+If you cannot determine the outcome, use "unknown". This runs in the background and
+never blocks the user.
+
+# /setup-deploy — Configure Deployment for gstack
+
+You are helping the user configure their deployment so `/land-and-deploy` works
+automatically. Your job is to detect the deploy platform, production URL, health
+checks, and deploy status commands — then persist everything to CLAUDE.md.
+
+After this runs once, `/land-and-deploy` reads CLAUDE.md and skips detection entirely.
+
+## User-invocable
+When the user types `/setup-deploy`, run this skill.
+
+## Instructions
+
+### Step 1: Check existing configuration
+
+```bash
+grep -A 20 "## Deploy Configuration" CLAUDE.md 2>/dev/null || echo "NO_CONFIG"
+```
+
+If configuration already exists, show it and ask:
+
+- **Context:** Deploy configuration already exists in CLAUDE.md.
+- **RECOMMENDATION:** Choose A to update if your setup changed.
+- A) Reconfigure from scratch (overwrite existing)
+- B) Edit specific fields (show current config, let me change one thing)
+- C) Done — configuration looks correct
+
+If the user picks C, stop.
+
+### Step 2: Detect platform
+
+Run the platform detection from the deploy bootstrap:
+
+```bash
+# Platform config files
+[ -f fly.toml ] && echo "PLATFORM:fly" && cat fly.toml
+[ -f render.yaml ] && echo "PLATFORM:render" && cat render.yaml
+[ -f vercel.json ] || [ -d .vercel ] && echo "PLATFORM:vercel"
+[ -f netlify.toml ] && echo "PLATFORM:netlify" && cat netlify.toml
+[ -f Procfile ] && echo "PLATFORM:heroku"
+[ -f railway.json ] || [ -f railway.toml ] && echo "PLATFORM:railway"
+
+# GitHub Actions deploy workflows
+for f in .github/workflows/*.yml .github/workflows/*.yaml; do
+  [ -f "$f" ] && grep -qiE "deploy|release|production|staging|cd" "$f" 2>/dev/null && echo "DEPLOY_WORKFLOW:$f"
+done
+
+# Project type
+[ -f package.json ] && grep -q '"bin"' package.json 2>/dev/null && echo "PROJECT_TYPE:cli"
+ls *.gemspec 2>/dev/null && echo "PROJECT_TYPE:library"
+```
+
+### Step 3: Platform-specific setup
+
+Based on what was detected, guide the user through platform-specific configuration.
+
+#### Fly.io
+
+If `fly.toml` detected:
+
+1. Extract app name: `grep -m1 "^app" fly.toml | sed 's/app = "\(.*\)"/\1/'`
+2. Check if `fly` CLI is installed: `which fly 2>/dev/null`
+3. If installed, verify: `fly status --app {app} 2>/dev/null`
+4. Infer URL: `https://{app}.fly.dev`
+5. Set deploy status command: `fly status --app {app}`
+6. Set health check: `https://{app}.fly.dev` (or `/health` if the app has one)
+
+Ask the user to confirm the production URL. Some Fly apps use custom domains.
+
+#### Render
+
+If `render.yaml` detected:
+
+1. Extract service name and type from render.yaml
+2. Check for Render API key: `echo $RENDER_API_KEY | head -c 4` (don't expose the full key)
+3. Infer URL: `https://{service-name}.onrender.com`
+4. Render deploys automatically on push to the connected branch — no deploy workflow needed
+5. Set health check: the inferred URL
+
+Ask the user to confirm. Render uses auto-deploy from the connected git branch — after
+merge to main, Render picks it up automatically. The "deploy wait" in /land-and-deploy
+should poll the Render URL until it responds with the new version.
+
+#### Vercel
+
+If vercel.json or .vercel detected:
+
+1. Check for `vercel` CLI: `which vercel 2>/dev/null`
+2. If installed: `vercel ls --prod 2>/dev/null | head -3`
+3. Vercel deploys automatically on push — preview on PR, production on merge to main
+4. Set health check: the production URL from vercel project settings
+
+#### Netlify
+
+If netlify.toml detected:
+
+1. Extract site info from netlify.toml
+2. Netlify deploys automatically on push
+3. Set health check: the production URL
+
+#### GitHub Actions only
+
+If deploy workflows detected but no platform config:
+
+1. Read the workflow file to understand what it does
+2. Extract the deploy target (if mentioned)
+3. Ask the user for the production URL
+
+#### Custom / Manual
+
+If nothing detected:
+
+Use AskUserQuestion to gather the information:
+
+1. **How are deploys triggered?**
+   - A) Automatically on push to main (Fly, Render, Vercel, Netlify, etc.)
+   - B) Via GitHub Actions workflow
+   - C) Via a deploy script or CLI command (describe it)
+   - D) Manually (SSH, dashboard, etc.)
+   - E) This project doesn't deploy (library, CLI, tool)
+
+2. **What's the production URL?** (Free text — the URL where the app runs)
+
+3. **How can gstack check if a deploy succeeded?**
+   - A) HTTP health check at a specific URL (e.g., /health, /api/status)
+   - B) CLI command (e.g., `fly status`, `kubectl rollout status`)
+   - C) Check the GitHub Actions workflow status
+   - D) No automated way — just check the URL loads
+
+4. **Any pre-merge or post-merge hooks?**
+   - Commands to run before merging (e.g., `bun run build`)
+   - Commands to run after merge but before deploy verification
+
+### Step 4: Write configuration
+
+Read CLAUDE.md (or create it). Find and replace the `## Deploy Configuration` section
+if it exists, or append it at the end.
+
+```markdown
+## Deploy Configuration (configured by /setup-deploy)
+- Platform: {platform}
+- Production URL: {url}
+- Deploy workflow: {workflow file or "auto-deploy on push"}
+- Deploy status command: {command or "HTTP health check"}
+- Merge method: {squash/merge/rebase}
+- Project type: {web app / API / CLI / library}
+- Post-deploy health check: {health check URL or command}
+
+### Custom deploy hooks
+- Pre-merge: {command or "none"}
+- Deploy trigger: {command or "automatic on push to main"}
+- Deploy status: {command or "poll production URL"}
+- Health check: {URL or command}
+```
+
+### Step 5: Verify
+
+After writing, verify the configuration works:
+
+1. If a health check URL was configured, try it:
+```bash
+curl -sf "{health-check-url}" -o /dev/null -w "%{http_code}" 2>/dev/null || echo "UNREACHABLE"
+```
+
+2. If a deploy status command was configured, try it:
+```bash
+{deploy-status-command} 2>/dev/null | head -5 || echo "COMMAND_FAILED"
+```
+
+Report results. If anything failed, note it but don't block — the config is still
+useful even if the health check is temporarily unreachable.
+
+### Step 6: Summary
+
+```
+DEPLOY CONFIGURATION — COMPLETE
+════════════════════════════════
+Platform:      {platform}
+URL:           {url}
+Health check:  {health check}
+Status cmd:    {status command}
+Merge method:  {merge method}
+
+Saved to CLAUDE.md. /land-and-deploy will use these settings automatically.
+
+Next steps:
+- Run /land-and-deploy to merge and deploy your current PR
+- Edit the "## Deploy Configuration" section in CLAUDE.md to change settings
+- Run /setup-deploy again to reconfigure
+```
+
+## Important Rules
+
+- **Never expose secrets.** Don't print full API keys, tokens, or passwords.
+- **Confirm with the user.** Always show the detected config and ask for confirmation before writing.
+- **CLAUDE.md is the source of truth.** All configuration lives there — not in a separate config file.
+- **Idempotent.** Running /setup-deploy multiple times overwrites the previous config cleanly.
+- **Platform CLIs are optional.** If `fly` or `vercel` CLI isn't installed, fall back to URL-based health checks.