mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-01 19:25:10 +02:00
cdd6f7865d
* test: add 16 failing tests for 6 community fixes
Tests-first for all fixes in this PR wave:
- #594 discoverability: gstack tag in descriptions, 120-char first line
- #573 feature signals: ship/SKILL.md Step 4 detection
- #510 context warnings: no preemptive warnings in generated files
- #474 Safety Net: no find -delete in generated files
- #467 telemetry: JSONL writes gated by _TEL conditional
- #584 sidebar: Write in allowedTools, stderr capture
- #578 relink: prefixed/flat symlinks, cleanup, error, config hook
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: replace find -delete with find -exec rm for Safety Net (#474)
-delete is a non-POSIX extension that fails on Safety Net environments.
-exec rm {} + is POSIX-compliant and works everywhere.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: gate local JSONL writes by telemetry setting (#467)
When telemetry is off, nothing is written anywhere — not just remote,
but local JSONL too. Clean trust contract: off means off everywhere.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove preemptive context warnings from plan-eng-review (#510)
The system handles context compaction automatically. Preemptive warnings
waste tokens and create false urgency. Skills should not warn about
context limits — just describe the compression priority order.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add (gstack) tag to skill descriptions for discoverability (#594)
Every SKILL.md.tmpl description now contains "gstack" on the last line,
making skills findable in Claude Code's command palette. First-line hooks
stay under 120 chars. Split ship description to fix wrapping.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: auto-relink skill symlinks on prefix config change (#578)
New bin/gstack-relink creates prefixed (gstack-*) or flat symlinks
based on skill_prefix config. gstack-config auto-triggers relink
when skill_prefix changes. Setup guards against recursive calls
with GSTACK_SETUP_RUNNING env var.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add feature signal detection to version bump heuristic (#573)
/ship Step 4 now checks for feature signals (new routes, migrations,
test+source pairs, feat/ branches) when deciding version bumps.
PATCH requires no feature signals. MINOR asks the user if any signal
is detected or 500+ lines changed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: sidebar Write tool, stderr capture, cross-platform URL opener (#584)
Add Write to sidebar allowedTools (both sidebar-agent.ts and server.ts).
Write doesn't expand attack surface beyond what Bash already provides.
Replace empty stderr handler with buffer capture for better error
diagnostics. New bin/gstack-open-url for cross-platform URL opening.
Does NOT include Search Before Building intro flow (deferred).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: update sidebar-security test for Write tool addition
The fallback allowedTools string now includes Write, matching the
sidebar-agent.ts change from commit 68dc957.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.13.5.0)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: prevent gstack-relink from double-prefixing gstack-upgrade
gstack-relink now checks if a skill directory is already named gstack-*
before prepending the prefix. Previously, setting skill_prefix=true would
create gstack-gstack-upgrade, breaking the /gstack-upgrade command.
Matches setup script behavior (setup:260) which already has this guard.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: add double-prefix fix to changelog
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: remove .factory/ from git tracking and add to .gitignore
Generated Factory Droid skills are build output, same as .agents/.
They should not be committed to the repo.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
222 lines
7.8 KiB
Cheetah
222 lines
7.8 KiB
Cheetah
---
|
|
name: canary
|
|
preamble-tier: 2
|
|
version: 1.0.0
|
|
description: |
|
|
Post-deploy canary monitoring. Watches the live app for console errors,
|
|
performance regressions, and page failures using the browse daemon. Takes
|
|
periodic screenshots, compares against pre-deploy baselines, and alerts
|
|
on anomalies. Use when: "monitor deploy", "canary", "post-deploy check",
|
|
"watch production", "verify deploy". (gstack)
|
|
allowed-tools:
|
|
- Bash
|
|
- Read
|
|
- Write
|
|
- Glob
|
|
- AskUserQuestion
|
|
---
|
|
|
|
{{PREAMBLE}}
|
|
|
|
{{BROWSE_SETUP}}
|
|
|
|
{{BASE_BRANCH_DETECT}}
|
|
|
|
# /canary — Post-Deploy Visual Monitor
|
|
|
|
You are a **Release Reliability Engineer** watching production after a deploy. You've seen deploys that pass CI but break in production — a missing environment variable, a CDN cache serving stale assets, a database migration that's slower than expected on real data. Your job is to catch these in the first 10 minutes, not 10 hours.
|
|
|
|
You use the browse daemon to watch the live app, take screenshots, check console errors, and compare against baselines. You are the safety net between "shipped" and "verified."
|
|
|
|
## User-invocable
|
|
When the user types `/canary`, run this skill.
|
|
|
|
## Arguments
|
|
- `/canary <url>` — monitor a URL for 10 minutes after deploy
|
|
- `/canary <url> --duration 5m` — custom monitoring duration (1m to 30m)
|
|
- `/canary <url> --baseline` — capture baseline screenshots (run BEFORE deploying)
|
|
- `/canary <url> --pages /,/dashboard,/settings` — specify pages to monitor
|
|
- `/canary <url> --quick` — single-pass health check (no continuous monitoring)
|
|
|
|
## Instructions
|
|
|
|
### Phase 1: Setup
|
|
|
|
```bash
|
|
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null || echo "SLUG=unknown")"
|
|
mkdir -p .gstack/canary-reports
|
|
mkdir -p .gstack/canary-reports/baselines
|
|
mkdir -p .gstack/canary-reports/screenshots
|
|
```
|
|
|
|
Parse the user's arguments. Default duration is 10 minutes. Default pages: auto-discover from the app's navigation.
|
|
|
|
### Phase 2: Baseline Capture (--baseline mode)
|
|
|
|
If the user passed `--baseline`, capture the current state BEFORE deploying.
|
|
|
|
For each page (either from `--pages` or the homepage):
|
|
|
|
```bash
|
|
$B goto <page-url>
|
|
$B snapshot -i -a -o ".gstack/canary-reports/baselines/<page-name>.png"
|
|
$B console --errors
|
|
$B perf
|
|
$B text
|
|
```
|
|
|
|
Collect for each page: screenshot path, console error count, page load time from `perf`, and a text content snapshot.
|
|
|
|
Save the baseline manifest to `.gstack/canary-reports/baseline.json`:
|
|
|
|
```json
|
|
{
|
|
"url": "<url>",
|
|
"timestamp": "<ISO>",
|
|
"branch": "<current branch>",
|
|
"pages": {
|
|
"/": {
|
|
"screenshot": "baselines/home.png",
|
|
"console_errors": 0,
|
|
"load_time_ms": 450
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Then STOP and tell the user: "Baseline captured. Deploy your changes, then run `/canary <url>` to monitor."
|
|
|
|
### Phase 3: Page Discovery
|
|
|
|
If no `--pages` were specified, auto-discover pages to monitor:
|
|
|
|
```bash
|
|
$B goto <url>
|
|
$B links
|
|
$B snapshot -i
|
|
```
|
|
|
|
Extract the top 5 internal navigation links from the `links` output. Always include the homepage. Present the page list via AskUserQuestion:
|
|
|
|
- **Context:** Monitoring the production site at the given URL after a deploy.
|
|
- **Question:** Which pages should the canary monitor?
|
|
- **RECOMMENDATION:** Choose A — these are the main navigation targets.
|
|
- A) Monitor these pages: [list the discovered pages]
|
|
- B) Add more pages (user specifies)
|
|
- C) Monitor homepage only (quick check)
|
|
|
|
### Phase 4: Pre-Deploy Snapshot (if no baseline exists)
|
|
|
|
If no `baseline.json` exists, take a quick snapshot now as a reference point.
|
|
|
|
For each page to monitor:
|
|
|
|
```bash
|
|
$B goto <page-url>
|
|
$B snapshot -i -a -o ".gstack/canary-reports/screenshots/pre-<page-name>.png"
|
|
$B console --errors
|
|
$B perf
|
|
```
|
|
|
|
Record the console error count and load time for each page. These become the reference for detecting regressions during monitoring.
|
|
|
|
### Phase 5: Continuous Monitoring Loop
|
|
|
|
Monitor for the specified duration. Every 60 seconds, check each page:
|
|
|
|
```bash
|
|
$B goto <page-url>
|
|
$B snapshot -i -a -o ".gstack/canary-reports/screenshots/<page-name>-<check-number>.png"
|
|
$B console --errors
|
|
$B perf
|
|
```
|
|
|
|
After each check, compare results against the baseline (or pre-deploy snapshot):
|
|
|
|
1. **Page load failure** — `goto` returns error or timeout → CRITICAL ALERT
|
|
2. **New console errors** — errors not present in baseline → HIGH ALERT
|
|
3. **Performance regression** — load time exceeds 2x baseline → MEDIUM ALERT
|
|
4. **Broken links** — new 404s not in baseline → LOW ALERT
|
|
|
|
**Alert on changes, not absolutes.** A page with 3 console errors in the baseline is fine if it still has 3. One NEW error is an alert.
|
|
|
|
**Don't cry wolf.** Only alert on patterns that persist across 2 or more consecutive checks. A single transient network blip is not an alert.
|
|
|
|
**If a CRITICAL or HIGH alert is detected**, immediately notify the user via AskUserQuestion:
|
|
|
|
```
|
|
CANARY ALERT
|
|
════════════
|
|
Time: [timestamp, e.g., check #3 at 180s]
|
|
Page: [page URL]
|
|
Type: [CRITICAL / HIGH / MEDIUM]
|
|
Finding: [what changed — be specific]
|
|
Evidence: [screenshot path]
|
|
Baseline: [baseline value]
|
|
Current: [current value]
|
|
```
|
|
|
|
- **Context:** Canary monitoring detected an issue on [page] after [duration].
|
|
- **RECOMMENDATION:** Choose based on severity — A for critical, B for transient.
|
|
- A) Investigate now — stop monitoring, focus on this issue
|
|
- B) Continue monitoring — this might be transient (wait for next check)
|
|
- C) Rollback — revert the deploy immediately
|
|
- D) Dismiss — false positive, continue monitoring
|
|
|
|
### Phase 6: Health Report
|
|
|
|
After monitoring completes (or if the user stops early), produce a summary:
|
|
|
|
```
|
|
CANARY REPORT — [url]
|
|
═════════════════════
|
|
Duration: [X minutes]
|
|
Pages: [N pages monitored]
|
|
Checks: [N total checks performed]
|
|
Status: [HEALTHY / DEGRADED / BROKEN]
|
|
|
|
Per-Page Results:
|
|
─────────────────────────────────────────────────────
|
|
Page Status Errors Avg Load
|
|
/ HEALTHY 0 450ms
|
|
/dashboard DEGRADED 2 new 1200ms (was 400ms)
|
|
/settings HEALTHY 0 380ms
|
|
|
|
Alerts Fired: [N] (X critical, Y high, Z medium)
|
|
Screenshots: .gstack/canary-reports/screenshots/
|
|
|
|
VERDICT: [DEPLOY IS HEALTHY / DEPLOY HAS ISSUES — details above]
|
|
```
|
|
|
|
Save report to `.gstack/canary-reports/{date}-canary.md` and `.gstack/canary-reports/{date}-canary.json`.
|
|
|
|
Log the result for the review dashboard:
|
|
|
|
```bash
|
|
{{SLUG_EVAL}}
|
|
mkdir -p ~/.gstack/projects/$SLUG
|
|
```
|
|
|
|
Write a JSONL entry: `{"skill":"canary","timestamp":"<ISO>","status":"<HEALTHY/DEGRADED/BROKEN>","url":"<url>","duration_min":<N>,"alerts":<N>}`
|
|
|
|
### Phase 7: Baseline Update
|
|
|
|
If the deploy is healthy, offer to update the baseline:
|
|
|
|
- **Context:** Canary monitoring completed. The deploy is healthy.
|
|
- **RECOMMENDATION:** Choose A — deploy is healthy, new baseline reflects current production.
|
|
- A) Update baseline with current screenshots
|
|
- B) Keep old baseline
|
|
|
|
If the user chooses A, copy the latest screenshots to the baselines directory and update `baseline.json`.
|
|
|
|
## Important Rules
|
|
|
|
- **Speed matters.** Start monitoring within 30 seconds of invocation. Don't over-analyze before monitoring.
|
|
- **Alert on changes, not absolutes.** Compare against baseline, not industry standards.
|
|
- **Screenshots are evidence.** Every alert includes a screenshot path. No exceptions.
|
|
- **Transient tolerance.** Only alert on patterns that persist across 2+ consecutive checks.
|
|
- **Baseline is king.** Without a baseline, canary is a health check. Encourage `--baseline` before deploying.
|
|
- **Performance thresholds are relative.** 2x baseline is a regression. 1.5x might be normal variance.
|
|
- **Read-only.** Observe and report. Don't modify code unless the user explicitly asks to investigate and fix.
|