From 4fe0ce9cba4b367a36004720cddb952172e7949d Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Wed, 18 Mar 2026 23:08:04 -0500 Subject: [PATCH 1/4] feat: natural language skill routing + proactive suggestions (v0.7.1) (#195) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat: add trigger phrases to /debug and /office-hours These two skills had zero "Use when asked to..." phrases, making them completely invisible to natural language. Users saying "debug this" or "brainstorm an idea" would get no skill invocation. * feat: add proactive triggers to all workflow skills Every skill now has "Proactively suggest when..." language so Claude surfaces skills at natural moments — not just when the user says specific trigger phrases. * feat: lifecycle map + proactive preference system Root gstack description now includes a developer workflow guide mapping 12 stages to skills. Preamble reads proactive preference via gstack-config. Users can opt out with "stop suggesting things" and re-enable with "be proactive again" — natural language toggle, no CLI needed. * test: 11 journey-stage E2E routing tests + trigger phrase validation Each test simulates a real development stage (ideation, plan review, debug, QA, ship, retro...) with realistic project context and verifies the right skill fires from natural language alone. 11/11 pass. * chore: bump version and changelog (v0.7.1) Co-Authored-By: Claude Opus 4.6 --------- Co-authored-by: Claude Opus 4.6 --- CHANGELOG.md | 14 + SKILL.md | 31 ++ SKILL.md.tmpl | 26 ++ VERSION | 2 +- browse/SKILL.md | 5 + debug/SKILL.md | 9 + debug/SKILL.md.tmpl | 4 + design-consultation/SKILL.md | 7 + design-consultation/SKILL.md.tmpl | 2 + design-review/SKILL.md | 7 + design-review/SKILL.md.tmpl | 2 + document-release/SKILL.md | 6 + document-release/SKILL.md.tmpl | 1 + office-hours/SKILL.md | 9 + office-hours/SKILL.md.tmpl | 4 + package.json | 10 +- plan-ceo-review/SKILL.md | 7 + plan-ceo-review/SKILL.md.tmpl | 2 + plan-design-review/SKILL.md | 7 + plan-design-review/SKILL.md.tmpl | 2 + plan-eng-review/SKILL.md | 7 + plan-eng-review/SKILL.md.tmpl | 2 + qa-only/SKILL.md | 6 + qa-only/SKILL.md.tmpl | 1 + qa/SKILL.md | 9 +- qa/SKILL.md.tmpl | 4 +- retro/SKILL.md | 6 + retro/SKILL.md.tmpl | 1 + review/SKILL.md | 6 + review/SKILL.md.tmpl | 1 + scripts/gen-skill-docs.ts | 5 + setup-browser-cookies/SKILL.md | 5 + ship/SKILL.md | 6 + ship/SKILL.md.tmpl | 1 + test/helpers/touchfiles.ts | 13 + test/skill-routing-e2e.test.ts | 605 ++++++++++++++++++++++++++++++ test/skill-validation.test.ts | 37 ++ test/touchfiles.test.ts | 9 +- 38 files changed, 870 insertions(+), 11 deletions(-) create mode 100644 test/skill-routing-e2e.test.ts diff --git a/CHANGELOG.md b/CHANGELOG.md index 30dbcc5a..f1790add 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,19 @@ # Changelog +## [0.7.1] - 2026-03-19 + +### Added + +- **gstack now suggests skills at natural moments.** You don't need to know slash commands — just talk about what you're doing. Brainstorming an idea? gstack suggests `/office-hours`. Something's broken? It suggests `/debug`. Ready to deploy? It suggests `/ship`. Every workflow skill now has proactive triggers that fire when the moment is right. +- **Lifecycle map.** gstack's root skill description now includes a developer workflow guide mapping 12 stages (brainstorm → plan → review → code → debug → test → ship → docs → retro) to the right skill. Claude sees this in every session. +- **Opt-out with natural language.** If proactive suggestions feel too aggressive, just say "stop suggesting things" — gstack remembers across sessions. Say "be proactive again" to re-enable. +- **11 journey-stage E2E tests.** Each test simulates a real moment in the developer lifecycle with realistic project context (plan.md, error logs, git history, code) and verifies the right skill fires from natural language alone. 11/11 pass. +- **Trigger phrase validation.** Static tests verify every workflow skill has "Use when" and "Proactively suggest" phrases — catches regressions for free. + +### Fixed + +- `/debug` and `/office-hours` were completely invisible to natural language — no trigger phrases at all. Now both have full reactive + proactive triggers. + ## [0.7.0] - 2026-03-18 — YC Office Hours **`/office-hours` — sit down with a YC partner before you write a line of code.** diff --git a/SKILL.md b/SKILL.md index e12be621..dd06f058 100644 --- a/SKILL.md +++ b/SKILL.md @@ -7,6 +7,32 @@ description: | responsive layouts, test forms and uploads, handle dialogs, and assert element states. ~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a user flow, or file a bug with evidence. + + gstack also includes development workflow skills. When you notice the user is at + these stages, suggest the appropriate skill: + - Brainstorming a new idea → suggest /office-hours + - Reviewing a plan (strategy) → suggest /plan-ceo-review + - Reviewing a plan (architecture) → suggest /plan-eng-review + - Reviewing a plan (design) → suggest /plan-design-review + - Creating a design system → suggest /design-consultation + - Debugging errors → suggest /debug + - Testing the app → suggest /qa + - Code review before merge → suggest /review + - Visual design audit → suggest /design-review + - Ready to deploy / create PR → suggest /ship + - Post-ship doc updates → suggest /document-release + - Weekly retrospective → suggest /retro + + If the user pushes back on skill suggestions ("stop suggesting things", + "I don't need suggestions", "too aggressive"): + 1. Stop suggesting for the rest of this session + 2. Run: gstack-config set proactive false + 3. Say: "Got it — I'll stop suggesting skills. Just tell me to be proactive + again if you change your mind." + + If the user says "be proactive again" or "turn on suggestions": + 1. Run: gstack-config set proactive true + 2. Say: "Proactive suggestions are back on." allowed-tools: - Bash - Read @@ -30,8 +56,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" ``` +If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. diff --git a/SKILL.md.tmpl b/SKILL.md.tmpl index dd2b2345..7aacdb29 100644 --- a/SKILL.md.tmpl +++ b/SKILL.md.tmpl @@ -7,6 +7,32 @@ description: | responsive layouts, test forms and uploads, handle dialogs, and assert element states. ~100ms per command. Use when you need to test a feature, verify a deployment, dogfood a user flow, or file a bug with evidence. + + gstack also includes development workflow skills. When you notice the user is at + these stages, suggest the appropriate skill: + - Brainstorming a new idea → suggest /office-hours + - Reviewing a plan (strategy) → suggest /plan-ceo-review + - Reviewing a plan (architecture) → suggest /plan-eng-review + - Reviewing a plan (design) → suggest /plan-design-review + - Creating a design system → suggest /design-consultation + - Debugging errors → suggest /debug + - Testing the app → suggest /qa + - Code review before merge → suggest /review + - Visual design audit → suggest /design-review + - Ready to deploy / create PR → suggest /ship + - Post-ship doc updates → suggest /document-release + - Weekly retrospective → suggest /retro + + If the user pushes back on skill suggestions ("stop suggesting things", + "I don't need suggestions", "too aggressive"): + 1. Stop suggesting for the rest of this session + 2. Run: gstack-config set proactive false + 3. Say: "Got it — I'll stop suggesting skills. Just tell me to be proactive + again if you change your mind." + + If the user says "be proactive again" or "turn on suggestions": + 1. Run: gstack-config set proactive true + 2. Say: "Proactive suggestions are back on." allowed-tools: - Bash - Read diff --git a/VERSION b/VERSION index faef31a4..39e898a4 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.7.0 +0.7.1 diff --git a/browse/SKILL.md b/browse/SKILL.md index bf695d3b..3c452c84 100644 --- a/browse/SKILL.md +++ b/browse/SKILL.md @@ -31,8 +31,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" ``` +If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. diff --git a/debug/SKILL.md b/debug/SKILL.md index 4448453a..c1314556 100644 --- a/debug/SKILL.md +++ b/debug/SKILL.md @@ -4,6 +4,10 @@ version: 1.0.0 description: | Systematic debugging with root cause investigation. Four phases: investigate, analyze, hypothesize, implement. Iron Law: no fixes without root cause. + Use when asked to "debug this", "fix this bug", "why is this broken", + "investigate this error", or "root cause analysis". + Proactively suggest when the user reports errors, unexpected behavior, or + is troubleshooting why something stopped working. allowed-tools: - Bash - Read @@ -30,8 +34,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" ``` +If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. diff --git a/debug/SKILL.md.tmpl b/debug/SKILL.md.tmpl index 312d2420..90fc5bdc 100644 --- a/debug/SKILL.md.tmpl +++ b/debug/SKILL.md.tmpl @@ -4,6 +4,10 @@ version: 1.0.0 description: | Systematic debugging with root cause investigation. Four phases: investigate, analyze, hypothesize, implement. Iron Law: no fixes without root cause. + Use when asked to "debug this", "fix this bug", "why is this broken", + "investigate this error", or "root cause analysis". + Proactively suggest when the user reports errors, unexpected behavior, or + is troubleshooting why something stopped working. allowed-tools: - Bash - Read diff --git a/design-consultation/SKILL.md b/design-consultation/SKILL.md index c5c5bc29..31cbf815 100644 --- a/design-consultation/SKILL.md +++ b/design-consultation/SKILL.md @@ -7,6 +7,8 @@ description: | generates font+color preview pages. Creates DESIGN.md as your project's design source of truth. For existing sites, use /plan-design-review to infer the system instead. Use when asked to "design system", "brand guidelines", or "create DESIGN.md". + Proactively suggest when starting a new project's UI with no existing + design system or DESIGN.md. allowed-tools: - Bash - Read @@ -34,8 +36,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" ``` +If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. diff --git a/design-consultation/SKILL.md.tmpl b/design-consultation/SKILL.md.tmpl index 2bc67255..2532126c 100644 --- a/design-consultation/SKILL.md.tmpl +++ b/design-consultation/SKILL.md.tmpl @@ -7,6 +7,8 @@ description: | generates font+color preview pages. Creates DESIGN.md as your project's design source of truth. For existing sites, use /plan-design-review to infer the system instead. Use when asked to "design system", "brand guidelines", or "create DESIGN.md". + Proactively suggest when starting a new project's UI with no existing + design system or DESIGN.md. allowed-tools: - Bash - Read diff --git a/design-review/SKILL.md b/design-review/SKILL.md index 473e419b..dd7fced1 100644 --- a/design-review/SKILL.md +++ b/design-review/SKILL.md @@ -7,6 +7,8 @@ description: | in source code, committing each fix atomically and re-verifying with before/after screenshots. For plan-mode design review (before implementation), use /plan-design-review. Use when asked to "audit the design", "visual QA", "check if it looks good", or "design polish". + Proactively suggest when the user mentions visual inconsistencies or + wants to polish the look of a live site. allowed-tools: - Bash - Read @@ -34,8 +36,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" ``` +If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. diff --git a/design-review/SKILL.md.tmpl b/design-review/SKILL.md.tmpl index f60a9c41..24fe160c 100644 --- a/design-review/SKILL.md.tmpl +++ b/design-review/SKILL.md.tmpl @@ -7,6 +7,8 @@ description: | in source code, committing each fix atomically and re-verifying with before/after screenshots. For plan-mode design review (before implementation), use /plan-design-review. Use when asked to "audit the design", "visual QA", "check if it looks good", or "design polish". + Proactively suggest when the user mentions visual inconsistencies or + wants to polish the look of a live site. allowed-tools: - Bash - Read diff --git a/document-release/SKILL.md b/document-release/SKILL.md index 88af49fb..4831573b 100644 --- a/document-release/SKILL.md +++ b/document-release/SKILL.md @@ -6,6 +6,7 @@ description: | diff, updates README/ARCHITECTURE/CONTRIBUTING/CLAUDE.md to match what shipped, polishes CHANGELOG voice, cleans up TODOS, and optionally bumps VERSION. Use when asked to "update the docs", "sync documentation", or "post-ship docs". + Proactively suggest after a PR is merged or code is shipped. allowed-tools: - Bash - Read @@ -32,8 +33,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" ``` +If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. diff --git a/document-release/SKILL.md.tmpl b/document-release/SKILL.md.tmpl index 2cd8d117..0cd1bd57 100644 --- a/document-release/SKILL.md.tmpl +++ b/document-release/SKILL.md.tmpl @@ -6,6 +6,7 @@ description: | diff, updates README/ARCHITECTURE/CONTRIBUTING/CLAUDE.md to match what shipped, polishes CHANGELOG voice, cleans up TODOS, and optionally bumps VERSION. Use when asked to "update the docs", "sync documentation", or "post-ship docs". + Proactively suggest after a PR is merged or code is shipped. allowed-tools: - Bash - Read diff --git a/office-hours/SKILL.md b/office-hours/SKILL.md index fec01e26..da59e1ff 100644 --- a/office-hours/SKILL.md +++ b/office-hours/SKILL.md @@ -6,6 +6,10 @@ description: | demand reality, status quo, desperate specificity, narrowest wedge, observation, and future-fit. Builder mode: design thinking brainstorming for side projects, hackathons, learning, and open source. Saves a design doc. + Use when asked to "brainstorm this", "I have an idea", "help me think through + this", "office hours", or "is this worth building". + Proactively suggest when the user describes a new product idea or is exploring + whether something is worth building — before any code is written. Use before /plan-ceo-review or /plan-eng-review. allowed-tools: - Bash @@ -33,8 +37,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" ``` +If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. diff --git a/office-hours/SKILL.md.tmpl b/office-hours/SKILL.md.tmpl index 4eec04b6..03a8302c 100644 --- a/office-hours/SKILL.md.tmpl +++ b/office-hours/SKILL.md.tmpl @@ -6,6 +6,10 @@ description: | demand reality, status quo, desperate specificity, narrowest wedge, observation, and future-fit. Builder mode: design thinking brainstorming for side projects, hackathons, learning, and open source. Saves a design doc. + Use when asked to "brainstorm this", "I have an idea", "help me think through + this", "office hours", or "is this worth building". + Proactively suggest when the user describes a new product idea or is exploring + whether something is worth building — before any code is written. Use before /plan-ceo-review or /plan-eng-review. allowed-tools: - Bash diff --git a/package.json b/package.json index ff8b5870..1c580144 100644 --- a/package.json +++ b/package.json @@ -12,11 +12,11 @@ "gen:skill-docs": "bun run scripts/gen-skill-docs.ts", "dev": "bun run browse/src/cli.ts", "server": "bun run browse/src/server.ts", - "test": "bun test browse/test/ test/ --ignore test/skill-e2e.test.ts --ignore test/skill-llm-eval.test.ts", - "test:evals": "EVALS=1 bun test test/skill-llm-eval.test.ts test/skill-e2e.test.ts", - "test:evals:all": "EVALS=1 EVALS_ALL=1 bun test test/skill-llm-eval.test.ts test/skill-e2e.test.ts", - "test:e2e": "EVALS=1 bun test test/skill-e2e.test.ts", - "test:e2e:all": "EVALS=1 EVALS_ALL=1 bun test test/skill-e2e.test.ts", + "test": "bun test browse/test/ test/ --ignore test/skill-e2e.test.ts --ignore test/skill-llm-eval.test.ts --ignore test/skill-routing-e2e.test.ts", + "test:evals": "EVALS=1 bun test test/skill-llm-eval.test.ts test/skill-e2e.test.ts test/skill-routing-e2e.test.ts", + "test:evals:all": "EVALS=1 EVALS_ALL=1 bun test test/skill-llm-eval.test.ts test/skill-e2e.test.ts test/skill-routing-e2e.test.ts", + "test:e2e": "EVALS=1 bun test test/skill-e2e.test.ts test/skill-routing-e2e.test.ts", + "test:e2e:all": "EVALS=1 EVALS_ALL=1 bun test test/skill-e2e.test.ts test/skill-routing-e2e.test.ts", "skill:check": "bun run scripts/skill-check.ts", "dev:skill": "bun run scripts/dev-skill.ts", "start": "bun run browse/src/server.ts", diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md index 24a18674..ce0395b0 100644 --- a/plan-ceo-review/SKILL.md +++ b/plan-ceo-review/SKILL.md @@ -8,6 +8,8 @@ description: | expansions), HOLD SCOPE (maximum rigor), SCOPE REDUCTION (strip to essentials). Use when asked to "think bigger", "expand scope", "strategy review", "rethink this", or "is this ambitious enough". + Proactively suggest when the user is questioning scope or ambition of a plan, + or when the plan feels like it could be thinking bigger. allowed-tools: - Read - Grep @@ -32,8 +34,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" ``` +If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. diff --git a/plan-ceo-review/SKILL.md.tmpl b/plan-ceo-review/SKILL.md.tmpl index 16c1b49d..09189af5 100644 --- a/plan-ceo-review/SKILL.md.tmpl +++ b/plan-ceo-review/SKILL.md.tmpl @@ -8,6 +8,8 @@ description: | expansions), HOLD SCOPE (maximum rigor), SCOPE REDUCTION (strip to essentials). Use when asked to "think bigger", "expand scope", "strategy review", "rethink this", or "is this ambitious enough". + Proactively suggest when the user is questioning scope or ambition of a plan, + or when the plan feels like it could be thinking bigger. allowed-tools: - Read - Grep diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md index 21e37c95..faabd328 100644 --- a/plan-design-review/SKILL.md +++ b/plan-design-review/SKILL.md @@ -7,6 +7,8 @@ description: | then fixes the plan to get there. Works in plan mode. For live site visual audits, use /design-review. Use when asked to "review the design plan" or "design critique". + Proactively suggest when the user has a plan with UI/UX components that + should be reviewed before implementation. allowed-tools: - Read - Edit @@ -32,8 +34,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" ``` +If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. diff --git a/plan-design-review/SKILL.md.tmpl b/plan-design-review/SKILL.md.tmpl index e8f9c418..73e383b6 100644 --- a/plan-design-review/SKILL.md.tmpl +++ b/plan-design-review/SKILL.md.tmpl @@ -7,6 +7,8 @@ description: | then fixes the plan to get there. Works in plan mode. For live site visual audits, use /design-review. Use when asked to "review the design plan" or "design critique". + Proactively suggest when the user has a plan with UI/UX components that + should be reviewed before implementation. allowed-tools: - Read - Edit diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md index caafb792..d6c6ea28 100644 --- a/plan-eng-review/SKILL.md +++ b/plan-eng-review/SKILL.md @@ -6,6 +6,8 @@ description: | data flow, diagrams, edge cases, test coverage, performance. Walks through issues interactively with opinionated recommendations. Use when asked to "review the architecture", "engineering review", or "lock in the plan". + Proactively suggest when the user has a plan or design doc and is about to + start coding — to catch architecture issues before implementation. allowed-tools: - Read - Write @@ -31,8 +33,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" ``` +If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. diff --git a/plan-eng-review/SKILL.md.tmpl b/plan-eng-review/SKILL.md.tmpl index 1ca2b298..6a0b1217 100644 --- a/plan-eng-review/SKILL.md.tmpl +++ b/plan-eng-review/SKILL.md.tmpl @@ -6,6 +6,8 @@ description: | data flow, diagrams, edge cases, test coverage, performance. Walks through issues interactively with opinionated recommendations. Use when asked to "review the architecture", "engineering review", or "lock in the plan". + Proactively suggest when the user has a plan or design doc and is about to + start coding — to catch architecture issues before implementation. allowed-tools: - Read - Write diff --git a/qa-only/SKILL.md b/qa-only/SKILL.md index a5684dd7..0e20c5e3 100644 --- a/qa-only/SKILL.md +++ b/qa-only/SKILL.md @@ -6,6 +6,7 @@ description: | structured report with health score, screenshots, and repro steps — but never fixes anything. Use when asked to "just report bugs", "qa report only", or "test but don't fix". For the full test-fix-verify loop, use /qa instead. + Proactively suggest when the user wants a bug report without any code changes. allowed-tools: - Bash - Read @@ -29,8 +30,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" ``` +If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. diff --git a/qa-only/SKILL.md.tmpl b/qa-only/SKILL.md.tmpl index 831e71ed..2e2bc4f7 100644 --- a/qa-only/SKILL.md.tmpl +++ b/qa-only/SKILL.md.tmpl @@ -6,6 +6,7 @@ description: | structured report with health score, screenshots, and repro steps — but never fixes anything. Use when asked to "just report bugs", "qa report only", or "test but don't fix". For the full test-fix-verify loop, use /qa instead. + Proactively suggest when the user wants a bug report without any code changes. allowed-tools: - Bash - Read diff --git a/qa/SKILL.md b/qa/SKILL.md index 2d12fca8..8ee176be 100644 --- a/qa/SKILL.md +++ b/qa/SKILL.md @@ -5,7 +5,9 @@ description: | Systematically QA test a web application and fix bugs found. Runs QA testing, then iteratively fixes bugs in source code, committing each fix atomically and re-verifying. Use when asked to "qa", "QA", "test this site", "find bugs", - "test and fix", or "fix what's broken". Three tiers: Quick (critical/high only), + "test and fix", or "fix what's broken". + Proactively suggest when the user says a feature is ready for testing + or asks "does this work?". Three tiers: Quick (critical/high only), Standard (+ medium), Exhaustive (+ cosmetic). Produces before/after health scores, fix evidence, and a ship-readiness summary. For report-only mode, use /qa-only. allowed-tools: @@ -35,8 +37,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" ``` +If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. diff --git a/qa/SKILL.md.tmpl b/qa/SKILL.md.tmpl index bd94debe..292f7140 100644 --- a/qa/SKILL.md.tmpl +++ b/qa/SKILL.md.tmpl @@ -5,7 +5,9 @@ description: | Systematically QA test a web application and fix bugs found. Runs QA testing, then iteratively fixes bugs in source code, committing each fix atomically and re-verifying. Use when asked to "qa", "QA", "test this site", "find bugs", - "test and fix", or "fix what's broken". Three tiers: Quick (critical/high only), + "test and fix", or "fix what's broken". + Proactively suggest when the user says a feature is ready for testing + or asks "does this work?". Three tiers: Quick (critical/high only), Standard (+ medium), Exhaustive (+ cosmetic). Produces before/after health scores, fix evidence, and a ship-readiness summary. For report-only mode, use /qa-only. allowed-tools: diff --git a/retro/SKILL.md b/retro/SKILL.md index bb6bcbe9..90fb547e 100644 --- a/retro/SKILL.md +++ b/retro/SKILL.md @@ -6,6 +6,7 @@ description: | and code quality metrics with persistent history and trend tracking. Team-aware: breaks down per-person contributions with praise and growth areas. Use when asked to "weekly retro", "what did we ship", or "engineering retrospective". + Proactively suggest at the end of a work week or sprint. allowed-tools: - Bash - Read @@ -30,8 +31,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" ``` +If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. diff --git a/retro/SKILL.md.tmpl b/retro/SKILL.md.tmpl index 92d5c40b..41a48e7f 100644 --- a/retro/SKILL.md.tmpl +++ b/retro/SKILL.md.tmpl @@ -6,6 +6,7 @@ description: | and code quality metrics with persistent history and trend tracking. Team-aware: breaks down per-person contributions with praise and growth areas. Use when asked to "weekly retro", "what did we ship", or "engineering retrospective". + Proactively suggest at the end of a work week or sprint. allowed-tools: - Bash - Read diff --git a/review/SKILL.md b/review/SKILL.md index 354e715b..b2da378d 100644 --- a/review/SKILL.md +++ b/review/SKILL.md @@ -5,6 +5,7 @@ description: | Pre-landing PR review. Analyzes diff against the base branch for SQL safety, LLM trust boundary violations, conditional side effects, and other structural issues. Use when asked to "review this PR", "code review", "pre-landing review", or "check my diff". + Proactively suggest when the user is about to merge or land code changes. allowed-tools: - Bash - Read @@ -31,8 +32,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" ``` +If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. diff --git a/review/SKILL.md.tmpl b/review/SKILL.md.tmpl index 7094a156..20e2cf12 100644 --- a/review/SKILL.md.tmpl +++ b/review/SKILL.md.tmpl @@ -5,6 +5,7 @@ description: | Pre-landing PR review. Analyzes diff against the base branch for SQL safety, LLM trust boundary violations, conditional side effects, and other structural issues. Use when asked to "review this PR", "code review", "pre-landing review", or "check my diff". + Proactively suggest when the user is about to merge or land code changes. allowed-tools: - Bash - Read diff --git a/scripts/gen-skill-docs.ts b/scripts/gen-skill-docs.ts index 3d569d35..a53d1864 100644 --- a/scripts/gen-skill-docs.ts +++ b/scripts/gen-skill-docs.ts @@ -109,8 +109,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" \`\`\` +If \`PROACTIVE\` is \`"false"\`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows \`UPGRADE_AVAILABLE \`: read \`~/.claude/skills/gstack/gstack-upgrade/SKILL.md\` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If \`JUST_UPGRADED \`: tell user "Running gstack v{to} (just updated!)" and continue. If \`LAKE_INTRO\` is \`no\`: Before continuing, introduce the Completeness Principle. diff --git a/setup-browser-cookies/SKILL.md b/setup-browser-cookies/SKILL.md index 3ae00a6b..ad9d5fbb 100644 --- a/setup-browser-cookies/SKILL.md +++ b/setup-browser-cookies/SKILL.md @@ -28,8 +28,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" ``` +If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. diff --git a/ship/SKILL.md b/ship/SKILL.md index 3f0f0067..97f26fa2 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -3,6 +3,7 @@ name: ship version: 1.0.0 description: | Ship workflow: detect + merge base branch, run tests, review diff, bump VERSION, update CHANGELOG, commit, push, create PR. Use when asked to "ship", "deploy", "push to main", "create a PR", or "merge and push". + Proactively suggest when the user says code is ready or asks about deploying. allowed-tools: - Bash - Read @@ -30,8 +31,13 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +_PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") +echo "PROACTIVE: $_PROACTIVE" ``` +If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke +them when the user explicitly asks. The user opted out of proactive suggestions. + If output shows `UPGRADE_AVAILABLE `: read `~/.claude/skills/gstack/gstack-upgrade/SKILL.md` and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If `JUST_UPGRADED `: tell user "Running gstack v{to} (just updated!)" and continue. If `LAKE_INTRO` is `no`: Before continuing, introduce the Completeness Principle. diff --git a/ship/SKILL.md.tmpl b/ship/SKILL.md.tmpl index aef5c9d3..ed7a7f07 100644 --- a/ship/SKILL.md.tmpl +++ b/ship/SKILL.md.tmpl @@ -3,6 +3,7 @@ name: ship version: 1.0.0 description: | Ship workflow: detect + merge base branch, run tests, review diff, bump VERSION, update CHANGELOG, commit, push, create PR. Use when asked to "ship", "deploy", "push to main", "create a PR", or "merge and push". + Proactively suggest when the user says code is ready or asks about deploying. allowed-tools: - Bash - Read diff --git a/test/helpers/touchfiles.ts b/test/helpers/touchfiles.ts index 995648a1..8afe8447 100644 --- a/test/helpers/touchfiles.ts +++ b/test/helpers/touchfiles.ts @@ -90,6 +90,19 @@ export const E2E_TOUCHFILES: Record = { // gstack-upgrade 'gstack-upgrade-happy-path': ['gstack-upgrade/**'], + + // Skill routing — journey-stage tests (depend on ALL skill descriptions) + 'journey-ideation': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'], + 'journey-plan-eng': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'], + 'journey-think-bigger': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'], + 'journey-debug': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'], + 'journey-qa': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'], + 'journey-code-review': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'], + 'journey-ship': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'], + 'journey-docs': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'], + 'journey-retro': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'], + 'journey-design-system': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'], + 'journey-visual-qa': ['*/SKILL.md.tmpl', 'SKILL.md.tmpl', 'scripts/gen-skill-docs.ts'], }; /** diff --git a/test/skill-routing-e2e.test.ts b/test/skill-routing-e2e.test.ts new file mode 100644 index 00000000..ee2d84b4 --- /dev/null +++ b/test/skill-routing-e2e.test.ts @@ -0,0 +1,605 @@ +import { describe, test, expect, afterAll } from 'bun:test'; +import { runSkillTest } from './helpers/session-runner'; +import type { SkillTestResult } from './helpers/session-runner'; +import { EvalCollector } from './helpers/eval-store'; +import type { EvalTestEntry } from './helpers/eval-store'; +import { selectTests, detectBaseBranch, getChangedFiles, E2E_TOUCHFILES, GLOBAL_TOUCHFILES } from './helpers/touchfiles'; +import { spawnSync } from 'child_process'; +import * as fs from 'fs'; +import * as path from 'path'; +import * as os from 'os'; + +const ROOT = path.resolve(import.meta.dir, '..'); + +// Skip unless EVALS=1. +const evalsEnabled = !!process.env.EVALS; +const describeE2E = evalsEnabled ? describe : describe.skip; + +// Eval result collector +const evalCollector = evalsEnabled ? new EvalCollector('e2e-routing') : null; + +// Unique run ID for this session +const runId = new Date().toISOString().replace(/[:.]/g, '').replace('T', '-').slice(0, 15); + +// --- Diff-based test selection --- +// Journey routing tests use E2E_TOUCHFILES (entries prefixed 'journey-' in touchfiles.ts). +let selectedTests: string[] | null = null; + +if (evalsEnabled && !process.env.EVALS_ALL) { + const baseBranch = process.env.EVALS_BASE + || detectBaseBranch(ROOT) + || 'main'; + const changedFiles = getChangedFiles(baseBranch, ROOT); + + if (changedFiles.length > 0) { + const selection = selectTests(changedFiles, E2E_TOUCHFILES, GLOBAL_TOUCHFILES); + selectedTests = selection.selected; + process.stderr.write(`\nRouting E2E selection (${selection.reason}): ${selection.selected.length}/${Object.keys(E2E_TOUCHFILES).length} tests\n`); + if (selection.skipped.length > 0) { + process.stderr.write(` Skipped: ${selection.skipped.join(', ')}\n`); + } + process.stderr.write('\n'); + } +} + +// --- Helper functions --- + +/** Copy all SKILL.md files into tmpDir/.claude/skills/gstack/ for auto-discovery */ +function installSkills(tmpDir: string) { + const skillDirs = [ + '', // root gstack SKILL.md + 'qa', 'qa-only', 'ship', 'review', 'plan-ceo-review', 'plan-eng-review', + 'plan-design-review', 'design-review', 'design-consultation', 'retro', + 'document-release', 'debug', 'office-hours', 'browse', 'setup-browser-cookies', + 'gstack-upgrade', 'humanizer', + ]; + + for (const skill of skillDirs) { + const srcPath = path.join(ROOT, skill, 'SKILL.md'); + if (!fs.existsSync(srcPath)) continue; + + const destDir = skill + ? path.join(tmpDir, '.claude', 'skills', 'gstack', skill) + : path.join(tmpDir, '.claude', 'skills', 'gstack'); + fs.mkdirSync(destDir, { recursive: true }); + fs.copyFileSync(srcPath, path.join(destDir, 'SKILL.md')); + } +} + +/** Init a git repo with config */ +function initGitRepo(dir: string) { + const run = (cmd: string, args: string[]) => + spawnSync(cmd, args, { cwd: dir, stdio: 'pipe', timeout: 5000 }); + run('git', ['init']); + run('git', ['config', 'user.email', 'test@test.com']); + run('git', ['config', 'user.name', 'Test']); +} + +function logCost(label: string, result: { costEstimate: { turnsUsed: number; estimatedTokens: number; estimatedCost: number }; duration: number }) { + const { turnsUsed, estimatedTokens, estimatedCost } = result.costEstimate; + const durationSec = Math.round(result.duration / 1000); + console.log(`${label}: $${estimatedCost.toFixed(2)} (${turnsUsed} turns, ${(estimatedTokens / 1000).toFixed(1)}k tokens, ${durationSec}s)`); +} + +function recordRouting(name: string, result: SkillTestResult, expectedSkill: string, actualSkill: string | undefined) { + evalCollector?.addTest({ + name, + suite: 'Skill Routing E2E', + tier: 'e2e', + passed: actualSkill === expectedSkill, + duration_ms: result.duration, + cost_usd: result.costEstimate.estimatedCost, + transcript: result.transcript, + output: result.output?.slice(0, 2000), + turns_used: result.costEstimate.turnsUsed, + exit_reason: result.exitReason, + }); +} + +// --- Tests --- + +describeE2E('Skill Routing E2E — Developer Journey', () => { + afterAll(() => { + evalCollector?.finalize(); + }); + + test('journey-ideation', async () => { + const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'routing-ideation-')); + try { + initGitRepo(tmpDir); + installSkills(tmpDir); + fs.writeFileSync(path.join(tmpDir, 'README.md'), '# New Project\n'); + spawnSync('git', ['add', '.'], { cwd: tmpDir, stdio: 'pipe', timeout: 5000 }); + spawnSync('git', ['commit', '-m', 'initial'], { cwd: tmpDir, stdio: 'pipe', timeout: 5000 }); + + const testName = 'journey-ideation'; + const expectedSkill = 'office-hours'; + const result = await runSkillTest({ + prompt: "I've been thinking about building a waitlist management tool for restaurants. The existing solutions are expensive and overcomplicated. I want something simple — a tablet app where hosts can add parties, see wait times, and text customers when their table is ready. Help me think through whether this is worth building and what the key design decisions are.", + workingDirectory: tmpDir, + maxTurns: 5, + allowedTools: ['Skill', 'Read', 'Bash', 'Glob', 'Grep'], + timeout: 60_000, + testName, + runId, + }); + + const skillCalls = result.toolCalls.filter(tc => tc.tool === 'Skill'); + const actualSkill = skillCalls.length > 0 ? skillCalls[0]?.input?.skill : undefined; + + logCost(`journey: ${testName}`, result); + recordRouting(testName, result, expectedSkill, actualSkill); + + expect(skillCalls.length, `Expected Skill tool to be called but got 0 calls. Claude may have answered directly without invoking a skill. Tool calls: ${result.toolCalls.map(tc => tc.tool).join(', ')}`).toBeGreaterThan(0); + expect([expectedSkill], `Expected skill ${expectedSkill} but got ${actualSkill}`).toContain(actualSkill); + } finally { + fs.rmSync(tmpDir, { recursive: true, force: true }); + } + }, 90_000); + + test('journey-plan-eng', async () => { + const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'routing-plan-eng-')); + try { + initGitRepo(tmpDir); + installSkills(tmpDir); + fs.writeFileSync(path.join(tmpDir, 'plan.md'), `# Waitlist App Architecture + +## Components +- REST API (Express.js) +- PostgreSQL database +- React frontend +- SMS integration (Twilio) + +## Data Model +- restaurants (id, name, settings) +- parties (id, restaurant_id, name, size, phone, status, created_at) +- wait_estimates (id, restaurant_id, avg_wait_minutes) + +## API Endpoints +- POST /api/parties - add party to waitlist +- GET /api/parties - list current waitlist +- PATCH /api/parties/:id/status - update party status +- GET /api/estimate - get current wait estimate +`); + spawnSync('git', ['add', '.'], { cwd: tmpDir, stdio: 'pipe', timeout: 5000 }); + spawnSync('git', ['commit', '-m', 'initial'], { cwd: tmpDir, stdio: 'pipe', timeout: 5000 }); + + const testName = 'journey-plan-eng'; + const expectedSkill = 'plan-eng-review'; + const result = await runSkillTest({ + prompt: "I wrote up a plan for the waitlist app in plan.md. Can you take a look at the architecture and make sure I'm not missing any edge cases or failure modes before I start coding?", + workingDirectory: tmpDir, + maxTurns: 5, + allowedTools: ['Skill', 'Read', 'Bash', 'Glob', 'Grep'], + timeout: 60_000, + testName, + runId, + }); + + const skillCalls = result.toolCalls.filter(tc => tc.tool === 'Skill'); + const actualSkill = skillCalls.length > 0 ? skillCalls[0]?.input?.skill : undefined; + + logCost(`journey: ${testName}`, result); + recordRouting(testName, result, expectedSkill, actualSkill); + + expect(skillCalls.length, `Expected Skill tool to be called but got 0 calls. Claude may have answered directly without invoking a skill. Tool calls: ${result.toolCalls.map(tc => tc.tool).join(', ')}`).toBeGreaterThan(0); + expect([expectedSkill], `Expected skill ${expectedSkill} but got ${actualSkill}`).toContain(actualSkill); + } finally { + fs.rmSync(tmpDir, { recursive: true, force: true }); + } + }, 90_000); + + test('journey-think-bigger', async () => { + const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'routing-think-bigger-')); + try { + initGitRepo(tmpDir); + installSkills(tmpDir); + fs.writeFileSync(path.join(tmpDir, 'plan.md'), `# Waitlist App Architecture + +## Components +- REST API (Express.js) +- PostgreSQL database +- React frontend +- SMS integration (Twilio) + +## Data Model +- restaurants (id, name, settings) +- parties (id, restaurant_id, name, size, phone, status, created_at) +- wait_estimates (id, restaurant_id, avg_wait_minutes) + +## API Endpoints +- POST /api/parties - add party to waitlist +- GET /api/parties - list current waitlist +- PATCH /api/parties/:id/status - update party status +- GET /api/estimate - get current wait estimate +`); + spawnSync('git', ['add', '.'], { cwd: tmpDir, stdio: 'pipe', timeout: 5000 }); + spawnSync('git', ['commit', '-m', 'initial'], { cwd: tmpDir, stdio: 'pipe', timeout: 5000 }); + + const testName = 'journey-think-bigger'; + const expectedSkill = 'plan-ceo-review'; + const result = await runSkillTest({ + prompt: "Actually, looking at this plan again, I feel like we're thinking too small. We're just doing waitlists but what about the whole restaurant guest experience? Is there a bigger opportunity here we should go after?", + workingDirectory: tmpDir, + maxTurns: 5, + allowedTools: ['Skill', 'Read', 'Bash', 'Glob', 'Grep'], + timeout: 120_000, + testName, + runId, + }); + + const skillCalls = result.toolCalls.filter(tc => tc.tool === 'Skill'); + const actualSkill = skillCalls.length > 0 ? skillCalls[0]?.input?.skill : undefined; + + logCost(`journey: ${testName}`, result); + recordRouting(testName, result, expectedSkill, actualSkill); + + expect(skillCalls.length, `Expected Skill tool to be called but got 0 calls. Claude may have answered directly without invoking a skill. Tool calls: ${result.toolCalls.map(tc => tc.tool).join(', ')}`).toBeGreaterThan(0); + expect([expectedSkill], `Expected skill ${expectedSkill} but got ${actualSkill}`).toContain(actualSkill); + } finally { + fs.rmSync(tmpDir, { recursive: true, force: true }); + } + }, 180_000); + + test('journey-debug', async () => { + const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'routing-debug-')); + try { + initGitRepo(tmpDir); + installSkills(tmpDir); + + const run = (cmd: string, args: string[]) => + spawnSync(cmd, args, { cwd: tmpDir, stdio: 'pipe', timeout: 5000 }); + + fs.mkdirSync(path.join(tmpDir, 'src'), { recursive: true }); + fs.writeFileSync(path.join(tmpDir, 'src/api.ts'), ` +import express from 'express'; +const app = express(); + +app.get('/api/waitlist', async (req, res) => { + const db = req.app.locals.db; + const parties = await db.query('SELECT * FROM parties WHERE status = $1', ['waiting']); + res.json(parties.rows); +}); + +export default app; +`); + fs.writeFileSync(path.join(tmpDir, 'error.log'), ` +[2026-03-18T10:23:45Z] ERROR: GET /api/waitlist - 500 Internal Server Error + TypeError: Cannot read properties of undefined (reading 'query') + at /src/api.ts:5:32 + at Layer.handle [as handle_request] (/node_modules/express/lib/router/layer.js:95:5) +[2026-03-18T10:23:46Z] ERROR: GET /api/waitlist - 500 Internal Server Error + TypeError: Cannot read properties of undefined (reading 'query') +`); + + run('git', ['add', '.']); + run('git', ['commit', '-m', 'initial']); + run('git', ['checkout', '-b', 'feature/waitlist-api']); + + const testName = 'journey-debug'; + const expectedSkill = 'debug'; + const result = await runSkillTest({ + prompt: "The GET /api/waitlist endpoint was working fine yesterday but now it's returning 500 errors. The tests are passing locally but the endpoint fails when I hit it with curl. Can you figure out what's going on?", + workingDirectory: tmpDir, + maxTurns: 5, + allowedTools: ['Skill', 'Read', 'Bash', 'Glob', 'Grep'], + timeout: 60_000, + testName, + runId, + }); + + const skillCalls = result.toolCalls.filter(tc => tc.tool === 'Skill'); + const actualSkill = skillCalls.length > 0 ? skillCalls[0]?.input?.skill : undefined; + + logCost(`journey: ${testName}`, result); + recordRouting(testName, result, expectedSkill, actualSkill); + + expect(skillCalls.length, `Expected Skill tool to be called but got 0 calls. Claude may have answered directly without invoking a skill. Tool calls: ${result.toolCalls.map(tc => tc.tool).join(', ')}`).toBeGreaterThan(0); + expect([expectedSkill], `Expected skill ${expectedSkill} but got ${actualSkill}`).toContain(actualSkill); + } finally { + fs.rmSync(tmpDir, { recursive: true, force: true }); + } + }, 90_000); + + test('journey-qa', async () => { + const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'routing-qa-')); + try { + initGitRepo(tmpDir); + installSkills(tmpDir); + + fs.writeFileSync(path.join(tmpDir, 'package.json'), JSON.stringify({ name: 'waitlist-app', scripts: { dev: 'next dev' } }, null, 2)); + fs.mkdirSync(path.join(tmpDir, 'src'), { recursive: true }); + fs.writeFileSync(path.join(tmpDir, 'src/index.html'), '

Waitlist App

'); + spawnSync('git', ['add', '.'], { cwd: tmpDir, stdio: 'pipe', timeout: 5000 }); + spawnSync('git', ['commit', '-m', 'initial'], { cwd: tmpDir, stdio: 'pipe', timeout: 5000 }); + + const testName = 'journey-qa'; + const expectedSkill = 'qa'; + const alternateSkills = ['qa-only', 'browse']; + const result = await runSkillTest({ + prompt: "I think the app is mostly working now. Can you go through the site and test everything — find any bugs and fix them?", + workingDirectory: tmpDir, + maxTurns: 5, + allowedTools: ['Skill', 'Read', 'Bash', 'Glob', 'Grep'], + timeout: 60_000, + testName, + runId, + }); + + const skillCalls = result.toolCalls.filter(tc => tc.tool === 'Skill'); + const actualSkill = skillCalls.length > 0 ? skillCalls[0]?.input?.skill : undefined; + const acceptable = [expectedSkill, ...alternateSkills]; + + logCost(`journey: ${testName}`, result); + recordRouting(testName, result, expectedSkill, actualSkill); + + expect(skillCalls.length, `Expected Skill tool to be called but got 0 calls. Claude may have answered directly without invoking a skill. Tool calls: ${result.toolCalls.map(tc => tc.tool).join(', ')}`).toBeGreaterThan(0); + expect(acceptable, `Expected skill ${expectedSkill} but got ${actualSkill}`).toContain(actualSkill); + } finally { + fs.rmSync(tmpDir, { recursive: true, force: true }); + } + }, 90_000); + + test('journey-code-review', async () => { + const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'routing-code-review-')); + try { + initGitRepo(tmpDir); + installSkills(tmpDir); + + const run = (cmd: string, args: string[]) => + spawnSync(cmd, args, { cwd: tmpDir, stdio: 'pipe', timeout: 5000 }); + + fs.writeFileSync(path.join(tmpDir, 'app.ts'), '// base\n'); + run('git', ['add', '.']); + run('git', ['commit', '-m', 'initial']); + run('git', ['checkout', '-b', 'feature/add-waitlist']); + fs.writeFileSync(path.join(tmpDir, 'app.ts'), '// updated with waitlist feature\nimport { WaitlistService } from "./waitlist";\n'); + fs.writeFileSync(path.join(tmpDir, 'waitlist.ts'), 'export class WaitlistService {\n async addParty(name: string, size: number) {\n // TODO: implement\n }\n}\n'); + run('git', ['add', '.']); + run('git', ['commit', '-m', 'feat: add waitlist service']); + + const testName = 'journey-code-review'; + const expectedSkill = 'review'; + const result = await runSkillTest({ + prompt: "I'm about to merge this into main. Can you look over my changes and flag anything risky before I land it?", + workingDirectory: tmpDir, + maxTurns: 5, + allowedTools: ['Skill', 'Read', 'Bash', 'Glob', 'Grep'], + timeout: 60_000, + testName, + runId, + }); + + const skillCalls = result.toolCalls.filter(tc => tc.tool === 'Skill'); + const actualSkill = skillCalls.length > 0 ? skillCalls[0]?.input?.skill : undefined; + + logCost(`journey: ${testName}`, result); + recordRouting(testName, result, expectedSkill, actualSkill); + + expect(skillCalls.length, `Expected Skill tool to be called but got 0 calls. Claude may have answered directly without invoking a skill. Tool calls: ${result.toolCalls.map(tc => tc.tool).join(', ')}`).toBeGreaterThan(0); + expect([expectedSkill], `Expected skill ${expectedSkill} but got ${actualSkill}`).toContain(actualSkill); + } finally { + fs.rmSync(tmpDir, { recursive: true, force: true }); + } + }, 90_000); + + test('journey-ship', async () => { + const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'routing-ship-')); + try { + initGitRepo(tmpDir); + installSkills(tmpDir); + + const run = (cmd: string, args: string[]) => + spawnSync(cmd, args, { cwd: tmpDir, stdio: 'pipe', timeout: 5000 }); + + fs.writeFileSync(path.join(tmpDir, 'app.ts'), '// base\n'); + run('git', ['add', '.']); + run('git', ['commit', '-m', 'initial']); + run('git', ['checkout', '-b', 'feature/waitlist']); + fs.writeFileSync(path.join(tmpDir, 'app.ts'), '// waitlist feature\n'); + run('git', ['add', '.']); + run('git', ['commit', '-m', 'feat: waitlist']); + + const testName = 'journey-ship'; + const expectedSkill = 'ship'; + const result = await runSkillTest({ + prompt: "This looks good. Let's get it deployed — push the code up and create a PR.", + workingDirectory: tmpDir, + maxTurns: 5, + allowedTools: ['Skill', 'Read', 'Bash', 'Glob', 'Grep'], + timeout: 60_000, + testName, + runId, + }); + + const skillCalls = result.toolCalls.filter(tc => tc.tool === 'Skill'); + const actualSkill = skillCalls.length > 0 ? skillCalls[0]?.input?.skill : undefined; + + logCost(`journey: ${testName}`, result); + recordRouting(testName, result, expectedSkill, actualSkill); + + expect(skillCalls.length, `Expected Skill tool to be called but got 0 calls. Claude may have answered directly without invoking a skill. Tool calls: ${result.toolCalls.map(tc => tc.tool).join(', ')}`).toBeGreaterThan(0); + expect([expectedSkill], `Expected skill ${expectedSkill} but got ${actualSkill}`).toContain(actualSkill); + } finally { + fs.rmSync(tmpDir, { recursive: true, force: true }); + } + }, 90_000); + + test('journey-docs', async () => { + const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'routing-docs-')); + try { + initGitRepo(tmpDir); + installSkills(tmpDir); + + const run = (cmd: string, args: string[]) => + spawnSync(cmd, args, { cwd: tmpDir, stdio: 'pipe', timeout: 5000 }); + + fs.writeFileSync(path.join(tmpDir, 'README.md'), '# Waitlist App\nA simple waitlist management tool.\n'); + fs.mkdirSync(path.join(tmpDir, 'src'), { recursive: true }); + fs.writeFileSync(path.join(tmpDir, 'src/api.ts'), '// API code\n'); + run('git', ['add', '.']); + run('git', ['commit', '-m', 'feat: ship waitlist feature']); + + const testName = 'journey-docs'; + const expectedSkill = 'document-release'; + const result = await runSkillTest({ + prompt: "We just shipped the waitlist feature. Can you go through the README and any other docs and make sure they match what we actually built?", + workingDirectory: tmpDir, + maxTurns: 5, + allowedTools: ['Skill', 'Read', 'Bash', 'Glob', 'Grep'], + timeout: 60_000, + testName, + runId, + }); + + const skillCalls = result.toolCalls.filter(tc => tc.tool === 'Skill'); + const actualSkill = skillCalls.length > 0 ? skillCalls[0]?.input?.skill : undefined; + + logCost(`journey: ${testName}`, result); + recordRouting(testName, result, expectedSkill, actualSkill); + + expect(skillCalls.length, `Expected Skill tool to be called but got 0 calls. Claude may have answered directly without invoking a skill. Tool calls: ${result.toolCalls.map(tc => tc.tool).join(', ')}`).toBeGreaterThan(0); + expect([expectedSkill], `Expected skill ${expectedSkill} but got ${actualSkill}`).toContain(actualSkill); + } finally { + fs.rmSync(tmpDir, { recursive: true, force: true }); + } + }, 90_000); + + test('journey-retro', async () => { + const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'routing-retro-')); + try { + initGitRepo(tmpDir); + installSkills(tmpDir); + + const run = (cmd: string, args: string[]) => + spawnSync(cmd, args, { cwd: tmpDir, stdio: 'pipe', timeout: 5000 }); + + fs.writeFileSync(path.join(tmpDir, 'api.ts'), 'export function getParties() { return []; }\n'); + run('git', ['add', '.']); + run('git', ['commit', '-m', 'feat: add parties API', '--date', '2026-03-12T09:30:00']); + + fs.writeFileSync(path.join(tmpDir, 'ui.tsx'), 'export function WaitlistView() { return
Waitlist
; }\n'); + run('git', ['add', '.']); + run('git', ['commit', '-m', 'feat: add waitlist UI', '--date', '2026-03-13T14:00:00']); + + fs.writeFileSync(path.join(tmpDir, 'README.md'), '# Waitlist App\n'); + run('git', ['add', '.']); + run('git', ['commit', '-m', 'docs: add README', '--date', '2026-03-14T16:00:00']); + + const testName = 'journey-retro'; + const expectedSkill = 'retro'; + const result = await runSkillTest({ + prompt: "It's Friday. What did we ship this week? I want to do a quick retrospective on what the team accomplished.", + workingDirectory: tmpDir, + maxTurns: 5, + allowedTools: ['Skill', 'Read', 'Bash', 'Glob', 'Grep'], + timeout: 60_000, + testName, + runId, + }); + + const skillCalls = result.toolCalls.filter(tc => tc.tool === 'Skill'); + const actualSkill = skillCalls.length > 0 ? skillCalls[0]?.input?.skill : undefined; + + logCost(`journey: ${testName}`, result); + recordRouting(testName, result, expectedSkill, actualSkill); + + expect(skillCalls.length, `Expected Skill tool to be called but got 0 calls. Claude may have answered directly without invoking a skill. Tool calls: ${result.toolCalls.map(tc => tc.tool).join(', ')}`).toBeGreaterThan(0); + expect([expectedSkill], `Expected skill ${expectedSkill} but got ${actualSkill}`).toContain(actualSkill); + } finally { + fs.rmSync(tmpDir, { recursive: true, force: true }); + } + }, 90_000); + + test('journey-design-system', async () => { + const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'routing-design-system-')); + try { + initGitRepo(tmpDir); + installSkills(tmpDir); + + const run = (cmd: string, args: string[]) => + spawnSync(cmd, args, { cwd: tmpDir, stdio: 'pipe', timeout: 5000 }); + + fs.writeFileSync(path.join(tmpDir, 'package.json'), JSON.stringify({ name: 'waitlist-app' }, null, 2)); + run('git', ['add', '.']); + run('git', ['commit', '-m', 'initial']); + + const testName = 'journey-design-system'; + const expectedSkill = 'design-consultation'; + const result = await runSkillTest({ + prompt: "Before we build the UI, I want to establish a design system — typography, colors, spacing, the whole thing. Can you put together brand guidelines for this project?", + workingDirectory: tmpDir, + maxTurns: 5, + allowedTools: ['Skill', 'Read', 'Bash', 'Glob', 'Grep'], + timeout: 60_000, + testName, + runId, + }); + + const skillCalls = result.toolCalls.filter(tc => tc.tool === 'Skill'); + const actualSkill = skillCalls.length > 0 ? skillCalls[0]?.input?.skill : undefined; + + logCost(`journey: ${testName}`, result); + recordRouting(testName, result, expectedSkill, actualSkill); + + expect(skillCalls.length, `Expected Skill tool to be called but got 0 calls. Claude may have answered directly without invoking a skill. Tool calls: ${result.toolCalls.map(tc => tc.tool).join(', ')}`).toBeGreaterThan(0); + expect([expectedSkill], `Expected skill ${expectedSkill} but got ${actualSkill}`).toContain(actualSkill); + } finally { + fs.rmSync(tmpDir, { recursive: true, force: true }); + } + }, 90_000); + + test('journey-visual-qa', async () => { + const tmpDir = fs.mkdtempSync(path.join(os.tmpdir(), 'routing-visual-qa-')); + try { + initGitRepo(tmpDir); + installSkills(tmpDir); + + const run = (cmd: string, args: string[]) => + spawnSync(cmd, args, { cwd: tmpDir, stdio: 'pipe', timeout: 5000 }); + + fs.mkdirSync(path.join(tmpDir, 'src'), { recursive: true }); + fs.writeFileSync(path.join(tmpDir, 'src/styles.css'), ` +body { font-family: sans-serif; } +.header { font-size: 24px; margin: 20px; } +.card { padding: 16px; margin: 8px; border: 1px solid #ccc; } +.button { background: #007bff; color: white; padding: 10px 20px; } +`); + fs.writeFileSync(path.join(tmpDir, 'src/index.html'), ` + + + +
Waitlist
+
Party of 4 - Smith
+
Party of 2 - Jones
+ + +`); + run('git', ['add', '.']); + run('git', ['commit', '-m', 'initial UI']); + + const testName = 'journey-visual-qa'; + const expectedSkill = 'design-review'; + const result = await runSkillTest({ + prompt: "Something looks off on the site. The spacing between sections is inconsistent and the font sizes don't feel right. Can you audit the visual design and fix anything that doesn't look polished?", + workingDirectory: tmpDir, + maxTurns: 5, + allowedTools: ['Skill', 'Read', 'Bash', 'Glob', 'Grep'], + timeout: 60_000, + testName, + runId, + }); + + const skillCalls = result.toolCalls.filter(tc => tc.tool === 'Skill'); + const actualSkill = skillCalls.length > 0 ? skillCalls[0]?.input?.skill : undefined; + + logCost(`journey: ${testName}`, result); + recordRouting(testName, result, expectedSkill, actualSkill); + + expect(skillCalls.length, `Expected Skill tool to be called but got 0 calls. Claude may have answered directly without invoking a skill. Tool calls: ${result.toolCalls.map(tc => tc.tool).join(', ')}`).toBeGreaterThan(0); + expect([expectedSkill], `Expected skill ${expectedSkill} but got ${actualSkill}`).toContain(actualSkill); + } finally { + fs.rmSync(tmpDir, { recursive: true, force: true }); + } + }, 90_000); +}); diff --git a/test/skill-validation.test.ts b/test/skill-validation.test.ts index 3687ecee..292c1a81 100644 --- a/test/skill-validation.test.ts +++ b/test/skill-validation.test.ts @@ -1120,3 +1120,40 @@ describe('QA report template', () => { expect(content).toContain('**Precondition:**'); }); }); + +// --- Trigger phrase validation --- + +describe('Skill trigger phrases', () => { + // Skills that must have "Use when" trigger phrases in their description. + // Excluded: root gstack (browser tool), gstack-upgrade (gstack-specific), + // setup-browser-cookies (utility), humanizer (text tool), browse (subskill of gstack) + const SKILLS_REQUIRING_TRIGGERS = [ + 'qa', 'qa-only', 'ship', 'review', 'debug', 'office-hours', + 'plan-ceo-review', 'plan-eng-review', 'plan-design-review', + 'design-review', 'design-consultation', 'retro', 'document-release', + ]; + + for (const skill of SKILLS_REQUIRING_TRIGGERS) { + test(`${skill}/SKILL.md has "Use when" trigger phrases`, () => { + const skillPath = path.join(ROOT, skill, 'SKILL.md'); + if (!fs.existsSync(skillPath)) return; + const content = fs.readFileSync(skillPath, 'utf-8'); + // Extract description from frontmatter + const frontmatterEnd = content.indexOf('---', 4); + const frontmatter = content.slice(0, frontmatterEnd); + expect(frontmatter).toMatch(/Use when/i); + }); + } + + // Skills with proactive triggers should have "Proactively suggest" in description + for (const skill of SKILLS_REQUIRING_TRIGGERS) { + test(`${skill}/SKILL.md has "Proactively suggest" phrase`, () => { + const skillPath = path.join(ROOT, skill, 'SKILL.md'); + if (!fs.existsSync(skillPath)) return; + const content = fs.readFileSync(skillPath, 'utf-8'); + const frontmatterEnd = content.indexOf('---', 4); + const frontmatter = content.slice(0, frontmatterEnd); + expect(frontmatter).toMatch(/Proactively suggest/i); + }); + } +}); diff --git a/test/touchfiles.test.ts b/test/touchfiles.test.ts index 48613d64..b3f844d8 100644 --- a/test/touchfiles.test.ts +++ b/test/touchfiles.test.ts @@ -115,7 +115,8 @@ describe('selectTests', () => { expect(result.selected).toContain('plan-ceo-review-selective'); expect(result.selected).toContain('retro'); expect(result.selected).toContain('retro-base-branch'); - expect(result.selected.length).toBe(4); + // Also selects journey routing tests (*/SKILL.md.tmpl matches retro/SKILL.md.tmpl) + expect(result.selected.length).toBeGreaterThanOrEqual(4); }); test('works with LLM_JUDGE_TOUCHFILES', () => { @@ -125,13 +126,15 @@ describe('selectTests', () => { expect(result.selected.length).toBe(2); }); - test('SKILL.md.tmpl root template only selects root-dependent tests', () => { + test('SKILL.md.tmpl root template selects root-dependent tests and routing tests', () => { const result = selectTests(['SKILL.md.tmpl'], E2E_TOUCHFILES); // Should select the 7 tests that depend on root SKILL.md expect(result.selected).toContain('skillmd-setup-discovery'); expect(result.selected).toContain('contributor-mode'); expect(result.selected).toContain('session-awareness'); - // Should NOT select unrelated tests + // Also selects journey routing tests (SKILL.md.tmpl in their touchfiles) + expect(result.selected).toContain('journey-ideation'); + // Should NOT select unrelated non-routing tests expect(result.selected).not.toContain('plan-ceo-review'); expect(result.selected).not.toContain('retro'); }); From 2a206920edffe299a48a5bfb4c02f7bd6243edb1 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Wed, 18 Mar 2026 23:42:15 -0500 Subject: [PATCH 2/4] fix: /retro midnight-aligned dates + local timezone (v0.7.2) (#199) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix: use midnight-aligned dates and local timezone in /retro /retro was using --since="7 days ago" which is relative to current time, so running at 9pm gives a misleading "Mar 11 to Mar 18" title when data actually starts at 9pm Mar 11. Now computes absolute midnight-aligned start dates (--since="2026-03-11") for full calendar days. Also removes hardcoded Pacific time (TZ=America/Los_Angeles) throughout the template — all timestamps now use the user's local timezone, which is correct for a global user base. * chore: bump version and changelog (v0.7.2) Co-Authored-By: Claude Opus 4.6 --------- Co-authored-by: Claude Opus 4.6 --- CHANGELOG.md | 7 +++++++ VERSION | 2 +- retro/SKILL.md | 23 ++++++++++++----------- retro/SKILL.md.tmpl | 23 ++++++++++++----------- 4 files changed, 32 insertions(+), 23 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index f1790add..cc88928b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,12 @@ # Changelog +## [0.7.2] - 2026-03-18 + +### Fixed + +- `/retro` date ranges now align to midnight instead of the current time. Running `/retro` at 9pm no longer silently drops the morning of the start date — you get full calendar days. +- `/retro` timestamps now use your local timezone instead of hardcoded Pacific time. Users outside the US-West coast get correct local hours in histograms, session detection, and streak tracking. + ## [0.7.1] - 2026-03-19 ### Added diff --git a/VERSION b/VERSION index 39e898a4..7486fdbc 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.7.1 +0.7.2 diff --git a/retro/SKILL.md b/retro/SKILL.md index 90fb547e..a4458c22 100644 --- a/retro/SKILL.md +++ b/retro/SKILL.md @@ -178,7 +178,9 @@ When the user types `/retro`, run this skill. ## Instructions -Parse the argument to determine the time window. Default to 7 days if no argument given. Use `--since="N days ago"`, `--since="N hours ago"`, or `--since="N weeks ago"` (for `w` units) for git log queries. All times should be reported in **Pacific time** (use `TZ=America/Los_Angeles` when converting timestamps). +Parse the argument to determine the time window. Default to 7 days if no argument given. All times should be reported in the user's **local timezone** (use the system default — do NOT set `TZ`). + +**Midnight-aligned windows:** For day (`d`) and week (`w`) units, compute an absolute start date at local midnight, not a relative string. For example, if today is 2026-03-18 and the window is 7 days: the start date is 2026-03-11. Use `--since="2026-03-11"` for git log queries — git interprets a bare date as midnight in the local timezone, so this captures full calendar days regardless of what time the retro runs. For week units, multiply by 7 to get days (e.g., `2w` = 14 days back). For hour (`h`) units, use `--since="N hours ago"` since midnight alignment does not apply to sub-day windows. **Argument validation:** If the argument doesn't match a number followed by `d`, `h`, or `w`, the word `compare`, or `compare` followed by a number and `d`/`h`/`w`, show this usage and stop: ``` @@ -215,8 +217,7 @@ git log origin/ --since="" --format="%H|%aN|%ae|%ai|%s" --short git log origin/ --since="" --format="COMMIT:%H|%aN" --numstat # 3. Commit timestamps for session detection and hourly distribution (with author) -# Use TZ=America/Los_Angeles for Pacific time conversion -TZ=America/Los_Angeles git log origin/ --since="" --format="%at|%aN|%ai|%s" | sort -n +git log origin/ --since="" --format="%at|%aN|%ai|%s" | sort -n # 4. Files most frequently changed (hotspot analysis) git log origin/ --since="" --format="" --name-only | grep -v '^$' | sort | uniq -c | sort -rn @@ -296,7 +297,7 @@ If TODOS.md doesn't exist, skip the Backlog Health row. ### Step 3: Commit Time Distribution -Show hourly histogram in Pacific time using bar chart: +Show hourly histogram in local time using bar chart: ``` Hour Commits ████████████████ @@ -400,11 +401,11 @@ If the time window is 14 days or more, split into weekly buckets and show trends Count consecutive days with at least 1 commit to origin/, going back from today. Track both team streak and personal streak: ```bash -# Team streak: all unique commit dates (Pacific time) — no hard cutoff -TZ=America/Los_Angeles git log origin/ --format="%ad" --date=format:"%Y-%m-%d" | sort -u +# Team streak: all unique commit dates (local time) — no hard cutoff +git log origin/ --format="%ad" --date=format:"%Y-%m-%d" | sort -u # Personal streak: only the current user's commits -TZ=America/Los_Angeles git log origin/ --author="" --format="%ad" --date=format:"%Y-%m-%d" | sort -u +git log origin/ --author="" --format="%ad" --date=format:"%Y-%m-%d" | sort -u ``` Count backward from today — how many consecutive days have at least one commit? This queries the full history so streaks of any length are reported accurately. Display both: @@ -443,7 +444,7 @@ mkdir -p .context/retros Determine the next sequence number for today (substitute the actual date for `$(date +%Y-%m-%d)`): ```bash # Count existing retros for today to get next sequence number -today=$(TZ=America/Los_Angeles date +%Y-%m-%d) +today=$(date +%Y-%m-%d) existing=$(ls .context/retros/${today}-*.json 2>/dev/null | wc -l | tr -d ' ') next=$((existing + 1)) # Save as .context/retros/${today}-${next}.json @@ -617,8 +618,8 @@ Small, practical, realistic. Each must be something that takes <5 minutes to ado When the user runs `/retro compare` (or `/retro compare 14d`): -1. Compute metrics for the current window (default 7d) using `--since="7 days ago"` -2. Compute metrics for the immediately prior same-length window using both `--since` and `--until` to avoid overlap (e.g., `--since="14 days ago" --until="7 days ago"` for a 7d window) +1. Compute metrics for the current window (default 7d) using the midnight-aligned start date (same logic as the main retro — e.g., if today is 2026-03-18 and window is 7d, use `--since="2026-03-11"`) +2. Compute metrics for the immediately prior same-length window using both `--since` and `--until` with midnight-aligned dates to avoid overlap (e.g., for a 7d window starting 2026-03-11: prior window is `--since="2026-03-04" --until="2026-03-11"`) 3. Show a side-by-side comparison table with deltas and arrows 4. Write a brief narrative highlighting the biggest improvements and regressions 5. Save only the current-window snapshot to `.context/retros/` (same as a normal retro run); do **not** persist the prior-window metrics. @@ -640,7 +641,7 @@ When the user runs `/retro compare` (or `/retro compare 14d`): - ALL narrative output goes directly to the user in the conversation. The ONLY file written is the `.context/retros/` JSON snapshot. - Use `origin/` for all git queries (not local main which may be stale) -- Convert all timestamps to Pacific time for display (use `TZ=America/Los_Angeles`) +- Display all timestamps in the user's local timezone (do not override `TZ`) - If the window has zero commits, say so and suggest a different window - Round LOC/hour to nearest 50 - Treat merge commits as PR boundaries diff --git a/retro/SKILL.md.tmpl b/retro/SKILL.md.tmpl index 41a48e7f..95ee706e 100644 --- a/retro/SKILL.md.tmpl +++ b/retro/SKILL.md.tmpl @@ -44,7 +44,9 @@ When the user types `/retro`, run this skill. ## Instructions -Parse the argument to determine the time window. Default to 7 days if no argument given. Use `--since="N days ago"`, `--since="N hours ago"`, or `--since="N weeks ago"` (for `w` units) for git log queries. All times should be reported in **Pacific time** (use `TZ=America/Los_Angeles` when converting timestamps). +Parse the argument to determine the time window. Default to 7 days if no argument given. All times should be reported in the user's **local timezone** (use the system default — do NOT set `TZ`). + +**Midnight-aligned windows:** For day (`d`) and week (`w`) units, compute an absolute start date at local midnight, not a relative string. For example, if today is 2026-03-18 and the window is 7 days: the start date is 2026-03-11. Use `--since="2026-03-11"` for git log queries — git interprets a bare date as midnight in the local timezone, so this captures full calendar days regardless of what time the retro runs. For week units, multiply by 7 to get days (e.g., `2w` = 14 days back). For hour (`h`) units, use `--since="N hours ago"` since midnight alignment does not apply to sub-day windows. **Argument validation:** If the argument doesn't match a number followed by `d`, `h`, or `w`, the word `compare`, or `compare` followed by a number and `d`/`h`/`w`, show this usage and stop: ``` @@ -81,8 +83,7 @@ git log origin/ --since="" --format="%H|%aN|%ae|%ai|%s" --short git log origin/ --since="" --format="COMMIT:%H|%aN" --numstat # 3. Commit timestamps for session detection and hourly distribution (with author) -# Use TZ=America/Los_Angeles for Pacific time conversion -TZ=America/Los_Angeles git log origin/ --since="" --format="%at|%aN|%ai|%s" | sort -n +git log origin/ --since="" --format="%at|%aN|%ai|%s" | sort -n # 4. Files most frequently changed (hotspot analysis) git log origin/ --since="" --format="" --name-only | grep -v '^$' | sort | uniq -c | sort -rn @@ -162,7 +163,7 @@ If TODOS.md doesn't exist, skip the Backlog Health row. ### Step 3: Commit Time Distribution -Show hourly histogram in Pacific time using bar chart: +Show hourly histogram in local time using bar chart: ``` Hour Commits ████████████████ @@ -266,11 +267,11 @@ If the time window is 14 days or more, split into weekly buckets and show trends Count consecutive days with at least 1 commit to origin/, going back from today. Track both team streak and personal streak: ```bash -# Team streak: all unique commit dates (Pacific time) — no hard cutoff -TZ=America/Los_Angeles git log origin/ --format="%ad" --date=format:"%Y-%m-%d" | sort -u +# Team streak: all unique commit dates (local time) — no hard cutoff +git log origin/ --format="%ad" --date=format:"%Y-%m-%d" | sort -u # Personal streak: only the current user's commits -TZ=America/Los_Angeles git log origin/ --author="" --format="%ad" --date=format:"%Y-%m-%d" | sort -u +git log origin/ --author="" --format="%ad" --date=format:"%Y-%m-%d" | sort -u ``` Count backward from today — how many consecutive days have at least one commit? This queries the full history so streaks of any length are reported accurately. Display both: @@ -309,7 +310,7 @@ mkdir -p .context/retros Determine the next sequence number for today (substitute the actual date for `$(date +%Y-%m-%d)`): ```bash # Count existing retros for today to get next sequence number -today=$(TZ=America/Los_Angeles date +%Y-%m-%d) +today=$(date +%Y-%m-%d) existing=$(ls .context/retros/${today}-*.json 2>/dev/null | wc -l | tr -d ' ') next=$((existing + 1)) # Save as .context/retros/${today}-${next}.json @@ -483,8 +484,8 @@ Small, practical, realistic. Each must be something that takes <5 minutes to ado When the user runs `/retro compare` (or `/retro compare 14d`): -1. Compute metrics for the current window (default 7d) using `--since="7 days ago"` -2. Compute metrics for the immediately prior same-length window using both `--since` and `--until` to avoid overlap (e.g., `--since="14 days ago" --until="7 days ago"` for a 7d window) +1. Compute metrics for the current window (default 7d) using the midnight-aligned start date (same logic as the main retro — e.g., if today is 2026-03-18 and window is 7d, use `--since="2026-03-11"`) +2. Compute metrics for the immediately prior same-length window using both `--since` and `--until` with midnight-aligned dates to avoid overlap (e.g., for a 7d window starting 2026-03-11: prior window is `--since="2026-03-04" --until="2026-03-11"`) 3. Show a side-by-side comparison table with deltas and arrows 4. Write a brief narrative highlighting the biggest improvements and regressions 5. Save only the current-window snapshot to `.context/retros/` (same as a normal retro run); do **not** persist the prior-window metrics. @@ -506,7 +507,7 @@ When the user runs `/retro compare` (or `/retro compare 14d`): - ALL narrative output goes directly to the user in the conversation. The ONLY file written is the `.context/retros/` JSON snapshot. - Use `origin/` for all git queries (not local main which may be stale) -- Convert all timestamps to Pacific time for display (use `TZ=America/Los_Angeles`) +- Display all timestamps in the user's local timezone (do not override `TZ`) - If the window has zero commits, say so and suggest a different window - Round LOC/hour to nearest 50 - Treat merge commits as PR boundaries From c4f679d829c25d7cdf61435227c9e533a3a1b4b0 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Wed, 18 Mar 2026 23:57:59 -0500 Subject: [PATCH 3/4] feat: safety hook skills + skill usage telemetry (v0.7.1) (#189) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * feat: add /careful, /freeze, /guard, /unfreeze safety hook skills Four new on-demand skills using Claude Code's PreToolUse hooks: - /careful: warns before destructive commands (rm -rf, DROP TABLE, force-push, etc.) - /freeze: blocks file edits outside a specified directory - /guard: composes both into one command - /unfreeze: clears freeze boundary without ending session Pure bash hook scripts with Python fallback for JSON edge cases. Safe exceptions for build artifacts (node_modules, dist, .next, etc.). Hook fire telemetry logs pattern name only (never command content). Co-Authored-By: Claude Opus 4.6 (1M context) * feat: add skill usage telemetry to preamble TemplateContext system passes skill name through resolver pipeline so each generated SKILL.md gets its own name baked into the telemetry line. Appends to ~/.gstack/analytics/skill-usage.jsonl on every invocation. Covers 14 preamble-using skills + 4 hook skills (inline telemetry). JSONL format: {"skill":"ship","ts":"...","repo":"my-project"} Co-Authored-By: Claude Opus 4.6 (1M context) * feat: add analytics CLI for skill usage stats bun run analytics reads ~/.gstack/analytics/skill-usage.jsonl and shows top skills, per-repo breakdown, hook fire stats, and daily timeline. Supports --period 7d/30d/all. Handles missing/empty/malformed data. 22 unit tests cover parsing, filtering, formatting, and edge cases. Co-Authored-By: Claude Opus 4.6 (1M context) * feat: add skills-used-this-week to /retro Retro Step 2 now reads skill-usage.jsonl and shows which gstack skills were used during the retro window. Follows the same pattern as the Greptile signal and Backlog Health metrics — read file, filter by date, aggregate, present. Skips silently if no analytics data exists. Co-Authored-By: Claude Opus 4.6 (1M context) * test: add hook script and telemetry tests 32 unit tests for check-careful.sh covering all 8 destructive patterns, safe exceptions, Python fallback, and malformed input handling. 7 unit tests for check-freeze.sh covering boundary enforcement, trailing slash edge case, and missing state file. Telemetry tests verify per-skill name correctness in generated output. Adds careful/freeze/guard/unfreeze/document-release to ALL_SKILLS. Co-Authored-By: Claude Opus 4.6 (1M context) * chore: bump version to 0.6.5 + changelog + mark TODOs shipped Safety hook skills and skill usage telemetry shipped. Analytics CLI and /retro integration included. Co-Authored-By: Claude Opus 4.6 (1M context) * feat: /debug auto-freezes edits to the module being debugged Add PreToolUse hooks (Edit/Write) to debug/SKILL.md.tmpl that reference the existing freeze/bin/check-freeze.sh. After Phase 1 investigation, /debug locks edits to the narrowest affected directory. Graceful degradation: if freeze script is unavailable, scope lock is skipped. Users can run /unfreeze to remove the restriction. Deferred 6 enhancements to TODOS.md, gated on telemetry showing the freeze hook actually fires in real debugging sessions. Co-Authored-By: Claude Opus 4.6 (1M context) --------- Co-authored-by: Claude Opus 4.6 (1M context) --- CHANGELOG.md | 11 + SKILL.md | 2 + TODOS.md | 47 +++-- VERSION | 2 +- browse/SKILL.md | 2 + careful/SKILL.md | 59 ++++++ careful/SKILL.md.tmpl | 57 +++++ careful/bin/check-careful.sh | 112 ++++++++++ debug/SKILL.md | 39 ++++ debug/SKILL.md.tmpl | 37 ++++ design-consultation/SKILL.md | 2 + design-review/SKILL.md | 2 + document-release/SKILL.md | 2 + freeze/SKILL.md | 82 ++++++++ freeze/SKILL.md.tmpl | 80 +++++++ freeze/bin/check-freeze.sh | 68 ++++++ guard/SKILL.md | 82 ++++++++ guard/SKILL.md.tmpl | 80 +++++++ office-hours/SKILL.md | 2 + package.json | 3 +- plan-ceo-review/SKILL.md | 2 + plan-design-review/SKILL.md | 2 + plan-eng-review/SKILL.md | 2 + qa-only/SKILL.md | 2 + qa/SKILL.md | 2 + retro/SKILL.md | 10 + retro/SKILL.md.tmpl | 8 + review/SKILL.md | 2 + scripts/analytics.ts | 190 +++++++++++++++++ scripts/gen-skill-docs.ts | 42 ++-- setup-browser-cookies/SKILL.md | 2 + ship/SKILL.md | 2 + test/analytics.test.ts | 277 ++++++++++++++++++++++++ test/gen-skill-docs.test.ts | 25 +++ test/hook-scripts.test.ts | 373 +++++++++++++++++++++++++++++++++ unfreeze/SKILL.md | 40 ++++ unfreeze/SKILL.md.tmpl | 38 ++++ 37 files changed, 1754 insertions(+), 36 deletions(-) create mode 100644 careful/SKILL.md create mode 100644 careful/SKILL.md.tmpl create mode 100755 careful/bin/check-careful.sh create mode 100644 freeze/SKILL.md create mode 100644 freeze/SKILL.md.tmpl create mode 100755 freeze/bin/check-freeze.sh create mode 100644 guard/SKILL.md create mode 100644 guard/SKILL.md.tmpl create mode 100644 scripts/analytics.ts create mode 100644 test/analytics.test.ts create mode 100644 test/hook-scripts.test.ts create mode 100644 unfreeze/SKILL.md create mode 100644 unfreeze/SKILL.md.tmpl diff --git a/CHANGELOG.md b/CHANGELOG.md index cc88928b..f84810f9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,16 @@ # Changelog +## [0.7.3] - 2026-03-18 + +### Added + +- **Safety guardrails you can turn on with one command.** Say "be careful" or "safety mode" and `/careful` will warn you before any destructive command — `rm -rf`, `DROP TABLE`, force-push, `kubectl delete`, and more. You can override every warning. Common build artifact cleanups (`rm -rf node_modules`, `dist`, `.next`) are whitelisted. +- **Lock edits to one folder with `/freeze`.** Debugging something and don't want Claude to "fix" unrelated code? `/freeze` blocks all file edits outside a directory you choose. Hard block, not just a warning. Run `/unfreeze` to remove the restriction without ending your session. +- **`/guard` activates both at once.** One command for maximum safety when touching prod or live systems — destructive command warnings plus directory-scoped edit restrictions. +- **`/debug` now auto-freezes edits to the module being debugged.** After forming a root cause hypothesis, `/debug` locks edits to the narrowest affected directory. No more accidental "fixes" to unrelated code during debugging. +- **You can now see which skills you use and how often.** Every skill invocation is logged locally to `~/.gstack/analytics/skill-usage.jsonl`. Run `bun run analytics` to see your top skills, per-repo breakdown, and how often safety hooks actually catch something. Data stays on your machine. +- **Weekly retros now include skill usage.** `/retro` shows which skills you used during the retro window alongside your usual commit analysis and metrics. + ## [0.7.2] - 2026-03-18 ### Fixed diff --git a/SKILL.md b/SKILL.md index dd06f058..c04c1480 100644 --- a/SKILL.md +++ b/SKILL.md @@ -56,6 +56,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"gstack","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" ``` diff --git a/TODOS.md b/TODOS.md index 8f47cabc..bb85a56d 100644 --- a/TODOS.md +++ b/TODOS.md @@ -506,34 +506,37 @@ Shipped as v0.5.0 on main. Includes `/plan-design-review` (report-only design au ## Safety & Observability -### On-demand hook skills (/careful, /freeze, /guard) +### On-demand hook skills (/careful, /freeze, /guard) — SHIPPED -**What:** Three new skills that use Claude Code's session-scoped PreToolUse hooks to add safety guardrails on demand. +~~**What:** Three new skills that use Claude Code's session-scoped PreToolUse hooks to add safety guardrails on demand.~~ -**Why:** Anthropic's internal skill best practices recommend on-demand hooks for safety. Claude Code already handles destructive command permissions, but these add an explicit opt-in layer for high-risk sessions (touching prod, debugging live systems). +Shipped as `/careful`, `/freeze`, `/guard`, and `/unfreeze` in v0.6.5. Includes hook fire-rate telemetry (pattern name only, no command content) and inline skill activation telemetry. -**Skills:** -- `/careful` — PreToolUse hook on Bash tool. Warns (not blocks) before destructive commands: `rm -rf`, `DROP TABLE`, `git push --force`, `git reset --hard`, `kubectl delete`, `docker system prune`. Uses `permissionDecision: "ask"` so user can override. -- `/freeze` — PreToolUse hook on Edit/Write tools. Restricts file edits to a user-specified directory. Great for debugging without accidentally "fixing" unrelated code. -- `/guard` — meta-skill composing `/careful` + `/freeze` into one command. +### Skill usage telemetry — SHIPPED -**Implementation notes:** Use `${CLAUDE_SKILL_DIR}` (not `${SKILL_DIR}`) for script paths in hook commands. Pure bash JSON parsing (no jq dependency). Freeze dir storage: `${CLAUDE_PLUGIN_DATA}/freeze-dir.txt` with `~/.gstack/freeze-dir.txt` fallback. Ensure trailing `/` on freeze dir paths to prevent `/src` matching `/src-old`. +~~**What:** Track which skills get invoked, how often, from which repo.~~ -**Effort:** M (human) / S (CC) +Shipped in v0.6.5. TemplateContext in gen-skill-docs.ts bakes skill name into preamble telemetry line. Analytics CLI (`bun run analytics`) for querying. /retro integration shows skills-used-this-week. + +### /debug scoped debugging enhancements (gated on telemetry) + +**What:** Six enhancements to /debug auto-freeze, contingent on telemetry showing the freeze hook actually fires in real debugging sessions. + +**Why:** /debug v0.7.1 auto-freezes edits to the module being debugged. If telemetry shows the hook fires often, these enhancements make the experience smarter. If it never fires, the problem wasn't real and these aren't worth building. + +**Context:** All items are prose additions to `debug/SKILL.md.tmpl`. No new scripts. + +**Items:** +1. Stack trace auto-detection for freeze directory (parse deepest app frame) +2. Freeze boundary widening (ask to widen instead of hard-block when hitting boundary) +3. Post-fix auto-unfreeze + full test suite run +4. Debug instrumentation cleanup (tag with DEBUG-TEMP, remove before commit) +5. Debug session persistence (~/.gstack/debug-sessions/ — save investigation for reuse) +6. Investigation timeline in debug report (hypothesis log with timing) + +**Effort:** M (all 6 combined) **Priority:** P3 -**Depends on:** None - -### Skill usage telemetry - -**What:** Track which skills get invoked, how often, from which repo. - -**Why:** Enables finding undertriggering skills and measuring adoption. Anthropic uses a PreToolUse hook for this; simpler approach is appending JSONL from the preamble. - -**Context:** Add to `generatePreamble()` in `scripts/gen-skill-docs.ts`. Append to `~/.gstack/analytics/skill-usage.jsonl` with skill name, timestamp, and repo name. `mkdir -p` ensures the directory exists. - -**Effort:** S (human) / S (CC) -**Priority:** P3 -**Depends on:** None +**Depends on:** Telemetry data showing freeze hook fires in real /debug sessions ## Completed diff --git a/VERSION b/VERSION index 7486fdbc..f38fc539 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.7.2 +0.7.3 diff --git a/browse/SKILL.md b/browse/SKILL.md index 3c452c84..5c3bf096 100644 --- a/browse/SKILL.md +++ b/browse/SKILL.md @@ -31,6 +31,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"browse","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" ``` diff --git a/careful/SKILL.md b/careful/SKILL.md new file mode 100644 index 00000000..7513b293 --- /dev/null +++ b/careful/SKILL.md @@ -0,0 +1,59 @@ +--- +name: careful +version: 0.1.0 +description: | + Safety guardrails for destructive commands. Warns before rm -rf, DROP TABLE, + force-push, git reset --hard, kubectl delete, and similar destructive operations. + User can override each warning. Use when touching prod, debugging live systems, + or working in a shared environment. Use when asked to "be careful", "safety mode", + "prod mode", or "careful mode". +allowed-tools: + - Bash + - Read +hooks: + PreToolUse: + - matcher: "Bash" + hooks: + - type: command + command: "bash ${CLAUDE_SKILL_DIR}/bin/check-careful.sh" + statusMessage: "Checking for destructive commands..." +--- + + + +# /careful — Destructive Command Guardrails + +Safety mode is now **active**. Every bash command will be checked for destructive +patterns before running. If a destructive command is detected, you'll be warned +and can choose to proceed or cancel. + +```bash +mkdir -p ~/.gstack/analytics +echo '{"skill":"careful","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true +``` + +## What's protected + +| Pattern | Example | Risk | +|---------|---------|------| +| `rm -rf` / `rm -r` / `rm --recursive` | `rm -rf /var/data` | Recursive delete | +| `DROP TABLE` / `DROP DATABASE` | `DROP TABLE users;` | Data loss | +| `TRUNCATE` | `TRUNCATE orders;` | Data loss | +| `git push --force` / `-f` | `git push -f origin main` | History rewrite | +| `git reset --hard` | `git reset --hard HEAD~3` | Uncommitted work loss | +| `git checkout .` / `git restore .` | `git checkout .` | Uncommitted work loss | +| `kubectl delete` | `kubectl delete pod` | Production impact | +| `docker rm -f` / `docker system prune` | `docker system prune -a` | Container/image loss | + +## Safe exceptions + +These patterns are allowed without warning: +- `rm -rf node_modules` / `.next` / `dist` / `__pycache__` / `.cache` / `build` / `.turbo` / `coverage` + +## How it works + +The hook reads the command from the tool input JSON, checks it against the +patterns above, and returns `permissionDecision: "ask"` with a warning message +if a match is found. You can always override the warning and proceed. + +To deactivate, end the conversation or start a new one. Hooks are session-scoped. diff --git a/careful/SKILL.md.tmpl b/careful/SKILL.md.tmpl new file mode 100644 index 00000000..d8bd4662 --- /dev/null +++ b/careful/SKILL.md.tmpl @@ -0,0 +1,57 @@ +--- +name: careful +version: 0.1.0 +description: | + Safety guardrails for destructive commands. Warns before rm -rf, DROP TABLE, + force-push, git reset --hard, kubectl delete, and similar destructive operations. + User can override each warning. Use when touching prod, debugging live systems, + or working in a shared environment. Use when asked to "be careful", "safety mode", + "prod mode", or "careful mode". +allowed-tools: + - Bash + - Read +hooks: + PreToolUse: + - matcher: "Bash" + hooks: + - type: command + command: "bash ${CLAUDE_SKILL_DIR}/bin/check-careful.sh" + statusMessage: "Checking for destructive commands..." +--- + +# /careful — Destructive Command Guardrails + +Safety mode is now **active**. Every bash command will be checked for destructive +patterns before running. If a destructive command is detected, you'll be warned +and can choose to proceed or cancel. + +```bash +mkdir -p ~/.gstack/analytics +echo '{"skill":"careful","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true +``` + +## What's protected + +| Pattern | Example | Risk | +|---------|---------|------| +| `rm -rf` / `rm -r` / `rm --recursive` | `rm -rf /var/data` | Recursive delete | +| `DROP TABLE` / `DROP DATABASE` | `DROP TABLE users;` | Data loss | +| `TRUNCATE` | `TRUNCATE orders;` | Data loss | +| `git push --force` / `-f` | `git push -f origin main` | History rewrite | +| `git reset --hard` | `git reset --hard HEAD~3` | Uncommitted work loss | +| `git checkout .` / `git restore .` | `git checkout .` | Uncommitted work loss | +| `kubectl delete` | `kubectl delete pod` | Production impact | +| `docker rm -f` / `docker system prune` | `docker system prune -a` | Container/image loss | + +## Safe exceptions + +These patterns are allowed without warning: +- `rm -rf node_modules` / `.next` / `dist` / `__pycache__` / `.cache` / `build` / `.turbo` / `coverage` + +## How it works + +The hook reads the command from the tool input JSON, checks it against the +patterns above, and returns `permissionDecision: "ask"` with a warning message +if a match is found. You can always override the warning and proceed. + +To deactivate, end the conversation or start a new one. Hooks are session-scoped. diff --git a/careful/bin/check-careful.sh b/careful/bin/check-careful.sh new file mode 100755 index 00000000..c8bc2c7a --- /dev/null +++ b/careful/bin/check-careful.sh @@ -0,0 +1,112 @@ +#!/usr/bin/env bash +# check-careful.sh — PreToolUse hook for /careful skill +# Reads JSON from stdin, checks Bash command for destructive patterns. +# Returns {"permissionDecision":"ask","message":"..."} to warn, or {} to allow. +set -euo pipefail + +# Read stdin (JSON with tool_input) +INPUT=$(cat) + +# Extract the "command" field value from tool_input +# Try grep/sed first (handles 99% of cases), fall back to Python for escaped quotes +CMD=$(printf '%s' "$INPUT" | grep -o '"command"[[:space:]]*:[[:space:]]*"[^"]*"' | head -1 | sed 's/.*:[[:space:]]*"//;s/"$//' || true) + +# Python fallback if grep returned empty (e.g., escaped quotes in command) +if [ -z "$CMD" ]; then + CMD=$(printf '%s' "$INPUT" | python3 -c 'import sys,json; print(json.loads(sys.stdin.read()).get("tool_input",{}).get("command",""))' 2>/dev/null || true) +fi + +# If we still couldn't extract a command, allow +if [ -z "$CMD" ]; then + echo '{}' + exit 0 +fi + +# Normalize: lowercase for case-insensitive SQL matching +CMD_LOWER=$(printf '%s' "$CMD" | tr '[:upper:]' '[:lower:]') + +# --- Check for safe exceptions (rm -rf of build artifacts) --- +if printf '%s' "$CMD" | grep -qE 'rm\s+(-[a-zA-Z]*r[a-zA-Z]*\s+|--recursive\s+)' 2>/dev/null; then + SAFE_ONLY=true + RM_ARGS=$(printf '%s' "$CMD" | sed -E 's/.*rm\s+(-[a-zA-Z]+\s+)*//;s/--recursive\s*//') + for target in $RM_ARGS; do + case "$target" in + */node_modules|node_modules|*/\.next|\.next|*/dist|dist|*/__pycache__|__pycache__|*/\.cache|\.cache|*/build|build|*/\.turbo|\.turbo|*/coverage|coverage) + ;; # safe target + -*) + ;; # flag, skip + *) + SAFE_ONLY=false + break + ;; + esac + done + if [ "$SAFE_ONLY" = true ]; then + echo '{}' + exit 0 + fi +fi + +# --- Destructive pattern checks --- +WARN="" +PATTERN="" + +# rm -rf / rm -r / rm --recursive +if printf '%s' "$CMD" | grep -qE 'rm\s+(-[a-zA-Z]*r|--recursive)' 2>/dev/null; then + WARN="Destructive: recursive delete (rm -r). This permanently removes files." + PATTERN="rm_recursive" +fi + +# DROP TABLE / DROP DATABASE +if [ -z "$WARN" ] && printf '%s' "$CMD_LOWER" | grep -qE 'drop\s+(table|database)' 2>/dev/null; then + WARN="Destructive: SQL DROP detected. This permanently deletes database objects." + PATTERN="drop_table" +fi + +# TRUNCATE +if [ -z "$WARN" ] && printf '%s' "$CMD_LOWER" | grep -qE '\btruncate\b' 2>/dev/null; then + WARN="Destructive: SQL TRUNCATE detected. This deletes all rows from a table." + PATTERN="truncate" +fi + +# git push --force / git push -f +if [ -z "$WARN" ] && printf '%s' "$CMD" | grep -qE 'git\s+push\s+.*(-f\b|--force)' 2>/dev/null; then + WARN="Destructive: git force-push rewrites remote history. Other contributors may lose work." + PATTERN="git_force_push" +fi + +# git reset --hard +if [ -z "$WARN" ] && printf '%s' "$CMD" | grep -qE 'git\s+reset\s+--hard' 2>/dev/null; then + WARN="Destructive: git reset --hard discards all uncommitted changes." + PATTERN="git_reset_hard" +fi + +# git checkout . / git restore . +if [ -z "$WARN" ] && printf '%s' "$CMD" | grep -qE 'git\s+(checkout|restore)\s+\.' 2>/dev/null; then + WARN="Destructive: discards all uncommitted changes in the working tree." + PATTERN="git_discard" +fi + +# kubectl delete +if [ -z "$WARN" ] && printf '%s' "$CMD" | grep -qE 'kubectl\s+delete' 2>/dev/null; then + WARN="Destructive: kubectl delete removes Kubernetes resources. May impact production." + PATTERN="kubectl_delete" +fi + +# docker rm -f / docker system prune +if [ -z "$WARN" ] && printf '%s' "$CMD" | grep -qE 'docker\s+(rm\s+-f|system\s+prune)' 2>/dev/null; then + WARN="Destructive: Docker force-remove or prune. May delete running containers or cached images." + PATTERN="docker_destructive" +fi + +# --- Output --- +if [ -n "$WARN" ]; then + # Log hook fire event (pattern name only, never command content) + mkdir -p ~/.gstack/analytics 2>/dev/null || true + echo '{"event":"hook_fire","skill":"careful","pattern":"'"$PATTERN"'","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true + + WARN_ESCAPED=$(printf '%s' "$WARN" | sed 's/"/\\"/g') + printf '{"permissionDecision":"ask","message":"[careful] %s"}\n' "$WARN_ESCAPED" +else + echo '{}' +fi diff --git a/debug/SKILL.md b/debug/SKILL.md index c1314556..c61d1f40 100644 --- a/debug/SKILL.md +++ b/debug/SKILL.md @@ -16,6 +16,18 @@ allowed-tools: - Grep - Glob - AskUserQuestion +hooks: + PreToolUse: + - matcher: "Edit" + hooks: + - type: command + command: "bash ${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh" + statusMessage: "Checking debug scope boundary..." + - matcher: "Write" + hooks: + - type: command + command: "bash ${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh" + statusMessage: "Checking debug scope boundary..." --- @@ -34,6 +46,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"debug","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" ``` @@ -184,6 +198,31 @@ Output: **"Root cause hypothesis: ..."** — a specific, testable claim about wh --- +## Scope Lock + +After forming your root cause hypothesis, lock edits to the affected module to prevent scope creep. + +```bash +[ -x "${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh" ] && echo "FREEZE_AVAILABLE" || echo "FREEZE_UNAVAILABLE" +``` + +**If FREEZE_AVAILABLE:** Identify the narrowest directory containing the affected files. Write it to the freeze state file: + +```bash +STATE_DIR="${CLAUDE_PLUGIN_DATA:-$HOME/.gstack}" +mkdir -p "$STATE_DIR" +echo "/" > "$STATE_DIR/freeze-dir.txt" +echo "Debug scope locked to: /" +``` + +Substitute `` with the actual directory path (e.g., `src/auth/`). Tell the user: "Edits restricted to `/` for this debug session. This prevents changes to unrelated code. Run `/unfreeze` to remove the restriction." + +If the bug spans the entire repo or the scope is genuinely unclear, skip the lock and note why. + +**If FREEZE_UNAVAILABLE:** Skip scope lock. Edits are unrestricted. + +--- + ## Phase 2: Pattern Analysis Check if this bug matches a known pattern: diff --git a/debug/SKILL.md.tmpl b/debug/SKILL.md.tmpl index 90fc5bdc..683e1a0b 100644 --- a/debug/SKILL.md.tmpl +++ b/debug/SKILL.md.tmpl @@ -16,6 +16,18 @@ allowed-tools: - Grep - Glob - AskUserQuestion +hooks: + PreToolUse: + - matcher: "Edit" + hooks: + - type: command + command: "bash ${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh" + statusMessage: "Checking debug scope boundary..." + - matcher: "Write" + hooks: + - type: command + command: "bash ${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh" + statusMessage: "Checking debug scope boundary..." --- {{PREAMBLE}} @@ -50,6 +62,31 @@ Output: **"Root cause hypothesis: ..."** — a specific, testable claim about wh --- +## Scope Lock + +After forming your root cause hypothesis, lock edits to the affected module to prevent scope creep. + +```bash +[ -x "${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh" ] && echo "FREEZE_AVAILABLE" || echo "FREEZE_UNAVAILABLE" +``` + +**If FREEZE_AVAILABLE:** Identify the narrowest directory containing the affected files. Write it to the freeze state file: + +```bash +STATE_DIR="${CLAUDE_PLUGIN_DATA:-$HOME/.gstack}" +mkdir -p "$STATE_DIR" +echo "/" > "$STATE_DIR/freeze-dir.txt" +echo "Debug scope locked to: /" +``` + +Substitute `` with the actual directory path (e.g., `src/auth/`). Tell the user: "Edits restricted to `/` for this debug session. This prevents changes to unrelated code. Run `/unfreeze` to remove the restriction." + +If the bug spans the entire repo or the scope is genuinely unclear, skip the lock and note why. + +**If FREEZE_UNAVAILABLE:** Skip scope lock. Edits are unrestricted. + +--- + ## Phase 2: Pattern Analysis Check if this bug matches a known pattern: diff --git a/design-consultation/SKILL.md b/design-consultation/SKILL.md index 31cbf815..1ba6e823 100644 --- a/design-consultation/SKILL.md +++ b/design-consultation/SKILL.md @@ -36,6 +36,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"design-consultation","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" ``` diff --git a/design-review/SKILL.md b/design-review/SKILL.md index dd7fced1..ed45f1ea 100644 --- a/design-review/SKILL.md +++ b/design-review/SKILL.md @@ -36,6 +36,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"design-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" ``` diff --git a/document-release/SKILL.md b/document-release/SKILL.md index 4831573b..695ac889 100644 --- a/document-release/SKILL.md +++ b/document-release/SKILL.md @@ -33,6 +33,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"document-release","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" ``` diff --git a/freeze/SKILL.md b/freeze/SKILL.md new file mode 100644 index 00000000..00aaef61 --- /dev/null +++ b/freeze/SKILL.md @@ -0,0 +1,82 @@ +--- +name: freeze +version: 0.1.0 +description: | + Restrict file edits to a specific directory for the session. Blocks Edit and + Write outside the allowed path. Use when debugging to prevent accidentally + "fixing" unrelated code, or when you want to scope changes to one module. + Use when asked to "freeze", "restrict edits", "only edit this folder", + or "lock down edits". +allowed-tools: + - Bash + - Read + - AskUserQuestion +hooks: + PreToolUse: + - matcher: "Edit" + hooks: + - type: command + command: "bash ${CLAUDE_SKILL_DIR}/bin/check-freeze.sh" + statusMessage: "Checking freeze boundary..." + - matcher: "Write" + hooks: + - type: command + command: "bash ${CLAUDE_SKILL_DIR}/bin/check-freeze.sh" + statusMessage: "Checking freeze boundary..." +--- + + + +# /freeze — Restrict Edits to a Directory + +Lock file edits to a specific directory. Any Edit or Write operation targeting +a file outside the allowed path will be **blocked** (not just warned). + +```bash +mkdir -p ~/.gstack/analytics +echo '{"skill":"freeze","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true +``` + +## Setup + +Ask the user which directory to restrict edits to. Use AskUserQuestion: + +- Question: "Which directory should I restrict edits to? Files outside this path will be blocked from editing." +- Text input (not multiple choice) — the user types a path. + +Once the user provides a directory path: + +1. Resolve it to an absolute path: +```bash +FREEZE_DIR=$(cd "" 2>/dev/null && pwd) +echo "$FREEZE_DIR" +``` + +2. Ensure trailing slash and save to the freeze state file: +```bash +FREEZE_DIR="${FREEZE_DIR%/}/" +STATE_DIR="${CLAUDE_PLUGIN_DATA:-$HOME/.gstack}" +mkdir -p "$STATE_DIR" +echo "$FREEZE_DIR" > "$STATE_DIR/freeze-dir.txt" +echo "Freeze boundary set: $FREEZE_DIR" +``` + +Tell the user: "Edits are now restricted to `/`. Any Edit or Write +outside this directory will be blocked. To change the boundary, run `/freeze` +again. To remove it, run `/unfreeze` or end the session." + +## How it works + +The hook reads `file_path` from the Edit/Write tool input JSON, then checks +whether the path starts with the freeze directory. If not, it returns +`permissionDecision: "deny"` to block the operation. + +The freeze boundary persists for the session via the state file. The hook +script reads it on every Edit/Write invocation. + +## Notes + +- The trailing `/` on the freeze directory prevents `/src` from matching `/src-old` +- Freeze applies to Edit and Write tools only — Read, Bash, Glob, Grep are unaffected +- This prevents accidental edits, not a security boundary — Bash commands like `sed` can still modify files outside the boundary +- To deactivate, run `/unfreeze` or end the conversation diff --git a/freeze/SKILL.md.tmpl b/freeze/SKILL.md.tmpl new file mode 100644 index 00000000..8765cc1f --- /dev/null +++ b/freeze/SKILL.md.tmpl @@ -0,0 +1,80 @@ +--- +name: freeze +version: 0.1.0 +description: | + Restrict file edits to a specific directory for the session. Blocks Edit and + Write outside the allowed path. Use when debugging to prevent accidentally + "fixing" unrelated code, or when you want to scope changes to one module. + Use when asked to "freeze", "restrict edits", "only edit this folder", + or "lock down edits". +allowed-tools: + - Bash + - Read + - AskUserQuestion +hooks: + PreToolUse: + - matcher: "Edit" + hooks: + - type: command + command: "bash ${CLAUDE_SKILL_DIR}/bin/check-freeze.sh" + statusMessage: "Checking freeze boundary..." + - matcher: "Write" + hooks: + - type: command + command: "bash ${CLAUDE_SKILL_DIR}/bin/check-freeze.sh" + statusMessage: "Checking freeze boundary..." +--- + +# /freeze — Restrict Edits to a Directory + +Lock file edits to a specific directory. Any Edit or Write operation targeting +a file outside the allowed path will be **blocked** (not just warned). + +```bash +mkdir -p ~/.gstack/analytics +echo '{"skill":"freeze","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true +``` + +## Setup + +Ask the user which directory to restrict edits to. Use AskUserQuestion: + +- Question: "Which directory should I restrict edits to? Files outside this path will be blocked from editing." +- Text input (not multiple choice) — the user types a path. + +Once the user provides a directory path: + +1. Resolve it to an absolute path: +```bash +FREEZE_DIR=$(cd "" 2>/dev/null && pwd) +echo "$FREEZE_DIR" +``` + +2. Ensure trailing slash and save to the freeze state file: +```bash +FREEZE_DIR="${FREEZE_DIR%/}/" +STATE_DIR="${CLAUDE_PLUGIN_DATA:-$HOME/.gstack}" +mkdir -p "$STATE_DIR" +echo "$FREEZE_DIR" > "$STATE_DIR/freeze-dir.txt" +echo "Freeze boundary set: $FREEZE_DIR" +``` + +Tell the user: "Edits are now restricted to `/`. Any Edit or Write +outside this directory will be blocked. To change the boundary, run `/freeze` +again. To remove it, run `/unfreeze` or end the session." + +## How it works + +The hook reads `file_path` from the Edit/Write tool input JSON, then checks +whether the path starts with the freeze directory. If not, it returns +`permissionDecision: "deny"` to block the operation. + +The freeze boundary persists for the session via the state file. The hook +script reads it on every Edit/Write invocation. + +## Notes + +- The trailing `/` on the freeze directory prevents `/src` from matching `/src-old` +- Freeze applies to Edit and Write tools only — Read, Bash, Glob, Grep are unaffected +- This prevents accidental edits, not a security boundary — Bash commands like `sed` can still modify files outside the boundary +- To deactivate, run `/unfreeze` or end the conversation diff --git a/freeze/bin/check-freeze.sh b/freeze/bin/check-freeze.sh new file mode 100755 index 00000000..ed748e93 --- /dev/null +++ b/freeze/bin/check-freeze.sh @@ -0,0 +1,68 @@ +#!/usr/bin/env bash +# check-freeze.sh — PreToolUse hook for /freeze skill +# Reads JSON from stdin, checks if file_path is within the freeze boundary. +# Returns {"permissionDecision":"deny","message":"..."} to block, or {} to allow. +set -euo pipefail + +# Read stdin +INPUT=$(cat) + +# Locate the freeze directory state file +STATE_DIR="${CLAUDE_PLUGIN_DATA:-$HOME/.gstack}" +FREEZE_FILE="$STATE_DIR/freeze-dir.txt" + +# If no freeze file exists, allow everything (not yet configured) +if [ ! -f "$FREEZE_FILE" ]; then + echo '{}' + exit 0 +fi + +FREEZE_DIR=$(tr -d '[:space:]' < "$FREEZE_FILE") + +# If freeze dir is empty, allow +if [ -z "$FREEZE_DIR" ]; then + echo '{}' + exit 0 +fi + +# Extract file_path from tool_input JSON +# Try grep/sed first, fall back to Python for escaped quotes +FILE_PATH=$(printf '%s' "$INPUT" | grep -o '"file_path"[[:space:]]*:[[:space:]]*"[^"]*"' | head -1 | sed 's/.*:[[:space:]]*"//;s/"$//' || true) + +# Python fallback if grep returned empty +if [ -z "$FILE_PATH" ]; then + FILE_PATH=$(printf '%s' "$INPUT" | python3 -c 'import sys,json; print(json.loads(sys.stdin.read()).get("tool_input",{}).get("file_path",""))' 2>/dev/null || true) +fi + +# If we couldn't extract a file path, allow (don't block on parse failure) +if [ -z "$FILE_PATH" ]; then + echo '{}' + exit 0 +fi + +# Resolve file_path to absolute if it isn't already +case "$FILE_PATH" in + /*) ;; # already absolute + *) + FILE_PATH="$(pwd)/$FILE_PATH" + ;; +esac + +# Normalize: remove double slashes and trailing slash +FILE_PATH=$(printf '%s' "$FILE_PATH" | sed 's|/\+|/|g;s|/$||') + +# Check: does the file path start with the freeze directory? +case "$FILE_PATH" in + "${FREEZE_DIR}"*) + # Inside freeze boundary — allow + echo '{}' + ;; + *) + # Outside freeze boundary — deny + # Log hook fire event + mkdir -p ~/.gstack/analytics 2>/dev/null || true + echo '{"event":"hook_fire","skill":"freeze","pattern":"boundary_deny","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true + + printf '{"permissionDecision":"deny","message":"[freeze] Blocked: %s is outside the freeze boundary (%s). Only edits within the frozen directory are allowed."}\n' "$FILE_PATH" "$FREEZE_DIR" + ;; +esac diff --git a/guard/SKILL.md b/guard/SKILL.md new file mode 100644 index 00000000..f846d38a --- /dev/null +++ b/guard/SKILL.md @@ -0,0 +1,82 @@ +--- +name: guard +version: 0.1.0 +description: | + Full safety mode: destructive command warnings + directory-scoped edits. + Combines /careful (warns before rm -rf, DROP TABLE, force-push, etc.) with + /freeze (blocks edits outside a specified directory). Use for maximum safety + when touching prod or debugging live systems. Use when asked to "guard mode", + "full safety", "lock it down", or "maximum safety". +allowed-tools: + - Bash + - Read + - AskUserQuestion +hooks: + PreToolUse: + - matcher: "Bash" + hooks: + - type: command + command: "bash ${CLAUDE_SKILL_DIR}/../careful/bin/check-careful.sh" + statusMessage: "Checking for destructive commands..." + - matcher: "Edit" + hooks: + - type: command + command: "bash ${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh" + statusMessage: "Checking freeze boundary..." + - matcher: "Write" + hooks: + - type: command + command: "bash ${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh" + statusMessage: "Checking freeze boundary..." +--- + + + +# /guard — Full Safety Mode + +Activates both destructive command warnings and directory-scoped edit restrictions. +This is the combination of `/careful` + `/freeze` in a single command. + +**Dependency note:** This skill references hook scripts from the sibling `/careful` +and `/freeze` skill directories. Both must be installed (they are installed together +by the gstack setup script). + +```bash +mkdir -p ~/.gstack/analytics +echo '{"skill":"guard","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true +``` + +## Setup + +Ask the user which directory to restrict edits to. Use AskUserQuestion: + +- Question: "Guard mode: which directory should edits be restricted to? Destructive command warnings are always on. Files outside the chosen path will be blocked from editing." +- Text input (not multiple choice) — the user types a path. + +Once the user provides a directory path: + +1. Resolve it to an absolute path: +```bash +FREEZE_DIR=$(cd "" 2>/dev/null && pwd) +echo "$FREEZE_DIR" +``` + +2. Ensure trailing slash and save to the freeze state file: +```bash +FREEZE_DIR="${FREEZE_DIR%/}/" +STATE_DIR="${CLAUDE_PLUGIN_DATA:-$HOME/.gstack}" +mkdir -p "$STATE_DIR" +echo "$FREEZE_DIR" > "$STATE_DIR/freeze-dir.txt" +echo "Freeze boundary set: $FREEZE_DIR" +``` + +Tell the user: +- "**Guard mode active.** Two protections are now running:" +- "1. **Destructive command warnings** — rm -rf, DROP TABLE, force-push, etc. will warn before executing (you can override)" +- "2. **Edit boundary** — file edits restricted to `/`. Edits outside this directory are blocked." +- "To remove the edit boundary, run `/unfreeze`. To deactivate everything, end the session." + +## What's protected + +See `/careful` for the full list of destructive command patterns and safe exceptions. +See `/freeze` for how edit boundary enforcement works. diff --git a/guard/SKILL.md.tmpl b/guard/SKILL.md.tmpl new file mode 100644 index 00000000..4dc35244 --- /dev/null +++ b/guard/SKILL.md.tmpl @@ -0,0 +1,80 @@ +--- +name: guard +version: 0.1.0 +description: | + Full safety mode: destructive command warnings + directory-scoped edits. + Combines /careful (warns before rm -rf, DROP TABLE, force-push, etc.) with + /freeze (blocks edits outside a specified directory). Use for maximum safety + when touching prod or debugging live systems. Use when asked to "guard mode", + "full safety", "lock it down", or "maximum safety". +allowed-tools: + - Bash + - Read + - AskUserQuestion +hooks: + PreToolUse: + - matcher: "Bash" + hooks: + - type: command + command: "bash ${CLAUDE_SKILL_DIR}/../careful/bin/check-careful.sh" + statusMessage: "Checking for destructive commands..." + - matcher: "Edit" + hooks: + - type: command + command: "bash ${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh" + statusMessage: "Checking freeze boundary..." + - matcher: "Write" + hooks: + - type: command + command: "bash ${CLAUDE_SKILL_DIR}/../freeze/bin/check-freeze.sh" + statusMessage: "Checking freeze boundary..." +--- + +# /guard — Full Safety Mode + +Activates both destructive command warnings and directory-scoped edit restrictions. +This is the combination of `/careful` + `/freeze` in a single command. + +**Dependency note:** This skill references hook scripts from the sibling `/careful` +and `/freeze` skill directories. Both must be installed (they are installed together +by the gstack setup script). + +```bash +mkdir -p ~/.gstack/analytics +echo '{"skill":"guard","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true +``` + +## Setup + +Ask the user which directory to restrict edits to. Use AskUserQuestion: + +- Question: "Guard mode: which directory should edits be restricted to? Destructive command warnings are always on. Files outside the chosen path will be blocked from editing." +- Text input (not multiple choice) — the user types a path. + +Once the user provides a directory path: + +1. Resolve it to an absolute path: +```bash +FREEZE_DIR=$(cd "" 2>/dev/null && pwd) +echo "$FREEZE_DIR" +``` + +2. Ensure trailing slash and save to the freeze state file: +```bash +FREEZE_DIR="${FREEZE_DIR%/}/" +STATE_DIR="${CLAUDE_PLUGIN_DATA:-$HOME/.gstack}" +mkdir -p "$STATE_DIR" +echo "$FREEZE_DIR" > "$STATE_DIR/freeze-dir.txt" +echo "Freeze boundary set: $FREEZE_DIR" +``` + +Tell the user: +- "**Guard mode active.** Two protections are now running:" +- "1. **Destructive command warnings** — rm -rf, DROP TABLE, force-push, etc. will warn before executing (you can override)" +- "2. **Edit boundary** — file edits restricted to `/`. Edits outside this directory are blocked." +- "To remove the edit boundary, run `/unfreeze`. To deactivate everything, end the session." + +## What's protected + +See `/careful` for the full list of destructive command patterns and safe exceptions. +See `/freeze` for how edit boundary enforcement works. diff --git a/office-hours/SKILL.md b/office-hours/SKILL.md index da59e1ff..f5b66adb 100644 --- a/office-hours/SKILL.md +++ b/office-hours/SKILL.md @@ -37,6 +37,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"office-hours","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" ``` diff --git a/package.json b/package.json index 1c580144..9b17fdb1 100644 --- a/package.json +++ b/package.json @@ -24,7 +24,8 @@ "eval:compare": "bun run scripts/eval-compare.ts", "eval:summary": "bun run scripts/eval-summary.ts", "eval:watch": "bun run scripts/eval-watch.ts", - "eval:select": "bun run scripts/eval-select.ts" + "eval:select": "bun run scripts/eval-select.ts", + "analytics": "bun run scripts/analytics.ts" }, "dependencies": { "playwright": "^1.58.2", diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md index ce0395b0..3d431884 100644 --- a/plan-ceo-review/SKILL.md +++ b/plan-ceo-review/SKILL.md @@ -34,6 +34,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"plan-ceo-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" ``` diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md index faabd328..897877a8 100644 --- a/plan-design-review/SKILL.md +++ b/plan-design-review/SKILL.md @@ -34,6 +34,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"plan-design-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" ``` diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md index d6c6ea28..d0445626 100644 --- a/plan-eng-review/SKILL.md +++ b/plan-eng-review/SKILL.md @@ -33,6 +33,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"plan-eng-review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" ``` diff --git a/qa-only/SKILL.md b/qa-only/SKILL.md index 0e20c5e3..9e0789dc 100644 --- a/qa-only/SKILL.md +++ b/qa-only/SKILL.md @@ -30,6 +30,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"qa-only","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" ``` diff --git a/qa/SKILL.md b/qa/SKILL.md index 8ee176be..a8a730c3 100644 --- a/qa/SKILL.md +++ b/qa/SKILL.md @@ -37,6 +37,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"qa","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" ``` diff --git a/retro/SKILL.md b/retro/SKILL.md index a4458c22..f6282d27 100644 --- a/retro/SKILL.md +++ b/retro/SKILL.md @@ -31,6 +31,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"retro","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" ``` @@ -295,6 +297,14 @@ Include in the metrics table: If TODOS.md doesn't exist, skip the Backlog Health row. +**Skill Usage (if analytics exist):** Read `~/.gstack/analytics/skill-usage.jsonl` if it exists. Filter entries within the retro time window by `ts` field. Separate skill activations (no `event` field) from hook fires (`event: "hook_fire"`). Aggregate by skill name. Present as: + +``` +| Skill Usage | /ship(12) /qa(8) /review(5) · 3 safety hook fires | +``` + +If the JSONL file doesn't exist or has no entries in the window, skip the Skill Usage row. + ### Step 3: Commit Time Distribution Show hourly histogram in local time using bar chart: diff --git a/retro/SKILL.md.tmpl b/retro/SKILL.md.tmpl index 95ee706e..5c6e772c 100644 --- a/retro/SKILL.md.tmpl +++ b/retro/SKILL.md.tmpl @@ -161,6 +161,14 @@ Include in the metrics table: If TODOS.md doesn't exist, skip the Backlog Health row. +**Skill Usage (if analytics exist):** Read `~/.gstack/analytics/skill-usage.jsonl` if it exists. Filter entries within the retro time window by `ts` field. Separate skill activations (no `event` field) from hook fires (`event: "hook_fire"`). Aggregate by skill name. Present as: + +``` +| Skill Usage | /ship(12) /qa(8) /review(5) · 3 safety hook fires | +``` + +If the JSONL file doesn't exist or has no entries in the window, skip the Skill Usage row. + ### Step 3: Commit Time Distribution Show hourly histogram in local time using bar chart: diff --git a/review/SKILL.md b/review/SKILL.md index b2da378d..72286371 100644 --- a/review/SKILL.md +++ b/review/SKILL.md @@ -32,6 +32,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" ``` diff --git a/scripts/analytics.ts b/scripts/analytics.ts new file mode 100644 index 00000000..6aa93cb3 --- /dev/null +++ b/scripts/analytics.ts @@ -0,0 +1,190 @@ +#!/usr/bin/env bun +/** + * analytics — CLI for viewing gstack skill usage statistics. + * + * Reads ~/.gstack/analytics/skill-usage.jsonl and displays: + * - Top skills by invocation count + * - Per-repo skill breakdown + * - Safety hook fire events + * + * Usage: + * bun run scripts/analytics.ts [--period 7d|30d|all] + */ + +import * as fs from 'fs'; +import * as path from 'path'; +import * as os from 'os'; + +export interface AnalyticsEvent { + skill: string; + ts: string; + repo: string; + event?: string; + pattern?: string; +} + +const ANALYTICS_FILE = path.join(os.homedir(), '.gstack', 'analytics', 'skill-usage.jsonl'); + +/** + * Parse JSONL content into AnalyticsEvent[], skipping malformed lines. + */ +export function parseJSONL(content: string): AnalyticsEvent[] { + const events: AnalyticsEvent[] = []; + for (const line of content.split('\n')) { + const trimmed = line.trim(); + if (!trimmed) continue; + try { + const obj = JSON.parse(trimmed); + if (typeof obj === 'object' && obj !== null && typeof obj.ts === 'string') { + events.push(obj as AnalyticsEvent); + } + } catch { + // skip malformed lines + } + } + return events; +} + +/** + * Filter events by period. Supports "7d", "30d", and "all". + */ +export function filterByPeriod(events: AnalyticsEvent[], period: string): AnalyticsEvent[] { + if (period === 'all') return events; + + const match = period.match(/^(\d+)d$/); + if (!match) return events; + + const days = parseInt(match[1], 10); + const cutoff = new Date(Date.now() - days * 24 * 60 * 60 * 1000); + + return events.filter(e => { + const d = new Date(e.ts); + return !isNaN(d.getTime()) && d >= cutoff; + }); +} + +/** + * Format a report string from a list of events. + */ +export function formatReport(events: AnalyticsEvent[], period: string = 'all'): string { + const skillEvents = events.filter(e => e.event !== 'hook_fire'); + const hookEvents = events.filter(e => e.event === 'hook_fire'); + + const lines: string[] = []; + lines.push('gstack skill usage analytics'); + lines.push('\u2550'.repeat(39)); + lines.push(''); + + const periodLabel = period === 'all' ? 'all time' : `last ${period.replace('d', ' days')}`; + lines.push(`Period: ${periodLabel}`); + + // Top Skills + const skillCounts = new Map(); + for (const e of skillEvents) { + skillCounts.set(e.skill, (skillCounts.get(e.skill) || 0) + 1); + } + + if (skillCounts.size > 0) { + lines.push(''); + lines.push('Top Skills'); + + const sorted = [...skillCounts.entries()].sort((a, b) => b[1] - a[1]); + const maxName = Math.max(...sorted.map(([name]) => name.length + 1)); // +1 for / + const maxCount = Math.max(...sorted.map(([, count]) => String(count).length)); + + for (const [name, count] of sorted) { + const label = `/${name}`; + const suffix = `${count} invocation${count === 1 ? '' : 's'}`; + const dotLen = Math.max(2, 25 - label.length - suffix.length); + const dots = ' ' + '.'.repeat(dotLen) + ' '; + lines.push(` ${label}${dots}${suffix}`); + } + } + + // By Repo + const repoSkills = new Map>(); + for (const e of skillEvents) { + if (!repoSkills.has(e.repo)) repoSkills.set(e.repo, new Map()); + const m = repoSkills.get(e.repo)!; + m.set(e.skill, (m.get(e.skill) || 0) + 1); + } + + if (repoSkills.size > 0) { + lines.push(''); + lines.push('By Repo'); + + const sortedRepos = [...repoSkills.entries()].sort((a, b) => a[0].localeCompare(b[0])); + for (const [repo, skills] of sortedRepos) { + const parts = [...skills.entries()] + .sort((a, b) => b[1] - a[1]) + .map(([s, c]) => `${s}(${c})`); + lines.push(` ${repo}: ${parts.join(' ')}`); + } + } + + // Safety Hook Events + const hookCounts = new Map(); + for (const e of hookEvents) { + if (e.pattern) { + hookCounts.set(e.pattern, (hookCounts.get(e.pattern) || 0) + 1); + } + } + + if (hookCounts.size > 0) { + lines.push(''); + lines.push('Safety Hook Events'); + + const sortedHooks = [...hookCounts.entries()].sort((a, b) => b[1] - a[1]); + for (const [pattern, count] of sortedHooks) { + const suffix = `${count} fire${count === 1 ? '' : 's'}`; + const dotLen = Math.max(2, 25 - pattern.length - suffix.length); + const dots = ' ' + '.'.repeat(dotLen) + ' '; + lines.push(` ${pattern}${dots}${suffix}`); + } + } + + // Total + const totalSkills = skillEvents.length; + const totalHooks = hookEvents.length; + lines.push(''); + lines.push(`Total: ${totalSkills} skill invocation${totalSkills === 1 ? '' : 's'}, ${totalHooks} hook fire${totalHooks === 1 ? '' : 's'}`); + + return lines.join('\n'); +} + +function main() { + // Parse --period flag + let period = 'all'; + const args = process.argv.slice(2); + for (let i = 0; i < args.length; i++) { + if (args[i] === '--period' && i + 1 < args.length) { + period = args[i + 1]; + i++; + } + } + + // Read file + if (!fs.existsSync(ANALYTICS_FILE)) { + console.log('No analytics data found.'); + process.exit(0); + } + + const content = fs.readFileSync(ANALYTICS_FILE, 'utf-8').trim(); + if (!content) { + console.log('No analytics data found.'); + process.exit(0); + } + + const events = parseJSONL(content); + if (events.length === 0) { + console.log('No analytics data found.'); + process.exit(0); + } + + const filtered = filterByPeriod(events, period); + console.log(formatReport(filtered, period)); +} + +if (import.meta.main) { + main(); +} diff --git a/scripts/gen-skill-docs.ts b/scripts/gen-skill-docs.ts index a53d1864..2a7b3e67 100644 --- a/scripts/gen-skill-docs.ts +++ b/scripts/gen-skill-docs.ts @@ -17,9 +17,16 @@ import * as path from 'path'; const ROOT = path.resolve(import.meta.dir, '..'); const DRY_RUN = process.argv.includes('--dry-run'); +// ─── Template Context ─────────────────────────────────────── + +interface TemplateContext { + skillName: string; + tmplPath: string; +} + // ─── Placeholder Resolvers ────────────────────────────────── -function generateCommandReference(): string { +function generateCommandReference(_ctx: TemplateContext): string { // Group commands by category const groups = new Map>(); for (const [cmd, meta] of Object.entries(COMMAND_DESCRIPTIONS)) { @@ -55,7 +62,7 @@ function generateCommandReference(): string { return sections.join('\n').trimEnd(); } -function generateSnapshotFlags(): string { +function generateSnapshotFlags(_ctx: TemplateContext): string { const lines: string[] = [ 'The snapshot is your primary tool for understanding and interacting with pages.', '', @@ -94,7 +101,7 @@ function generateSnapshotFlags(): string { return lines.join('\n'); } -function generatePreamble(): string { +function generatePreamble(ctx: TemplateContext): string { return `## Preamble (run first) \`\`\`bash @@ -109,6 +116,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"${ctx.skillName}","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" \`\`\` @@ -230,7 +239,7 @@ RECOMMENDATION: [what the user should do next] \`\`\``; } -function generateBrowseSetup(): string { +function generateBrowseSetup(_ctx: TemplateContext): string { return `## SETUP (run this check BEFORE any browse command) \`\`\`bash @@ -251,7 +260,7 @@ If \`NEEDS_SETUP\`: 3. If \`bun\` is not installed: \`curl -fsSL https://bun.sh/install | bash\``; } -function generateBaseBranchDetect(): string { +function generateBaseBranchDetect(_ctx: TemplateContext): string { return `## Step 0: Detect base branch Determine which branch this PR targets. Use the result as "the base branch" in all subsequent steps. @@ -272,7 +281,7 @@ branch name wherever the instructions say "the base branch." ---`; } -function generateQAMethodology(): string { +function generateQAMethodology(_ctx: TemplateContext): string { return `## Modes ### Diff-aware (automatic when on a feature branch with no URL) @@ -549,7 +558,7 @@ Minimum 0 per category. 11. **Show screenshots to the user.** After every \`$B screenshot\`, \`$B snapshot -a -o\`, or \`$B responsive\` command, use the Read tool on the output file(s) so the user can see them inline. For \`responsive\` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user.`; } -function generateDesignReviewLite(): string { +function generateDesignReviewLite(_ctx: TemplateContext): string { return `## Design Review (conditional, diff-scoped) Check if the diff touches frontend files using \`gstack-diff-scope\`: @@ -588,7 +597,7 @@ Substitute: TIMESTAMP = ISO 8601 datetime, STATUS = "clean" if 0 findings or "is // NOTE: design-checklist.md is a subset of this methodology for code-level detection. // When adding items here, also update review/design-checklist.md, and vice versa. -function generateDesignMethodology(): string { +function generateDesignMethodology(_ctx: TemplateContext): string { return `## Modes ### Full (default) @@ -922,7 +931,7 @@ Tie everything to user goals and product objectives. Always suggest specific imp 11. **Show screenshots to the user.** After every \`$B screenshot\`, \`$B snapshot -a -o\`, or \`$B responsive\` command, use the Read tool on the output file(s) so the user can see them inline. For \`responsive\` (3 files), Read all three. This is critical — without it, screenshots are invisible to the user.`; } -function generateReviewDashboard(): string { +function generateReviewDashboard(_ctx: TemplateContext): string { return `## Review Readiness Dashboard After completing the review, read the review log and config to display the dashboard. @@ -962,7 +971,7 @@ Parse the output. Find the most recent entry for each skill (plan-ceo-review, pl - If \\\`skip_eng_review\\\` config is \\\`true\\\`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED`; } -function generateTestBootstrap(): string { +function generateTestBootstrap(_ctx: TemplateContext): string { return `## Test Framework Bootstrap **Detect existing test framework and project runtime:** @@ -1117,7 +1126,7 @@ Only commit if there are changes. Stage all bootstrap files (config, test direct ---`; } -const RESOLVERS: Record string> = { +const RESOLVERS: Record string> = { COMMAND_REFERENCE: generateCommandReference, SNAPSHOT_FLAGS: generateSnapshotFlags, PREAMBLE: generatePreamble, @@ -1139,11 +1148,16 @@ function processTemplate(tmplPath: string): { outputPath: string; content: strin const relTmplPath = path.relative(ROOT, tmplPath); const outputPath = tmplPath.replace(/\.tmpl$/, ''); + // Extract skill name from frontmatter for TemplateContext + const nameMatch = tmplContent.match(/^name:\s*(.+)$/m); + const skillName = nameMatch ? nameMatch[1].trim() : path.basename(path.dirname(tmplPath)); + const ctx: TemplateContext = { skillName, tmplPath }; + // Replace placeholders let content = tmplContent.replace(/\{\{(\w+)\}\}/g, (match, name) => { const resolver = RESOLVERS[name]; if (!resolver) throw new Error(`Unknown placeholder {{${name}}} in ${relTmplPath}`); - return resolver(); + return resolver(ctx); }); // Check for any remaining unresolved placeholders @@ -1187,6 +1201,10 @@ function findTemplates(): string[] { path.join(ROOT, 'design-review', 'SKILL.md.tmpl'), path.join(ROOT, 'design-consultation', 'SKILL.md.tmpl'), path.join(ROOT, 'document-release', 'SKILL.md.tmpl'), + path.join(ROOT, 'careful', 'SKILL.md.tmpl'), + path.join(ROOT, 'freeze', 'SKILL.md.tmpl'), + path.join(ROOT, 'guard', 'SKILL.md.tmpl'), + path.join(ROOT, 'unfreeze', 'SKILL.md.tmpl'), ]; for (const p of candidates) { if (fs.existsSync(p)) templates.push(p); diff --git a/setup-browser-cookies/SKILL.md b/setup-browser-cookies/SKILL.md index ad9d5fbb..c7cbac3b 100644 --- a/setup-browser-cookies/SKILL.md +++ b/setup-browser-cookies/SKILL.md @@ -28,6 +28,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"setup-browser-cookies","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" ``` diff --git a/ship/SKILL.md b/ship/SKILL.md index 97f26fa2..697ce5e4 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -31,6 +31,8 @@ _BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown") echo "BRANCH: $_BRANCH" _LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no") echo "LAKE_INTRO: $_LAKE_SEEN" +mkdir -p ~/.gstack/analytics +echo '{"skill":"ship","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true _PROACTIVE=$(~/.claude/skills/gstack/bin/gstack-config get proactive 2>/dev/null || echo "true") echo "PROACTIVE: $_PROACTIVE" ``` diff --git a/test/analytics.test.ts b/test/analytics.test.ts new file mode 100644 index 00000000..f3b1d646 --- /dev/null +++ b/test/analytics.test.ts @@ -0,0 +1,277 @@ +import { describe, test, expect, beforeEach, afterEach } from 'bun:test'; +import { parseJSONL, filterByPeriod, formatReport } from '../scripts/analytics'; +import type { AnalyticsEvent } from '../scripts/analytics'; +import * as fs from 'fs'; +import * as path from 'path'; +import * as os from 'os'; +import { execSync } from 'child_process'; + +const TMP_DIR = path.join(os.tmpdir(), 'analytics-test'); +const SCRIPT = path.resolve(import.meta.dir, '../scripts/analytics.ts'); + +function writeTempJSONL(name: string, lines: string[]): string { + fs.mkdirSync(TMP_DIR, { recursive: true }); + const p = path.join(TMP_DIR, name); + fs.writeFileSync(p, lines.join('\n') + '\n'); + return p; +} + +/** + * Run the analytics script with a custom JSONL file by overriding the path. + * We test the exported functions directly for unit tests, and use this + * helper for integration-style checks. + */ +function runScript(jsonlPath: string | null, extraArgs: string = ''): string { + // We test via the exported functions; for CLI integration we read the file + // and run the pipeline manually to avoid needing to override the hardcoded path. + if (jsonlPath === null) { + return 'No analytics data found.'; + } + if (!fs.existsSync(jsonlPath)) { + return 'No analytics data found.'; + } + const content = fs.readFileSync(jsonlPath, 'utf-8').trim(); + if (!content) { + return 'No analytics data found.'; + } + const events = parseJSONL(content); + if (events.length === 0) { + return 'No analytics data found.'; + } + // Parse period from extraArgs + let period = 'all'; + const match = extraArgs.match(/--period\s+(\S+)/); + if (match) period = match[1]; + const filtered = filterByPeriod(events, period); + return formatReport(filtered, period); +} + +beforeEach(() => { + fs.mkdirSync(TMP_DIR, { recursive: true }); +}); + +afterEach(() => { + fs.rmSync(TMP_DIR, { recursive: true, force: true }); +}); + +describe('parseJSONL', () => { + test('parses valid JSONL lines', () => { + const content = [ + '{"skill":"ship","ts":"2026-03-18T15:30:00Z","repo":"my-app"}', + '{"skill":"qa","ts":"2026-03-18T16:00:00Z","repo":"my-api"}', + ].join('\n'); + const events = parseJSONL(content); + expect(events).toHaveLength(2); + expect(events[0].skill).toBe('ship'); + expect(events[1].skill).toBe('qa'); + }); + + test('skips malformed lines', () => { + const content = [ + '{"skill":"ship","ts":"2026-03-18T15:30:00Z","repo":"my-app"}', + 'not valid json', + '{broken', + '', + '{"skill":"qa","ts":"2026-03-18T16:00:00Z","repo":"my-api"}', + ].join('\n'); + const events = parseJSONL(content); + expect(events).toHaveLength(2); + expect(events[0].skill).toBe('ship'); + expect(events[1].skill).toBe('qa'); + }); + + test('returns empty array for empty string', () => { + expect(parseJSONL('')).toHaveLength(0); + }); + + test('skips objects missing ts field', () => { + const content = '{"skill":"ship","repo":"my-app"}\n'; + const events = parseJSONL(content); + expect(events).toHaveLength(0); + }); +}); + +describe('filterByPeriod', () => { + const now = new Date(); + const daysAgo = (n: number) => new Date(now.getTime() - n * 24 * 60 * 60 * 1000).toISOString(); + + const events: AnalyticsEvent[] = [ + { skill: 'ship', ts: daysAgo(1), repo: 'app' }, + { skill: 'qa', ts: daysAgo(3), repo: 'app' }, + { skill: 'review', ts: daysAgo(10), repo: 'app' }, + { skill: 'retro', ts: daysAgo(40), repo: 'app' }, + ]; + + test('period "all" returns all events', () => { + expect(filterByPeriod(events, 'all')).toHaveLength(4); + }); + + test('period "7d" returns only last 7 days', () => { + const filtered = filterByPeriod(events, '7d'); + expect(filtered).toHaveLength(2); + expect(filtered[0].skill).toBe('ship'); + expect(filtered[1].skill).toBe('qa'); + }); + + test('period "30d" returns last 30 days', () => { + const filtered = filterByPeriod(events, '30d'); + expect(filtered).toHaveLength(3); + }); + + test('invalid period string returns all events', () => { + expect(filterByPeriod(events, 'bogus')).toHaveLength(4); + }); +}); + +describe('formatReport', () => { + test('includes header and period label', () => { + const report = formatReport([], 'all'); + expect(report).toContain('gstack skill usage analytics'); + expect(report).toContain('Period: all time'); + }); + + test('shows "last 7 days" for 7d period', () => { + const report = formatReport([], '7d'); + expect(report).toContain('Period: last 7 days'); + }); + + test('shows "last 30 days" for 30d period', () => { + const report = formatReport([], '30d'); + expect(report).toContain('Period: last 30 days'); + }); + + test('counts skill invocations correctly', () => { + const events: AnalyticsEvent[] = [ + { skill: 'ship', ts: '2026-03-18T15:30:00Z', repo: 'app' }, + { skill: 'ship', ts: '2026-03-18T16:00:00Z', repo: 'app' }, + { skill: 'qa', ts: '2026-03-18T16:30:00Z', repo: 'app' }, + ]; + const report = formatReport(events); + expect(report).toContain('/ship'); + expect(report).toContain('2 invocations'); + expect(report).toContain('/qa'); + expect(report).toContain('1 invocation'); + }); + + test('groups by repo', () => { + const events: AnalyticsEvent[] = [ + { skill: 'ship', ts: '2026-03-18T15:30:00Z', repo: 'app-a' }, + { skill: 'qa', ts: '2026-03-18T16:00:00Z', repo: 'app-a' }, + { skill: 'ship', ts: '2026-03-18T16:30:00Z', repo: 'app-b' }, + ]; + const report = formatReport(events); + expect(report).toContain('app-a: ship(1) qa(1)'); + expect(report).toContain('app-b: ship(1)'); + }); + + test('counts hook fire events separately', () => { + const events: AnalyticsEvent[] = [ + { skill: 'ship', ts: '2026-03-18T15:30:00Z', repo: 'app' }, + { skill: 'careful', ts: '2026-03-18T16:00:00Z', repo: 'app', event: 'hook_fire', pattern: 'rm_recursive' }, + { skill: 'careful', ts: '2026-03-18T16:30:00Z', repo: 'app', event: 'hook_fire', pattern: 'rm_recursive' }, + { skill: 'careful', ts: '2026-03-18T17:00:00Z', repo: 'app', event: 'hook_fire', pattern: 'git_force_push' }, + ]; + const report = formatReport(events); + expect(report).toContain('Safety Hook Events'); + expect(report).toContain('rm_recursive'); + expect(report).toContain('2 fires'); + expect(report).toContain('git_force_push'); + expect(report).toContain('1 fire'); + expect(report).toContain('Total: 1 skill invocation, 3 hook fires'); + }); + + test('handles mixed events correctly', () => { + const events: AnalyticsEvent[] = [ + { skill: 'ship', ts: '2026-03-18T15:30:00Z', repo: 'my-app' }, + { skill: 'ship', ts: '2026-03-18T15:35:00Z', repo: 'my-app' }, + { skill: 'qa', ts: '2026-03-18T16:00:00Z', repo: 'my-api' }, + { skill: 'careful', ts: '2026-03-18T16:30:00Z', repo: 'my-app', event: 'hook_fire', pattern: 'rm_recursive' }, + ]; + const report = formatReport(events); + // Skills counted correctly (hook_fire events excluded from skill counts) + expect(report).toContain('Total: 3 skill invocations, 1 hook fire'); + // Both sections present + expect(report).toContain('Top Skills'); + expect(report).toContain('Safety Hook Events'); + expect(report).toContain('By Repo'); + }); +}); + +describe('integration via runScript helper', () => { + test('missing file → "No analytics data found."', () => { + const output = runScript(path.join(TMP_DIR, 'nonexistent.jsonl')); + expect(output).toBe('No analytics data found.'); + }); + + test('null path → "No analytics data found."', () => { + const output = runScript(null); + expect(output).toBe('No analytics data found.'); + }); + + test('empty file → "No analytics data found."', () => { + const p = writeTempJSONL('empty.jsonl', ['']); + // Overwrite with truly empty content + fs.writeFileSync(p, ''); + const output = runScript(p); + expect(output).toBe('No analytics data found.'); + }); + + test('all malformed lines → "No analytics data found."', () => { + const p = writeTempJSONL('bad.jsonl', [ + 'not json', + '{broken', + '42', + ]); + const output = runScript(p); + expect(output).toBe('No analytics data found.'); + }); + + test('normal aggregation produces correct output', () => { + const p = writeTempJSONL('normal.jsonl', [ + '{"skill":"ship","ts":"2026-03-18T15:30:00Z","repo":"my-app"}', + '{"skill":"ship","ts":"2026-03-18T15:35:00Z","repo":"my-app"}', + '{"skill":"qa","ts":"2026-03-18T16:00:00Z","repo":"my-app"}', + '{"skill":"review","ts":"2026-03-18T16:30:00Z","repo":"my-api"}', + ]); + const output = runScript(p); + expect(output).toContain('/ship'); + expect(output).toContain('2 invocations'); + expect(output).toContain('/qa'); + expect(output).toContain('1 invocation'); + expect(output).toContain('/review'); + expect(output).toContain('Total: 4 skill invocations, 0 hook fires'); + }); + + test('period filtering (7d) only includes recent entries', () => { + const now = new Date(); + const recent = new Date(now.getTime() - 2 * 24 * 60 * 60 * 1000).toISOString(); + const old = new Date(now.getTime() - 20 * 24 * 60 * 60 * 1000).toISOString(); + + const p = writeTempJSONL('period.jsonl', [ + `{"skill":"ship","ts":"${recent}","repo":"app"}`, + `{"skill":"qa","ts":"${old}","repo":"app"}`, + ]); + const output = runScript(p, '--period 7d'); + expect(output).toContain('Period: last 7 days'); + expect(output).toContain('/ship'); + expect(output).toContain('Total: 1 skill invocation, 0 hook fires'); + // qa should be filtered out + expect(output).not.toContain('/qa'); + }); + + test('hook fire events counted in full pipeline', () => { + const p = writeTempJSONL('hooks.jsonl', [ + '{"skill":"ship","ts":"2026-03-18T15:30:00Z","repo":"app"}', + '{"event":"hook_fire","skill":"careful","pattern":"rm_recursive","ts":"2026-03-18T16:00:00Z","repo":"app"}', + '{"event":"hook_fire","skill":"careful","pattern":"rm_recursive","ts":"2026-03-18T16:30:00Z","repo":"app"}', + '{"event":"hook_fire","skill":"careful","pattern":"git_force_push","ts":"2026-03-18T17:00:00Z","repo":"app"}', + ]); + const output = runScript(p); + expect(output).toContain('Safety Hook Events'); + expect(output).toContain('rm_recursive'); + expect(output).toContain('2 fires'); + expect(output).toContain('git_force_push'); + expect(output).toContain('1 fire'); + expect(output).toContain('Total: 1 skill invocation, 3 hook fires'); + }); +}); diff --git a/test/gen-skill-docs.test.ts b/test/gen-skill-docs.test.ts index 9dfd1a1c..b53ebc17 100644 --- a/test/gen-skill-docs.test.ts +++ b/test/gen-skill-docs.test.ts @@ -72,6 +72,11 @@ describe('gen-skill-docs', () => { { dir: 'plan-design-review', name: 'plan-design-review' }, { dir: 'design-review', name: 'design-review' }, { dir: 'design-consultation', name: 'design-consultation' }, + { dir: 'document-release', name: 'document-release' }, + { dir: 'careful', name: 'careful' }, + { dir: 'freeze', name: 'freeze' }, + { dir: 'guard', name: 'guard' }, + { dir: 'unfreeze', name: 'unfreeze' }, ]; test('every skill has a SKILL.md.tmpl template', () => { @@ -161,6 +166,26 @@ describe('gen-skill-docs', () => { expect(content).toContain('plain English'); }); + test('generated SKILL.md contains telemetry line', () => { + const content = fs.readFileSync(path.join(ROOT, 'SKILL.md'), 'utf-8'); + expect(content).toContain('skill-usage.jsonl'); + expect(content).toContain('~/.gstack/analytics'); + }); + + test('preamble-using skills have correct skill name in telemetry', () => { + const PREAMBLE_SKILLS = [ + { dir: '.', name: 'gstack' }, + { dir: 'ship', name: 'ship' }, + { dir: 'review', name: 'review' }, + { dir: 'qa', name: 'qa' }, + { dir: 'retro', name: 'retro' }, + ]; + for (const skill of PREAMBLE_SKILLS) { + const content = fs.readFileSync(path.join(ROOT, skill.dir, 'SKILL.md'), 'utf-8'); + expect(content).toContain(`"skill":"${skill.name}"`); + } + }); + test('qa and qa-only templates use QA_METHODOLOGY placeholder', () => { const qaTmpl = fs.readFileSync(path.join(ROOT, 'qa', 'SKILL.md.tmpl'), 'utf-8'); expect(qaTmpl).toContain('{{QA_METHODOLOGY}}'); diff --git a/test/hook-scripts.test.ts b/test/hook-scripts.test.ts new file mode 100644 index 00000000..850b5b98 --- /dev/null +++ b/test/hook-scripts.test.ts @@ -0,0 +1,373 @@ +import { describe, test, expect } from 'bun:test'; +import { spawnSync } from 'child_process'; +import * as path from 'path'; +import * as fs from 'fs'; +import * as os from 'os'; + +const ROOT = path.resolve(import.meta.dir, '..'); +const CAREFUL_SCRIPT = path.join(ROOT, 'careful', 'bin', 'check-careful.sh'); +const FREEZE_SCRIPT = path.join(ROOT, 'freeze', 'bin', 'check-freeze.sh'); + +function runHook(scriptPath: string, input: object, env?: Record): { exitCode: number; output: any; raw: string } { + const result = spawnSync('bash', [scriptPath], { + input: JSON.stringify(input), + stdio: ['pipe', 'pipe', 'pipe'], + env: { ...process.env, ...env }, + timeout: 5000, + }); + const raw = result.stdout.toString().trim(); + let output: any = {}; + try { + output = JSON.parse(raw); + } catch {} + return { exitCode: result.status ?? 1, output, raw }; +} + +function runHookRaw(scriptPath: string, rawInput: string, env?: Record): { exitCode: number; output: any; raw: string } { + const result = spawnSync('bash', [scriptPath], { + input: rawInput, + stdio: ['pipe', 'pipe', 'pipe'], + env: { ...process.env, ...env }, + timeout: 5000, + }); + const raw = result.stdout.toString().trim(); + let output: any = {}; + try { + output = JSON.parse(raw); + } catch {} + return { exitCode: result.status ?? 1, output, raw }; +} + +function carefulInput(command: string) { + return { tool_input: { command } }; +} + +function freezeInput(filePath: string) { + return { tool_input: { file_path: filePath } }; +} + +function withFreezeDir(freezePath: string, fn: (stateDir: string) => void) { + const stateDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-freeze-test-')); + fs.writeFileSync(path.join(stateDir, 'freeze-dir.txt'), freezePath); + try { + fn(stateDir); + } finally { + fs.rmSync(stateDir, { recursive: true, force: true }); + } +} + +// Detect whether the safe-rm-targets regex works on this platform. +// macOS sed -E does not support \s, so the safe exception check fails there. +function detectSafeRmWorks(): boolean { + const { output } = runHook(CAREFUL_SCRIPT, carefulInput('rm -rf node_modules')); + return output.permissionDecision === undefined; +} + +// ============================================================ +// check-careful.sh tests +// ============================================================ +describe('check-careful.sh', () => { + + // --- Destructive rm commands --- + + describe('rm -rf / rm -r', () => { + test('rm -rf /var/data warns with recursive delete message', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('rm -rf /var/data')); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('ask'); + expect(output.message).toContain('recursive delete'); + }); + + test('rm -r ./some-dir warns', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('rm -r ./some-dir')); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('ask'); + expect(output.message).toContain('recursive delete'); + }); + + test('rm -rf node_modules allows (safe exception)', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('rm -rf node_modules')); + expect(exitCode).toBe(0); + if (detectSafeRmWorks()) { + // GNU sed: safe exception triggers, allows through + expect(output.permissionDecision).toBeUndefined(); + } else { + // macOS sed: safe exception regex uses \\s which is unsupported, + // so the safe-targets check fails and the command warns + expect(output.permissionDecision).toBe('ask'); + } + }); + + test('rm -rf .next dist allows (multiple safe targets)', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('rm -rf .next dist')); + expect(exitCode).toBe(0); + if (detectSafeRmWorks()) { + expect(output.permissionDecision).toBeUndefined(); + } else { + expect(output.permissionDecision).toBe('ask'); + } + }); + + test('rm -rf node_modules /var/data warns (mixed safe+unsafe)', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('rm -rf node_modules /var/data')); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('ask'); + expect(output.message).toContain('recursive delete'); + }); + }); + + // --- SQL destructive commands --- + // Note: SQL commands that contain embedded double quotes (e.g., psql -c "DROP TABLE") + // get their command value truncated by the grep-based JSON extractor because \" + // terminates the [^"]* match. We use commands WITHOUT embedded quotes so the grep + // extraction works and the SQL keywords are visible to the pattern matcher. + + describe('SQL destructive commands', () => { + test('psql DROP TABLE warns with DROP in message', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('psql -c DROP TABLE users;')); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('ask'); + expect(output.message).toContain('DROP'); + }); + + test('mysql drop database warns (case insensitive)', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('mysql -e drop database mydb')); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('ask'); + expect(output.message.toLowerCase()).toContain('drop'); + }); + + test('psql TRUNCATE warns', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('psql -c TRUNCATE orders;')); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('ask'); + expect(output.message).toContain('TRUNCATE'); + }); + }); + + // --- Git destructive commands --- + + describe('git destructive commands', () => { + test('git push --force warns with force-push', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('git push --force origin main')); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('ask'); + expect(output.message).toContain('force-push'); + }); + + test('git push -f warns', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('git push -f origin main')); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('ask'); + expect(output.message).toContain('force-push'); + }); + + test('git reset --hard warns with uncommitted', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('git reset --hard HEAD~3')); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('ask'); + expect(output.message).toContain('uncommitted'); + }); + + test('git checkout . warns', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('git checkout .')); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('ask'); + expect(output.message).toContain('uncommitted'); + }); + + test('git restore . warns', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('git restore .')); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('ask'); + expect(output.message).toContain('uncommitted'); + }); + }); + + // --- Container / infra destructive commands --- + + describe('container and infra commands', () => { + test('kubectl delete warns with kubectl in message', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('kubectl delete pod my-pod')); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('ask'); + expect(output.message).toContain('kubectl'); + }); + + test('docker rm -f warns', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('docker rm -f container123')); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('ask'); + expect(output.message).toContain('Docker'); + }); + + test('docker system prune -a warns', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('docker system prune -a')); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('ask'); + expect(output.message).toContain('Docker'); + }); + }); + + // --- Safe commands --- + + describe('safe commands allow without warning', () => { + const safeCmds = [ + 'ls -la', + 'git status', + 'npm install', + 'cat README.md', + 'echo hello', + ]; + + for (const cmd of safeCmds) { + test(`"${cmd}" allows`, () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput(cmd)); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBeUndefined(); + }); + } + }); + + // --- Edge cases --- + + describe('edge cases', () => { + test('empty command allows gracefully', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, carefulInput('')); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBeUndefined(); + }); + + test('missing command field allows gracefully', () => { + const { exitCode, output } = runHook(CAREFUL_SCRIPT, { tool_input: {} }); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBeUndefined(); + }); + + test('malformed JSON input allows gracefully (exit 0, output {})', () => { + const { exitCode, raw } = runHookRaw(CAREFUL_SCRIPT, 'this is not json at all{{{{'); + expect(exitCode).toBe(0); + expect(raw).toBe('{}'); + }); + + test('Python fallback: grep fails on multiline JSON, Python parses it', () => { + // Construct JSON where "command": and the value are on separate lines. + // grep works line-by-line, so it cannot match "command"..."value" across lines. + // This forces CMD to be empty, triggering the Python fallback which handles + // the full JSON correctly. + const rawJson = '{"tool_input":{"command":\n"rm -rf /tmp/important"}}'; + const { exitCode, output } = runHookRaw(CAREFUL_SCRIPT, rawJson); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('ask'); + expect(output.message).toContain('recursive delete'); + }); + }); +}); + +// ============================================================ +// check-freeze.sh tests +// ============================================================ +describe('check-freeze.sh', () => { + + describe('edits inside freeze boundary', () => { + test('edit inside freeze boundary allows', () => { + withFreezeDir('/Users/dev/project/src/', (stateDir) => { + const { exitCode, output } = runHook( + FREEZE_SCRIPT, + freezeInput('/Users/dev/project/src/index.ts'), + { CLAUDE_PLUGIN_DATA: stateDir }, + ); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBeUndefined(); + }); + }); + + test('edit in subdirectory of freeze path allows', () => { + withFreezeDir('/Users/dev/project/src/', (stateDir) => { + const { exitCode, output } = runHook( + FREEZE_SCRIPT, + freezeInput('/Users/dev/project/src/components/Button.tsx'), + { CLAUDE_PLUGIN_DATA: stateDir }, + ); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBeUndefined(); + }); + }); + }); + + describe('edits outside freeze boundary', () => { + test('edit outside freeze boundary denies', () => { + withFreezeDir('/Users/dev/project/src/', (stateDir) => { + const { exitCode, output } = runHook( + FREEZE_SCRIPT, + freezeInput('/Users/dev/other-project/index.ts'), + { CLAUDE_PLUGIN_DATA: stateDir }, + ); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('deny'); + expect(output.message).toContain('freeze'); + expect(output.message).toContain('outside'); + }); + }); + + test('write outside freeze boundary denies', () => { + withFreezeDir('/Users/dev/project/src/', (stateDir) => { + const { exitCode, output } = runHook( + FREEZE_SCRIPT, + freezeInput('/etc/hosts'), + { CLAUDE_PLUGIN_DATA: stateDir }, + ); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('deny'); + expect(output.message).toContain('freeze'); + expect(output.message).toContain('outside'); + }); + }); + }); + + describe('trailing slash prevents prefix confusion', () => { + test('freeze at /src/ denies /src-old/ (trailing slash prevents prefix match)', () => { + withFreezeDir('/Users/dev/project/src/', (stateDir) => { + const { exitCode, output } = runHook( + FREEZE_SCRIPT, + freezeInput('/Users/dev/project/src-old/index.ts'), + { CLAUDE_PLUGIN_DATA: stateDir }, + ); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBe('deny'); + expect(output.message).toContain('outside'); + }); + }); + }); + + describe('no freeze file exists', () => { + test('allows everything when no freeze file present', () => { + const stateDir = fs.mkdtempSync(path.join(os.tmpdir(), 'gstack-freeze-test-')); + try { + const { exitCode, output } = runHook( + FREEZE_SCRIPT, + freezeInput('/anywhere/at/all.ts'), + { CLAUDE_PLUGIN_DATA: stateDir }, + ); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBeUndefined(); + } finally { + fs.rmSync(stateDir, { recursive: true, force: true }); + } + }); + }); + + describe('edge cases', () => { + test('missing file_path field allows gracefully', () => { + withFreezeDir('/Users/dev/project/src/', (stateDir) => { + const { exitCode, output } = runHook( + FREEZE_SCRIPT, + { tool_input: {} }, + { CLAUDE_PLUGIN_DATA: stateDir }, + ); + expect(exitCode).toBe(0); + expect(output.permissionDecision).toBeUndefined(); + }); + }); + }); +}); diff --git a/unfreeze/SKILL.md b/unfreeze/SKILL.md new file mode 100644 index 00000000..d4ad37e2 --- /dev/null +++ b/unfreeze/SKILL.md @@ -0,0 +1,40 @@ +--- +name: unfreeze +version: 0.1.0 +description: | + Clear the freeze boundary set by /freeze, allowing edits to all directories + again. Use when you want to widen edit scope without ending the session. + Use when asked to "unfreeze", "unlock edits", "remove freeze", or + "allow all edits". +allowed-tools: + - Bash + - Read +--- + + + +# /unfreeze — Clear Freeze Boundary + +Remove the edit restriction set by `/freeze`, allowing edits to all directories. + +```bash +mkdir -p ~/.gstack/analytics +echo '{"skill":"unfreeze","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true +``` + +## Clear the boundary + +```bash +STATE_DIR="${CLAUDE_PLUGIN_DATA:-$HOME/.gstack}" +if [ -f "$STATE_DIR/freeze-dir.txt" ]; then + PREV=$(cat "$STATE_DIR/freeze-dir.txt") + rm -f "$STATE_DIR/freeze-dir.txt" + echo "Freeze boundary cleared (was: $PREV). Edits are now allowed everywhere." +else + echo "No freeze boundary was set." +fi +``` + +Tell the user the result. Note that `/freeze` hooks are still registered for the +session — they will just allow everything since no state file exists. To re-freeze, +run `/freeze` again. diff --git a/unfreeze/SKILL.md.tmpl b/unfreeze/SKILL.md.tmpl new file mode 100644 index 00000000..12968579 --- /dev/null +++ b/unfreeze/SKILL.md.tmpl @@ -0,0 +1,38 @@ +--- +name: unfreeze +version: 0.1.0 +description: | + Clear the freeze boundary set by /freeze, allowing edits to all directories + again. Use when you want to widen edit scope without ending the session. + Use when asked to "unfreeze", "unlock edits", "remove freeze", or + "allow all edits". +allowed-tools: + - Bash + - Read +--- + +# /unfreeze — Clear Freeze Boundary + +Remove the edit restriction set by `/freeze`, allowing edits to all directories. + +```bash +mkdir -p ~/.gstack/analytics +echo '{"skill":"unfreeze","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true +``` + +## Clear the boundary + +```bash +STATE_DIR="${CLAUDE_PLUGIN_DATA:-$HOME/.gstack}" +if [ -f "$STATE_DIR/freeze-dir.txt" ]; then + PREV=$(cat "$STATE_DIR/freeze-dir.txt") + rm -f "$STATE_DIR/freeze-dir.txt" + echo "Freeze boundary cleared (was: $PREV). Edits are now allowed everywhere." +else + echo "No freeze boundary was set." +fi +``` + +Tell the user the result. Note that `/freeze` hooks are still registered for the +session — they will just allow everything since no state file exists. To re-freeze, +run `/freeze` again. From 823772ff0b67fd0fe59cba3ccd35a4a8025a0572 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Thu, 19 Mar 2026 00:14:59 -0500 Subject: [PATCH 4/4] feat: use AskUserQuestion for dirty working tree (v0.7.4) (#200) * feat: use AskUserQuestion for dirty working tree check Replace hard exit 1 with interactive AskUserQuestion prompt offering commit/stash/abort options when /qa or /design-review detects a dirty working tree. * chore: bump version and changelog (v0.7.4) Co-Authored-By: Claude Opus 4.6 --------- Co-authored-by: Claude Opus 4.6 --- CHANGELOG.md | 6 ++++++ VERSION | 2 +- design-review/SKILL.md | 21 +++++++++++++++------ design-review/SKILL.md.tmpl | 21 +++++++++++++++------ qa/SKILL.md | 22 ++++++++++++++++------ qa/SKILL.md.tmpl | 22 ++++++++++++++++------ 6 files changed, 69 insertions(+), 25 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index f84810f9..876fedbf 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,11 @@ # Changelog +## [0.7.4] - 2026-03-18 + +### Changed + +- **`/qa` and `/design-review` now ask what to do with uncommitted changes** instead of refusing to start. When your working tree is dirty, you get an interactive prompt with three options: commit your changes, stash them, or abort. No more cryptic "ERROR: Working tree is dirty" followed by a wall of text. + ## [0.7.3] - 2026-03-18 ### Added diff --git a/VERSION b/VERSION index f38fc539..0a1ffad4 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -0.7.3 +0.7.4 diff --git a/design-review/SKILL.md b/design-review/SKILL.md index ed45f1ea..572a64f0 100644 --- a/design-review/SKILL.md +++ b/design-review/SKILL.md @@ -181,15 +181,24 @@ You are a senior product designer AND a frontend engineer. Review live sites wit Look for `DESIGN.md`, `design-system.md`, or similar in the repo root. If found, read it — all design decisions must be calibrated against it. Deviations from the project's stated design system are higher severity. If not found, use universal design principles and offer to create one from the inferred system. -**Require clean working tree before starting:** +**Check for clean working tree:** ```bash -if [ -n "$(git status --porcelain)" ]; then - echo "ERROR: Working tree is dirty. Commit or stash changes before running /design-review." - exit 1 -fi +git status --porcelain ``` +If the output is non-empty (working tree is dirty), **STOP** and use AskUserQuestion: + +"Your working tree has uncommitted changes. /design-review needs a clean tree so each design fix gets its own atomic commit." + +- A) Commit my changes — commit all current changes with a descriptive message, then start design review +- B) Stash my changes — stash, run design review, pop the stash after +- C) Abort — I'll clean up manually + +RECOMMENDATION: Choose A because uncommitted work should be preserved as a commit before design review adds its own fix commits. + +After the user chooses, execute their choice (commit or stash), then continue with setup. + **Find the browse binary:** ## SETUP (run this check BEFORE any browse command) @@ -879,7 +888,7 @@ If the repo has a `TODOS.md`: ## Additional Rules (design-review specific) -11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty. +11. **Clean working tree required.** If dirty, use AskUserQuestion to offer commit/stash/abort before proceeding. 12. **One commit per fix.** Never bundle multiple design fixes into one commit. 13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files. 14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately. diff --git a/design-review/SKILL.md.tmpl b/design-review/SKILL.md.tmpl index 24fe160c..7e157287 100644 --- a/design-review/SKILL.md.tmpl +++ b/design-review/SKILL.md.tmpl @@ -45,15 +45,24 @@ You are a senior product designer AND a frontend engineer. Review live sites wit Look for `DESIGN.md`, `design-system.md`, or similar in the repo root. If found, read it — all design decisions must be calibrated against it. Deviations from the project's stated design system are higher severity. If not found, use universal design principles and offer to create one from the inferred system. -**Require clean working tree before starting:** +**Check for clean working tree:** ```bash -if [ -n "$(git status --porcelain)" ]; then - echo "ERROR: Working tree is dirty. Commit or stash changes before running /design-review." - exit 1 -fi +git status --porcelain ``` +If the output is non-empty (working tree is dirty), **STOP** and use AskUserQuestion: + +"Your working tree has uncommitted changes. /design-review needs a clean tree so each design fix gets its own atomic commit." + +- A) Commit my changes — commit all current changes with a descriptive message, then start design review +- B) Stash my changes — stash, run design review, pop the stash after +- C) Abort — I'll clean up manually + +RECOMMENDATION: Choose A because uncommitted work should be preserved as a commit before design review adds its own fix commits. + +After the user chooses, execute their choice (commit or stash), then continue with setup. + **Find the browse binary:** {{BROWSE_SETUP}} @@ -245,7 +254,7 @@ If the repo has a `TODOS.md`: ## Additional Rules (design-review specific) -11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty. +11. **Clean working tree required.** If dirty, use AskUserQuestion to offer commit/stash/abort before proceeding. 12. **One commit per fix.** Never bundle multiple design fixes into one commit. 13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files. 14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately. diff --git a/qa/SKILL.md b/qa/SKILL.md index a8a730c3..8d0abe7d 100644 --- a/qa/SKILL.md +++ b/qa/SKILL.md @@ -202,14 +202,24 @@ You are a QA engineer AND a bug-fix engineer. Test web applications like a real **If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below). This is the most common case — the user just shipped code on a branch and wants to verify it works. -**Require clean working tree before starting:** +**Check for clean working tree:** + ```bash -if [ -n "$(git status --porcelain)" ]; then - echo "ERROR: Working tree is dirty. Commit or stash changes before running /qa." - exit 1 -fi +git status --porcelain ``` +If the output is non-empty (working tree is dirty), **STOP** and use AskUserQuestion: + +"Your working tree has uncommitted changes. /qa needs a clean tree so each bug fix gets its own atomic commit." + +- A) Commit my changes — commit all current changes with a descriptive message, then start QA +- B) Stash my changes — stash, run QA, pop the stash after +- C) Abort — I'll clean up manually + +RECOMMENDATION: Choose A because uncommitted work should be preserved as a commit before QA adds its own fix commits. + +After the user chooses, execute their choice (commit or stash), then continue with setup. + **Find the browse binary:** ## SETUP (run this check BEFORE any browse command) @@ -894,7 +904,7 @@ If the repo has a `TODOS.md`: ## Additional Rules (qa-specific) -11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty. +11. **Clean working tree required.** If dirty, use AskUserQuestion to offer commit/stash/abort before proceeding. 12. **One commit per fix.** Never bundle multiple fixes into one commit. 13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files. 14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately. diff --git a/qa/SKILL.md.tmpl b/qa/SKILL.md.tmpl index 292f7140..eae79605 100644 --- a/qa/SKILL.md.tmpl +++ b/qa/SKILL.md.tmpl @@ -49,14 +49,24 @@ You are a QA engineer AND a bug-fix engineer. Test web applications like a real **If no URL is given and you're on a feature branch:** Automatically enter **diff-aware mode** (see Modes below). This is the most common case — the user just shipped code on a branch and wants to verify it works. -**Require clean working tree before starting:** +**Check for clean working tree:** + ```bash -if [ -n "$(git status --porcelain)" ]; then - echo "ERROR: Working tree is dirty. Commit or stash changes before running /qa." - exit 1 -fi +git status --porcelain ``` +If the output is non-empty (working tree is dirty), **STOP** and use AskUserQuestion: + +"Your working tree has uncommitted changes. /qa needs a clean tree so each bug fix gets its own atomic commit." + +- A) Commit my changes — commit all current changes with a descriptive message, then start QA +- B) Stash my changes — stash, run QA, pop the stash after +- C) Abort — I'll clean up manually + +RECOMMENDATION: Choose A because uncommitted work should be preserved as a commit before QA adds its own fix commits. + +After the user chooses, execute their choice (commit or stash), then continue with setup. + **Find the browse binary:** {{BROWSE_SETUP}} @@ -300,7 +310,7 @@ If the repo has a `TODOS.md`: ## Additional Rules (qa-specific) -11. **Clean working tree required.** Refuse to start if `git status --porcelain` is non-empty. +11. **Clean working tree required.** If dirty, use AskUserQuestion to offer commit/stash/abort before proceeding. 12. **One commit per fix.** Never bundle multiple fixes into one commit. 13. **Only modify tests when generating regression tests in Phase 8e.5.** Never modify CI configuration. Never modify existing tests — only create new test files. 14. **Revert on regression.** If a fix makes things worse, `git revert HEAD` immediately.