diff --git a/CHANGELOG.md b/CHANGELOG.md index 6f81685b..e32a3610 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,6 @@ # Changelog -## [1.1.2.0] - 2026-04-19 +## [1.1.3.0] - 2026-04-19 ### Changed - **`/checkpoint` is now `/context-save` + `/context-restore`.** Claude Code treats `/checkpoint` as a native rewind alias in current environments, which was shadowing the gstack skill. Symptom: you'd type `/checkpoint`, the agent would describe it as a "built-in you need to type directly," and nothing would get saved. The fix is a clean rename and a split into two skills. One that saves, one that restores. Your old saved files still load via `/context-restore` (storage path unchanged). @@ -14,12 +14,33 @@ - **Empty-set bug on macOS.** If you ran `/checkpoint resume` (now `/context-restore`) with zero saved files, `find ... | xargs ls -1t` would fall back to listing your current directory. Confusing output, no clean "no saved contexts yet" message. Replaced with `find | sort -r | head` so empty input stays empty. ### For contributors -- New `gstack-upgrade/migrations/v1.1.2.0.sh` removes the stale on-disk `/checkpoint` install so Claude Code's native `/rewind` alias is no longer shadowed. Ownership-guarded across three install shapes (directory symlink into gstack, directory with SKILL.md symlinked into gstack, anything else). User-owned `/checkpoint` skills preserved with a notice. Migration hardened after adversarial review: explicit `HOME` unset/empty guard, `realpath` with python3 fallback, `rm --` flag, macOS sidecar handling. +- New `gstack-upgrade/migrations/v1.1.3.0.sh` removes the stale on-disk `/checkpoint` install so Claude Code's native `/rewind` alias is no longer shadowed. Ownership-guarded across three install shapes (directory symlink into gstack, directory with SKILL.md symlinked into gstack, anything else). User-owned `/checkpoint` skills preserved with a notice. Migration hardened after adversarial review: explicit `HOME` unset/empty guard, `realpath` with python3 fallback, `rm --` flag, macOS sidecar handling. - `test/migration-checkpoint-ownership.test.ts` ships 7 scenarios covering all 3 install shapes + idempotency + no-op-when-gstack-not-installed + SKILL.md-symlink-outside-gstack. Free tier, ~85ms. - Split `checkpoint-save-resume` E2E into `context-save-writes-file` and `context-restore-loads-latest`. The latter seeds two files with scrambled mtimes so the "filename-prefix, not mtime" guarantee is locked in. - `context-save` now sanitizes the title in bash (allowlist `[a-z0-9.-]`, cap 60 chars) instead of trusting LLM-side slugification, and appends a random suffix on same-second collisions to enforce the append-only contract. - `context-restore` caps its filename listing at 20 most-recent entries so users with 10k+ saved files don't blow the context window. - `test/skill-e2e-autoplan-dual-voice.test.ts` was shipped broken on main (wrong `runSkillTest` option names, wrong result-field access, wrong helper signatures, missing Agent/Skill tools). Fixed end-to-end: 1/1 pass on first attempt, $0.68, 211s. Voice-detection regexes now match JSON-shaped tool_use entries and phase-completion markers, not bare prompt-text mentions. +- Added 8 live-fire E2E tests in `test/skill-e2e-context-skills.test.ts` that spawn `claude -p` with the Skill tool enabled and assert on the routing path, not hand-fed section prompts. Covers: save routing, save-then-restore round-trip, fragment-match restore, empty-state graceful message, `/context-restore list` delegation to `/context-save list`, legacy file compat, branch-filter default, and `--all` flag. 21 additional free-tier hardening tests in `test/context-save-hardening.test.ts` pin the title-sanitizer allowlist, collision-safe filenames, empty-set fallback, and migration HOME guard. +- New `test/skill-collision-sentinel.test.ts` — insurance policy against upstream slash-command shadowing. Enumerates every gstack skill name and cross-checks against a per-host list of known built-in slash commands (23 Claude Code built-ins tracked so far). When a host ships a new built-in, add it to `KNOWN_BUILTINS` and the test flags the collision before users find it. `/review` collision with Claude Code's `/review` documented in `KNOWN_COLLISIONS_TOLERATED` with a written justification; the exception list is validated against live skills on every run so stale entries fail loud. +- `runSkillTest` in `test/helpers/session-runner.ts` now accepts an `env:` option for per-test env overrides. Prevents tests from having to stuff `GSTACK_HOME=...` into the prompt, which was causing the agent to bypass the Skill tool. All 8 new E2E tests use `env: { GSTACK_HOME: gstackHome }`. + +## [1.1.2.0] - 2026-04-19 + +### Fixed +- **`/plan-ceo-review` SCOPE EXPANSION mode stays expansive.** If you asked the CEO review to dream big, proposals were collapsing into dry feature bullets ("Add real-time notifications. Improves retention by Y%"). The V1 writing-style rules steered every outcome into diagnostic-pain framing. Rule 2 and rule 4 in the shared preamble now cover three framings: pain reduction, capability unlocked, and forcing-question pressure. Cathedral language survives the clarity layer. Ask for a 10x vision, get one. +- **`/office-hours` keeps its edge.** Startup-mode Q3 (Desperate Specificity) stopped collapsing into "Who is your target user?" The forcing question now stacks three pressures, matched to the domain of the idea — career impact for B2B, daily pain for consumer, weekend project unlocked for hobby and open-source. Builder mode stays wild: "what if you also..." riffs and adjacent unlocks come through, not PRD-voice feature roadmaps. + +### Added +- **Gate-tier eval tests catch mode-posture regressions on every PR.** Three new E2E tests fire when the shared preamble, the plan-ceo-review template, or the office-hours template change. A Sonnet judge scores each mode on two axes: felt-experience vs decision-preservation for expansion, stacked-pressure vs domain-matched-consequence for forcing, unexpected-combinations vs excitement-over-optimization for builder. The original V1 regression shipped because nothing caught it. This closes that gap. + +### For contributors +- Writing Style rule 2 and rule 4 in `scripts/resolvers/preamble.ts` each present three paired framing examples instead of one. Rule 3 adds an explicit exception for stacked forcing questions. +- `plan-ceo-review/SKILL.md.tmpl` gets a new `### 0D-prelude. Expansion Framing` subsection shared by SCOPE EXPANSION and SELECTIVE EXPANSION. +- `office-hours/SKILL.md.tmpl` gets inline forcing exemplar (Q3) and wild exemplar (builder operating principles). Anchored by stable heading, not line numbers. +- New `judgePosture(mode, text)` helper in `test/helpers/llm-judge.ts` (Sonnet judge, dual-axis rubric per mode). +- Three test fixtures in `test/fixtures/mode-posture/` — expansion plan, forcing pitch, builder idea. +- Three entries registered in `E2E_TOUCHFILES` + `E2E_TIERS`: `plan-ceo-review-expansion-energy`, `office-hours-forcing-energy`, `office-hours-builder-wildness` — all `gate` tier. +- Review history on this branch: CEO review (HOLD SCOPE) + Codex plan review (30 findings, drove approach pivot from "add new rule #5 taxonomy" to "rewrite rule 2-4 examples"). One eng review pass caught the test-infrastructure target (originally pointed at `test/skill-llm-eval.test.ts`, which does static analysis — actually needs E2E). ## [1.1.1.0] - 2026-04-18 diff --git a/VERSION b/VERSION index a6f417b8..9abfbcb9 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.1.2.0 +1.1.3.0 diff --git a/autoplan/SKILL.md b/autoplan/SKILL.md index 0d511fb4..d02e95b8 100644 --- a/autoplan/SKILL.md +++ b/autoplan/SKILL.md @@ -413,9 +413,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/canary/SKILL.md b/canary/SKILL.md index a75a764f..5886d194 100644 --- a/canary/SKILL.md +++ b/canary/SKILL.md @@ -405,9 +405,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/codex/SKILL.md b/codex/SKILL.md index d2cec367..710f145c 100644 --- a/codex/SKILL.md +++ b/codex/SKILL.md @@ -407,9 +407,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/context-restore/SKILL.md b/context-restore/SKILL.md index f059ca32..84835500 100644 --- a/context-restore/SKILL.md +++ b/context-restore/SKILL.md @@ -409,9 +409,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/context-save/SKILL.md b/context-save/SKILL.md index e190cb72..ce9d65ef 100644 --- a/context-save/SKILL.md +++ b/context-save/SKILL.md @@ -409,9 +409,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/cso/SKILL.md b/cso/SKILL.md index 37d40564..1f49371d 100644 --- a/cso/SKILL.md +++ b/cso/SKILL.md @@ -410,9 +410,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/design-consultation/SKILL.md b/design-consultation/SKILL.md index 0d53b7e7..d1c1e55b 100644 --- a/design-consultation/SKILL.md +++ b/design-consultation/SKILL.md @@ -410,9 +410,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/design-html/SKILL.md b/design-html/SKILL.md index 794dc16d..1f81a8e2 100644 --- a/design-html/SKILL.md +++ b/design-html/SKILL.md @@ -412,9 +412,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/design-review/SKILL.md b/design-review/SKILL.md index 0588a8ae..ee44d82a 100644 --- a/design-review/SKILL.md +++ b/design-review/SKILL.md @@ -410,9 +410,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/design-shotgun/SKILL.md b/design-shotgun/SKILL.md index 52998b67..d14b287a 100644 --- a/design-shotgun/SKILL.md +++ b/design-shotgun/SKILL.md @@ -407,9 +407,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/devex-review/SKILL.md b/devex-review/SKILL.md index 8e3d4a5f..09b9a74d 100644 --- a/devex-review/SKILL.md +++ b/devex-review/SKILL.md @@ -410,9 +410,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/document-release/SKILL.md b/document-release/SKILL.md index 3aaeb09e..338e361c 100644 --- a/document-release/SKILL.md +++ b/document-release/SKILL.md @@ -407,9 +407,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/gstack-upgrade/migrations/v1.1.2.0.sh b/gstack-upgrade/migrations/v1.1.3.0.sh similarity index 87% rename from gstack-upgrade/migrations/v1.1.2.0.sh rename to gstack-upgrade/migrations/v1.1.3.0.sh index ab5f7a8d..8523a802 100755 --- a/gstack-upgrade/migrations/v1.1.2.0.sh +++ b/gstack-upgrade/migrations/v1.1.3.0.sh @@ -1,5 +1,5 @@ #!/usr/bin/env bash -# Migration: v1.1.2.0 — Remove stale /checkpoint skill installs +# Migration: v1.1.3.0 — Remove stale /checkpoint skill installs # # Claude Code ships /checkpoint as a native alias for /rewind, which was # shadowing the gstack checkpoint skill. The skill has been split into @@ -25,7 +25,7 @@ set -euo pipefail # CI runners) survives and produces dangerous absolute paths like # "/.claude/skills/...". Abort cleanly. if [ -z "${HOME:-}" ]; then - echo " [v1.1.2.0] HOME is unset or empty — skipping migration." >&2 + echo " [v1.1.3.0] HOME is unset or empty — skipping migration." >&2 exit 0 fi @@ -75,10 +75,10 @@ if [ -L "$OLD_TOPLEVEL" ]; then target_real=$(resolve_real "$OLD_TOPLEVEL") if [ -n "$GSTACK_ROOT_REAL" ] && path_inside "$target_real" "$GSTACK_ROOT_REAL"; then rm -- "$OLD_TOPLEVEL" - echo " [v1.1.2.0] Removed stale /checkpoint symlink (was shadowing Claude Code's /rewind alias)." + echo " [v1.1.3.0] Removed stale /checkpoint symlink (was shadowing Claude Code's /rewind alias)." removed_any=1 else - echo " [v1.1.2.0] Leaving $OLD_TOPLEVEL alone — symlink target is outside gstack (or unresolvable)." + echo " [v1.1.3.0] Leaving $OLD_TOPLEVEL alone — symlink target is outside gstack (or unresolvable)." fi elif [ -d "$OLD_TOPLEVEL" ]; then # Regular directory. Only remove if it contains exactly one file named @@ -92,13 +92,13 @@ elif [ -d "$OLD_TOPLEVEL" ]; then # Strip macOS sidecars first (not user content), then remove the dir. find "$OLD_TOPLEVEL" -maxdepth 1 \( -name '.DS_Store' -o -name '._*' \) -type f -delete 2>/dev/null || true rm -r -- "$OLD_TOPLEVEL" - echo " [v1.1.2.0] Removed stale /checkpoint install directory (gstack prefix-mode)." + echo " [v1.1.3.0] Removed stale /checkpoint install directory (gstack prefix-mode)." removed_any=1 else - echo " [v1.1.2.0] Leaving $OLD_TOPLEVEL alone — SKILL.md symlink target is outside gstack." + echo " [v1.1.3.0] Leaving $OLD_TOPLEVEL alone — SKILL.md symlink target is outside gstack." fi else - echo " [v1.1.2.0] Leaving $OLD_TOPLEVEL alone — not a gstack-owned install (has custom content)." + echo " [v1.1.3.0] Leaving $OLD_TOPLEVEL alone — not a gstack-owned install (has custom content)." fi fi # Missing → no-op (idempotency). @@ -111,10 +111,10 @@ if [ -L "$OLD_NAMESPACED" ]; then target_real=$(resolve_real "$OLD_NAMESPACED") if [ -n "$GSTACK_ROOT_REAL" ] && path_inside "$target_real" "$GSTACK_ROOT_REAL"; then rm -- "$OLD_NAMESPACED" - echo " [v1.1.2.0] Removed stale ~/.claude/skills/gstack/checkpoint symlink." + echo " [v1.1.3.0] Removed stale ~/.claude/skills/gstack/checkpoint symlink." removed_any=1 else - echo " [v1.1.2.0] Leaving $OLD_NAMESPACED alone — symlink target is outside gstack." + echo " [v1.1.3.0] Leaving $OLD_NAMESPACED alone — symlink target is outside gstack." fi elif [ -d "$OLD_NAMESPACED" ]; then # Regular directory. This is the gstack-prefix install location. Check that @@ -123,15 +123,15 @@ elif [ -d "$OLD_NAMESPACED" ]; then target_real=$(resolve_real "$OLD_NAMESPACED") if [ -n "$GSTACK_ROOT_REAL" ] && path_inside "$target_real" "$GSTACK_ROOT_REAL"; then rm -rf -- "$OLD_NAMESPACED" - echo " [v1.1.2.0] Removed stale ~/.claude/skills/gstack/checkpoint/ (replaced by context-save + context-restore)." + echo " [v1.1.3.0] Removed stale ~/.claude/skills/gstack/checkpoint/ (replaced by context-save + context-restore)." removed_any=1 else - echo " [v1.1.2.0] Leaving $OLD_NAMESPACED alone — resolves outside gstack." + echo " [v1.1.3.0] Leaving $OLD_NAMESPACED alone — resolves outside gstack." fi fi if [ "$removed_any" = "1" ]; then - echo " [v1.1.2.0] /checkpoint is now Claude Code's native /rewind alias. Use /context-save to save state and /context-restore to resume." + echo " [v1.1.3.0] /checkpoint is now Claude Code's native /rewind alias. Use /context-save to save state and /context-restore to resume." fi exit 0 diff --git a/health/SKILL.md b/health/SKILL.md index 87c42eb4..3ff29b4a 100644 --- a/health/SKILL.md +++ b/health/SKILL.md @@ -407,9 +407,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/investigate/SKILL.md b/investigate/SKILL.md index 455c5f92..1fc0ddd5 100644 --- a/investigate/SKILL.md +++ b/investigate/SKILL.md @@ -424,9 +424,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/land-and-deploy/SKILL.md b/land-and-deploy/SKILL.md index 7652f053..4bb1be92 100644 --- a/land-and-deploy/SKILL.md +++ b/land-and-deploy/SKILL.md @@ -404,9 +404,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/learn/SKILL.md b/learn/SKILL.md index eadf5127..1ac0ca9b 100644 --- a/learn/SKILL.md +++ b/learn/SKILL.md @@ -407,9 +407,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/office-hours/SKILL.md b/office-hours/SKILL.md index d8a2be64..4ee7ee84 100644 --- a/office-hours/SKILL.md +++ b/office-hours/SKILL.md @@ -415,9 +415,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. @@ -984,6 +990,14 @@ If the framing is imprecise, **reframe constructively** — don't dissolve the q **Red flags:** Category-level answers. "Healthcare enterprises." "SMBs." "Marketing teams." These are filters, not people. You can't email a category. +**Forcing exemplar:** + +SOFTENED (avoid): "Who's your target user, and what gets them to buy? Worth thinking about before marketing spend ramps." + +FORCING (aim for): "Name the actual human. Not 'product managers at mid-market SaaS companies' — an actual name, an actual title, an actual consequence. What's the real thing they're avoiding that your product solves? If this is a career problem, whose career? If this is a daily pain, whose day? If this is a creative unlock, whose weekend project becomes possible? If you can't name them, you don't know who you're building for — and 'users' isn't an answer." + +The pressure is in the stacking — don't collapse it into a single ask. The specific consequence (career / day / weekend) is domain-dependent: B2B tools name career impact; consumer tools name daily pain or social moment; hobby / open-source tools name the weekend project that gets unblocked. Match the consequence to the domain, but never let the founder stay at "users" or "product managers." + #### Q4: Narrowest Wedge **Ask:** "What's the smallest possible version of this that someone would pay real money for — this week, not after you build the platform?" @@ -1038,6 +1052,14 @@ Use this mode when the user is building for fun, learning, hacking on open sourc 3. **The best side projects solve your own problem.** If you're building it for yourself, trust that instinct. 4. **Explore before you optimize.** Try the weird idea first. Polish later. +**Wild exemplar:** + +STRUCTURED (avoid): "Consider adding a share feature. This would improve user retention by enabling virality." + +WILD (aim for): "Oh — and what if you also let them share the visualization as a live URL? Or pipe it into a Slack thread? Or animate the generation so viewers see it draw itself? Each one's a 30-minute unlock. Any of them turn this from 'a tool I used' into 'a thing I showed a friend.'" + +Both are outcome-framed. Only one has the 'whoa.' Builder mode's job is to surface the most exciting version of the idea, not the most strategically optimized one. Lead with the fun; let the user edit it down. + ### Response Posture - **Enthusiastic, opinionated collaborator.** You're here to help them build the coolest thing possible. Riff on their ideas. Get excited about what's exciting. diff --git a/office-hours/SKILL.md.tmpl b/office-hours/SKILL.md.tmpl index afe063c9..5b9f762e 100644 --- a/office-hours/SKILL.md.tmpl +++ b/office-hours/SKILL.md.tmpl @@ -203,6 +203,14 @@ If the framing is imprecise, **reframe constructively** — don't dissolve the q **Red flags:** Category-level answers. "Healthcare enterprises." "SMBs." "Marketing teams." These are filters, not people. You can't email a category. +**Forcing exemplar:** + +SOFTENED (avoid): "Who's your target user, and what gets them to buy? Worth thinking about before marketing spend ramps." + +FORCING (aim for): "Name the actual human. Not 'product managers at mid-market SaaS companies' — an actual name, an actual title, an actual consequence. What's the real thing they're avoiding that your product solves? If this is a career problem, whose career? If this is a daily pain, whose day? If this is a creative unlock, whose weekend project becomes possible? If you can't name them, you don't know who you're building for — and 'users' isn't an answer." + +The pressure is in the stacking — don't collapse it into a single ask. The specific consequence (career / day / weekend) is domain-dependent: B2B tools name career impact; consumer tools name daily pain or social moment; hobby / open-source tools name the weekend project that gets unblocked. Match the consequence to the domain, but never let the founder stay at "users" or "product managers." + #### Q4: Narrowest Wedge **Ask:** "What's the smallest possible version of this that someone would pay real money for — this week, not after you build the platform?" @@ -257,6 +265,14 @@ Use this mode when the user is building for fun, learning, hacking on open sourc 3. **The best side projects solve your own problem.** If you're building it for yourself, trust that instinct. 4. **Explore before you optimize.** Try the weird idea first. Polish later. +**Wild exemplar:** + +STRUCTURED (avoid): "Consider adding a share feature. This would improve user retention by enabling virality." + +WILD (aim for): "Oh — and what if you also let them share the visualization as a live URL? Or pipe it into a Slack thread? Or animate the generation so viewers see it draw itself? Each one's a 30-minute unlock. Any of them turn this from 'a tool I used' into 'a thing I showed a friend.'" + +Both are outcome-framed. Only one has the 'whoa.' Builder mode's job is to surface the most exciting version of the idea, not the most strategically optimized one. Lead with the fun; let the user edit it down. + ### Response Posture - **Enthusiastic, opinionated collaborator.** You're here to help them build the coolest thing possible. Riff on their ideas. Get excited about what's exciting. diff --git a/open-gstack-browser/SKILL.md b/open-gstack-browser/SKILL.md index d5e521ee..2286bf2c 100644 --- a/open-gstack-browser/SKILL.md +++ b/open-gstack-browser/SKILL.md @@ -404,9 +404,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/package.json b/package.json index ac937347..8f6725a1 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "gstack", - "version": "1.1.2.0", + "version": "1.1.3.0", "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.", "license": "MIT", "type": "module", diff --git a/pair-agent/SKILL.md b/pair-agent/SKILL.md index 5d23c468..df666db7 100644 --- a/pair-agent/SKILL.md +++ b/pair-agent/SKILL.md @@ -405,9 +405,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md index 3777cf00..490c1d59 100644 --- a/plan-ceo-review/SKILL.md +++ b/plan-ceo-review/SKILL.md @@ -411,9 +411,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. @@ -1103,6 +1109,18 @@ Rules: - If only one approach exists, explain concretely why alternatives were eliminated. - Do NOT proceed to mode selection (0F) without user approval of the chosen approach. +### 0D-prelude. Expansion Framing (shared by EXPANSION and SELECTIVE EXPANSION) + +Every expansion proposal you generate in SCOPE EXPANSION or SELECTIVE EXPANSION mode follows this framing pattern: + +FLAT (avoid): "Add real-time notifications. Users would see workflow results faster — latency drops from ~30s polling to <500ms push. Effort: ~1 hour CC." + +EXPANSIVE (aim for): "Imagine the moment a workflow finishes — the user sees the result instantly, no tab-switching, no polling, no 'did it actually work?' anxiety. Real-time feedback turns a tool they check into a tool that talks to them. Concrete shape: WebSocket channel + optimistic UI + desktop notification fallback. Effort: human ~2 days / CC ~1 hour. Makes the product feel 10x more alive." + +Both are outcome-framed. Only one makes the user feel the cathedral. Lead with the felt experience, close with concrete effort and impact. + +**For SELECTIVE EXPANSION:** neutral recommendation posture ≠ flat prose. Present vivid options, then let the user decide. Do not over-sell — "Makes the product feel 10x more alive" is vivid; "This would 10x your revenue" is over-sell. Evocative, not promotional. + ### 0D. Mode-Specific Analysis **For SCOPE EXPANSION** — run all three, then the opt-in ceremony: 1. 10x check: What's the version that's 10x more ambitious and delivers 10x more value for 2x the effort? Describe it concretely. diff --git a/plan-ceo-review/SKILL.md.tmpl b/plan-ceo-review/SKILL.md.tmpl index 93d1af0a..f6dbc876 100644 --- a/plan-ceo-review/SKILL.md.tmpl +++ b/plan-ceo-review/SKILL.md.tmpl @@ -246,6 +246,18 @@ Rules: - If only one approach exists, explain concretely why alternatives were eliminated. - Do NOT proceed to mode selection (0F) without user approval of the chosen approach. +### 0D-prelude. Expansion Framing (shared by EXPANSION and SELECTIVE EXPANSION) + +Every expansion proposal you generate in SCOPE EXPANSION or SELECTIVE EXPANSION mode follows this framing pattern: + +FLAT (avoid): "Add real-time notifications. Users would see workflow results faster — latency drops from ~30s polling to <500ms push. Effort: ~1 hour CC." + +EXPANSIVE (aim for): "Imagine the moment a workflow finishes — the user sees the result instantly, no tab-switching, no polling, no 'did it actually work?' anxiety. Real-time feedback turns a tool they check into a tool that talks to them. Concrete shape: WebSocket channel + optimistic UI + desktop notification fallback. Effort: human ~2 days / CC ~1 hour. Makes the product feel 10x more alive." + +Both are outcome-framed. Only one makes the user feel the cathedral. Lead with the felt experience, close with concrete effort and impact. + +**For SELECTIVE EXPANSION:** neutral recommendation posture ≠ flat prose. Present vivid options, then let the user decide. Do not over-sell — "Makes the product feel 10x more alive" is vivid; "This would 10x your revenue" is over-sell. Evocative, not promotional. + ### 0D. Mode-Specific Analysis **For SCOPE EXPANSION** — run all three, then the opt-in ceremony: 1. 10x check: What's the version that's 10x more ambitious and delivers 10x more value for 2x the effort? Describe it concretely. diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md index 4e8a8c3b..128cc7b7 100644 --- a/plan-design-review/SKILL.md +++ b/plan-design-review/SKILL.md @@ -408,9 +408,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/plan-devex-review/SKILL.md b/plan-devex-review/SKILL.md index bfbca08b..83d21f34 100644 --- a/plan-devex-review/SKILL.md +++ b/plan-devex-review/SKILL.md @@ -412,9 +412,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md index c7c777d1..fe0340e4 100644 --- a/plan-eng-review/SKILL.md +++ b/plan-eng-review/SKILL.md @@ -410,9 +410,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/plan-tune/SKILL.md b/plan-tune/SKILL.md index 05f169c8..f0f54f7c 100644 --- a/plan-tune/SKILL.md +++ b/plan-tune/SKILL.md @@ -418,9 +418,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/qa-only/SKILL.md b/qa-only/SKILL.md index d9d19c2b..dbf6b46c 100644 --- a/qa-only/SKILL.md +++ b/qa-only/SKILL.md @@ -406,9 +406,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/qa/SKILL.md b/qa/SKILL.md index 0bf23fa2..d79ed321 100644 --- a/qa/SKILL.md +++ b/qa/SKILL.md @@ -412,9 +412,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/retro/SKILL.md b/retro/SKILL.md index f8433f72..d3ccd7bd 100644 --- a/retro/SKILL.md +++ b/retro/SKILL.md @@ -405,9 +405,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/review/SKILL.md b/review/SKILL.md index 247b6bf5..cbb48cf5 100644 --- a/review/SKILL.md +++ b/review/SKILL.md @@ -409,9 +409,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/scripts/resolvers/preamble.ts b/scripts/resolvers/preamble.ts index f3bd9718..e11b04f4 100644 --- a/scripts/resolvers/preamble.ts +++ b/scripts/resolvers/preamble.ts @@ -375,9 +375,15 @@ function generateWritingStyle(_ctx: TemplateContext): string { These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/setup-deploy/SKILL.md b/setup-deploy/SKILL.md index 94d2fc34..b3504bf0 100644 --- a/setup-deploy/SKILL.md +++ b/setup-deploy/SKILL.md @@ -408,9 +408,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/ship/SKILL.md b/ship/SKILL.md index dcc030a6..5cbe32c5 100644 --- a/ship/SKILL.md +++ b/ship/SKILL.md @@ -410,9 +410,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/test/context-save-hardening.test.ts b/test/context-save-hardening.test.ts index a38ce9e3..95f49aba 100644 --- a/test/context-save-hardening.test.ts +++ b/test/context-save-hardening.test.ts @@ -314,9 +314,9 @@ describe('context-restore: find + sort + head cap', () => { // ─── Migration HOME guard ────────────────────────────────────────────────── -describe('migration v1.1.2.0: HOME guard', () => { +describe('migration v1.1.3.0: HOME guard', () => { let tmp: string; - const MIGRATION = path.join(ROOT, 'gstack-upgrade', 'migrations', 'v1.1.2.0.sh'); + const MIGRATION = path.join(ROOT, 'gstack-upgrade', 'migrations', 'v1.1.3.0.sh'); beforeEach(() => { tmp = fs.mkdtempSync(path.join(os.tmpdir(), 'ctx-home-')); }); afterEach(() => { try { fs.rmSync(tmp, { recursive: true, force: true }); } catch {} }); diff --git a/test/fixtures/golden/claude-ship-SKILL.md b/test/fixtures/golden/claude-ship-SKILL.md index dcc030a6..5cbe32c5 100644 --- a/test/fixtures/golden/claude-ship-SKILL.md +++ b/test/fixtures/golden/claude-ship-SKILL.md @@ -410,9 +410,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/test/fixtures/golden/codex-ship-SKILL.md b/test/fixtures/golden/codex-ship-SKILL.md index d5a7356d..df0b2da8 100644 --- a/test/fixtures/golden/codex-ship-SKILL.md +++ b/test/fixtures/golden/codex-ship-SKILL.md @@ -399,9 +399,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/test/fixtures/golden/factory-ship-SKILL.md b/test/fixtures/golden/factory-ship-SKILL.md index d749d3d9..963fa176 100644 --- a/test/fixtures/golden/factory-ship-SKILL.md +++ b/test/fixtures/golden/factory-ship-SKILL.md @@ -401,9 +401,15 @@ Per-skill instructions may add additional formatting rules on top of this baseli These rules apply to every AskUserQuestion, every response you write to the user, and every review finding. They compose with the AskUserQuestion Format section above: Format = *how* a question is structured; Writing Style = *the prose quality of the content inside it*. 1. **Jargon gets a one-sentence gloss on first use per skill invocation.** Even if the user's own prompt already contained the term — users often paste jargon from someone else's plan. Gloss unconditionally on first use. No cross-invocation memory: a new skill fire is a new first-use opportunity. Example: "race condition (two things happen at the same time and step on each other)". -2. **Frame questions in outcome terms, not implementation terms.** Bad: "Is this endpoint idempotent?" Good: "If someone double-clicks the button, is it OK for the action to run twice?" Ask the question the user would actually want to answer. -3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." -4. **Close every decision with user impact.** Connect the technical call back to who's affected. "If we skip this, your users will see a 3-second spinner on every page load." Make the user's user real. +2. **Frame questions in outcome terms, not implementation terms.** Ask the question the user would actually want to answer. Outcome framing covers three families — match the framing to the mode: + - **Pain reduction** (default for diagnostic / HOLD SCOPE / rigor review): "If someone double-clicks the button, is it OK for the action to run twice?" (instead of "Is this endpoint idempotent?") + - **Upside / delight** (for expansion / builder / vision contexts): "When the workflow finishes, does the user see the result instantly, or are they still refreshing a dashboard?" (instead of "Should we add webhook notifications?") + - **Interrogative pressure** (for forcing-question / founder-challenge contexts): "Can you name the actual person whose career gets better if this ships and whose career gets worse if it doesn't?" (instead of "Who's the target user?") +3. **Short sentences. Concrete nouns. Active voice.** Standard advice from any good writing guide. Prefer "the cache stores the result for 60s" over "results will have been cached for a period of 60s." *Exception:* stacked, multi-part questions are a legitimate forcing device — "Title? Gets them promoted? Gets them fired? Keeps them up at night?" is longer than one short sentence, and it should be, because the pressure IS in the stacking. Don't collapse a stack into a single neutral ask when the skill's posture is forcing. +4. **Close every decision with user impact.** Connect the technical call back to who's affected. Make the user's user real. Impact has three shapes — again, match the mode: + - **Pain avoided:** "If we skip this, your users will see a 3-second spinner on every page load." + - **Capability unlocked:** "If we ship this, users get instant feedback the moment a workflow finishes — no tabs to refresh, no polling." + - **Consequence named** (for forcing questions): "If you can't name the person whose career this helps, you don't know who you're building for — and 'users' isn't an answer." 5. **User-turn override.** If the user's current message says "be terse" / "no explanations" / "brutally honest, just the answer" / similar, skip this entire Writing Style block for your next response, regardless of config. User's in-turn request wins. 6. **Glossary boundary is the curated list.** Terms below get glossed. Terms not on the list are assumed plain-English enough. If you see a term that genuinely needs glossing but isn't listed, note it (once) in your response so it can be added via PR. diff --git a/test/fixtures/mode-posture/builder-idea.md b/test/fixtures/mode-posture/builder-idea.md new file mode 100644 index 00000000..c2df04c4 --- /dev/null +++ b/test/fixtures/mode-posture/builder-idea.md @@ -0,0 +1,15 @@ +# Weekend Project: Dependency Graph Visualizer + +I want to build a tool that takes a codebase and visualizes its dependency graph — modules, imports, which files depend on which. For fun, for learning. Maybe open-source it. + +## What I have so far + +- Rough idea: point it at a repo, get an interactive graph +- Stack I'm leaning toward: TypeScript + D3 or Cytoscape for rendering +- Potential: could work for JS/TS first, maybe Python later + +## What I don't know yet + +- How to make the visualization actually useful vs just pretty +- Whether this should be a CLI, a web tool, or a VS Code extension +- What would make someone else want to use it diff --git a/test/fixtures/mode-posture/expansion-plan.md b/test/fixtures/mode-posture/expansion-plan.md new file mode 100644 index 00000000..3042d28d --- /dev/null +++ b/test/fixtures/mode-posture/expansion-plan.md @@ -0,0 +1,23 @@ +# Plan: Team Velocity Dashboard + +## Context + +We're building a dashboard for engineering managers to track team code velocity — commits per engineer, PR cycle time, review latency, CI pass rate. The data already lives in GitHub; we're just aggregating it for a manager's single-pane view. + +## Changes + +1. New React component `TeamVelocityDashboard` in `src/dashboard/` +2. REST API endpoint `GET /api/team/velocity?days=30` returning aggregated metrics +3. Background job pulling GitHub data every 15 minutes into Postgres +4. Simple filter UI: team, date range, metric + +## Architecture + +- Frontend: React + shadcn/ui +- Backend: Express + PostgreSQL +- Data source: GitHub REST API (cached 15min) + +## Open questions + +- Should we support multiple repos per team? +- Do we show individual engineer names or aggregate only? diff --git a/test/fixtures/mode-posture/forcing-pitch.md b/test/fixtures/mode-posture/forcing-pitch.md new file mode 100644 index 00000000..7374ef97 --- /dev/null +++ b/test/fixtures/mode-posture/forcing-pitch.md @@ -0,0 +1,13 @@ +# Our Idea: AI Tools for Product Managers + +We're building AI tools for product managers at mid-market SaaS companies. The product combines a bunch of the things PMs already do — writing PRDs, gathering user feedback, analyzing usage data, drafting roadmaps — and uses LLMs to speed each of them up. + +## Who we're targeting + +Product managers at SaaS companies with 50-500 engineers. These PMs are stretched thin, juggle a lot of surface area, and would benefit from AI assistance. + +## What we've done so far + +- Talked to a few PMs we know from prior jobs +- Built a prototype that summarizes Zoom customer calls into a PRD stub +- Got on a waitlist of about 40 signups from LinkedIn posts diff --git a/test/helpers/llm-judge.ts b/test/helpers/llm-judge.ts index 7040cd6c..6ce4ca67 100644 --- a/test/helpers/llm-judge.ts +++ b/test/helpers/llm-judge.ts @@ -25,6 +25,14 @@ export interface OutcomeJudgeResult { reasoning: string; } +export interface PostureScore { + axis_a: number; // 1-5 — mode-specific primary rubric axis + axis_b: number; // 1-5 — mode-specific secondary rubric axis + reasoning: string; +} + +export type PostureMode = 'expansion' | 'forcing' | 'builder'; + /** * Call claude-sonnet-4-6 with a prompt, extract JSON response. * Retries once on 429 rate limit errors. @@ -128,3 +136,57 @@ Rules: - evidence_quality (1-5): Do detected bugs have screenshots, repro steps, or specific element references? 5 = excellent evidence for every bug, 1 = no evidence at all`); } + +/** + * Score mode-specific prose posture on two mode-dependent axes (1-5 each). + * + * Used by mode-posture regression tests to detect whether V1's Writing Style + * rules have flattened the distinctive energy of expansion / forcing / builder + * modes. See docs/designs/PLAN_TUNING_V1.md and the V1.1 mode-posture fix. + * + * The generator model is whatever the skill runs with (often Opus for + * plan-ceo-review). The judge is always Sonnet via callJudge() for cost. + */ +export async function judgePosture(mode: PostureMode, text: string): Promise { + const rubrics: Record = { + expansion: { + context: 'This text is expansion proposals emitted by /plan-ceo-review in SCOPE EXPANSION or SELECTIVE EXPANSION mode. The skill is supposed to lead with felt-experience vision, then close with concrete effort and impact.', + axis_a: 'surface_framing (1-5): Does each proposal lead with felt-experience framing ("imagine", "when the user sees", "the moment X happens", or equivalent) BEFORE closing with concrete metrics? Penalize pure feature bullets ("Add X. Improves Y by Z%").', + axis_b: 'decision_preservation (1-5): Does each proposal contain the elements a scope-expansion decision needs — what to build (concrete shape), effort (ideally both human and CC scales), risk or integration note? Penalize pure prose with no actionable content.', + }, + forcing: { + context: 'This text is the Q3 Desperate Specificity question emitted by /office-hours startup mode. The skill is supposed to force the founder to name a specific person and consequence, stacking multiple pressures.', + axis_a: 'stacking_preserved (1-5): Does the question include at least 3 distinct sub-pressures (e.g., title? promoted? fired? up at night? OR career? day? weekend?) rather than a single neutral ask? Penalize "Who is your target user?" style collapses.', + axis_b: 'domain_matched_consequence (1-5): Does the named consequence match the domain context in the input (B2B → career impact, consumer → daily pain, hobby/open-source → weekend project)? Penalize one-size-fits-all B2B career framing for non-B2B ideas.', + }, + builder: { + context: 'This text is builder-mode response from /office-hours. The skill is supposed to riff creatively — "what if you also..." adjacent unlocks, cross-domain combinations, the "whoa" moment — not emit a structured product roadmap.', + axis_a: 'unexpected_combinations (1-5): Does the output include at least 2 cross-domain or surprising adjacent unlocks ("what if you also...", "pipe it into X", etc.)? Penalize structured feature lists with no creative leaps.', + axis_b: 'excitement_over_optimization (1-5): Does the output read as a creative riff (enthusiastic, opinionated, evocative) or as a PRD / product roadmap (structured, metric-driven, conservative)? Penalize PRD-voice language like "improve retention", "enable virality", "consider adding".', + }, + }; + + const r = rubrics[mode]; + return callJudge(`You are evaluating prose quality for a mode-specific posture regression test. + +Context: ${r.context} + +Rate the following output on two dimensions (1-5 scale each): + +- **axis_a** — ${r.axis_a} +- **axis_b** — ${r.axis_b} + +Scoring guide: +- 5: Excellent — strong, unambiguous match for the posture +- 4: Good — matches posture with minor weakness +- 3: Adequate — partial match, noticeable flatness or structure +- 2: Poor — posture mostly flattened / collapsed +- 1: Fail — posture entirely missing, reads as the opposite mode + +Respond with ONLY valid JSON in this exact format: +{"axis_a": N, "axis_b": N, "reasoning": "brief explanation naming specific phrases that drove the score"} + +Here is the output to evaluate: + +${text}`); +} diff --git a/test/helpers/touchfiles.ts b/test/helpers/touchfiles.ts index 577ca306..334e0590 100644 --- a/test/helpers/touchfiles.ts +++ b/test/helpers/touchfiles.ts @@ -69,12 +69,15 @@ export const E2E_TOUCHFILES: Record = { 'review-army-consensus': ['review/**', 'scripts/resolvers/review-army.ts'], // Office Hours - 'office-hours-spec-review': ['office-hours/**', 'scripts/gen-skill-docs.ts'], + 'office-hours-spec-review': ['office-hours/**', 'scripts/gen-skill-docs.ts'], + 'office-hours-forcing-energy': ['office-hours/**', 'scripts/resolvers/preamble.ts', 'test/fixtures/mode-posture/**', 'test/helpers/llm-judge.ts'], + 'office-hours-builder-wildness': ['office-hours/**', 'scripts/resolvers/preamble.ts', 'test/fixtures/mode-posture/**', 'test/helpers/llm-judge.ts'], // Plan reviews - 'plan-ceo-review': ['plan-ceo-review/**'], - 'plan-ceo-review-selective': ['plan-ceo-review/**'], - 'plan-ceo-review-benefits': ['plan-ceo-review/**', 'scripts/gen-skill-docs.ts'], + 'plan-ceo-review': ['plan-ceo-review/**'], + 'plan-ceo-review-selective': ['plan-ceo-review/**'], + 'plan-ceo-review-benefits': ['plan-ceo-review/**', 'scripts/gen-skill-docs.ts'], + 'plan-ceo-review-expansion-energy': ['plan-ceo-review/**', 'scripts/resolvers/preamble.ts', 'test/fixtures/mode-posture/**', 'test/helpers/llm-judge.ts'], 'plan-eng-review': ['plan-eng-review/**'], 'plan-eng-review-artifact': ['plan-eng-review/**'], 'plan-review-report': ['plan-eng-review/**', 'scripts/gen-skill-docs.ts'], @@ -247,11 +250,14 @@ export const E2E_TIERS: Record = { // Office Hours 'office-hours-spec-review': 'gate', + 'office-hours-forcing-energy': 'gate', // V1.1 mode-posture regression gate (Sonnet generator) + 'office-hours-builder-wildness': 'gate', // V1.1 mode-posture regression gate (Sonnet generator) // Plan reviews — gate for cheap functional, periodic for Opus quality 'plan-ceo-review': 'periodic', 'plan-ceo-review-selective': 'periodic', 'plan-ceo-review-benefits': 'gate', + 'plan-ceo-review-expansion-energy': 'gate', // V1.1 mode-posture regression gate (Opus generator, Sonnet judge) 'plan-eng-review': 'periodic', 'plan-eng-review-artifact': 'periodic', 'plan-eng-coverage-audit': 'gate', diff --git a/test/migration-checkpoint-ownership.test.ts b/test/migration-checkpoint-ownership.test.ts index e33f0d3d..526b9498 100644 --- a/test/migration-checkpoint-ownership.test.ts +++ b/test/migration-checkpoint-ownership.test.ts @@ -5,7 +5,7 @@ import * as path from 'path'; import * as os from 'os'; const ROOT = path.resolve(import.meta.dir, '..'); -const MIGRATION = path.join(ROOT, 'gstack-upgrade', 'migrations', 'v1.1.2.0.sh'); +const MIGRATION = path.join(ROOT, 'gstack-upgrade', 'migrations', 'v1.1.3.0.sh'); function runMigration(tmpHome: string): { exitCode: number; stdout: string; stderr: string } { const result = spawnSync('bash', [MIGRATION], { @@ -28,7 +28,7 @@ function setupFakeGstackRoot(tmpHome: string): string { return gstackDir; } -describe('migration v1.1.2.0 — checkpoint ownership guard', () => { +describe('migration v1.1.3.0 — checkpoint ownership guard', () => { let tmpHome: string; beforeEach(() => { diff --git a/test/skill-e2e-office-hours.test.ts b/test/skill-e2e-office-hours.test.ts new file mode 100644 index 00000000..b5f4f6b1 --- /dev/null +++ b/test/skill-e2e-office-hours.test.ts @@ -0,0 +1,173 @@ +/** + * E2E tests for /office-hours mode-posture regression (V1.1 gate). + * + * Exercises startup mode Q3 (forcing energy) and builder mode (generative wildness). + * Both cases detect whether preamble Writing Style rules have flattened the + * skill's distinctive posture at runtime. + * + * Judge: Sonnet via judgePosture() — cheap per-call. + * Generator: whatever the skill runs with (Sonnet for office-hours). + */ + +import { describe, test, expect, beforeAll, afterAll } from 'bun:test'; +import { runSkillTest } from './helpers/session-runner'; +import { + ROOT, browseBin, runId, evalsEnabled, + describeIfSelected, testConcurrentIfSelected, + logCost, recordE2E, + createEvalCollector, finalizeEvalCollector, +} from './helpers/e2e-helpers'; +import { judgePosture } from './helpers/llm-judge'; +import { spawnSync } from 'child_process'; +import * as fs from 'fs'; +import * as path from 'path'; +import * as os from 'os'; + +const evalCollector = createEvalCollector('e2e-office-hours'); + +// --- Office Hours forcing-question energy (Q3 Desperate Specificity) --- + +describeIfSelected('Office Hours Forcing Energy E2E', ['office-hours-forcing-energy'], () => { + let workDir: string; + + beforeAll(() => { + workDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-office-hours-forcing-')); + const run = (cmd: string, args: string[]) => + spawnSync(cmd, args, { cwd: workDir, stdio: 'pipe', timeout: 5000 }); + + run('git', ['init', '-b', 'main']); + run('git', ['config', 'user.email', 'test@test.com']); + run('git', ['config', 'user.name', 'Test']); + + const pitch = fs.readFileSync( + path.join(ROOT, 'test', 'fixtures', 'mode-posture', 'forcing-pitch.md'), + 'utf-8', + ); + fs.writeFileSync(path.join(workDir, 'pitch.md'), pitch); + + run('git', ['add', '.']); + run('git', ['commit', '-m', 'add pitch']); + + fs.mkdirSync(path.join(workDir, 'office-hours'), { recursive: true }); + fs.copyFileSync( + path.join(ROOT, 'office-hours', 'SKILL.md'), + path.join(workDir, 'office-hours', 'SKILL.md'), + ); + }); + + afterAll(() => { + try { fs.rmSync(workDir, { recursive: true, force: true }); } catch {} + }); + + testConcurrentIfSelected('office-hours-forcing-energy', async () => { + const result = await runSkillTest({ + prompt: `Read office-hours/SKILL.md for the workflow. + +Read pitch.md — that's the founder pitch the user is bringing to office hours. Select Startup Mode. Skip any AskUserQuestion — this is non-interactive. + +Assume the founder has already answered Q1 (strongest evidence = "got on a waitlist of about 40 signups from LinkedIn posts") and Q2 (status quo = "PMs use Notion docs + lots of Zoom summaries by hand"). Jump directly to Q3 Desperate Specificity. + +Write Q3 output — the forcing question you would ask this founder — to ${workDir}/q3.md. Write ONLY the question prose. No conversational wrapper, no meta-commentary, no Q1/Q2 recap.`, + workingDirectory: workDir, + maxTurns: 8, + timeout: 240_000, + testName: 'office-hours-forcing-energy', + runId, + model: 'claude-sonnet-4-6', + }); + + logCost('/office-hours (FORCING)', result); + recordE2E(evalCollector, '/office-hours-forcing-energy', 'Office Hours Forcing Energy E2E', result, { + passed: ['success', 'error_max_turns'].includes(result.exitReason), + }); + expect(['success', 'error_max_turns']).toContain(result.exitReason); + + const q3Path = path.join(workDir, 'q3.md'); + if (!fs.existsSync(q3Path)) { + throw new Error('Agent did not emit q3.md — forcing energy eval requires Q3 output'); + } + const q3Text = fs.readFileSync(q3Path, 'utf-8'); + expect(q3Text.length).toBeGreaterThan(80); + + const scores = await judgePosture('forcing', q3Text); + console.log('Forcing energy scores:', JSON.stringify(scores, null, 2)); + expect(scores.axis_a).toBeGreaterThanOrEqual(4); // stacking_preserved + expect(scores.axis_b).toBeGreaterThanOrEqual(4); // domain_matched_consequence + }, 360_000); +}); + +// --- Office Hours builder-mode wildness --- + +describeIfSelected('Office Hours Builder Wildness E2E', ['office-hours-builder-wildness'], () => { + let workDir: string; + + beforeAll(() => { + workDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-office-hours-builder-')); + const run = (cmd: string, args: string[]) => + spawnSync(cmd, args, { cwd: workDir, stdio: 'pipe', timeout: 5000 }); + + run('git', ['init', '-b', 'main']); + run('git', ['config', 'user.email', 'test@test.com']); + run('git', ['config', 'user.name', 'Test']); + + const idea = fs.readFileSync( + path.join(ROOT, 'test', 'fixtures', 'mode-posture', 'builder-idea.md'), + 'utf-8', + ); + fs.writeFileSync(path.join(workDir, 'idea.md'), idea); + + run('git', ['add', '.']); + run('git', ['commit', '-m', 'add idea']); + + fs.mkdirSync(path.join(workDir, 'office-hours'), { recursive: true }); + fs.copyFileSync( + path.join(ROOT, 'office-hours', 'SKILL.md'), + path.join(workDir, 'office-hours', 'SKILL.md'), + ); + }); + + afterAll(() => { + try { fs.rmSync(workDir, { recursive: true, force: true }); } catch {} + }); + + testConcurrentIfSelected('office-hours-builder-wildness', async () => { + const result = await runSkillTest({ + prompt: `Read office-hours/SKILL.md for the workflow. + +Read idea.md — that's the user's weekend project idea. Select Builder Mode (Phase 2B). Skip any AskUserQuestion — this is non-interactive. + +The user has confirmed the basic idea is "TypeScript + D3 web tool, start with JS/TS dependency graphs." They are now asking: "What are three adjacent unlocks I haven't mentioned yet — things that would turn this from a tool I used into something I'd show a friend?" + +Write your response — the three adjacent unlocks — to ${workDir}/unlocks.md. Write ONLY the response prose. No meta-commentary, no mode recap. Lead with the fun; let me edit it down later.`, + workingDirectory: workDir, + maxTurns: 8, + timeout: 240_000, + testName: 'office-hours-builder-wildness', + runId, + model: 'claude-sonnet-4-6', + }); + + logCost('/office-hours (BUILDER)', result); + recordE2E(evalCollector, '/office-hours-builder-wildness', 'Office Hours Builder Wildness E2E', result, { + passed: ['success', 'error_max_turns'].includes(result.exitReason), + }); + expect(['success', 'error_max_turns']).toContain(result.exitReason); + + const unlocksPath = path.join(workDir, 'unlocks.md'); + if (!fs.existsSync(unlocksPath)) { + throw new Error('Agent did not emit unlocks.md — builder wildness eval requires output'); + } + const unlocksText = fs.readFileSync(unlocksPath, 'utf-8'); + expect(unlocksText.length).toBeGreaterThan(200); + + const scores = await judgePosture('builder', unlocksText); + console.log('Builder wildness scores:', JSON.stringify(scores, null, 2)); + expect(scores.axis_a).toBeGreaterThanOrEqual(4); // unexpected_combinations + expect(scores.axis_b).toBeGreaterThanOrEqual(4); // excitement_over_optimization + }, 360_000); +}); + +// Finalize eval collector for this file +if (evalsEnabled) { + finalizeEvalCollector(evalCollector); +} diff --git a/test/skill-e2e-plan.test.ts b/test/skill-e2e-plan.test.ts index 8953200b..269c889c 100644 --- a/test/skill-e2e-plan.test.ts +++ b/test/skill-e2e-plan.test.ts @@ -6,6 +6,7 @@ import { copyDirSync, setupBrowseShims, logCost, recordE2E, createEvalCollector, finalizeEvalCollector, } from './helpers/e2e-helpers'; +import { judgePosture } from './helpers/llm-judge'; import { spawnSync } from 'child_process'; import * as fs from 'fs'; import * as path from 'path'; @@ -183,6 +184,79 @@ Focus on reviewing the plan content: architecture, error handling, security, and }, 420_000); }); +// --- Plan CEO Review SCOPE EXPANSION energy (V1.1 mode-posture regression gate) --- + +describeIfSelected('Plan CEO Review Expansion Energy E2E', ['plan-ceo-review-expansion-energy'], () => { + let planDir: string; + + beforeAll(() => { + planDir = fs.mkdtempSync(path.join(os.tmpdir(), 'skill-e2e-plan-ceo-exp-')); + const run = (cmd: string, args: string[]) => + spawnSync(cmd, args, { cwd: planDir, stdio: 'pipe', timeout: 5000 }); + + run('git', ['init', '-b', 'main']); + run('git', ['config', 'user.email', 'test@test.com']); + run('git', ['config', 'user.name', 'Test']); + + // Use the shared fixture so expansion-energy regressions are reproducible. + const fixture = fs.readFileSync( + path.join(ROOT, 'test', 'fixtures', 'mode-posture', 'expansion-plan.md'), + 'utf-8', + ); + fs.writeFileSync(path.join(planDir, 'plan.md'), fixture); + + run('git', ['add', '.']); + run('git', ['commit', '-m', 'add plan']); + + fs.mkdirSync(path.join(planDir, 'plan-ceo-review'), { recursive: true }); + fs.copyFileSync( + path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), + path.join(planDir, 'plan-ceo-review', 'SKILL.md'), + ); + }); + + afterAll(() => { + try { fs.rmSync(planDir, { recursive: true, force: true }); } catch {} + }); + + testConcurrentIfSelected('plan-ceo-review-expansion-energy', async () => { + const result = await runSkillTest({ + prompt: `Read plan-ceo-review/SKILL.md for the review workflow. + +Read plan.md — that's the plan to review. This is a standalone plan document, not a codebase — skip any codebase exploration or system audit steps. + +Choose SCOPE EXPANSION mode. Skip any AskUserQuestion calls — this is non-interactive. Auto-approve the ideal-architecture approach in 0C-bis. For 0D, run all three analyses (10x check, platonic ideal, delight opportunities), then emit exactly 2 concrete expansion proposals in the opt-in ceremony. + +Write your expansion proposals to ${planDir}/proposals.md with ONLY the proposal text — no conversational wrapper, no review summary, no mode analysis. Each proposal separated by "---".`, + workingDirectory: planDir, + maxTurns: 15, + timeout: 360_000, + testName: 'plan-ceo-review-expansion-energy', + runId, + model: 'claude-opus-4-6', + }); + + logCost('/plan-ceo-review (EXPANSION ENERGY)', result); + recordE2E(evalCollector, '/plan-ceo-review-expansion-energy', 'Plan CEO Review Expansion Energy E2E', result, { + passed: ['success', 'error_max_turns'].includes(result.exitReason), + }); + expect(['success', 'error_max_turns']).toContain(result.exitReason); + + const proposalsPath = path.join(planDir, 'proposals.md'); + if (!fs.existsSync(proposalsPath)) { + throw new Error('Agent did not emit proposals.md — expansion energy eval requires proposal output'); + } + const proposalText = fs.readFileSync(proposalsPath, 'utf-8'); + expect(proposalText.length).toBeGreaterThan(200); + + const scores = await judgePosture('expansion', proposalText); + console.log('Expansion energy scores:', JSON.stringify(scores, null, 2)); + // Pass threshold: 4/5 on both axes (good — matches posture with minor weakness). + expect(scores.axis_a).toBeGreaterThanOrEqual(4); // surface_framing + expect(scores.axis_b).toBeGreaterThanOrEqual(4); // decision_preservation + }, 600_000); +}); + // --- Plan Eng Review E2E --- describeIfSelected('Plan Eng Review E2E', ['plan-eng-review'], () => { diff --git a/test/touchfiles.test.ts b/test/touchfiles.test.ts index d4aee202..4ee23a18 100644 --- a/test/touchfiles.test.ts +++ b/test/touchfiles.test.ts @@ -80,10 +80,11 @@ describe('selectTests', () => { expect(result.selected).toContain('plan-ceo-review'); expect(result.selected).toContain('plan-ceo-review-selective'); expect(result.selected).toContain('plan-ceo-review-benefits'); + expect(result.selected).toContain('plan-ceo-review-expansion-energy'); expect(result.selected).toContain('autoplan-core'); expect(result.selected).toContain('codex-offered-ceo-review'); - expect(result.selected.length).toBe(5); - expect(result.skipped.length).toBe(Object.keys(E2E_TOUCHFILES).length - 5); + expect(result.selected.length).toBe(6); + expect(result.skipped.length).toBe(Object.keys(E2E_TOUCHFILES).length - 6); }); test('global touchfile triggers ALL tests', () => {