From 08486bbf8bd5790bf5ed26fc6c7c90257619b811 Mon Sep 17 00:00:00 2001
From: Garry Tan <garrytan@gmail.com>
Date: Sun, 19 Apr 2026 08:56:45 +0800
Subject: [PATCH] docs: adopt gbrain's release-summary CHANGELOG format + apply
 to v1.3
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Ported the "release-summary format" rules from ~/git/gbrain/CLAUDE.md
(lines 291-354) into gstack's CLAUDE.md under the existing
"CHANGELOG + VERSION style" section. Every future `## [X.Y.Z]` entry
now needs a verdict-style release summary at the top:
1. Two-line bold headline (10-14 words)
2. Lead paragraph (3-5 sentences)
3. "Numbers that matter" with BEFORE / AFTER / Δ table
4. "What this means for [audience]" closer
5. `### Itemized changes` header
6. Existing itemized subsections below

Rewrote v1.3.0.0 entry to match. Preserved every existing bullet in
Added / Changed / Fixed / For contributors (no content clobbered per
the CLAUDE.md CHANGELOG rule).

Numbers in the v1.3 release summary are verifiable — every row of the
BEFORE / AFTER table has a reproducible command listed in the setup
paragraph (git log, bun test, grep for wiring status). No made-up
metrics.

Also added the gbrain "always credit community contributions" rule to
the itemized-changes section. `Contributed by @username` for every
community PR that lands in a CHANGELOG entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md | 26 +++++++++++++++++++++++++
 CLAUDE.md    | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 7349a511..53a9e580 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,32 @@
 
 ## [1.3.0.0] - 2026-04-19
 
+## **Every new CLI wired to a slash command.**
+## **Zero orphan binaries ship in v1.3.**
+
+v1.3 ships three new binaries (`gstack-model-benchmark`, `gstack-publish`, `gstack-taste-update`), one new skill (`/benchmark-models`), and a `/ship` Step 19.5 that detects methodology-skill changes and offers to publish them. The delta from v1.2 isn't just "more features." It's that every new primitive is discoverable from a `/command`, not buried in a CHANGELOG bullet nobody reads. Before this cut, `gstack-model-benchmark` and `gstack-publish` existed but no skill called them, so most users would never find them. Now `/benchmark-models` walks you through a cross-model comparison, and `/ship` asks about publishing the moment you touch a methodology skill. First multi-provider benchmark in any agent framework, and it's one slash command away.
+
+### The numbers that matter
+
+Headline from this branch's review audit against `origin/main` (32 commits, commit `09466734`). Reproducible: `git log origin/main..HEAD --oneline` for the commit set, `bun test test/taste-engine.test.ts test/publish-dry-run.test.ts test/benchmark-cli.test.ts test/skill-e2e-benchmark-providers.test.ts` for the test count, `grep -rn "gstack-model-benchmark\|gstack-publish\|gstack-taste-update" --include="*.tmpl"` for wiring status.
+
+| Metric                                           | BEFORE (initial v1.2 scope) | AFTER (v1.3)         | Δ           |
+|--------------------------------------------------|------------------------------|----------------------|-------------|
+| **New CLIs wired to a /skill**                   | 1 of 3 (33%)                 | **3 of 3 (100%)**    | **+2**      |
+| **Deterministic tests for v1.3 CLIs**            | 0                            | **45**               | **+45**     |
+| **Live-API adapter E2E (gated on `EVALS=1`)**    | 0                            | **8**                | **+8**      |
+| **Real adapter bugs caught by new tests**        | 0                            | **1** (codex `--skip-git-repo-check`) | **+1**      |
+| **Preamble composition root**                    | 740 lines                    | **~100 lines**       | **-86%**    |
+| **Models benchmarkable in one command**          | 1 (Claude only)              | **3** (Claude, GPT, Gemini) | **+2** |
+
+The single most striking number: the new E2E suite caught a real codex adapter bug (`--skip-git-repo-check` missing) on its first run. That bug would have shipped silently, then surfaced later as a cryptic "Not inside a trusted directory" error to anyone running `gstack-model-benchmark` from a temp dir. One test, one regression caught, before a user ever hit it.
+
+### What this means for gstack users
+
+If you're a YC founder or solo builder shipping methodology skills from one laptop, `/benchmark-models` answers "is my skill better on Opus, GPT-5.4, or Gemini" with a real benchmark table, not vibes. When you tweak `/office-hours` or `/plan-ceo-review` on a feature branch, `/ship` asks whether to push to ClawHub + SkillsMP + Vercel Skills.sh too, so methodology updates don't die on your main branch. Continuous checkpoint mode (opt-in, local by default) means you can close your laptop mid-refactor and `/context-restore` picks you up from a WIP commit with decisions and remaining work intact, not a stale notes file. Run `/gstack-upgrade` and try `/benchmark-models` on the skill you use most this week.
+
+### Itemized changes
+
 ### Added
 
 - **Per-model behavioral overlays via `--model` flag.** Different LLMs need different nudges. Run `bun run gen:skill-docs --model gpt-5.4` and every generated skill picks up GPT-tuned behavioral patches. Five overlays ship in `model-overlays/`: claude (todo discipline), gpt (anti-termination), gpt-5.4 (anti-verbosity, inherits gpt), gemini (conciseness), o-series (structured output). Overlay files are plain markdown — edit in place, no code changes. `MODEL_OVERLAY: {model}` line in the preamble output tells you which one is active. Defaults to claude. Missing overlay file → empty string (graceful), no error.
diff --git a/CLAUDE.md b/CLAUDE.md
index b12d39dc..1939c67d 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -397,6 +397,60 @@ CHANGELOG.md is **for users**, not contributors. Write it like product release n
 - No jargon: say "every question now tells you which project and branch you're in" not
   "AskUserQuestion format standardized across skill templates via preamble resolver."
 
+### Release-summary format (every `## [X.Y.Z]` entry)
+
+Every version entry in `CHANGELOG.md` MUST start with a release-summary section in
+the GStack/Garry voice, one viewport's worth of prose + tables that lands like a
+verdict, not marketing. The itemized changelog (subsections, bullets, files) goes
+BELOW that summary, separated by a `### Itemized changes` header.
+
+The release-summary section gets read by humans, by the auto-update agent, and by
+anyone deciding whether to upgrade. The itemized list is for agents that need to
+know exactly what changed.
+
+Structure for the top of every `## [X.Y.Z]` entry:
+
+1. **Two-line bold headline** (10-14 words total). Should land like a verdict, not
+   marketing. Sound like someone who shipped today and cares whether it works.
+2. **Lead paragraph** (3-5 sentences). What shipped, what changed for the user.
+   Specific, concrete, no AI vocabulary, no em dashes, no hype.
+3. **A "The X numbers that matter" section** with:
+   - One short setup paragraph naming the source of the numbers (real production
+     deployment OR a reproducible benchmark, name the file/command to run).
+   - A table of 3-6 key metrics with BEFORE / AFTER / Δ columns.
+   - A second optional table for per-category breakdown if relevant.
+   - 1-2 sentences interpreting the most striking number in concrete user terms.
+4. **A "What this means for [audience]" closing paragraph** (2-4 sentences) tying
+   the metrics to a real workflow shift. End with what to do.
+
+Voice rules for the release summary:
+- No em dashes (use commas, periods, "...").
+- No AI vocabulary (delve, robust, comprehensive, nuanced, fundamental, etc.) or
+  banned phrases ("here's the kicker", "the bottom line", etc.).
+- Real numbers, real file names, real commands. Not "fast" but "~30s on 30K pages."
+- Short paragraphs, mix one-sentence punches with 2-3 sentence runs.
+- Connect to user outcomes: "the agent does ~3x less reading" beats "improved precision."
+- Be direct about quality. "Well-designed" or "this is a mess." No dancing.
+
+Source material:
+- CHANGELOG previous entry for prior context.
+- Benchmark files or `/retro` output for headline numbers.
+- Recent commits (`git log <prev-version>..HEAD --oneline`) for what shipped.
+- Don't make up numbers. If a metric isn't in a benchmark or production data,
+  don't include it. Say "no measurement yet" if asked.
+
+Target length: ~250-350 words for the summary. Should render as one viewport.
+
+### Itemized changes (below the release summary)
+
+Write `### Itemized changes` and continue with the detailed subsections (Added,
+Changed, Fixed, For contributors). Same rules as the user-facing voice guidance
+above, plus:
+
+- **Always credit community contributions.** When an entry includes work from a
+  community PR, name the contributor with `Contributed by @username`. Contributors
+  did real work. Thank them publicly every time, no exceptions.
+
 ## AI effort compression
 
 When estimating or discussing effort, always show both human-team and CC+gstack time: