docs: adopt gbrain's release-summary CHANGELOG format + apply to v1.3

Ported the "release-summary format" rules from ~/git/gbrain/CLAUDE.md
(lines 291-354) into gstack's CLAUDE.md under the existing
"CHANGELOG + VERSION style" section. Every future `## [X.Y.Z]` entry
now needs a verdict-style release summary at the top:
1. Two-line bold headline (10-14 words)
2. Lead paragraph (3-5 sentences)
3. "Numbers that matter" with BEFORE / AFTER / Δ table
4. "What this means for [audience]" closer
5. `### Itemized changes` header
6. Existing itemized subsections below

Rewrote v1.3.0.0 entry to match. Preserved every existing bullet in
Added / Changed / Fixed / For contributors (no content clobbered per
the CLAUDE.md CHANGELOG rule).

Numbers in the v1.3 release summary are verifiable — every row of the
BEFORE / AFTER table has a reproducible command listed in the setup
paragraph (git log, bun test, grep for wiring status). No made-up
metrics.

Also added the gbrain "always credit community contributions" rule to
the itemized-changes section. `Contributed by @username` for every
community PR that lands in a CHANGELOG entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-04-19 08:56:45 +08:00
parent 0946673445
commit 08486bbf8b
2 changed files with 80 additions and 0 deletions
+26
View File
@@ -2,6 +2,32 @@
## [1.3.0.0] - 2026-04-19
## **Every new CLI wired to a slash command.**
## **Zero orphan binaries ship in v1.3.**
v1.3 ships three new binaries (`gstack-model-benchmark`, `gstack-publish`, `gstack-taste-update`), one new skill (`/benchmark-models`), and a `/ship` Step 19.5 that detects methodology-skill changes and offers to publish them. The delta from v1.2 isn't just "more features." It's that every new primitive is discoverable from a `/command`, not buried in a CHANGELOG bullet nobody reads. Before this cut, `gstack-model-benchmark` and `gstack-publish` existed but no skill called them, so most users would never find them. Now `/benchmark-models` walks you through a cross-model comparison, and `/ship` asks about publishing the moment you touch a methodology skill. First multi-provider benchmark in any agent framework, and it's one slash command away.
### The numbers that matter
Headline from this branch's review audit against `origin/main` (32 commits, commit `09466734`). Reproducible: `git log origin/main..HEAD --oneline` for the commit set, `bun test test/taste-engine.test.ts test/publish-dry-run.test.ts test/benchmark-cli.test.ts test/skill-e2e-benchmark-providers.test.ts` for the test count, `grep -rn "gstack-model-benchmark\|gstack-publish\|gstack-taste-update" --include="*.tmpl"` for wiring status.
| Metric | BEFORE (initial v1.2 scope) | AFTER (v1.3) | Δ |
|--------------------------------------------------|------------------------------|----------------------|-------------|
| **New CLIs wired to a /skill** | 1 of 3 (33%) | **3 of 3 (100%)** | **+2** |
| **Deterministic tests for v1.3 CLIs** | 0 | **45** | **+45** |
| **Live-API adapter E2E (gated on `EVALS=1`)** | 0 | **8** | **+8** |
| **Real adapter bugs caught by new tests** | 0 | **1** (codex `--skip-git-repo-check`) | **+1** |
| **Preamble composition root** | 740 lines | **~100 lines** | **-86%** |
| **Models benchmarkable in one command** | 1 (Claude only) | **3** (Claude, GPT, Gemini) | **+2** |
The single most striking number: the new E2E suite caught a real codex adapter bug (`--skip-git-repo-check` missing) on its first run. That bug would have shipped silently, then surfaced later as a cryptic "Not inside a trusted directory" error to anyone running `gstack-model-benchmark` from a temp dir. One test, one regression caught, before a user ever hit it.
### What this means for gstack users
If you're a YC founder or solo builder shipping methodology skills from one laptop, `/benchmark-models` answers "is my skill better on Opus, GPT-5.4, or Gemini" with a real benchmark table, not vibes. When you tweak `/office-hours` or `/plan-ceo-review` on a feature branch, `/ship` asks whether to push to ClawHub + SkillsMP + Vercel Skills.sh too, so methodology updates don't die on your main branch. Continuous checkpoint mode (opt-in, local by default) means you can close your laptop mid-refactor and `/context-restore` picks you up from a WIP commit with decisions and remaining work intact, not a stale notes file. Run `/gstack-upgrade` and try `/benchmark-models` on the skill you use most this week.
### Itemized changes
### Added
- **Per-model behavioral overlays via `--model` flag.** Different LLMs need different nudges. Run `bun run gen:skill-docs --model gpt-5.4` and every generated skill picks up GPT-tuned behavioral patches. Five overlays ship in `model-overlays/`: claude (todo discipline), gpt (anti-termination), gpt-5.4 (anti-verbosity, inherits gpt), gemini (conciseness), o-series (structured output). Overlay files are plain markdown — edit in place, no code changes. `MODEL_OVERLAY: {model}` line in the preamble output tells you which one is active. Defaults to claude. Missing overlay file → empty string (graceful), no error.
+54
View File
@@ -397,6 +397,60 @@ CHANGELOG.md is **for users**, not contributors. Write it like product release n
- No jargon: say "every question now tells you which project and branch you're in" not
"AskUserQuestion format standardized across skill templates via preamble resolver."
### Release-summary format (every `## [X.Y.Z]` entry)
Every version entry in `CHANGELOG.md` MUST start with a release-summary section in
the GStack/Garry voice, one viewport's worth of prose + tables that lands like a
verdict, not marketing. The itemized changelog (subsections, bullets, files) goes
BELOW that summary, separated by a `### Itemized changes` header.
The release-summary section gets read by humans, by the auto-update agent, and by
anyone deciding whether to upgrade. The itemized list is for agents that need to
know exactly what changed.
Structure for the top of every `## [X.Y.Z]` entry:
1. **Two-line bold headline** (10-14 words total). Should land like a verdict, not
marketing. Sound like someone who shipped today and cares whether it works.
2. **Lead paragraph** (3-5 sentences). What shipped, what changed for the user.
Specific, concrete, no AI vocabulary, no em dashes, no hype.
3. **A "The X numbers that matter" section** with:
- One short setup paragraph naming the source of the numbers (real production
deployment OR a reproducible benchmark, name the file/command to run).
- A table of 3-6 key metrics with BEFORE / AFTER / Δ columns.
- A second optional table for per-category breakdown if relevant.
- 1-2 sentences interpreting the most striking number in concrete user terms.
4. **A "What this means for [audience]" closing paragraph** (2-4 sentences) tying
the metrics to a real workflow shift. End with what to do.
Voice rules for the release summary:
- No em dashes (use commas, periods, "...").
- No AI vocabulary (delve, robust, comprehensive, nuanced, fundamental, etc.) or
banned phrases ("here's the kicker", "the bottom line", etc.).
- Real numbers, real file names, real commands. Not "fast" but "~30s on 30K pages."
- Short paragraphs, mix one-sentence punches with 2-3 sentence runs.
- Connect to user outcomes: "the agent does ~3x less reading" beats "improved precision."
- Be direct about quality. "Well-designed" or "this is a mess." No dancing.
Source material:
- CHANGELOG previous entry for prior context.
- Benchmark files or `/retro` output for headline numbers.
- Recent commits (`git log <prev-version>..HEAD --oneline`) for what shipped.
- Don't make up numbers. If a metric isn't in a benchmark or production data,
don't include it. Say "no measurement yet" if asked.
Target length: ~250-350 words for the summary. Should render as one viewport.
### Itemized changes (below the release summary)
Write `### Itemized changes` and continue with the detailed subsections (Added,
Changed, Fixed, For contributors). Same rules as the user-facing voice guidance
above, plus:
- **Always credit community contributions.** When an entry includes work from a
community PR, name the contributor with `Contributed by @username`. Contributors
did real work. Thank them publicly every time, no exceptions.
## AI effort compression
When estimating or discussing effort, always show both human-team and CC+gstack time: