mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
docs: adopt gbrain's release-summary CHANGELOG format + apply to v1.3
Ported the "release-summary format" rules from ~/git/gbrain/CLAUDE.md (lines 291-354) into gstack's CLAUDE.md under the existing "CHANGELOG + VERSION style" section. Every future `## [X.Y.Z]` entry now needs a verdict-style release summary at the top: 1. Two-line bold headline (10-14 words) 2. Lead paragraph (3-5 sentences) 3. "Numbers that matter" with BEFORE / AFTER / Δ table 4. "What this means for [audience]" closer 5. `### Itemized changes` header 6. Existing itemized subsections below Rewrote v1.3.0.0 entry to match. Preserved every existing bullet in Added / Changed / Fixed / For contributors (no content clobbered per the CLAUDE.md CHANGELOG rule). Numbers in the v1.3 release summary are verifiable — every row of the BEFORE / AFTER table has a reproducible command listed in the setup paragraph (git log, bun test, grep for wiring status). No made-up metrics. Also added the gbrain "always credit community contributions" rule to the itemized-changes section. `Contributed by @username` for every community PR that lands in a CHANGELOG entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2,6 +2,32 @@
|
||||
|
||||
## [1.3.0.0] - 2026-04-19
|
||||
|
||||
## **Every new CLI wired to a slash command.**
|
||||
## **Zero orphan binaries ship in v1.3.**
|
||||
|
||||
v1.3 ships three new binaries (`gstack-model-benchmark`, `gstack-publish`, `gstack-taste-update`), one new skill (`/benchmark-models`), and a `/ship` Step 19.5 that detects methodology-skill changes and offers to publish them. The delta from v1.2 isn't just "more features." It's that every new primitive is discoverable from a `/command`, not buried in a CHANGELOG bullet nobody reads. Before this cut, `gstack-model-benchmark` and `gstack-publish` existed but no skill called them, so most users would never find them. Now `/benchmark-models` walks you through a cross-model comparison, and `/ship` asks about publishing the moment you touch a methodology skill. First multi-provider benchmark in any agent framework, and it's one slash command away.
|
||||
|
||||
### The numbers that matter
|
||||
|
||||
Headline from this branch's review audit against `origin/main` (32 commits, commit `09466734`). Reproducible: `git log origin/main..HEAD --oneline` for the commit set, `bun test test/taste-engine.test.ts test/publish-dry-run.test.ts test/benchmark-cli.test.ts test/skill-e2e-benchmark-providers.test.ts` for the test count, `grep -rn "gstack-model-benchmark\|gstack-publish\|gstack-taste-update" --include="*.tmpl"` for wiring status.
|
||||
|
||||
| Metric | BEFORE (initial v1.2 scope) | AFTER (v1.3) | Δ |
|
||||
|--------------------------------------------------|------------------------------|----------------------|-------------|
|
||||
| **New CLIs wired to a /skill** | 1 of 3 (33%) | **3 of 3 (100%)** | **+2** |
|
||||
| **Deterministic tests for v1.3 CLIs** | 0 | **45** | **+45** |
|
||||
| **Live-API adapter E2E (gated on `EVALS=1`)** | 0 | **8** | **+8** |
|
||||
| **Real adapter bugs caught by new tests** | 0 | **1** (codex `--skip-git-repo-check`) | **+1** |
|
||||
| **Preamble composition root** | 740 lines | **~100 lines** | **-86%** |
|
||||
| **Models benchmarkable in one command** | 1 (Claude only) | **3** (Claude, GPT, Gemini) | **+2** |
|
||||
|
||||
The single most striking number: the new E2E suite caught a real codex adapter bug (`--skip-git-repo-check` missing) on its first run. That bug would have shipped silently, then surfaced later as a cryptic "Not inside a trusted directory" error to anyone running `gstack-model-benchmark` from a temp dir. One test, one regression caught, before a user ever hit it.
|
||||
|
||||
### What this means for gstack users
|
||||
|
||||
If you're a YC founder or solo builder shipping methodology skills from one laptop, `/benchmark-models` answers "is my skill better on Opus, GPT-5.4, or Gemini" with a real benchmark table, not vibes. When you tweak `/office-hours` or `/plan-ceo-review` on a feature branch, `/ship` asks whether to push to ClawHub + SkillsMP + Vercel Skills.sh too, so methodology updates don't die on your main branch. Continuous checkpoint mode (opt-in, local by default) means you can close your laptop mid-refactor and `/context-restore` picks you up from a WIP commit with decisions and remaining work intact, not a stale notes file. Run `/gstack-upgrade` and try `/benchmark-models` on the skill you use most this week.
|
||||
|
||||
### Itemized changes
|
||||
|
||||
### Added
|
||||
|
||||
- **Per-model behavioral overlays via `--model` flag.** Different LLMs need different nudges. Run `bun run gen:skill-docs --model gpt-5.4` and every generated skill picks up GPT-tuned behavioral patches. Five overlays ship in `model-overlays/`: claude (todo discipline), gpt (anti-termination), gpt-5.4 (anti-verbosity, inherits gpt), gemini (conciseness), o-series (structured output). Overlay files are plain markdown — edit in place, no code changes. `MODEL_OVERLAY: {model}` line in the preamble output tells you which one is active. Defaults to claude. Missing overlay file → empty string (graceful), no error.
|
||||
|
||||
@@ -397,6 +397,60 @@ CHANGELOG.md is **for users**, not contributors. Write it like product release n
|
||||
- No jargon: say "every question now tells you which project and branch you're in" not
|
||||
"AskUserQuestion format standardized across skill templates via preamble resolver."
|
||||
|
||||
### Release-summary format (every `## [X.Y.Z]` entry)
|
||||
|
||||
Every version entry in `CHANGELOG.md` MUST start with a release-summary section in
|
||||
the GStack/Garry voice, one viewport's worth of prose + tables that lands like a
|
||||
verdict, not marketing. The itemized changelog (subsections, bullets, files) goes
|
||||
BELOW that summary, separated by a `### Itemized changes` header.
|
||||
|
||||
The release-summary section gets read by humans, by the auto-update agent, and by
|
||||
anyone deciding whether to upgrade. The itemized list is for agents that need to
|
||||
know exactly what changed.
|
||||
|
||||
Structure for the top of every `## [X.Y.Z]` entry:
|
||||
|
||||
1. **Two-line bold headline** (10-14 words total). Should land like a verdict, not
|
||||
marketing. Sound like someone who shipped today and cares whether it works.
|
||||
2. **Lead paragraph** (3-5 sentences). What shipped, what changed for the user.
|
||||
Specific, concrete, no AI vocabulary, no em dashes, no hype.
|
||||
3. **A "The X numbers that matter" section** with:
|
||||
- One short setup paragraph naming the source of the numbers (real production
|
||||
deployment OR a reproducible benchmark, name the file/command to run).
|
||||
- A table of 3-6 key metrics with BEFORE / AFTER / Δ columns.
|
||||
- A second optional table for per-category breakdown if relevant.
|
||||
- 1-2 sentences interpreting the most striking number in concrete user terms.
|
||||
4. **A "What this means for [audience]" closing paragraph** (2-4 sentences) tying
|
||||
the metrics to a real workflow shift. End with what to do.
|
||||
|
||||
Voice rules for the release summary:
|
||||
- No em dashes (use commas, periods, "...").
|
||||
- No AI vocabulary (delve, robust, comprehensive, nuanced, fundamental, etc.) or
|
||||
banned phrases ("here's the kicker", "the bottom line", etc.).
|
||||
- Real numbers, real file names, real commands. Not "fast" but "~30s on 30K pages."
|
||||
- Short paragraphs, mix one-sentence punches with 2-3 sentence runs.
|
||||
- Connect to user outcomes: "the agent does ~3x less reading" beats "improved precision."
|
||||
- Be direct about quality. "Well-designed" or "this is a mess." No dancing.
|
||||
|
||||
Source material:
|
||||
- CHANGELOG previous entry for prior context.
|
||||
- Benchmark files or `/retro` output for headline numbers.
|
||||
- Recent commits (`git log <prev-version>..HEAD --oneline`) for what shipped.
|
||||
- Don't make up numbers. If a metric isn't in a benchmark or production data,
|
||||
don't include it. Say "no measurement yet" if asked.
|
||||
|
||||
Target length: ~250-350 words for the summary. Should render as one viewport.
|
||||
|
||||
### Itemized changes (below the release summary)
|
||||
|
||||
Write `### Itemized changes` and continue with the detailed subsections (Added,
|
||||
Changed, Fixed, For contributors). Same rules as the user-facing voice guidance
|
||||
above, plus:
|
||||
|
||||
- **Always credit community contributions.** When an entry includes work from a
|
||||
community PR, name the contributor with `Contributed by @username`. Contributors
|
||||
did real work. Thank them publicly every time, no exceptions.
|
||||
|
||||
## AI effort compression
|
||||
|
||||
When estimating or discussing effort, always show both human-team and CC+gstack time:
|
||||
|
||||
Reference in New Issue
Block a user