From 09466734459437a6e4e13df3cb4a443488d8312d Mon Sep 17 00:00:00 2001
From: Garry Tan <garrytan@gmail.com>
Date: Sun, 19 Apr 2026 08:48:44 +0800
Subject: [PATCH] =?UTF-8?q?docs:=20v1.3.0.0=20=E2=80=94=20complete=20CHANG?=
 =?UTF-8?q?ELOG=20+=20bump=20for=20post-1.2=20scope=20additions?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

VERSION 1.2.0.0 → 1.3.0.0. The original 1.2 entry was written before I
added substantial new scope: the /benchmark-models skill, /ship Step 19.5
gstack-publish integration, --dry-run on gstack-model-benchmark, and the
lite E2E test coverage (4 new test files). A minor bump gives those
changes their own version line instead of silently folding them into
1.2's scope.

CHANGELOG additions under 1.3.0.0:
- /benchmark-models skill (new Added)
- /ship Step 19.5 publish check (new Added)
- gstack-model-benchmark --dry-run (new Added)
- Token ceiling 25K → 40K (moved to Changed)
- New Fixed section — codex adapter --skip-git-repo-check, --models
  dedupe, CI Dockerfile xz-utils + nodejs.org tarball
- 4 new test files documented under contributors (taste-engine,
  publish-dry-run, benchmark-cli, skill-e2e-benchmark-providers)
- Ship golden fixtures for claude/codex/factory hosts

Pre-existing 1.2 content preserved verbatim — no entries clobbered or
reordered. Sequence remains contiguous (1.3.0.0 → 1.1.3.0 → 1.1.2.0 →
1.1.1.0 → 1.1.0.0 → 1.0.0.0 → 0.19.0.0 → ...).

package.json and VERSION both at 1.3.0.0. No drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md | 18 +++++++++++++++---
 VERSION      |  2 +-
 package.json |  2 +-
 3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 840572ec..7349a511 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,6 +1,6 @@
 # Changelog
 
-## [1.2.0.0] - 2026-04-18
+## [1.3.0.0] - 2026-04-19
 
 ### Added
 
@@ -15,11 +15,21 @@
 - **Anti-slop design constraints.** Design-consultation now asks "What's the one thing someone will remember?" as a forcing question. Phase 5 self-gate: "Would a human designer be embarrassed by this?" — discards and regenerates if yes. Anti-convergence directive in design-shotgun: each variant must use a different font, palette, and layout, or one of them failed. Space Grotesk added to the overused fonts list (it's the new "safe alternative to Inter" trap). system-ui-as-primary-font added to the AI slop blacklist.
 - **`gstack-config list` and `gstack-config defaults`** subcommands. `list` shows all config keys with their current value AND source (set/default). `defaults` shows just the defaults table. Fixes the prior gap where `get` returned empty for missing keys instead of falling back to the documented defaults. Telemetry default aligned: header and runtime both say `off` now (previously mismatched).
 - **`gstack-config checkpoint_mode` and `checkpoint_push` keys.** New config knobs for continuous checkpoint mode. Both default to safe values (`explicit` mode, no auto-push).
+- **New `/benchmark-models` skill.** Wraps `gstack-model-benchmark` in an interactive flow: pick a prompt (an existing SKILL.md, inline text, or file path), confirm providers (dry-run shows auth status per provider), decide on `--judge` (adds ~$0.05 for quality scoring), run, interpret. Trigger phrases: "compare models", "model shootout", "which model is best". Separate from `/benchmark` (which measures web page performance) — different surface, different domain.
+- **`/ship` Step 19.5 — methodology skill publish check.** When a PR touches any standalone methodology skill (`openclaw/skills/gstack-*/SKILL.md`) or the `skills.json` manifest, `/ship` runs `gstack-publish --dry-run` after PR creation and asks whether to actually publish. Conditional — silent no-op on PRs that don't touch methodology skills. Eliminates the "shipped to main but never pushed to the marketplace" failure mode.
+- **`gstack-model-benchmark --dry-run`.** Offline validation mode that matches `gstack-publish --dry-run` semantics. Validates the provider list, resolves per-adapter auth, echoes the resolved flag values, and exits without invoking any provider CLI. Zero-cost pre-flight for CI pipelines and for catching auth drift before starting a paid benchmark run.
 
 ### Changed
 
-- **Preamble split into submodules.** `scripts/resolvers/preamble.ts` was 740 lines with 18 generators inline. Now it's an 80-line composition root that imports each generator from `scripts/resolvers/preamble/*.ts`. Output is byte-identical (verified via `diff -r` on all 135 generated SKILL.md files across all hosts before and after the refactor). Maintenance gets easier: adding a new preamble section is now "create one file, add one import line" instead of "find a spot in the god-file."
+- **Preamble split into submodules.** `scripts/resolvers/preamble.ts` was 740 lines with 18 generators inline. Now it's a ~100-line composition root that imports each generator from `scripts/resolvers/preamble/*.ts`. Output is byte-identical (verified via `diff -r` on all 135 generated SKILL.md files across all hosts before and after the refactor). Maintenance gets easier: adding a new preamble section is now "create one file, add one import line" instead of "find a spot in the god-file." This also absorbs main's v1.1.2 mode-posture and v1.0 writing-style additions as submodules (`generate-writing-style.ts`, `generate-writing-style-migration.ts`).
 - **Anti-slop dead code removed.** `scripts/gen-skill-docs.ts` had a duplicate copy of `AI_SLOP_BLACKLIST`, `OPENAI_HARD_REJECTIONS`, and `OPENAI_LITMUS_CHECKS`. Deleted — `scripts/resolvers/constants.ts` is now the single source. No more drift risk.
+- **Token ceiling raised from 25K to 40K.** Skills legitimately packing a lot of behavior (`/ship`, `/plan-ceo-review`, `/office-hours`) were tripping warnings that no longer reflect real risk given today's 200K-1M context windows and prompt caching. CLAUDE.md's guidance reframes the ceiling as a "watch for runaway growth" signal rather than a forcing compression target.
+
+### Fixed
+
+- **Codex adapter works in temp working directories.** The GPT adapter (via `codex exec`) now passes `--skip-git-repo-check` so benchmarks running in non-git temp dirs stop hitting "Not inside a trusted directory" errors. `-s read-only` stays the safety boundary; the flag only skips the interactive trust prompt.
+- **`--models` list deduplication.** Passing `--models claude,claude,gpt` no longer runs Claude twice and double-bills. The flag parser dedupes via Set while preserving first-occurrence order.
+- **CI Docker build on Ubicloud runners.** Two fixes merged during the branch's life: (1) switched the Node.js install from NodeSource apt to direct download of the official nodejs.org tarball, since Ubicloud runners regularly couldn't reach archive.ubuntu.com / security.ubuntu.com; (2) added `xz-utils` to the system deps so `tar -xJ` on the `.tar.xz` tarball actually works.
 
 ### For contributors
 
@@ -27,7 +37,9 @@
 - **Model taxonomy in neutral `scripts/models.ts`.** Avoids an import cycle through `hosts/index.ts` that would have happened if `Model` lived in `scripts/resolvers/types.ts`. `resolveModel()` handles family heuristics: `gpt-5.4-mini` → `gpt-5.4`, `o3` → `o-series`, `claude-opus-4-7` → `claude`.
 - **`scripts/resolvers/preamble/`** — 18 single-purpose generators, 16-160 lines each. The composition root in `scripts/resolvers/preamble.ts` imports them and wires them into the tier-gated section list.
 - **Plan and reviews persisted.** Implementation followed `~/.claude/plans/declarative-riding-cook.md` which went through CEO review (SCOPE EXPANSION, 6 expansions accepted), DX review (POLISH, 5 gaps fixed), Eng review (4 architecture issues), and Codex review (11 brutal findings, all integrated and 2 prior decisions reversed).
-- **Mode-posture energy in Writing Style rules 2-4** (ported from main's v1.1.2.0). Rule 2 and rule 4 now cover three framings — pain reduction, capability unlocked, forcing-question pressure — so expansion, builder, and forcing-question skills keep their edge instead of collapsing into diagnostic-pain framing. Rule 3 adds an explicit exception for stacked forcing questions. Came in via the merge; sits on top of the submodule refactor already shipped in v1.2.
+- **Mode-posture energy in Writing Style rules 2-4** (ported from main's v1.1.2.0). Rule 2 and rule 4 now cover three framings — pain reduction, capability unlocked, forcing-question pressure — so expansion, builder, and forcing-question skills keep their edge instead of collapsing into diagnostic-pain framing. Rule 3 adds an explicit exception for stacked forcing questions. Came in via the merge; sits on top of the submodule refactor already shipped in v1.3.
+- **Lite E2E coverage for v1.3 primitives.** Four new test files fill the real coverage gaps flagged in initial review: `test/taste-engine.test.ts` (24 tests — schema shape, Laplace-smoothed confidence, 5%/week decay clamped at 0, multi-dimension extraction, case-insensitive first-casing-wins policy, session cap via seed-then-one-call, legacy profile migration, taste-drift conflict warning, malformed-JSON recovery), `test/publish-dry-run.test.ts` (13 tests — manifest parsing, missing source file detection, slug filter, per-marketplace auth isolation via fake marketplaces), `test/benchmark-cli.test.ts` (12 tests — CLI flag wiring, provider defaults, unknown-provider WARN path, NOT-READY branch regression catcher that strips auth env vars), `test/skill-e2e-benchmark-providers.test.ts` (8 periodic-tier live-API tests — trivial "echo ok" prompt through claude/codex/gemini adapters, assertions on parsed output + tokens + cost + timeout error codes + Promise.allSettled parallel isolation).
+- **Ship golden fixtures for three hosts.** `test/fixtures/golden/{claude,codex,factory}-ship-SKILL.md` — byte-exact regression pins on the `/ship` generated output. The adversarial subagent pass during /review caught two real bugs before merge: Geist/GEIST casing policy in the taste engine was unpinned, and the live-E2E workdir was created at module load and never cleaned up.
 
 ## [1.1.3.0] - 2026-04-19
 
diff --git a/VERSION b/VERSION
index db700576..67505518 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-1.2.0.0
+1.3.0.0
diff --git a/package.json b/package.json
index db1f265e..ddf5c776 100644
--- a/package.json
+++ b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "gstack",
-  "version": "1.2.0.0",
+  "version": "1.3.0.0",
   "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
   "license": "MIT",
   "type": "module",