mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
chore: bump version and changelog (v1.6.2.0)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -1,5 +1,47 @@
|
||||
# Changelog
|
||||
|
||||
## [1.6.2.0] - 2026-04-22
|
||||
|
||||
## **Plan reviews give you the recommendation again. And we finally admitted a 10/10 score on a mode pick means nothing.**
|
||||
|
||||
A user on Opus 4.7 reported `/plan-ceo-review` and `/plan-eng-review` stopped showing the `RECOMMENDATION: Choose X` line and the per-option `Completeness: N/10` score that used to make decisions quick. The fix ships both signals back, but with a sharper distinction: coverage-differentiated options get real scores (10 = all edges, 7 = happy path, 3 = shortcut), and kind-differentiated options (mode selection, A-vs-B architecture calls, cherry-pick Add/Defer/Skip) get the RECOMMENDATION plus an explicit `Note: options differ in kind, not coverage — no completeness score.` line instead of fabricated 10/10 filler.
|
||||
|
||||
### The numbers that matter
|
||||
|
||||
Source: `test/skill-e2e-plan-format.test.ts`, four cases pinned to `claude-opus-4-7`, ~$2 per full run. Periodic tier (non-deterministic Opus behavior gets weekly cron, not per-PR gate).
|
||||
|
||||
| Question type | Before (v1.6.1.0) | After (v1.6.2.0) |
|
||||
|---|---|---|
|
||||
| Mode selection (kind-differentiated) | `Completeness: 10/10` fabricated on all 4 modes | RECOMMENDATION + "options differ in kind" note |
|
||||
| Approach menu (coverage-differentiated) | `**RECOMMENDATION:**` markdown-bolded but regex missed it | RECOMMENDATION + `Completeness: 5/7/10` per option |
|
||||
| Per-issue coverage decision | Present, working | Present, working (unchanged) |
|
||||
| Per-issue architectural choice (kind-differentiated) | `Completeness: 9/9/5` fabricated on kind question | RECOMMENDATION + "options differ in kind" note |
|
||||
|
||||
| Eval pass | Result | Cost |
|
||||
|---|---|---|
|
||||
| Phase 1 baseline (pre-fix) | 1/4 assertions pass (evidence of regression) | $2.19 |
|
||||
| Phase 3 post-fix | 4/4 assertions pass | $1.84 |
|
||||
| Phase 3b neighbor regression (`skill-e2e-plan.test.ts`) | 12/12 pass, no drift | $5.19 |
|
||||
|
||||
### Itemized changes
|
||||
|
||||
#### Fixed
|
||||
|
||||
- `RECOMMENDATION: Choose X` now appears consistently on every AskUserQuestion in `/plan-ceo-review` and `/plan-eng-review` regardless of question type.
|
||||
- `Completeness: N/10` is only emitted on coverage-differentiated options. Kind-differentiated questions (mode picks, architectural choices between different systems, cherry-pick A/B/C) emit a one-line note explaining why the score doesn't apply, instead of fabricating 10/10 filler.
|
||||
|
||||
#### Changed
|
||||
|
||||
- The `AskUserQuestion Format` section in the T2 preamble splits the old run-on paragraph into two ALWAYS-framed rules: step 3 "Recommend (ALWAYS)" and step 4 "Score completeness (when meaningful)". This affects every T2 skill (~15 files regenerated).
|
||||
- The `Completeness Principle — Boil the Lake` preamble section now states the coverage-vs-kind distinction explicitly, matching step 4. Without this edit the two preamble locations would disagree — which is how the regression started.
|
||||
- Section 0C-bis (approach menu) and Section 0F (mode selection) in `plan-ceo-review/SKILL.md.tmpl` now carry short anchor lines that remind the model which question type applies. `plan-eng-review/SKILL.md.tmpl` gets an equivalent anchor inside the CRITICAL RULE section for per-issue AskUserQuestion decisions.
|
||||
|
||||
#### For contributors
|
||||
|
||||
- New test file `test/skill-e2e-plan-format.test.ts` captures verbatim AskUserQuestion output from the two plan skills and asserts the coverage-vs-kind format. Instructs the agent to write would-be AskUserQuestion text to `$OUT_FILE` rather than calling an MCP tool (since MCP isn't wired inside `claude -p`).
|
||||
- Classified `periodic` tier because behavior depends on Opus 4.7 non-determinism — `gate` tier would flake and block merges.
|
||||
- Golden fixtures (`test/fixtures/golden/claude-ship-SKILL.md`, `codex-ship-SKILL.md`, `factory-ship-SKILL.md`) refreshed to reflect the new format rule.
|
||||
|
||||
## [1.6.1.0] - 2026-04-22
|
||||
|
||||
## **Opus 4.7 migration, reviewed. Overlay actually split per model. Routing verified, fanout is still on the list.**
|
||||
|
||||
+1
-1
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "gstack",
|
||||
"version": "1.6.1.0",
|
||||
"version": "1.6.2.0",
|
||||
"description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
|
||||
"license": "MIT",
|
||||
"type": "module",
|
||||
|
||||
Reference in New Issue
Block a user