Merge origin/main into /spec branch — retag v1.45.0.0 → v1.47.0.0

main moved to v1.46.0.0 (gstack v2 foundation, eval-first floor across 51 skills) while this branch was at v1.45.0.0. v1.46 also reserved v1.45.0.0 for the design daemon feature. Retag this branch's release v1.45.0.0 → v1.47.0.0 so it lands cleanly on top of main. Conflict resolutions: - VERSION: 1.47.0.0 (MINOR continues on top of main's 1.46.0.0; this branch is also a MINOR per scale-aware rules — new skill capability). - CHANGELOG: rewrite this branch's release header v1.45.0.0 → v1.47.0.0. Keep both main entries above main's older history. Adapts to main's eval-first floor (v1.46.0.0 test/skill-coverage-matrix.ts + test/skill-coverage-floor.test.ts): - Register /spec in SKILL_COVERAGE with 3 gate entries + 2 periodic. - Skill catalog grows 51 → 52. Floor 6/6 structural checks pass. - Catalog tokens: 4045 → 4116 (+71 for /spec, within v1.46's ≤7000 budget). - Trim spec frontmatter description to single-paragraph block form to respect v1.46's catalog-trim intent (was 14 lines / ~900 chars, now 5 lines / ~350 chars; routing prose stays in body sections). - 363/363 gate-tier tests pass across skill-coverage-floor (309) + skill-coverage-matrix (10) + skill-size-budget (3) + parity-suite (4) + spec-template-invariants (35) + spec-template-sync (2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-29 13:10:05 +02:00 · 2026-05-26 18:54:21 -07:00
parent 33fdb2c9b2 22f8c7f4e1
commit 3d77d1edd6
132 changed files with 10945 additions and 4270 deletions
@@ -2,14 +2,7 @@
 name: benchmark-models
 preamble-tier: 1
 version: 1.0.0
-description: |
-  Cross-model benchmark for gstack skills. Runs the same prompt through Claude,
-  GPT (via Codex CLI), and Gemini side-by-side — compares latency, tokens, cost,
-  and optionally quality via LLM judge. Answers "which model is actually best
-  for this skill?" with data instead of vibes. Separate from /benchmark, which
-  measures web page performance. Use when: "benchmark models", "compare models",
-  "which model is best for X", "cross-model comparison", "model shootout". (gstack)
-  Voice triggers (speech-to-text aliases): "compare models", "model shootout", "which model is best".
+description: Cross-model benchmark for gstack skills. (gstack)
 triggers:
  - cross model benchmark
  - compare claude gpt gemini
@@ -23,6 +16,18 @@ allowed-tools:
 <!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
 <!-- Regenerate: bun run gen:skill-docs -->

+
+## When to invoke this skill
+
+Runs the same prompt through Claude,
+GPT (via Codex CLI), and Gemini side-by-side — compares latency, tokens, cost,
+and optionally quality via LLM judge. Answers "which model is actually best
+for this skill?" with data instead of vibes. Separate from /benchmark, which
+measures web page performance. Use when: "benchmark models", "compare models",
+"which model is best for X", "cross-model comparison", "model shootout".
+
+Voice triggers (speech-to-text aliases): "compare models", "model shootout", "which model is best".
+
 ## Preamble (run first)

 ```bash