mirror of
https://github.com/garrytan/gstack.git
synced 2026-06-29 13:10:05 +02:00
Merge origin/main into /spec branch — retag v1.45.0.0 → v1.47.0.0
main moved to v1.46.0.0 (gstack v2 foundation, eval-first floor across 51 skills) while this branch was at v1.45.0.0. v1.46 also reserved v1.45.0.0 for the design daemon feature. Retag this branch's release v1.45.0.0 → v1.47.0.0 so it lands cleanly on top of main. Conflict resolutions: - VERSION: 1.47.0.0 (MINOR continues on top of main's 1.46.0.0; this branch is also a MINOR per scale-aware rules — new skill capability). - CHANGELOG: rewrite this branch's release header v1.45.0.0 → v1.47.0.0. Keep both main entries above main's older history. Adapts to main's eval-first floor (v1.46.0.0 test/skill-coverage-matrix.ts + test/skill-coverage-floor.test.ts): - Register /spec in SKILL_COVERAGE with 3 gate entries + 2 periodic. - Skill catalog grows 51 → 52. Floor 6/6 structural checks pass. - Catalog tokens: 4045 → 4116 (+71 for /spec, within v1.46's ≤7000 budget). - Trim spec frontmatter description to single-paragraph block form to respect v1.46's catalog-trim intent (was 14 lines / ~900 chars, now 5 lines / ~350 chars; routing prose stays in body sections). - 363/363 gate-tier tests pass across skill-coverage-floor (309) + skill-coverage-matrix (10) + skill-size-budget (3) + parity-suite (4) + spec-template-invariants (35) + spec-template-sync (2). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2,14 +2,7 @@
|
||||
name: benchmark-models
|
||||
preamble-tier: 1
|
||||
version: 1.0.0
|
||||
description: |
|
||||
Cross-model benchmark for gstack skills. Runs the same prompt through Claude,
|
||||
GPT (via Codex CLI), and Gemini side-by-side — compares latency, tokens, cost,
|
||||
and optionally quality via LLM judge. Answers "which model is actually best
|
||||
for this skill?" with data instead of vibes. Separate from /benchmark, which
|
||||
measures web page performance. Use when: "benchmark models", "compare models",
|
||||
"which model is best for X", "cross-model comparison", "model shootout". (gstack)
|
||||
Voice triggers (speech-to-text aliases): "compare models", "model shootout", "which model is best".
|
||||
description: Cross-model benchmark for gstack skills. (gstack)
|
||||
triggers:
|
||||
- cross model benchmark
|
||||
- compare claude gpt gemini
|
||||
@@ -23,6 +16,18 @@ allowed-tools:
|
||||
<!-- AUTO-GENERATED from SKILL.md.tmpl — do not edit directly -->
|
||||
<!-- Regenerate: bun run gen:skill-docs -->
|
||||
|
||||
|
||||
## When to invoke this skill
|
||||
|
||||
Runs the same prompt through Claude,
|
||||
GPT (via Codex CLI), and Gemini side-by-side — compares latency, tokens, cost,
|
||||
and optionally quality via LLM judge. Answers "which model is actually best
|
||||
for this skill?" with data instead of vibes. Separate from /benchmark, which
|
||||
measures web page performance. Use when: "benchmark models", "compare models",
|
||||
"which model is best for X", "cross-model comparison", "model shootout".
|
||||
|
||||
Voice triggers (speech-to-text aliases): "compare models", "model shootout", "which model is best".
|
||||
|
||||
## Preamble (run first)
|
||||
|
||||
```bash
|
||||
|
||||
Reference in New Issue
Block a user