gstack

CalvinBackup/gstack

Fork 0

mirror of https://github.com/garrytan/gstack.git synced 2026-08-02 04:18:37 +02:00

Files

T

History

a468374272 fix: enrich SKILL.md docs to pass LLM evals, upgrade judge to Sonnet 4.6 (#43 )

* fix: enrich command descriptions and snapshot flags for LLM eval quality

14 command descriptions enriched with specific arg formats, valid values,
error behavior, and return types. Fixed header usage from <name> <value>
to <name>:<value>. Added cookie usage syntax. Snapshot flags now show
long names, ref numbering, and output format examples.

* refactor: auto-generate server.ts help text from COMMAND_DESCRIPTIONS

Replace hand-maintained help block with generateHelpText() that reads
from COMMAND_DESCRIPTIONS and SNAPSHOT_FLAGS. Eliminates help text
drift from source of truth.

* test: add usage consistency and pipe guard tests

Usage consistency test cross-checks Usage: patterns in implementation
against COMMAND_DESCRIPTIONS using structural skeleton comparison.
Pipe guard test ensures descriptions don't contain | which would break
markdown table rendering.

* chore: upgrade eval judge to Sonnet 4.6, update changelog

Switch LLM-as-judge evals from Haiku to Sonnet 4.6 for more stable,
nuanced scoring. Add changelog entry for all eval improvements.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-13 22:14:14 -07:00

dev-skill.ts

feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41 )

2026-03-13 21:08:12 -07:00

gen-skill-docs.ts

fix: enrich SKILL.md docs to pass LLM evals, upgrade judge to Sonnet 4.6 (#43 )

2026-03-13 22:14:14 -07:00

skill-check.ts

feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41 )

2026-03-13 21:08:12 -07:00