mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-01 19:25:10 +02:00
fix: enrich SKILL.md docs to pass LLM evals, upgrade judge to Sonnet 4.6 (#43)
* fix: enrich command descriptions and snapshot flags for LLM eval quality 14 command descriptions enriched with specific arg formats, valid values, error behavior, and return types. Fixed header usage from <name> <value> to <name>:<value>. Added cookie usage syntax. Snapshot flags now show long names, ref numbering, and output format examples. * refactor: auto-generate server.ts help text from COMMAND_DESCRIPTIONS Replace hand-maintained help block with generateHelpText() that reads from COMMAND_DESCRIPTIONS and SNAPSHOT_FLAGS. Eliminates help text drift from source of truth. * test: add usage consistency and pipe guard tests Usage consistency test cross-checks Usage: patterns in implementation against COMMAND_DESCRIPTIONS using structural skeleton comparison. Pipe guard test ensures descriptions don't contain | which would break markdown table rendering. * chore: upgrade eval judge to Sonnet 4.6, update changelog Switch LLM-as-judge evals from Haiku to Sonnet 4.6 for more stable, nuanced scoring. Add changelog entry for all eval improvements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -64,20 +64,31 @@ function generateSnapshotFlags(): string {
|
||||
|
||||
for (const flag of SNAPSHOT_FLAGS) {
|
||||
const label = flag.valueHint ? `${flag.short} ${flag.valueHint}` : flag.short;
|
||||
lines.push(`${label.padEnd(10)}${flag.description}`);
|
||||
lines.push(`${label.padEnd(10)}${flag.long.padEnd(24)}${flag.description}`);
|
||||
}
|
||||
|
||||
lines.push('```');
|
||||
lines.push('');
|
||||
lines.push('Combine flags: `$B snapshot -i -a -C -o /tmp/annotated.png`');
|
||||
lines.push('All flags can be combined freely. `-o` only applies when `-a` is also used.');
|
||||
lines.push('Example: `$B snapshot -i -a -C -o /tmp/annotated.png`');
|
||||
lines.push('');
|
||||
lines.push('After snapshot, use @refs everywhere:');
|
||||
lines.push('**Ref numbering:** @e refs are assigned sequentially (@e1, @e2, ...) in tree order.');
|
||||
lines.push('@c refs from `-C` are numbered separately (@c1, @c2, ...).');
|
||||
lines.push('');
|
||||
lines.push('After snapshot, use @refs as selectors in any command:');
|
||||
lines.push('```bash');
|
||||
lines.push('$B click @e3 $B fill @e4 "value" $B hover @e1');
|
||||
lines.push('$B html @e2 $B css @e5 "color" $B attrs @e6');
|
||||
lines.push('$B click @c1 # cursor-interactive ref (from -C)');
|
||||
lines.push('```');
|
||||
lines.push('');
|
||||
lines.push('**Output format:** indented accessibility tree with @ref IDs, one element per line.');
|
||||
lines.push('```');
|
||||
lines.push(' @e1 [heading] "Welcome" [level=1]');
|
||||
lines.push(' @e2 [textbox] "Email"');
|
||||
lines.push(' @e3 [button] "Submit"');
|
||||
lines.push('```');
|
||||
lines.push('');
|
||||
lines.push('Refs are invalidated on navigation — run `snapshot` again after `goto`.');
|
||||
|
||||
return lines.join('\n');
|
||||
|
||||
Reference in New Issue
Block a user