fix: enrich SKILL.md docs to pass LLM evals, upgrade judge to Sonnet 4.6 (#43)

* fix: enrich command descriptions and snapshot flags for LLM eval quality 14 command descriptions enriched with specific arg formats, valid values, error behavior, and return types. Fixed header usage from <name> <value> to <name>:<value>. Added cookie usage syntax. Snapshot flags now show long names, ref numbering, and output format examples. * refactor: auto-generate server.ts help text from COMMAND_DESCRIPTIONS Replace hand-maintained help block with generateHelpText() that reads from COMMAND_DESCRIPTIONS and SNAPSHOT_FLAGS. Eliminates help text drift from source of truth. * test: add usage consistency and pipe guard tests Usage consistency test cross-checks Usage: patterns in implementation against COMMAND_DESCRIPTIONS using structural skeleton comparison. Pipe guard test ensures descriptions don't contain | which would break markdown table rendering. * chore: upgrade eval judge to Sonnet 4.6, update changelog Switch LLM-as-judge evals from Haiku to Sonnet 4.6 for more stable, nuanced scoring. Add changelog entry for all eval improvements. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-25 11:10:00 +02:00 · 2026-03-13 22:14:14 -07:00
parent 5205070299
commit a468374272
10 changed files with 233 additions and 112 deletions
@@ -64,20 +64,31 @@ function generateSnapshotFlags(): string {

  for (const flag of SNAPSHOT_FLAGS) {
    const label = flag.valueHint ? `${flag.short} ${flag.valueHint}` : flag.short;
-    lines.push(`${label.padEnd(10)}${flag.description}`);
+    lines.push(`${label.padEnd(10)}${flag.long.padEnd(24)}${flag.description}`);
  }

  lines.push('```');
  lines.push('');
-  lines.push('Combine flags: `$B snapshot -i -a -C -o /tmp/annotated.png`');
+  lines.push('All flags can be combined freely. `-o` only applies when `-a` is also used.');
+  lines.push('Example: `$B snapshot -i -a -C -o /tmp/annotated.png`');
  lines.push('');
-  lines.push('After snapshot, use @refs everywhere:');
+  lines.push('**Ref numbering:** @e refs are assigned sequentially (@e1, @e2, ...) in tree order.');
+  lines.push('@c refs from `-C` are numbered separately (@c1, @c2, ...).');
+  lines.push('');
+  lines.push('After snapshot, use @refs as selectors in any command:');
  lines.push('```bash');
  lines.push('$B click @e3       $B fill @e4 "value"     $B hover @e1');
  lines.push('$B html @e2        $B css @e5 "color"      $B attrs @e6');
  lines.push('$B click @c1       # cursor-interactive ref (from -C)');
  lines.push('```');
  lines.push('');
+  lines.push('**Output format:** indented accessibility tree with @ref IDs, one element per line.');
+  lines.push('```');
+  lines.push('  @e1 [heading] "Welcome" [level=1]');
+  lines.push('  @e2 [textbox] "Email"');
+  lines.push('  @e3 [button] "Submit"');
+  lines.push('```');
+  lines.push('');
  lines.push('Refs are invalidated on navigation — run `snapshot` again after `goto`.');

  return lines.join('\n');