gstack

CalvinBackup/gstack

Fork 0

mirror of https://github.com/garrytan/gstack.git synced 2026-05-05 13:15:24 +02:00

Commit Graph

Author	SHA1	Message	Date
Garry Tan	70d045d4d9	merge: integrate origin/main (v1.1.0.0) — V1 + Puppeteer parity + /plan-tune Big merge. Main shipped three releases while this branch was in flight: - v0.19.0.0 /plan-tune skill (observational layer; dual-track dev profile) - v1.0.0.0 V1 prompts (simpler, outcome-framed, jargon-glossed) + LOC receipts - v1.1.0.0 browse Puppeteer parity (load-html, file://, --selector, --scale) This branch bumps to v1.2.0.0 (above main's v1.1.0.0) per the branch-scoped-version rule in CLAUDE.md. My "0.19.0.0" CHANGELOG entry is renamed to "1.2.0.0" and dated 2026-04-18 to land above main's trail. Conflicts resolved: - VERSION / package.json: 1.2.0.0 - CHANGELOG.md: preserved my entry at top (renamed), kept main's 1.1.0.0 / 1.0.0.0 / 0.19.0.0 / 0.18.4.0 trail below in correct order - .github/docker/Dockerfile.ci: kept my xz-utils + nodejs.org tarball fix (real CI bug fix main didn't have); absorbed main's retry loop structure for both apt and the tarball curl - bin/gstack-config: kept both my checkpoint_mode/push section and main's explain_level writing-style section - scripts/resolvers/preamble.ts: kept my submodule refactor as the file shape; extracted main's new generateWritingStyle and generateWritingStyleMigration into scripts/resolvers/preamble/ submodules; absorbed main's generateQuestionTuning import - All generated SKILL.md files: resolved by regen via bun run gen:skill-docs --host all (per CLAUDE.md: never hand-merge generated files — resolve templates and regen) - Ship golden fixtures (claude/codex/factory): refreshed Tier 2 preamble composition now includes all 8 sections: context recovery, ask-user-format, writing-style, completeness, confusion, continuous checkpoint, context health, question tuning. Main also brought new test files from /plan-tune: skill-e2e-plan-tune, upgrade-migration-v1, v0-dormancy, writing-style-resolver. All absorbed. 468 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 23:35:36 +08:00
Garry Tan	612c1a24f8	feat(benchmark-models): new skill wrapping gstack-model-benchmark Wires the orphaned gstack-model-benchmark binary into a dedicated skill so users can discover cross-model benchmarking via /benchmark-models or voice triggers ("compare models", "which model is best"). Deliberately separate from /benchmark (page performance) because the two surfaces test completely different things — confusing them would muddy both. Flow: 1. Pick a prompt (an existing SKILL.md file, inline text, or file path) 2. Confirm providers (dry-run shows auth status per provider) 3. Decide on --judge (adds ~$0.05, scores output quality 0-10) 4. Run the benchmark — table output 5. Interpret results (fastest / cheapest / highest quality) 6. Offer to save to ~/.gstack/benchmarks/<date>.json for trend tracking Uses gstack-model-benchmark --dry-run as a safety gate — auth status is visible BEFORE the user spends API calls. If zero providers are authed, the skill stops cleanly rather than attempting a run that produces no useful output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 08:19:00 +08:00

Author

SHA1

Message

Date

Garry Tan

70d045d4d9

merge: integrate origin/main (v1.1.0.0) — V1 + Puppeteer parity + /plan-tune

Big merge. Main shipped three releases while this branch was in flight:
- v0.19.0.0 /plan-tune skill (observational layer; dual-track dev profile)
- v1.0.0.0 V1 prompts (simpler, outcome-framed, jargon-glossed) + LOC receipts
- v1.1.0.0 browse Puppeteer parity (load-html, file://, --selector, --scale)

This branch bumps to v1.2.0.0 (above main's v1.1.0.0) per the
branch-scoped-version rule in CLAUDE.md. My "0.19.0.0" CHANGELOG entry
is renamed to "1.2.0.0" and dated 2026-04-18 to land above main's trail.

Conflicts resolved:
- VERSION / package.json: 1.2.0.0
- CHANGELOG.md: preserved my entry at top (renamed), kept main's 1.1.0.0
  / 1.0.0.0 / 0.19.0.0 / 0.18.4.0 trail below in correct order
- .github/docker/Dockerfile.ci: kept my xz-utils + nodejs.org tarball
  fix (real CI bug fix main didn't have); absorbed main's retry loop
  structure for both apt and the tarball curl
- bin/gstack-config: kept both my checkpoint_mode/push section and
  main's explain_level writing-style section
- scripts/resolvers/preamble.ts: kept my submodule refactor as the
  file shape; extracted main's new generateWritingStyle and
  generateWritingStyleMigration into scripts/resolvers/preamble/
  submodules; absorbed main's generateQuestionTuning import
- All generated SKILL.md files: resolved by regen via
  bun run gen:skill-docs --host all (per CLAUDE.md: never hand-merge
  generated files — resolve templates and regen)
- Ship golden fixtures (claude/codex/factory): refreshed

Tier 2 preamble composition now includes all 8 sections: context
recovery, ask-user-format, writing-style, completeness, confusion,
continuous checkpoint, context health, question tuning.

Main also brought new test files from /plan-tune: skill-e2e-plan-tune,
upgrade-migration-v1, v0-dormancy, writing-style-resolver. All absorbed.
468 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-18 23:35:36 +08:00

Garry Tan

612c1a24f8

feat(benchmark-models): new skill wrapping gstack-model-benchmark

Wires the orphaned gstack-model-benchmark binary into a dedicated skill
so users can discover cross-model benchmarking via /benchmark-models or
voice triggers ("compare models", "which model is best").

Deliberately separate from /benchmark (page performance) because the
two surfaces test completely different things — confusing them would
muddy both.

Flow:
  1. Pick a prompt (an existing SKILL.md file, inline text, or file path)
  2. Confirm providers (dry-run shows auth status per provider)
  3. Decide on --judge (adds ~$0.05, scores output quality 0-10)
  4. Run the benchmark — table output
  5. Interpret results (fastest / cheapest / highest quality)
  6. Offer to save to ~/.gstack/benchmarks/<date>.json for trend tracking

Uses gstack-model-benchmark --dry-run as a safety gate — auth status is
visible BEFORE the user spends API calls. If zero providers are authed,
the skill stops cleanly rather than attempting a run that produces no
useful output.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-18 08:19:00 +08:00

2 Commits