* refactor: extract gen-skill-docs into modular resolver architecture
Break the 3000-line monolith into 10 domain modules under scripts/resolvers/:
types, constants, preamble, utility, browse, design, testing, review,
codex-helpers, and index. Each module owns one domain of template generation.
The preamble module introduces a 4-tier composition system (T1-T4) so skills
only pay for the preamble sections they actually need, reducing token usage
for lightweight skills by ~40%.
Adds a token budget dashboard that prints after every generation run showing
per-skill and total token counts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: tiered preamble — skills only pay for what they use
Tag all 23 templates with preamble-tier (T1-T4). Lightweight skills
like /browse and /benchmark get a minimal preamble (~40% fewer tokens),
while review skills get the full stack. Regenerate all SKILL.md files.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: migrate eval storage to project-scoped paths
Move eval results and E2E run artifacts from ~/.gstack-dev/evals/ to
~/.gstack/projects/$SLUG/evals/ so each project's eval history lives
alongside its other gstack data. Falls back to legacy path if slug
detection fails.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: sync package.json version with VERSION after merge
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add WorktreeManager for isolated test environments
Reusable platform module (lib/worktree.ts) that creates git worktrees
for test isolation and harvests useful changes as patches. Includes
SHA-256 dedup, original SHA tracking for committed change detection,
and automatic gitignored artifact copying (.agents/, browse/dist/).
12 unit tests covering lifecycle, harvest, dedup, and error handling.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: integrate worktree isolation into E2E test infrastructure
Add createTestWorktree(), harvestAndCleanup(), and describeWithWorktree()
helpers to e2e-helpers.ts. Add harvest field to EvalTestEntry for
eval-store integration. Register lib/worktree.ts as a global touchfile.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: run Gemini and Codex E2E tests in worktrees
Switch both test suites from cwd: ROOT to worktree isolation.
Gemini (--yolo) no longer pollutes the working tree. Codex
(read-only) gets worktree for consistency. Useful changes are
harvested as patches for cherry-picking.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: skip symlinks in copyDirSync to prevent infinite recursion
Adversarial review caught that .claude/skills/gstack may be a symlink
back to the repo root, causing copyDirSync to recurse infinitely
when copying gitignored artifacts into worktrees.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version and changelog (v0.11.12.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: relax session-awareness assertion to accept structured options
The LLM consistently presents well-formatted A/B choices with pros/cons
but doesn't always use the exact string "RECOMMENDATION". Accept
case-insensitive "recommend", "option a", "which do you want", or
"which approach" as equivalent signals of a structured recommendation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: browser ref staleness detection via async count() validation
resolveRef() now checks element count to detect stale refs after page
mutations (e.g. SPA navigation). RefEntry stores role+name metadata
for better diagnostics. 3 new snapshot tests for staleness detection.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: qa-only skill, qa fix loop, plan-to-QA artifact flow
Add /qa-only (report-only, Edit tool blocked), restructure /qa with
find-fix-verify cycle, add {{QA_METHODOLOGY}} DRY placeholder for
shared methodology. /plan-eng-review now writes test-plan artifacts
to ~/.gstack/projects/<slug>/ for QA consumption.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: eval efficiency metrics — turns, duration, commentary across all surfaces
Add generateCommentary() for natural-language delta interpretation,
per-test turns/duration in comparison and summary output, judgePassed
unit tests, 3 new E2E tests (qa-only, qa fix loop, plan artifact).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump version and changelog (v0.4.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update ARCHITECTURE, BROWSER, CONTRIBUTING, README for v0.4.0
- ARCHITECTURE: add ref staleness detection section, update RefEntry type
- BROWSER: add ref staleness paragraph to snapshot system docs
- CONTRIBUTING: update eval tool descriptions with commentary feature
- README: fix missing qa-only in project-local uninstall command
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add user-facing benefit descriptions to v0.4.0 changelog
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>