mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-06 21:46:40 +02:00
efa519b8811db22df7e8d396d2b2dd68312fafc0
18 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
efa519b881 |
chore: rename /brainstorm → /office-hours across references
Update CHANGELOG, CLAUDE.md, TODOS, design-consultation, plan-ceo-review, and gen-skill-docs to reference the new office-hours skill name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
95e4430e63 |
fix: resolve merge conflicts with origin/main (v0.5.0 + v0.4.5)
Merge brings in design review skills (plan-design-review, qa-design-review, design-consultation), fix-first review mode (v0.4.5), and browse JS/click fixes. Conflicts resolved in VERSION (keep 0.6.0), CHANGELOG (keep all entries), TODOS (update design review to shipped), and test arrays (combine all skill registrations). Regenerated all SKILL.md files so design skills get the improved preamble (ELI16, escalation protocol, branch detection). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
4a77cc2c34 |
feat: /plan-design-review + /qa-design-review skills (v0.5.0) (#102)
* feat: add {{DESIGN_METHODOLOGY}} resolver and register design review skills
Add generateDesignMethodology() to gen-skill-docs.ts with 10-category, 80-item
design audit checklist. Register plan-design-review and qa-design-review templates
in findTemplates(). Add both skills to skill-check.ts SKILL_FILES. Add command
and snapshot flag validation tests for both skills in skill-validation.test.ts.
* feat: add /plan-design-review and /qa-design-review skills
/plan-design-review: report-only designer audit with letter grades, AI slop
scoring, structured first impression, design system extraction, DESIGN.md
inference and export offer. Never modifies code.
/qa-design-review: same audit, then iterative fix loop with style(design):
commits, CSS-safe WTF heuristic, before/after screenshots, final re-audit.
* chore: bump version and changelog (v0.5.0)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update README, ARCHITECTURE for design review skills (v0.5.0)
- Update skill count to 11, add /plan-design-review and /qa-design-review
to skill table, install/uninstall commands, and demo walkthrough
- Add narrative sections: "senior designer mode" and "designer who codes mode"
with compelling examples showing AI Slop detection and design system inference
- Add {{DESIGN_METHODOLOGY}} to ARCHITECTURE.md placeholder table
- Extend demo to show full plan→eng→review→ship→qa→design-review pipeline
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: regenerate design review SKILL.md files after merge from main
Picks up BASE_BRANCH_DETECT resolver and updated contributor mode from main.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add /design-consultation skill — design consultant that creates DESIGN.md
6-phase consultant flow: product context → competitive research (WebSearch) →
complete coherent proposal → drill-downs on demand → font+color preview page →
write DESIGN.md + update CLAUDE.md. Opinionated recommendations grounded in
product context, not menu-driven forms.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add E2E tests for design skill family (7 tests + LLM quality judge)
Tests 1-4: /design-consultation (core flow, research integration, existing
DESIGN.md handling, font+color preview generation).
Tests 5-6: /plan-design-review (audit report, DESIGN.md export).
Test 7: /qa-design-review (audit + fix loop).
LLM judge validates font blacklist compliance, coherence, and AI slop avoidance.
Also adds plan-design-review + qa-design-review to ALL_SKILLS test array.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: mark /design-consultation as shipped in TODOS.md
Renamed from /setup-design-md to reflect the consultant approach.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|
|
9a55efe7c7 |
fix: resolve merge conflicts with origin/main (v0.4.3 + v0.4.4)
Merge origin/main which added: - v0.4.3: /document-release skill, simplified AskUserQuestion (always ELI16), runtime branch detection via _BRANCH preamble variable - v0.4.4: 60-min update check TTL, --force flag for gstack-update-check Resolved conflicts: - VERSION: keep 0.6.0 (our version, above 0.4.4) - CHANGELOG: all entries preserved (0.6.0 > 0.4.4 > 0.4.3 > 0.4.2) - TODOS.md: keep both sections (Brainstorm/Design + Document-Release) - test/skill-validation.test.ts: include all three new skills in arrays (brainstorm, debug, document-release) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
a68244ab57 |
feat: /document-release skill — post-ship doc updates (v0.4.3) (#109)
* docs: update project documentation for v0.4.2 - README: skill count 9→10, added /document-release to skills table, install/uninstall sections, and dedicated section with example - CHANGELOG: added /document-release bullet to v0.4.2 entry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add /document-release skill with smart VERSION handling New skill runs after /ship but before PR merge. Reads every doc file, cross-references the diff, auto-updates factual changes, asks about risky edits. CHANGELOG clobber protection: never uses Write tool on CHANGELOG.md, only Edit with exact old_string matches. Smart VERSION logic: instead of silently skipping already-bumped versions, compares CHANGELOG entry scope against full diff and asks if significant uncovered changes exist. Also fixes gstack-upgrade/SKILL.md missing from skill-check.ts SKILL_FILES array (existing inconsistency with gen-skill-docs.ts). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /review Step 5.6 — documentation staleness check Review skill now cross-references code changes against doc files. If a doc describes a feature that changed but the doc wasn't updated, flags it as INFORMATIONAL with a pointer to /document-release. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: /document-release E2E with CHANGELOG clobber guard E2E test creates a repo with existing CHANGELOG entries, runs /document-release, and asserts original entries survive. Critical guardrail against the incident where an agent replaced CHANGELOG entries during conflict resolution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump to v0.4.3 — /document-release skill Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files after merge * chore: regenerate SKILL.md files after merge --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
f5b981fcc0 |
fix: resolve merge conflicts with origin/main (v0.4.2 base branch detection)
Merge origin/main which added: - BASE_BRANCH_DETECT placeholder + dynamic branch detection in all skills - Updated contributor mode (reflection-based, 0-10 rating) - Async await wrapping in browse js/eval commands - Hardcoded-main regression test Resolved conflicts: - VERSION: keep 0.6.0 (our version, above 0.4.2) - CHANGELOG: both entries preserved (0.6.0 above 0.4.2) - gen-skill-docs.ts: keep main's updated contributor mode, add our escalation protocol - review/SKILL.md.tmpl: fix hardcoded 'origin/main' in Step 1.5 to use origin/<base> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
78e519e3b7 |
feat: await support in browse js/eval + contributor mode v2 (#104)
* feat: support await in $B js and eval commands Auto-wrap await expressions in async IIFE context so $B js "await fetch(...)" works without SyntaxError. - hasAwait() strips comments before detection - js: expression wrapping (async()=>(expr))() - eval: smart wrapping — single-line=expression, multi-line=block - 6 new unit tests covering async, false-positive, and return semantics * feat: redesign contributor mode — periodic reflection with 0-10 rating Replace passive "report when things break" with active reflection: - Rate gstack experience 0-10 at workflow step boundaries - Historical calibration example (await bug) anchors the reporting bar - "What would make this a 10" field focuses on actionable improvements - Removed category lists in favor of judgment-based assessment * test: add deterministic contributor mode preamble validation 40 new skill-validation tests (4 checks × 10 skills) verify: - 0-10 rating scale present - Calibration example present - "What would make this a 10" field present - Periodic reflection (not per-command) Update existing E2E contributor eval for new report format. * chore: bump version and changelog (v0.4.2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: improve contributor mode + qa-quick E2E reliability Contributor mode: - Add "do not truncate" directive to template — agent was stopping after "My rating" without completing Steps/Raw output/What would make this a 10 sections - Restore assertions for Steps to reproduce and Date footer QA quick: - Make test server URL prominent: top of prompt, explicit "already running" and "do NOT discover ports" instructions - Bump session timeout 180s→240s and test timeout 240s→300s - Set B= at top of prompt (was buried in prose) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: use flexible assertions for contributor mode E2E Agent writes thorough reports with creative section names ("Repro Steps" vs "Steps to reproduce"). Match intent not formatting: - /repro|steps to reproduce/ for reproduction steps - /date.*2026/ for date footer presence Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add E2E eval failure blame protocol "Not related to our changes" is an extraordinary claim that requires extraordinary proof. When evals fail during /ship: 1. Run the same eval on main — prove it fails there too 2. If it passes on main, it IS your change — trace the blame 3. If you can't verify, say "unverified" not "pre-existing" Added to CLAUDE.md and as a comment in skill-e2e.test.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update CONTRIBUTING.md and BROWSER.md for v0.4.2 CONTRIBUTING.md: update contributor mode description — now describes periodic 0-10 reflection loop instead of passive friction detection. BROWSER.md: add js/eval async documentation — await expressions are auto-wrapped in async context, single-line eval returns values directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: restore v0.4.2 changelog entries lost during cherry-pick conflict The base branch detection entries from main were dropped when resolving the CHANGELOG conflict — should have merged both sets, not replaced. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
1e06b6a5c6 |
fix: dynamic base branch detection across all SKILL templates (v0.3.10) (#81)
* feat: add {{BASE_BRANCH_DETECT}} resolver to gen-skill-docs
DRY placeholder for dynamic base branch detection across PR-targeting
skills. Detects via gh pr view (existing PR base) → gh repo view
(repo default) → fallback to main.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: ship skill detects base branch instead of hardcoding main
Replaces ~14 hardcoded 'main' references with dynamic detection via
{{BASE_BRANCH_DETECT}}. Fixes stacked branches and Conductor workspaces
targeting non-main branches. Adds --base <base> to gh pr create.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: review, qa, plan-ceo-review detect base branch dynamically
Same pattern as ship: replaces hardcoded 'main' with {{BASE_BRANCH_DETECT}}.
Also cleans up qa bash-isms (REPORT_DIR variable, port chaining).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: retro detects default branch instead of hardcoding origin/main
Retro queries commit history (not PR targets), so uses simpler detection:
gh repo view defaultBranchRef. Replaces ~11 origin/main refs with
origin/<default>.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add explicit cross-step references in gstack-upgrade template
Bash blocks are self-contained, but cross-block variable references
(INSTALL_DIR from Step 2) were implicit. Adds prose making them explicit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs+test: SKILL authoring guidance + regression tests
Adds "Writing SKILL templates" section to CLAUDE.md explaining that
templates are prompts, not scripts. Adds validation test catching
hardcoded 'main' in git commands, and resolver content test.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: update ARCHITECTURE + CONTRIBUTING for new placeholders
Add {{BASE_BRANCH_DETECT}} to ARCHITECTURE.md placeholder list.
Cross-reference CLAUDE.md template authoring guidance from CONTRIBUTING.md.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.3.10)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add missing blank line between resolver functions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add 3 E2E smoke tests for base branch detection
- /review: verifies Step 0 detection + git diff against detected base
- /ship: truncated dry-run (Steps 0-1 only, no push/PR), asserts no
destructive actions
- /retro: verifies default branch detection for git log queries
Covers the {{BASE_BRANCH_DETECT}} resolver path (review), the ship
template's dual abort check, and retro's inline detection pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version and changelog (v0.4.2)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
||
|
|
ba3c2dee2e |
test: structural tests for new skills + escalation protocol assertions
Add brainstorm + debug to skillsWithUpdateCheck and skillsWithPreamble arrays. Add structural tests: brainstorm (Phase 1-6, Design Doc, Supersedes, Smart-skip), debug (Iron Law, Root Cause, Pattern Analysis, Hypothesis, DEBUG REPORT, 3-strike). Add escalation protocol tests (DONE_WITH_CONCERNS, BLOCKED, NEEDS_CONTEXT) for all preamble skills. Also: 2 new TODOs (design docs → Supabase sync, /plan-design-review skill), update CLAUDE.md project structure with new skill directories. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
3e3843c4a9 |
feat: contributor mode, session awareness, recommendation format (#90)
* feat: contributor mode, session awareness, universal RECOMMENDATION format
- Rename {{UPDATE_CHECK}} → {{PREAMBLE}} across all 10 skill templates
- Add session tracking (touch ~/.gstack/sessions/$PPID, count active sessions)
- ELI16 mode when 3+ concurrent sessions detected (re-ground user on context)
- Contributor mode: auto-file field reports to ~/.gstack/contributor-logs/
- Universal AskUserQuestion format: context → question → RECOMMENDATION → options
- Update plan-ceo-review and plan-eng-review to reference preamble baseline
- Add vendored symlink awareness section to CLAUDE.md
- Rewrite CONTRIBUTING.md with contributor workflow and cross-project testing
- Add tests for contributor mode and session awareness in generated output
- Add E2E eval for contributor mode report filing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add Enum & Value Completeness to /review critical checklist
New CRITICAL review category that traces new enum values, status strings,
and type constants through every consumer outside the diff. Catches the
class of bugs where a new value is added but not handled in all switch/case
chains, allowlists, or frontend-backend contracts.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: bump v0.4.1, user-facing changelog, update qa-only template and architecture docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add CHANGELOG style guide — user-facing, sell the feature
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: rewrite v0.4.1 changelog to be user-facing and sell the features
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add evals for RECOMMENDATION format, session awareness, and enum completeness
Free tests (Tier 1): RECOMMENDATION format + session awareness in all
preamble SKILL.md files, enum completeness checklist structure and CRITICAL
classification.
E2E eval: /review catches missed enum handlers when a new status value
is added but not handled in case/switch and notify methods.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add E2E eval for session awareness ELI16 mode
Stubs _SESSIONS=4, gives agent a decision point on feature/add-payments
branch, verifies the output re-grounds the user with project, branch,
context, and RECOMMENDATION — the ELI16 mode behavior for 3+ sessions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: contributor mode eval marked FAIL due to expected browse error
The test intentionally runs a nonexistent binary to trigger contributor
mode. The session runner's browse error detection catches "no such file
or directory...browse" and sets browseErrors, causing recordE2E to mark
passed=false. Override passed to check only exitReason since the browse
error is the expected scenario.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|
|
f3ee0ee28a |
feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0) (#83)
* feat: browser ref staleness detection via async count() validation resolveRef() now checks element count to detect stale refs after page mutations (e.g. SPA navigation). RefEntry stores role+name metadata for better diagnostics. 3 new snapshot tests for staleness detection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: qa-only skill, qa fix loop, plan-to-QA artifact flow Add /qa-only (report-only, Edit tool blocked), restructure /qa with find-fix-verify cycle, add {{QA_METHODOLOGY}} DRY placeholder for shared methodology. /plan-eng-review now writes test-plan artifacts to ~/.gstack/projects/<slug>/ for QA consumption. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: eval efficiency metrics — turns, duration, commentary across all surfaces Add generateCommentary() for natural-language delta interpretation, per-test turns/duration in comparison and summary output, judgePassed unit tests, 3 new E2E tests (qa-only, qa fix loop, plan artifact). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.4.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update ARCHITECTURE, BROWSER, CONTRIBUTING, README for v0.4.0 - ARCHITECTURE: add ref staleness detection section, update RefEntry type - BROWSER: add ref staleness paragraph to snapshot system docs - CONTRIBUTING: update eval tool descriptions with commentary feature - README: fix missing qa-only in project-local uninstall command Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add user-facing benefit descriptions to v0.4.0 changelog Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
ed802d0c7f |
feat: eval CLI tools + docs cleanup
Add eval:list, eval:compare, eval:summary CLI scripts for exploring eval history from ~/.gstack-dev/evals/. eval:compare reuses the shared comparison functions from eval-store.ts. - eval:list: sorted table with branch/tier/cost filters - eval:compare: thin wrapper around compareEvalResults + formatComparison - eval:summary: aggregate stats, flaky test detection, branch rankings - Remove unused @anthropic-ai/claude-agent-sdk from devDependencies - Update CLAUDE.md: streaming docs, eval CLI commands, remove Agent SDK refs - Add GH Actions eval upload (P2) and web dashboard (P3) to TODOS.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
c35e933c7d |
fix: rewrite session-runner to claude -p subprocess, lower flaky baselines
Session runner now spawns `claude -p` as a subprocess instead of using Agent SDK query(), which fixes E2E tests hanging inside Claude Code. Also lowers command_reference completeness baseline to 3 (flaky oscillation), adds test:e2e script, and updates CLAUDE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
942df42161 |
simplify: one command for evals — bun run test:evals
Remove test:eval, test:e2e, test:all. Just two commands: - bun test (free) - bun run test:evals (everything that costs money) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
76803d789a |
feat: 3-tier eval suite with planted-bug outcome testing (EVALS=1)
Adds comprehensive eval infrastructure: - Tier 1 (free): 13 new static tests — cross-skill path consistency, QA structure validation, greptile format, planted-bug fixture validation - Tier 2 (Agent SDK E2E): /qa quick, /review with pre-built git repo, 3 planted-bug outcome evals (static, SPA, checkout — each with 5 bugs) - Tier 3 (LLM judge): QA workflow quality, health rubric clarity, cross-skill consistency, baseline score pinning New fixtures: 3 HTML pages with 15 total planted bugs, ground truth JSON, review-eval-vuln.rb, eval-baselines.json. Shared llm-judge.ts helper (DRY). Unified EVALS=1 flag replaces SKILL_E2E + ANTHROPIC_API_KEY checks. `bun run test:evals` runs everything that costs money (~$4/run). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
5205070299 |
feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41)
* refactor: extract command registry to commands.ts, add SNAPSHOT_FLAGS metadata
- NEW: browse/src/commands.ts — command sets + COMMAND_DESCRIPTIONS + load-time validation (zero side effects)
- server.ts imports from commands.ts instead of declaring sets inline
- snapshot.ts: SNAPSHOT_FLAGS array drives parseSnapshotArgs (metadata-driven, no duplication)
- All 186 existing tests pass
* feat: SKILL.md template system with auto-generated command references
- SKILL.md.tmpl + browse/SKILL.md.tmpl with {{COMMAND_REFERENCE}} and {{SNAPSHOT_FLAGS}} placeholders
- scripts/gen-skill-docs.ts generates SKILL.md from templates (supports --dry-run)
- Build pipeline runs gen:skill-docs before binary compilation
- Generated files have AUTO-GENERATED header, committed to git
* test: Tier 1 static validation — 34 tests for SKILL.md command correctness
- test/helpers/skill-parser.ts: extracts $B commands from code blocks, validates against registry
- test/skill-parser.test.ts: 13 parser/validator unit tests
- test/skill-validation.test.ts: 13 tests validating all SKILL.md files + registry consistency
- test/gen-skill-docs.test.ts: 8 generator tests (categories, sorting, freshness)
* feat: DX tools (skill:check, dev:skill) + Tier 2 E2E test scaffolding
- scripts/skill-check.ts: health summary for all SKILL.md files (commands, templates, freshness)
- scripts/dev-skill.ts: watch mode for template development
- test/helpers/session-runner.ts: Agent SDK wrapper for E2E skill tests
- test/skill-e2e.test.ts: 2 E2E tests + 3 stubs (auto-skip inside Claude Code sessions)
- E2E tests must run from plain terminal: SKILL_E2E=1 bun test test/skill-e2e.test.ts
* ci: SKILL.md freshness check on push/PR + TODO updates
- .github/workflows/skill-docs.yml: fails if generated SKILL.md files are stale
- TODO.md: add E2E cost tracking and model pinning to future ideas
* fix: restore rich descriptions lost in auto-generation
- Snapshot flags: add back value hints (-d <N>, -s <sel>, -o <path>)
- Snapshot flags: restore parenthetical context (@e refs, @c refs, etc.)
- Commands: is → includes valid states enum
- Commands: console → notes --errors filter behavior
- Commands: press → lists common keys (Enter, Tab, Escape)
- Commands: cookie-import-browser → describes picker UI
- Commands: dialog-accept → specifies alert/confirm/prompt
- Tips: restore → arrow (was downgraded to ->)
* test: quality evals for generated SKILL.md descriptions
Catches the exact regressions we shipped and caught in review:
- Snapshot flags must include value hints (-d <N>, -s <sel>, -o <path>)
- is command must list all valid states (visible/hidden/enabled/...)
- press command must list example keys (Enter, Tab, Escape)
- console command must describe --errors behavior
- Snapshot -i must mention @e refs, -C must mention @c refs
- All descriptions must be >= 8 chars (no empty stubs)
- Tips section must use → not ->
* feat: LLM-as-judge evals for SKILL.md documentation quality
4 eval tests using Anthropic API (claude-haiku, ~$0.01-0.03/run):
- Command reference table: clarity/completeness/actionability >= 4/5
- Snapshot flags section: same thresholds
- browse/SKILL.md overall quality
- Regression: generated version must score >= hand-maintained baseline
Requires ANTHROPIC_API_KEY. Auto-skips without it.
Run: bun run test:eval (or ANTHROPIC_API_KEY=sk-... bun test test/skill-llm-eval.test.ts)
* chore: bump version to 0.3.3, update changelog
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add ARCHITECTURE.md, update CLAUDE.md and CONTRIBUTING.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: conductor.json lifecycle hooks + .env propagation across worktrees
bin/dev-setup now copies .env from main worktree so API keys carry
over to Conductor workspaces automatically. conductor.json wires up
setup and archive hooks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: complete CHANGELOG for v0.3.3 (architecture, conductor, .env)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
||
|
|
07b4e15b34 |
feat: v0.3.2 — project-local state, diff-aware QA, Greptile integration (#36)
* fix: cookie import picker returns JSON instead of HTML jsonResponse() was defined at module scope but referenced `url` which only existed as a parameter of handleCookiePickerRoute(). Every API call crashed, the catch block also crashed, and Bun returned a default HTML page that the frontend couldn't parse as JSON. Thread port via corsOrigin() helper and options objects. Add route-level tests to prevent this class of bug from shipping again. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add help command to browse server Agents that don't have SKILL.md loaded (or misread flags) had no way to self-discover the CLI. The help command returns a formatted reference of all commands and snapshot flags. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: version-aware find-browse with META signal protocol Agents in other workspaces found stale browse binaries that were missing newer flags. find-browse now compares the local binary's git SHA against origin/main via git ls-remote (4hr cache), and emits META:UPDATE_AVAILABLE when behind. SKILL.md setup checks parse META signals and prompt the user to update. - New compiled binary: browse/dist/find-browse (TypeScript, testable) - Bash shim at browse/bin/find-browse delegates to compiled binary - .version file written at build time with git commit SHA - Build script compiles both browse and find-browse binaries - Graceful degradation: offline, missing .version, corrupt cache all skip check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: clean up .bun-build temp files after compile bun build --compile leaves ~58MB temp files in the working directory. Add rm -f .*.bun-build to the build script to clean up after each build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make help command reachable by removing it from META_COMMANDS help was in META_COMMANDS, so it dispatched to handleMetaCommand() which threw "Unknown meta command: help". Removing it from the set lets the dedicated else-if handler in handleCommand() execute correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.3.2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add shared Greptile comment triage reference doc Shared reference for fetching, filtering, and classifying Greptile review comments on GitHub PRs. Used by both /review and /ship skills. Includes parallel API fetching, suppressions check, classification logic, reply APIs, and history file writes. * feat: make /review and /ship Greptile-aware /review: Step 2.5 fetches and classifies Greptile comments, Step 5 resolves them with AskUserQuestion for valid issues and false positives. /ship: Step 3.75 triages Greptile comments between pre-landing review and version bump. Adds Greptile Review section to PR body in Step 8. Re-runs tests if any Greptile fixes are applied. * feat: add Greptile batting average to /retro Reads ~/.gstack/greptile-history.md, computes signal ratio (valid catches vs false positives), includes in metrics table, JSON snapshot, and Code Quality Signals narrative. * docs: add Greptile integration section to README Personal endorsement, two-layer review narrative, full UX walkthrough transcript, skills table updates. Add Greptile training feedback loop to TODO.md future ideas. * feat: add local dev mode for testing skills from within the repo bin/dev-setup creates .claude/skills/gstack symlink to the working tree so Claude Code discovers skills locally. bin/dev-teardown cleans up. DEVELOPING_GSTACK.md documents the workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: narrow gitignore to .claude/skills/ instead of all .claude/ Avoids ignoring legitimate Claude Code config like settings.json or CLAUDE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: rename DEVELOPING_GSTACK.md to CONTRIBUTING.md Rewritten as a contributor-friendly guide instead of a dry plan doc. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: explain why dev-setup is needed in CONTRIBUTING.md quick start Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add browser interaction guidance to CLAUDE.md Prevents Claude from using mcp__claude-in-chrome__* tools instead of /browse. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add shared config module for project-local browse state Centralizes path resolution (git root detection, state dir, log paths) into config.ts. Both cli.ts and server.ts import from it, eliminating duplicated PORT_OFFSET/BROWSE_PORT/STATE_FILE logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: rewrite port selection to use random ports Replace CONDUCTOR_PORT magic offset and 9400-9409 scan with random port 10000-60000. Atomic state file writes, log paths from config module, binaryVersion field for auto-restart on update. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: move browse state from /tmp to project-local .gstack/ CLI now uses config module for state paths, passes BROWSE_STATE_FILE to spawned server. Adds version mismatch auto-restart, legacy /tmp cleanup with PID verification, and removes stale global install fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update crash log path reference to .gstack/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add config tests and update CLI lifecycle test 14 new tests for config resolution, ensureStateDir, readVersionHash, resolveServerScript, and version mismatch detection. Remove obsolete CONDUCTOR_PORT/BROWSE_PORT filtering from commands.test.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update BROWSER.md and TODO.md for project-local state Replace /tmp paths with .gstack/, remove CONDUCTOR_PORT docs, document random port selection and per-project isolation. Add server bundling TODO. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update README, CHANGELOG, and CONTRIBUTING for v0.3.2 - README: replace Conductor-aware language with project-local isolation, add Greptile setup note - CHANGELOG: comprehensive v0.3.2 entry with all state management changes - CONTRIBUTING: add instructions for testing branches in other repos Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add diff-aware mode to /qa — auto-tests affected pages from branch diff When on a feature branch, /qa now reads git diff main, identifies affected pages/routes from changed files, and tests them automatically. No URL required. The most natural flow: write code, /ship, /qa. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: update CHANGELOG for complete v0.3.2 coverage Add missing entries: diff-aware QA mode, Greptile integration, local dev mode, crash log path fix, README/SKILL.md updates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
3d901066cd |
Initial release — gstack v0.0.1
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |