Commit Graph

  • 7d26666164 Merge pull request #55 from garrytan/v0.3.6-qa-upgrades Garry Tan 2026-03-14 11:24:24 -07:00
  • baf8acd55c fix: update check ignores stale UP_TO_DATE cache after version change v0.3.6-qa-upgrades Garry Tan 2026-03-14 13:23:25 -05:00
  • 4e31acbd47 fix: auto-clear stale heartbeat when process is dead Garry Tan 2026-03-14 12:55:40 -05:00
  • 43fbe165a4 docs: update README, CONTRIBUTING, ARCHITECTURE for v0.3.6 Garry Tan 2026-03-14 12:47:00 -05:00
  • 4ace0c2f6f chore: bump version and changelog (v0.3.6) Garry Tan 2026-03-14 12:44:41 -05:00
  • 9f5aa32e67 fix: fail fast on API connectivity — pre-check before E2E suite Garry Tan 2026-03-14 12:37:44 -05:00
  • 5aae3ce117 fix: never clean up observability artifacts — partial file persists after finalize Garry Tan 2026-03-14 12:37:38 -05:00
  • 336dbaa50d fix: detect is_error from claude -p result line (ConnectionRefused was PASS) Garry Tan 2026-03-14 12:35:43 -05:00
  • 029a7c2a37 feat: eval-watch dashboard + observability unit tests (15 tests, 11 codepaths) Garry Tan 2026-03-14 11:04:40 -05:00
  • 510a8d8dda feat: wire runId + testName + diagnostics through all E2E tests Garry Tan 2026-03-14 11:04:28 -05:00
  • f9cfabeda8 feat: add E2E observability — heartbeat, progress.log, NDJSON persistence, savePartial() Garry Tan 2026-03-14 11:04:16 -05:00
  • eb9a9193c9 fix: plan-ceo-review timeout — init git repo, skip codebase exploration, bump to 420s Garry Tan 2026-03-14 08:39:26 -05:00
  • 7d5036db1a fix: increase timeouts for plan-review and retro E2E tests Garry Tan 2026-03-14 07:54:48 -05:00
  • f1ee3d924e feat: template-ify all skills + E2E tests for plan-ceo-review, plan-eng-review, retro Garry Tan 2026-03-14 07:28:02 -05:00
  • 2d88f5f02a test: add update-check exit code regression tests Garry Tan 2026-03-14 07:19:11 -05:00
  • c6c3294ee9 fix: 100% E2E pass — isolate test dirs, restart server, relax FP thresholds Garry Tan 2026-03-14 07:17:17 -05:00
  • cddf8ee3bd fix: simplify planted-bug eval prompts for reliable 25-turn completion Garry Tan 2026-03-14 05:51:48 -05:00
  • 4a56b882ab fix: make planted-bug evals resilient to max_turns and browse error flakes Garry Tan 2026-03-14 05:29:40 -05:00
  • 2e75c33714 fix: lower planted-bug detection baselines and LLM judge thresholds for reliability Garry Tan 2026-03-14 05:16:17 -05:00
  • 4063104126 fix: remove false-positive Exit code 1 pattern, fix NEEDS_SETUP test, update QA tests Garry Tan 2026-03-14 04:48:35 -05:00
  • a67dae5f84 fix: update check preamble exits 1 when up to date — convert all skills to .tmpl Garry Tan 2026-03-14 04:40:46 -05:00
  • ed802d0c7f feat: eval CLI tools + docs cleanup Garry Tan 2026-03-14 03:49:57 -05:00
  • 84f52f3bad feat: eval persistence with auto-compare against previous run Garry Tan 2026-03-14 03:49:47 -05:00
  • e7347c2f8f feat: stream-json NDJSON parser for real-time E2E progress Garry Tan 2026-03-14 03:49:36 -05:00
  • 3d750d89af Merge remote-tracking branch 'origin/main' into v0.3.6-qa-upgrades Garry Tan 2026-03-14 02:35:48 -05:00
  • c35e933c7d fix: rewrite session-runner to claude -p subprocess, lower flaky baselines Garry Tan 2026-03-14 02:34:10 -05:00
  • 1717ed2891 fix: browse binary discovery broken for agents (v0.3.5) (#44) Garry Tan 2026-03-14 00:24:06 -07:00
  • 5ac76b8153 chore: bump version and changelog (v0.3.5) investigate-brokenness Garry Tan 2026-03-14 02:02:33 -05:00
  • 60fce976cb test: add e2e and LLM eval tests for SKILL.md setup block Garry Tan 2026-03-14 02:02:28 -05:00
  • 8e1feb7fa2 refactor: convert qa/ and setup-browser-cookies/ to .tmpl templates Garry Tan 2026-03-14 02:02:22 -05:00
  • f4a298551a fix: replace find-browse with direct path in SKILL.md setup blocks Garry Tan 2026-03-14 02:02:17 -05:00
  • 942df42161 simplify: one command for evals — bun run test:evals v0.3.5-qa-upgrades Garry Tan 2026-03-14 01:27:42 -05:00
  • b5b2a15ad2 fix: pass all LLM evals — severity defs, rubric edge cases, EVALS=1 flag Garry Tan 2026-03-14 01:27:06 -05:00
  • 76803d789a feat: 3-tier eval suite with planted-bug outcome testing (EVALS=1) Garry Tan 2026-03-14 01:17:36 -05:00
  • 6b69c46a27 feat: daily update check + /gstack-upgrade skill (v0.3.4) (#42) Garry Tan 2026-03-13 22:17:25 -07:00
  • d6fdee681a merge: resolve CHANGELOG conflict with origin/main v0.3.4-upgrading-ease Garry Tan 2026-03-14 00:15:23 -05:00
  • 5155fe3a28 Merge remote-tracking branch 'origin/main' into v0.3.5-qa-upgrades Garry Tan 2026-03-14 00:15:06 -05:00
  • 650680443e fix: remove unused import + add corrupt cache test Garry Tan 2026-03-14 00:14:46 -05:00
  • a468374272 fix: enrich SKILL.md docs to pass LLM evals, upgrade judge to Sonnet 4.6 (#43) Garry Tan 2026-03-13 22:14:14 -07:00
  • f1581e6ff7 chore: upgrade eval judge to Sonnet 4.6, update changelog garrytan/check-evals Garry Tan 2026-03-14 00:12:48 -05:00
  • 4f5757a1f2 test: add usage consistency and pipe guard tests Garry Tan 2026-03-14 00:12:42 -05:00
  • b87a33aec5 refactor: auto-generate server.ts help text from COMMAND_DESCRIPTIONS Garry Tan 2026-03-14 00:12:38 -05:00
  • ae53970499 fix: enrich command descriptions and snapshot flags for LLM eval quality Garry Tan 2026-03-14 00:12:33 -05:00
  • e377ba295d feat: dual greptile-history paths (per-project + global) Garry Tan 2026-03-14 00:10:14 -05:00
  • e04ad1bea0 feat: QA test plan tiers with per-page risk scoring Garry Tan 2026-03-14 00:10:07 -05:00
  • ff5cbbbfef feat: add remote slug helper and auto-gitignore for .gstack/ Garry Tan 2026-03-14 00:10:00 -05:00
  • 02f0ca6938 chore: regenerate SKILL.md from template Garry Tan 2026-03-14 00:09:53 -05:00
  • f43c962b50 chore: bump version and changelog (v0.3.4) Garry Tan 2026-03-13 23:57:24 -05:00
  • a99162db66 feat: add update check preamble to all 9 skills Garry Tan 2026-03-13 23:57:19 -05:00
  • cb11783253 refactor: remove version check from find-browse, simplify to binary locator Garry Tan 2026-03-13 23:57:13 -05:00
  • ff55e532f7 feat: add daily update check script + /gstack-upgrade skill Garry Tan 2026-03-13 23:57:05 -05:00
  • 5205070299 feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41) Garry Tan 2026-03-13 21:08:12 -07:00
  • b08b187f55 docs: complete CHANGELOG for v0.3.3 (architecture, conductor, .env) v0.3.3-e2e-tests-templates Garry Tan 2026-03-13 23:07:40 -05:00
  • 9099010859 Merge remote-tracking branch 'origin/main' into v0.3.3-e2e-tests-templates Garry Tan 2026-03-13 23:06:15 -05:00
  • 7e62f4bd0f feat: conductor.json lifecycle hooks + .env propagation across worktrees Garry Tan 2026-03-13 22:56:49 -05:00
  • ea0c0dad5e Add .env to gitignore Garry Tan 2026-03-13 22:49:13 -05:00
  • c5f40465a8 docs: add ARCHITECTURE.md, update CLAUDE.md and CONTRIBUTING.md Garry Tan 2026-03-13 22:35:35 -05:00
  • e6c7348006 chore: bump version to 0.3.3, update changelog Garry Tan 2026-03-13 22:35:31 -05:00
  • 9dffb1ed16 feat: LLM-as-judge evals for SKILL.md documentation quality v0.3.3 Garry Tan 2026-03-13 15:59:11 -07:00
  • a0f28de22f test: quality evals for generated SKILL.md descriptions Garry Tan 2026-03-13 15:56:24 -07:00
  • a6153e3cc6 fix: restore rich descriptions lost in auto-generation Garry Tan 2026-03-13 15:52:39 -07:00
  • 65455a582e ci: SKILL.md freshness check on push/PR + TODO updates Garry Tan 2026-03-13 15:43:47 -07:00
  • 173bad8e82 feat: DX tools (skill:check, dev:skill) + Tier 2 E2E test scaffolding Garry Tan 2026-03-13 15:43:41 -07:00
  • 3d9b8e8e21 test: Tier 1 static validation — 34 tests for SKILL.md command correctness Garry Tan 2026-03-13 15:43:34 -07:00
  • 9aec68142d feat: SKILL.md template system with auto-generated command references Garry Tan 2026-03-13 15:43:21 -07:00
  • f565a06912 refactor: extract command registry to commands.ts, add SNAPSHOT_FLAGS metadata Garry Tan 2026-03-13 15:43:13 -07:00
  • 07b4e15b34 feat: v0.3.2 — project-local state, diff-aware QA, Greptile integration (#36) Garry Tan 2026-03-13 18:10:56 -07:00
  • 1e2bffd213 chore: update CHANGELOG for complete v0.3.2 coverage garrytan/v0.3.2 Garry Tan 2026-03-13 20:10:13 -05:00
  • 22c24b9092 feat: add diff-aware mode to /qa — auto-tests affected pages from branch diff Garry Tan 2026-03-13 20:03:04 -05:00
  • 675aaa6651 docs: update README, CHANGELOG, and CONTRIBUTING for v0.3.2 Garry Tan 2026-03-13 20:00:54 -05:00
  • 5337ce8635 docs: update BROWSER.md and TODO.md for project-local state Garry Tan 2026-03-13 19:46:00 -05:00
  • b9125fd394 test: add config tests and update CLI lifecycle test Garry Tan 2026-03-13 19:45:58 -05:00
  • 63c6910fb9 fix: update crash log path reference to .gstack/ Garry Tan 2026-03-13 19:45:56 -05:00
  • 594bf32085 feat: move browse state from /tmp to project-local .gstack/ Garry Tan 2026-03-13 19:45:54 -05:00
  • 17bd61ee72 feat: rewrite port selection to use random ports Garry Tan 2026-03-13 19:45:52 -05:00
  • 15db8c86ab feat: add shared config module for project-local browse state Garry Tan 2026-03-13 19:45:49 -05:00
  • 6345bb0aa2 feat: add browser interaction guidance to CLAUDE.md Garry Tan 2026-03-13 19:45:42 -05:00
  • 06f25ca387 feat: LLM-as-judge evals for SKILL.md documentation quality garrytan/cmd-tooling Garry Tan 2026-03-13 15:59:11 -07:00
  • 2cd50101fc test: quality evals for generated SKILL.md descriptions Garry Tan 2026-03-13 15:56:24 -07:00
  • b6c450fc8c fix: restore rich descriptions lost in auto-generation Garry Tan 2026-03-13 15:52:39 -07:00
  • 5393739862 ci: SKILL.md freshness check on push/PR + TODO updates Garry Tan 2026-03-13 15:43:47 -07:00
  • 18ce1129f3 feat: DX tools (skill:check, dev:skill) + Tier 2 E2E test scaffolding Garry Tan 2026-03-13 15:43:41 -07:00
  • cc023045ab test: Tier 1 static validation — 34 tests for SKILL.md command correctness Garry Tan 2026-03-13 15:43:34 -07:00
  • 3d38394fae feat: SKILL.md template system with auto-generated command references Garry Tan 2026-03-13 15:43:21 -07:00
  • 46b20fe01e refactor: extract command registry to commands.ts, add SNAPSHOT_FLAGS metadata Garry Tan 2026-03-13 15:43:13 -07:00
  • 04ce0a613f docs: explain why dev-setup is needed in CONTRIBUTING.md quick start Garry Tan 2026-03-13 15:41:18 -07:00
  • f83fa3417c docs: rename DEVELOPING_GSTACK.md to CONTRIBUTING.md Garry Tan 2026-03-13 15:39:55 -07:00
  • b76f4e56c0 fix: narrow gitignore to .claude/skills/ instead of all .claude/ Garry Tan 2026-03-13 15:37:10 -07:00
  • 4899b71e19 feat: add local dev mode for testing skills from within the repo Garry Tan 2026-03-13 15:36:27 -07:00
  • d30d5eb56e docs: add Greptile integration section to README Garry Tan 2026-03-13 15:27:37 -07:00
  • f2da4346dc feat: add Greptile batting average to /retro Garry Tan 2026-03-13 15:27:22 -07:00
  • 5126806125 feat: make /review and /ship Greptile-aware Garry Tan 2026-03-13 15:27:05 -07:00
  • 0ba14acf3d feat: add shared Greptile comment triage reference doc Garry Tan 2026-03-13 15:26:49 -07:00
  • 259517b3d3 chore: bump version and changelog (v0.3.2) Garry Tan 2026-03-13 11:18:44 -07:00
  • 645afc83ab fix: make help command reachable by removing it from META_COMMANDS Garry Tan 2026-03-13 11:18:06 -07:00
  • 7a8cc2290c chore: clean up .bun-build temp files after compile Garry Tan 2026-03-13 10:15:22 -07:00
  • c0153f1fe9 feat: version-aware find-browse with META signal protocol Garry Tan 2026-03-13 09:49:55 -07:00
  • 9b103871b9 feat: add help command to browse server Garry Tan 2026-03-13 09:49:46 -07:00
  • ee0b11452d fix: cookie import picker returns JSON instead of HTML Garry Tan 2026-03-13 09:49:41 -07:00
  • f7b95329c1 feat: Phase 3.5 — cookie import, QA testing, team retro (v0.3.1) (#29) Garry Tan 2026-03-13 00:31:41 -07:00