gstack

mirror of https://github.com/garrytan/gstack.git synced 2026-06-17 23:30:09 +02:00

Author	SHA1	Message	Date
Garry Tan	cce407b218	Merge remote-tracking branch 'origin/garrytan/team-supabase-store' into garrytan/dev-mode	2026-03-16 00:22:05 -05:00
Garry Tan	6e14689f0e	docs: add team sync TODOs — streaming parser, effectiveness scoring, weekly digest Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-16 00:15:40 -05:00
Garry Tan	3a57a3f59e	feat: add /setup-team-sync skill, auto-push transcript hooks in skills - setup-team-sync/SKILL.md.tmpl: idempotent guided setup (create config, OAuth, verify connectivity, configure settings, summary) - ship/retro/qa SKILL.md.tmpl: add push-transcript hook after existing push-ship/push-retro/push-qa hooks (silent, non-fatal) - scripts/gen-skill-docs.ts: add setup-team-sync to template list - Regenerated all SKILL.md files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-16 00:15:36 -05:00
Garry Tan	a104471272	feat: add push-transcript CLI, show sessions, interactive setup, 36 tests - cli-sync.ts: push-transcript command, show sessions with formatSessionTable(), upgrade cmdSetup() to interactively create .gstack-sync.json if missing - bin/gstack-sync: add push-transcript case and help text - test/lib-llm-summarize.test.ts: 10 tests with mocked fetch (429 retry, 5xx backoff, malformed response, no API key, cache) - test/lib-transcript-sync.test.ts: 22 tests for parsing, grouping, session file extraction, marker management, slug resolution - test/lib-sync-show.test.ts: 4 tests for formatSessionTable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-16 00:15:26 -05:00
Garry Tan	0e29d7d1a3	feat: add enriched transcript sync — Haiku summaries, session file enrichment Add session intelligence pipeline for team transcript sync: - lib/transcript-sync.ts: parse history.jsonl, enrich with Claude session file data (tools_used, full turn count), sync marker management, 10-concurrent push with 5-concurrent Haiku summarization - lib/llm-summarize.ts: raw fetch() to Anthropic Messages API (no SDK dep), retry-after on 429, exponential backoff on 5xx, SHA-based eval-cache - lib/sync.ts: pushTranscript() and pullTranscripts() following existing patterns - 006_transcript_sync.sql: unique index on (team_id, session_id) for idempotent upsert, RLS changed from admin-only to team-wide read Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-16 00:15:19 -05:00
Garry Tan	2d42e15b5c	chore: bump version and changelog (v0.3.11) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 20:46:10 -05:00
Garry Tan	b07e842f13	Merge remote-tracking branch 'origin/garrytan/team-supabase-store' into garrytan/dev-mode	2026-03-15 20:41:33 -05:00
Garry Tan	5e641bdf76	feat: add Enum & Value Completeness to /review critical checklist New CRITICAL review category that traces new enum values, status strings, and type constants through every consumer outside the diff. Catches the class of bugs where a new value is added but not handled in all switch/case chains, allowlists, or frontend-backend contracts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 20:41:17 -05:00
Garry Tan	87cb769c35	feat: sync heartbeats, eval:trend --team, setup guide, 10 new tests - 005_sync_heartbeats.sql migration for connectivity testing - eval:trend --team flag pulls team eval data (graceful fallback) - docs/TEAM_SYNC_SETUP.md step-by-step setup guide - Design doc status updated to Phase 2 complete - 10 new tests for sync show formatting functions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 19:43:03 -05:00
Garry Tan	06f2da2019	feat: wire team sync push into ship, retro, qa, and greptile skills Add non-fatal sync steps to all 4 skill templates: - /ship Step 8.5: write ship log JSON + push after PR creation - /retro Step 13: push snapshot after JSON save - /qa Phase 6.7: write qa-sync.json + push after health score - greptile-triage: push each triage entry after history file writes All calls use \|\| true for zero disruption. Silent when sync not configured. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 19:42:54 -05:00
Garry Tan	dc3fcc8611	feat: DRY push functions, add push-greptile + sync test/show commands Extract pushWithSync() helper to eliminate boilerplate across 6 push functions. Add pushHeartbeat() for connectivity testing. Add push-greptile to CLI. New commands: gstack-sync test (validates full push/pull flow via sync_heartbeats table), gstack-sync show (terminal team data dashboard with summary/evals/ships/retros views). Guard main block with import.meta.main. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 19:42:45 -05:00
Garry Tan	c11cb708a5	Merge remote-tracking branch 'origin/garrytan/team-supabase-store' into garrytan/dev-mode	2026-03-15 17:29:37 -05:00
Garry Tan	e97108ae10	feat: contributor mode, session awareness, universal RECOMMENDATION format - Rename {{UPDATE_CHECK}} → {{PREAMBLE}} across all 10 skill templates - Add session tracking (touch ~/.gstack/sessions/$PPID, count active sessions) - ELI16 mode when 3+ concurrent sessions detected (re-ground user on context) - Contributor mode: auto-file field reports to ~/.gstack/contributor-logs/ - Universal AskUserQuestion format: context → question → RECOMMENDATION → options - Update plan-ceo-review and plan-eng-review to reference preamble baseline - Add vendored symlink awareness section to CLAUDE.md - Rewrite CONTRIBUTING.md with contributor workflow and cross-project testing - Add tests for contributor mode and session awareness in generated output - Add E2E eval for contributor mode report filing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 17:29:34 -05:00
Garry Tan	704fe34e98	docs: clean up sync example, add team sync section to README Remove _comment hacks from JSON example file. Add short team sync section to README explaining what it is, that it's optional, and how to set it up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 17:06:51 -05:00
Garry Tan	14320469b0	docs: CHANGELOG covers full branch scope including team sync Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 17:05:45 -05:00
Garry Tan	eb7ef2153b	docs: add setup comments to .gstack-sync.json.example Explain what team sync gives you, that it's optional, and how to set it up. Points to TEAM_COORDINATION_STORE.md for full guide. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 17:04:49 -05:00
Garry Tan	e28033353d	chore: bump v0.3.10, update CHANGELOG and docs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 16:55:34 -05:00
Garry Tan	33c9552870	chore: update gitignore Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 16:47:46 -05:00
Garry Tan	daea165333	feat: add eval:trend CLI for per-test pass rate tracking computeTrends() classifies tests as stable-pass/stable-fail/flaky/ improving/degrading based on pass rate, flip count, and recent streak. gstack eval trend shows sparkline table with --limit, --tier, --test filters. Guard CLI main block with import.meta.main to prevent execution on import. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 16:47:41 -05:00
Garry Tan	59752fc510	feat: wire eval-cache + eval-tier into LLM judge, pin E2E model callJudge/judge now return {result, meta} with SHA-based caching (~$0.18/run savings when SKILL.md unchanged) and dynamic model selection via EVAL_JUDGE_TIER env var. E2E tests pass --model from EVAL_TIER to claude -p. outcomeJudge retains simple return type. All 8 LLM eval test sites updated with real costs and costs[]. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 16:47:35 -05:00
Garry Tan	02925cfc7a	feat: wire costs[] from modelUsage into eval results Extract per-model token usage from resultLine.modelUsage (including cache tokens and exact API cost), flow CostEntry[] through EvalCollector, aggregate in finalize(). Extend CostEntry with cache_read_input_tokens, cache_creation_input_tokens, cost_usd. computeCosts() prefers exact cost_usd over MODEL_PRICING when available (~4x more accurate with prompt caching). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 16:47:27 -05:00
Garry Tan	4ad73f7362	feat: unified gstack eval CLI with list, compare, push, cache, cost - lib/cli-eval.ts: routes to list/compare/summary/push/cost/cache/watch subcommands. Ports logic from 4 separate scripts into unified entry. Adds ANSI color for TTY (respects NO_COLOR), --limit flag for list. - bin/gstack-eval: bash wrapper matching bin/gstack-sync pattern - package.json: eval:* scripts now point to lib/cli-eval.ts - supabase/migrations/004_eval_costs.sql: per-model cost tracking + RLS - docs/eval-result-format.md: public format spec for any language - test/lib-eval-cli.test.ts: integration tests (spawn CLI subprocess) including 3 push failure modes (file-not-found, invalid schema, sync unavailable) 215 tests passing across 13 files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 09:39:36 -05:00
Garry Tan	1f5b7882e6	feat: add SHA-based eval caching with EVAL_CACHE=0 bypass Cache at ~/.gstack/eval-cache/{suite}/{sha}.json. Compute cache keys from source file contents + test input via Bun.CryptoHasher SHA256. Supports read/write/stats/clear/verify operations. EVAL_CACHE=0 skips reads for force-rerun. 16 tests including corrupt JSON handling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 09:39:26 -05:00
Garry Tan	9bc6c9416f	feat: add eval format validation, tier selection, cost tracking - lib/eval-format.ts: StandardEvalResult interfaces, validateEvalResult(), normalizeFromLegacy/normalizeToLegacy round-trip converters - lib/eval-tier.ts: EvalTier type, resolveTier/resolveJudgeTier from env, tierToModel mapping, TIER_ALIASES (haiku→fast, sonnet→standard, opus→full) - lib/eval-cost.ts: MODEL_PRICING (last verified 2025-05-01), computeCosts(), formatCostDashboard(), aggregateCosts(), fallback for unknown models - 42 tests across 3 test files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 09:39:18 -05:00
Garry Tan	7f7035f55a	feat: add listEvalFiles, loadEvalResults, formatTimestamp to lib/util.ts DRY up eval I/O duplicated across scripts/eval-list.ts, eval-compare.ts, and eval-summary.ts. Adds EVAL_DIR constant, formatTimestamp(), listEvalFiles(), loadEvalResults() with --limit support. 13 new tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 09:39:09 -05:00
Garry Tan	82e204179b	feat: hook eval-store sync, use shared utils, add 30 lib tests - eval-store.ts: import shared getGitInfo/getVersion, add pushEvalRun() hook in finalize() (non-blocking, non-fatal) - session-runner.ts: import shared atomicWriteSync/sanitizeForFilename - eval-store.test.ts: fix pre-existing bug in double-finalize test (was counting _partial file) - 30 new tests for lib/util, lib/sync-config, lib/sync Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 02:02:54 -05:00
Garry Tan	f7ae465415	feat: add Supabase migration SQL for team data store - 001_teams.sql: teams + team_members + RLS - 002_eval_runs.sql: eval results with universal format, indexes, upsert key - 003_data_tables.sql: retro, QA, ship, greptile, transcripts + RLS All tables use RLS: team members read/insert, admins delete. Transcript table has tighter policy (admin-only read). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 02:02:47 -05:00
Garry Tan	3713c3b9b9	feat: add team sync infrastructure (config, auth, push/pull, CLI) - lib/sync-config.ts: reads .gstack-sync.json + ~/.gstack/auth.json - lib/auth.ts: device auth flow (browser OAuth, local HTTP callback) - lib/sync.ts: Supabase push/pull via raw fetch(), offline queue, cache - lib/cli-sync.ts: CLI handler for gstack-sync commands - bin/gstack-sync: bash wrapper (setup, status, push-*, pull, drain) - .gstack-sync.json.example: template for team setup Zero new dependencies — uses raw fetch() against PostgREST API. All sync is non-fatal with 5s timeout and offline queue fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 02:02:40 -05:00
Garry Tan	caed287496	feat: extract shared utilities into lib/util.ts DRY up atomicWriteSync, readJSON, getGitInfo, getVersion, getRemoteSlug, and sanitizeForFilename from eval-store.ts, session-runner.ts, and eval-watch.ts into a shared module. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 02:02:32 -05:00
Garry Tan	5c1ea088d8	docs: scrub proprietary refs, close eval format gaps, integrate gstack-config - Replace project-specific references with generic language - Add missing fields to eval result format: prompt_sha, by_category, timestamp, response_preview - Enrich failure format with details array, scores dict, expectation_type - Add EVAL_JUDGE_CACHE, EVAL_VERBOSE, multiprocess worker support, dedup on push, run scopes, model aliases, judge profiles - Restructure credential storage to 4 layers with gstack-config (v0.3.9) for user preferences (sync_enabled, sync_transcripts) - Update integration points, observability, and reuse map Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 01:47:30 -05:00
Garry Tan	89311653df	Merge remote-tracking branch 'origin/main' into garrytan/team-supabase-store	2026-03-15 01:32:11 -05:00
Garry Tan	f87bc21865	docs: add team coordination store design doc Design doc for Supabase-backed team data store and universal eval infrastructure. Covers architecture, credential storage, eval formats, YAML test case spec, Supabase schema, phased rollout, and security model. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-15 01:32:06 -05:00
Garry Tan	bb46ca6b21	feat: smart update check with auto-upgrade, snooze backoff, config CLI (v0.3.9) (#62 ) * feat: add bin/gstack-config CLI for reading/writing ~/.gstack/config.yaml Simple get/set/list interface for persistent gstack configuration. Used by update-check and upgrade skill for auto_upgrade and update_check settings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: smart update check with 12h cache, snooze backoff, config disable - Reduce cache TTL from 24h to 12h for faster update detection - Add exponential snooze backoff: 24h → 48h → 1 week (resets on new version) - Add update_check: false config option to disable checks entirely - Clear snooze file on just-upgraded - 14 new tests covering snooze levels, expiry, corruption, and config paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: upgrade skill with auto-upgrade, 4-option prompt, vendored sync - Auto-upgrade mode via config or GSTACK_AUTO_UPGRADE=1 env var - 4-option AskUserQuestion: upgrade once, always, not now, never - Step 4.5: sync local vendored copy after upgrading primary install - Snooze write with escalating backoff on "Not now" - Update preamble text in gen-skill-docs for new upgrade flow - Regenerate all SKILL.md files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: simplify upgrade instructions, move auto-upgrade to completed README now points to /gstack-upgrade instead of long paste commands. Auto-upgrade TODO moved to Completed section (v0.3.8). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.3.9) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 23:28:02 -07:00
Garry Tan	41141007c1	feat: TODOS-aware skills, 2-tier Greptile replies, gitignore fix (#61 ) * fix: log non-ENOENT errors in ensureStateDir() instead of silently swallowing Replace bare catch {} with ENOENT-only silence. Non-ENOENT errors (EACCES, ENOSPC) are now logged to .gstack/browse-server.log. Includes test for permission-denied scenario with chmod 444. * feat: merge TODO.md + TODOS.md into unified backlog with shared format reference Merge TODO.md (roadmap) and TODOS.md (near-term) into one file organized by skill/component with P0-P4 priority ordering and Completed section. Add shared review/TODOS-format.md for canonical format. Add static validation tests. * feat: add 2-tier Greptile reply system with escalation detection Add reply templates (Tier 1 friendly, Tier 2 firm), explicit escalation detection algorithm, and severity re-ranking guidance to greptile-triage.md. * feat: cross-skill TODOS awareness + Greptile template refs in all skills /ship Step 5.5: auto-detect completed TODOs, offer reorganization. /review Step 5.5: cross-reference PR against open TODOs. /plan-ceo-review, /plan-eng-review: TODOS context in planning. /retro: Backlog Health metric. /qa: bug TODO context in diff-aware mode. All Greptile-aware skills now reference reply templates and escalation detection. * chore: bump version and changelog (v0.3.8) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update CONTRIBUTING.md for v0.3.8 changes Clarify test tier cost table (Tier 3 standalone vs combined), add TODOS.md to "Things to know", mention Greptile triage in ship workflow description. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 20:15:11 -07:00
Garry Tan	2aa745cb0e	feat: screenshot element/region clipping (v0.3.7) (#56 ) * feat: screenshot element/region clipping (--clip, --viewport, CSS/@ref) Add element crop (CSS selector or @ref), region clip (--clip x,y,w,h), and viewport-only (--viewport) modes to the screenshot command. Uses Playwright's native locator.screenshot() and page.screenshot({ clip }). Full page remains the default. Includes 10 new tests covering all modes and error paths. * chore: bump version and changelog (v0.3.7) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add screenshot modes to BROWSER.md command reference Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 12:47:42 -07:00
Garry Tan	0ac7ef4e81	fix: harden planted-bug eval prompt for reliable form testing Phase 3 was too vague ("click every nav link") causing the agent to wander instead of systematically testing form fields. Now explicitly directs: fill every input, clear it, try invalid values, submit and check console. Added Phase 4 finalize step to ensure report is updated with all findings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 13:28:18 -05:00
Garry Tan	7d26666164	Merge pull request #55 from garrytan/v0.3.6-qa-upgrades feat: E2E observability + eval infrastructure + all skills templated	2026-03-14 11:24:24 -07:00
Garry Tan	baf8acd55c	fix: update check ignores stale UP_TO_DATE cache after version change The UP_TO_DATE cache path exited immediately without checking if the cached version still matched the local VERSION. After upgrading (e.g. 0.3.3 → 0.3.4), the cache still said "UP_TO_DATE 0.3.3" and the script never re-checked against remote — so updates were invisible until the 24h cache expired. Now both UP_TO_DATE and UPGRADE_AVAILABLE verify cached version vs local before trusting the cache. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 13:23:25 -05:00
Garry Tan	4e31acbd47	fix: auto-clear stale heartbeat when process is dead Add PID to heartbeat file. eval-watch checks process.kill(pid, 0) and auto-deletes the heartbeat when the PID is no longer alive — no manual cleanup needed after crashed/killed E2E runs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 12:55:40 -05:00
Garry Tan	43fbe165a4	docs: update README, CONTRIBUTING, ARCHITECTURE for v0.3.6 Update test tier costs and commands (Agent SDK → claude -p, SKILL_E2E → EVALS), add E2E observability section to CONTRIBUTING and ARCHITECTURE, add testing quick-start to README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 12:47:00 -05:00
Garry Tan	4ace0c2f6f	chore: bump version and changelog (v0.3.6) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 12:44:41 -05:00
Garry Tan	9f5aa32e67	fix: fail fast on API connectivity — pre-check before E2E suite Spawn a quick claude -p ping before running 13 tests. If the Anthropic API is unreachable (ConnectionRefused), throw immediately instead of burning through the entire suite with silent false passes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 12:37:44 -05:00
Garry Tan	5aae3ce117	fix: never clean up observability artifacts — partial file persists after finalize Removing the _partial-e2e.json deletion from finalize(). These are small files on a local disk and their persistence is the whole point of observability. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 12:37:38 -05:00
Garry Tan	336dbaa50d	fix: detect is_error from claude -p result line (ConnectionRefused was PASS) claude -p can return subtype="success" with is_error=true when the API is unreachable. Previously we only checked subtype, so API failures silently passed. Now check is_error first and report as 'error_api'. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 12:35:43 -05:00
Garry Tan	029a7c2a37	feat: eval-watch dashboard + observability unit tests (15 tests, 11 codepaths) eval-watch: live terminal dashboard reads heartbeat + partial file every 1s, shows completed/running tests, stale detection (>10min), --tail flag for progress.log tail. Pure renderDashboard() function for testability. observability.test.ts: unit tests for sanitizeTestName, heartbeat schema, progress.log format, NDJSON file naming, savePartial() with _partial flag, finalize() cleanup, diagnostic fields, watcher rendering, stale detection, and non-fatal I/O guarantees. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 11:04:40 -05:00
Garry Tan	510a8d8dda	feat: wire runId + testName + diagnostics through all E2E tests Generate per-session runId, pass testName + runId to every runSkillTest() call, wire exit_reason/timeout_at_turn/last_tool_call through recordE2E(). Add eval:watch script entry to package.json. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 11:04:28 -05:00
Garry Tan	f9cfabeda8	feat: add E2E observability — heartbeat, progress.log, NDJSON persistence, savePartial() session-runner: atomic heartbeat file (e2e-live.json), per-run log directory (~/.gstack-dev/e2e-runs/{runId}/), progress.log + per-test NDJSON persistence, failure transcripts to persistent run dir instead of tmpdir. eval-store: 3 new diagnostic fields (exit_reason, timeout_at_turn, last_tool_call), savePartial() writes _partial-e2e.json after each addTest() for crash resilience, finalize() cleans up partial file. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 11:04:16 -05:00
Garry Tan	eb9a9193c9	fix: plan-ceo-review timeout — init git repo, skip codebase exploration, bump to 420s The CEO review SKILL.md has a "System Audit" step that runs git commands. In an empty tmpdir without a git repo, the agent wastes turns exploring. Fix: init minimal git repo, tell agent to skip codebase exploration, bump test timeouts to 420s for all review/retro tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 08:39:26 -05:00
Garry Tan	7d5036db1a	fix: increase timeouts for plan-review and retro E2E tests plan-ceo-review takes ~300s (thorough 10-section review), retro takes ~220s (many git commands for history analysis). Bumped runSkillTest timeout to 300s and test timeout to 360s. Also accept error_max_turns for these verbose skills. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 07:54:48 -05:00
Garry Tan	f1ee3d924e	feat: template-ify all skills + E2E tests for plan-ceo-review, plan-eng-review, retro - Convert gstack-upgrade to SKILL.md.tmpl template system - All 10 skills now use templates (consistent auto-generated headers) - Add comprehensive template validation tests (22 tests): every skill has .tmpl, generated SKILL.md has header, valid frontmatter, --dry-run reports FRESH, no unresolved placeholders - Add E2E tests for /plan-ceo-review, /plan-eng-review, /retro - Mark /ship, /setup-browser-cookies, /gstack-upgrade as test.todo (destructive/interactive) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 07:28:02 -05:00

1 2

81 Commits