gstack

mirror of https://github.com/garrytan/gstack.git synced 2026-05-02 03:35:09 +02:00

Author	SHA1	Message	Date
Garry Tan	264c1ca234	feat: plan files always show review status (v0.11.1.1) (#345 ) * feat: plan files always show review status via preamble footer Add Plan Status Footer to generateCompletionStatus() in the preamble. When in plan mode before ExitPlanMode, Claude writes a GSTACK REVIEW REPORT section to the plan file — either populated from review logs or a "NO REVIEWS YET" placeholder. Skips if a review skill already wrote a richer report. * chore: bump version and changelog (v0.11.1.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-22 18:42:56 -07:00
Garry Tan	7ff0f84b1e	feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 ) * refactor: extract {{TEST_COVERAGE_AUDIT}} shared resolver DRY extraction of the test coverage audit methodology into a shared generator function with three explicit placeholders: - TEST_COVERAGE_AUDIT_PLAN (plan-eng-review) - TEST_COVERAGE_AUDIT_SHIP (ship) - TEST_COVERAGE_AUDIT_REVIEW (review) Shared across all modes: codepath tracing, ASCII diagram format, quality scoring rubric, E2E test decision matrix, regression rule, and test framework detection via CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: plan-eng-review uses shared test coverage audit Replace the thin 6-line Section 3 test review with the full shared methodology via {{TEST_COVERAGE_AUDIT_PLAN}}. Plan mode now: - Traces every codepath with full ASCII diagrams - Adds missing tests to the plan (not just "check for tests") - Writes test plan artifact for /qa consumption - Includes E2E/eval recommendations and regression detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: ship uses shared test coverage audit Replace 135 lines of inline Step 3.4 methodology with {{TEST_COVERAGE_AUDIT_SHIP}}. Functionally identical output plus: - E2E test decision matrix (marks paths needing E2E vs unit) - Eval recommendations for LLM prompt changes - Regression detection iron rule - Test framework detection via CLAUDE.md first - Test plan artifact for /qa consumption Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /review Step 4.75 test coverage diagram Add codepath tracing to the pre-landing review via {{TEST_COVERAGE_AUDIT_REVIEW}}. Review mode: - Produces ASCII coverage diagram (same methodology as plan/ship) - Generates tests for gaps via Fix-First (ASK user) - Subsumes Pass 2 "Test Gaps" checklist category - Gaps are INFORMATIONAL findings Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: mode differentiation + regression guard for coverage audit 10 new tests verifying the three TEST_COVERAGE_AUDIT placeholders: - All modes share: codepath tracing, E2E matrix, regression rule - Plan mode: adds to plan + artifact, no ship-specific content - Ship mode: auto-generates + before/after count + coverage summary - Review mode: Fix-First ASK + INFORMATIONAL, no artifact - Regression guard: ship SKILL.md preserves all key phrases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: extract shared coverage audit fixture + review E2E - Extract billing.ts fixture into coverage-audit-fixture.ts (DRY) - Refactor ship-coverage-audit E2E to use shared fixture - Add review-coverage-audit E2E for Step 4.75 - Update touchfiles: both E2Es depend on shared fixture Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: strengthen E2E assertions for coverage audit tests The coverage audit E2E tests (ship + review) were only asserting exitReason === 'success' and readCalls > 0 — they passed even if the agent produced no coverage diagram. Add assertion that the output contains either GAP or TESTED markers. Found during /review. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: plan mode traces the plan, not the git diff Codex adversarial review caught that plan-eng-review was inheriting "git diff origin/<base>...HEAD" from the shared resolver, but plan mode reviews a plan document, not a code diff. Plan mode now says: "Trace every codepath in the plan" and "Read the plan document." Ship and review modes keep the git diff instruction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.9.5.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: test coverage catalog + failure triage (merged branches) (#285) * feat: add bin/gstack-repo-mode — solo vs collaborative detection with caching Detects whether a repo is solo-dev (one person does 80%+ of recent commits) or collaborative. Uses 90-day git shortlog window with 7-day cache in ~/.gstack/projects/{SLUG}/repo-mode.json. Config override via `gstack-config set repo_mode solo\|collaborative` takes precedence over the heuristic. Minimum 5 commits required to classify (otherwise unknown). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: test failure ownership triage — see something say something Adds two new preamble sections to all gstack skills: - Repo Ownership Mode: explains solo vs collaborative behavior - See Something, Say Something: proactive issue flagging principle Adds {{TEST_FAILURE_TRIAGE}} template variable (opt-in, used by /ship): - Classifies test failures as in-branch vs pre-existing - Solo mode defaults to "investigate and fix now" - Collaborative mode offers "blame + assign GitHub issue" option - Also offers P0 TODO and skip options /ship Step 3 now triages test failures instead of hard-stopping on all failures. In-branch failures still block shipping. Pre-existing failures get user-directed triage based on repo mode. Adds P2 TODO for gstack notes system (deferred lightweight reminder). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files for Claude and Codex hosts All 22 Claude skills and 21 Codex skills regenerated with new preamble sections (Repo Ownership Mode, See Something Say Something) and {{TEST_FAILURE_TRIAGE}} resolved in ship/SKILL.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: validate repo mode values to prevent shell injection Codex adversarial review found that unvalidated config/cache values could be injected into shell via source <(gstack-repo-mode). Added validate_mode() that only allows solo\|collaborative\|unknown — anything else becomes "unknown". Prevents persistent code execution through malicious config.yaml or tampered cache JSON. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: shell injection via branch names + feature-branch sampling bias Codex code review found two issues: P1: eval $(gstack-slug) in gstack-repo-mode executes branch names as shell. Branch names like foo$(touch${IFS}pwned) are valid git refs and would execute arbitrary commands. Fix: compute SLUG directly with sed instead of eval'ing gstack-slug output. P2: git shortlog HEAD only sees current branch history. On feature branches that haven't merged main recently, other contributors disappear from the sample. Fix: use git shortlog on the default branch (origin/main) instead of HEAD. Also improved blame lookup in collaborative triage to check both the test file and the production code it covers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: broaden codex-host stripping test to accommodate triage section "Investigate and fix" now appears in TEST_FAILURE_TRIAGE (not just the Codex review step). Use CODEX_REVIEWS config string as a more specific marker for detecting the Codex review step in Codex-hosted skills. * fix: replace template placeholder in TODOS.md with readable text {{TEST_FAILURE_TRIAGE}} is template syntax but TODOS.md is not processed by gen-skill-docs — replaced with human-readable reference. * chore: bump version and changelog (v0.9.5.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add bin/ directory to project structure in CLAUDE.md * test: add triage resolver unit tests, plan-eng coverage audit E2E, and triage E2E - TEST_FAILURE_TRIAGE resolver: 6 unit tests verifying all triage steps (T1-T4), REPO_MODE branching, and safety default for ambiguous failures - plan-eng-coverage-audit E2E: tests /plan-eng-review coverage audit codepath (gap identified during eng review — existed on neither branch) - ship-triage E2E: planted-bug fixture with in-branch (truncate null) and pre-existing (divide-by-zero) failures; verifies correct classification - Touchfile entries for diff-based test selection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate stale Codex SKILL.md for retro Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: gstack-repo-mode handles repos without origin remote Split `git remote get-url origin` into a separate variable with `\|\| true` so the script doesn't crash under `set -euo pipefail` in local-only repos. Falls back to REPO_MODE=unknown gracefully. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: REPO_MODE defaults to unknown when helper emits nothing Changed preamble from `source <(...) \|\| REPO_MODE=unknown` (which doesn't catch empty output) to `source <(...) \|\| true` followed by `REPO_MODE=${REPO_MODE:-unknown}`. Regenerated all SKILL.md files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: triage E2E runs both test files in subprocesses math.test.js called process.exit(1) which killed the runner before string.test.js could execute. Changed test runner to use child_process so each test runs independently and both failure classes are exercised. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: gstack-repo-mode handles repos without origin remote Fall back through origin/main → origin/master → HEAD when git symbolic-ref refs/remotes/origin/HEAD is not set. Prevents shortlog crash in repos where origin/HEAD isn't configured. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: triage E2E runs both test files in subprocesses Add assertions verifying both math.test.js (pre-existing failure) and string.test.js (in-branch failure) actually executed during triage. Prevents false passes where only one failure class is exercised. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: REPO_MODE defaults to unknown when helper emits nothing - Remove head -20 truncation that biased solo classification by dropping low-volume contributors from the denominator - Use atomic write (mktemp + mv) for cache to prevent concurrent preamble reads from seeing partial JSON Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add test coverage catalog to CHANGELOG + update project structure - CHANGELOG: add 6 entries for coverage audit, review Step 4.75, E2E recommendations, regression iron rule, failure triage, repo-mode fix - CLAUDE.md: add missing skill directories (autoplan, benchmark, canary, codex, land-and-deploy, setup-deploy) to project structure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.10.1.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: CHANGELOG rules — branch-scoped versions, never fold into old entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-22 11:28:16 -07:00
Garry Tan	f075cb757f	feat: Search Before Building — builder ethos + skill integrations (v0.9.5.0) (#298 ) * feat: ETHOS.md — gstack builder philosophy Standalone document capturing the four principles: The Golden Age, Boil the Lake, Search Before Building, and Build for Yourself. Introduces the three-layer knowledge framework (tried-and-true, new-and-popular, first-principles) and the Eureka Moment concept — when first-principles reasoning reveals conventional wisdom is wrong. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: Search Before Building preamble section + CLAUDE.md Add generateSearchBeforeBuildingSection(ctx) to gen-skill-docs.ts. Every workflow skill now gets a compact router section covering: - Three layers of knowledge (tried-and-true, new-and-popular, first-principles) - Eureka moment format and jq-based JSONL logging - WebSearch fallback clause - ETHOS.md reference via ctx.paths.skillRoot resolver Also adds compact "Search before building" section to CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: skill-specific Search Before Building integrations 8 template changes: - /office-hours: Phase 2.75 Landscape Awareness (WebSearch + three-layer synthesis) - /plan-eng-review: Step 0 search check with layer provenance annotations - /investigate: external pattern search + search escalation on hypothesis failure - /plan-ceo-review: Landscape Check before scope challenge - /review: search-before-recommending for fix patterns - /qa-only: WebSearch in allowed-tools - /design-consultation: three-layer synthesis backport in Phase 2 Step 3 - /retro: eureka moment tracking from ~/.gstack/analytics/eureka.jsonl All search steps include WebSearch fallback clause. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: v0.9.5.0 — Builder Ethos (CHANGELOG + VERSION + TODOS) ETHOS.md + Search Before Building across all workflow skills. Deferred: first-time intro flow (blocked on blog post). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address Codex review — sanitize search, privacy gate, ETHOS.md sidecar Three fixes from adversarial Codex review: - /investigate: sanitize error messages before searching (strip hostnames, IPs, file paths, SQL, customer data). Skip search if unsanitizable. - /office-hours: add privacy gate before landscape search. Use generalized category terms, never the user's specific product name or stealth idea. - setup: link ETHOS.md into .agents/skills/gstack/ sidecar so workspace- local Codex sessions can find the builder philosophy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sanitize Phase 2 external pattern search in /investigate The Phase 2 external search also sent raw error messages to WebSearch. Apply same sanitization rule as Phase 3 search escalation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: sync documentation with shipped changes - ARCHITECTURE.md: preamble now handles 5 things (add Search Before Building) - CLAUDE.md: add ETHOS.md to project structure tree - README.md: add ETHOS.md to docs table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-21 11:46:06 -07:00
Garry Tan	91bea06675	fix: plan mode exception for review log + telemetry writes (v0.9.0.1) (#234 ) * fix: plan mode exception for review log + telemetry writes Add explicit plan-mode exception notes to review log sections in all 3 plan review skill templates and the telemetry section in gen-skill-docs.ts. When Claude runs in plan mode, it self-censors bash writes — but review logging and telemetry write to ~/.gstack/ (user metadata, not project files). The preamble already writes to the same directory successfully. The exception note gives Claude a reasoning chain: safety argument, precedent, and consequence of skipping. * chore: regenerate Codex/agents SKILL.md files with plan-mode exception * chore: bump version and changelog (v0.9.0.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: community-first telemetry opt-in with anonymous fallback Default opt-in is now "Help gstack get better!" (community mode with stable device ID). If declined, offers anonymous mode as a softer alternative before fully off. * chore: regenerate SKILL.md files with community-first telemetry prompt --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-19 23:10:26 -07:00
Garry Tan	3b22fc39e6	feat: opt-in usage telemetry + community intelligence platform (v0.8.6) (#210 ) * feat: add gstack-telemetry-log and gstack-analytics scripts Local telemetry infrastructure for gstack usage tracking. gstack-telemetry-log appends JSONL events with skill name, duration, outcome, session ID, and platform info. Supports off/anonymous/community privacy tiers. gstack-analytics renders a personal usage dashboard from local data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add telemetry preamble injection + opt-in prompt + epilogue Extends generatePreamble() with telemetry start block (config read, timer, session ID, .pending marker), opt-in prompt (gated by .telemetry-prompted), and epilogue instructions for Claude to log events after skill completion. Adds 5 telemetry tests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate all SKILL.md files with telemetry blocks Automated regeneration from gen-skill-docs.ts changes. All skills now include telemetry start block, opt-in prompt, and epilogue. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Supabase schema, edge functions, and SQL views Telemetry backend infrastructure: telemetry_events table with RLS (insert-only), installations table for retention tracking, update_checks for install pings. Edge functions for update-check (version + ping), telemetry-ingest (batch insert), and community-pulse (weekly active count). SQL views for crash clustering and skill co-occurrence sequences. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add telemetry-sync, community-dashboard, and integration tests gstack-telemetry-sync: fire-and-forget JSONL → Supabase sync with privacy tier field stripping, batch limits, and cursor tracking. gstack-community-dashboard: CLI tool querying Supabase for skill popularity, crash clusters, and version distribution. 19 integration tests covering all telemetry scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: session-specific .pending markers + crash_clusters view fix Addresses Codex review findings: - .pending race condition: use .pending-$SESSION_ID instead of shared .pending file to prevent concurrent session interference - crash_clusters view: add total_occurrences and anonymous_occurrences columns since anonymous tier has no installation_id - Added test: own session pending marker is not finalized Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: dual-attempt update check with Supabase install ping Fires a parallel background curl to Supabase during the slow-path version fetch. Logs upgrade_prompted event only on fresh fetches (not cached replays) to avoid overcounting. GitHub remains the primary version source — Supabase ping is fire-and-forget. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: integrate telemetry usage stats into /retro output Retro now reads ~/.gstack/analytics/skill-usage.jsonl and includes gstack usage metrics (skill run counts, top skills, success rate) in the weekly retrospective output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: move 'Skill usage telemetry' to Completed in TODOS.md Implemented in this branch: local JSONL logging, opt-in prompt, privacy tiers, Supabase backend, community dashboard, /retro integration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: wire Supabase credentials and expose tables via Data API Add supabase/config.sh with project URL and publishable key (safe to commit — RLS restricts to INSERT only). Update telemetry-sync, community-dashboard, and update-check to source the config and include proper auth headers for the Supabase REST API. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add SELECT RLS policies to migration for community dashboard reads All telemetry data is anonymous (no PII), so public reads via the publishable key are safe. Needed for the community dashboard to query skill popularity, crash clusters, and version distribution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.8.6) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: analytics backward-compatible with old JSONL format Handle old-format events (no event_type field) alongside new format. Skip hook_fire events. Fix grep -c whitespace issues and unbound variable errors. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: map JSONL field names to Postgres columns in telemetry-sync Local JSONL uses short names (v, ts, sessions) but the Supabase table expects full names (schema_version, event_timestamp, concurrent_sessions). Add sed mapping during field stripping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address Codex adversarial findings — cursor, opt-out, queries - Sync cursor now advances on HTTP 2xx (not grep for "inserted") - Update-check respects telemetry opt-out before pinging Supabase - Dashboard queries use correct view column names (total_occurrences) - Sync strips old-format "repo" field to prevent privacy leak Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Privacy & Telemetry section to README Transparent disclosure of what telemetry collects, what it never sends, how to opt out, and a link to the schema so users can verify. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-19 17:21:05 -07:00
Garry Tan	d85233017b	feat: /codex skill — multi-AI second opinion + proactive suggestions (#197 ) * feat: /codex skill — multi-AI second opinion (review, challenge, consult) Three modes: code review with pass/fail gate, adversarial challenge mode, and conversational consult with session continuity. First multi-AI skill in gstack, wrapping OpenAI's Codex CLI. * feat: integrate /codex into /review, /ship, /plan-eng-review + dashboard /review offers Codex second opinion after completing its own review. /ship offers Codex review as optional gate before pushing. /plan-eng-review offers Codex plan critique after scope challenge. Review Readiness Dashboard shows Codex Review as optional row. * chore: bump version and changelog (v0.8.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: codex skill validation (12 stub tests) + E2E eval test Stub tests (free tier): verify template content — three modes, gate verdict, session continuity, cost tracking, cross-model comparison, binary discovery, error handling, mktemp usage, and integrations into /review, /ship, /plan-eng-review. E2E test (paid tier): runs /codex review on vulnerable fixture repo via session-runner, verifies output contains findings and GATE verdict. * fix: codex auth error message — use codex login, not OPENAI_API_KEY Codex authenticates via ChatGPT OAuth (codex login), not an env var. * feat: codex uses high reasoning effort by default gpt-5.2-codex is the only model available with ChatGPT login. All commands now use model_reasoning_effort="high" for maximum depth — the whole point is a thorough second opinion. * feat: crank codex reasoning to xhigh (maximum) * feat: per-mode reasoning (high for review/consult, xhigh for challenge) + web search Review and consult use high reasoning — thorough but not slow. Challenge (adversarial) uses xhigh — maximum depth for breaking code. All modes enable web_search_cached so Codex can look up docs/APIs. * refactor: don't hardcode model — use codex default (always latest) * feat: JSONL output for codex challenge + consult modes Use --json flag to parse codex's JSONL events, extracting reasoning traces ([codex thinking]), tool calls ([codex ran]), and token counts. This gives richer output than the -o flag alone — you can see what codex thought through before its answer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: only persist codex-review log when code review actually ran Don't write a codex-review entry to reviews.jsonl when only the adversarial challenge (option B) was selected — there's no gate verdict to record, and a false entry misleads the Review Readiness Dashboard into thinking a code review happened. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add codex plan review option to /plan-eng-review After scope challenge (Step 0), offer to have Codex independently review the plan with a brutally honest tech reviewer persona. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: update e2e test for codex skill Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: codex integration bugs — plan content, review persistence, quoting, stderr - plan-eng-review: Codex now reads the plan file itself instead of inlining content as a CLI arg (avoids ARG_MAX for large plans) - review: add missing echo to persist codex-review results to reviews.jsonl - codex: consult mode uses $TMPERR (mktemp) instead of hardcoded stderr path - codex + review: quote $SLUG/$BRANCH_SLUG in review log paths - codex: scope plan lookup to current project, warn on cross-project fallback Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add .context/ to .gitignore to prevent session ID leaks Codex consult mode stores session IDs in .context/codex-session-id. Without this ignore rule, session IDs could leak into commits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: proactive skill suggestions + opt-out + trigger phrase tests - Preamble reads proactive config via gstack-config - Root SKILL.md.tmpl has lifecycle map (stage → skill suggestion) - Users can opt out ("stop suggesting") / opt in ("be proactive again") - Restored trigger phrase validation tests (16 skills × "Use when" check) - Added missing "Use when" trigger phrases to /debug and /office-hours Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update changelog for v0.8.0 — add proactive suggestions note Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-19 00:22:52 -05:00
Garry Tan	c4f679d829	feat: safety hook skills + skill usage telemetry (v0.7.1) (#189 ) * feat: add /careful, /freeze, /guard, /unfreeze safety hook skills Four new on-demand skills using Claude Code's PreToolUse hooks: - /careful: warns before destructive commands (rm -rf, DROP TABLE, force-push, etc.) - /freeze: blocks file edits outside a specified directory - /guard: composes both into one command - /unfreeze: clears freeze boundary without ending session Pure bash hook scripts with Python fallback for JSON edge cases. Safe exceptions for build artifacts (node_modules, dist, .next, etc.). Hook fire telemetry logs pattern name only (never command content). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add skill usage telemetry to preamble TemplateContext system passes skill name through resolver pipeline so each generated SKILL.md gets its own name baked into the telemetry line. Appends to ~/.gstack/analytics/skill-usage.jsonl on every invocation. Covers 14 preamble-using skills + 4 hook skills (inline telemetry). JSONL format: {"skill":"ship","ts":"...","repo":"my-project"} Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add analytics CLI for skill usage stats bun run analytics reads ~/.gstack/analytics/skill-usage.jsonl and shows top skills, per-repo breakdown, hook fire stats, and daily timeline. Supports --period 7d/30d/all. Handles missing/empty/malformed data. 22 unit tests cover parsing, filtering, formatting, and edge cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add skills-used-this-week to /retro Retro Step 2 now reads skill-usage.jsonl and shows which gstack skills were used during the retro window. Follows the same pattern as the Greptile signal and Backlog Health metrics — read file, filter by date, aggregate, present. Skips silently if no analytics data exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add hook script and telemetry tests 32 unit tests for check-careful.sh covering all 8 destructive patterns, safe exceptions, Python fallback, and malformed input handling. 7 unit tests for check-freeze.sh covering boundary enforcement, trailing slash edge case, and missing state file. Telemetry tests verify per-skill name correctness in generated output. Adds careful/freeze/guard/unfreeze/document-release to ALL_SKILLS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version to 0.6.5 + changelog + mark TODOs shipped Safety hook skills and skill usage telemetry shipped. Analytics CLI and /retro integration included. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /debug auto-freezes edits to the module being debugged Add PreToolUse hooks (Edit/Write) to debug/SKILL.md.tmpl that reference the existing freeze/bin/check-freeze.sh. After Phase 1 investigation, /debug locks edits to the narrowest affected directory. Graceful degradation: if freeze script is unavailable, scope lock is skipped. Users can run /unfreeze to remove the restriction. Deferred 6 enhancements to TODOS.md, gated on telemetry showing the freeze hook actually fires in real debugging sessions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 23:57:59 -05:00
Garry Tan	4fe0ce9cba	feat: natural language skill routing + proactive suggestions (v0.7.1) (#195 ) * feat: add trigger phrases to /debug and /office-hours These two skills had zero "Use when asked to..." phrases, making them completely invisible to natural language. Users saying "debug this" or "brainstorm an idea" would get no skill invocation. * feat: add proactive triggers to all workflow skills Every skill now has "Proactively suggest when..." language so Claude surfaces skills at natural moments — not just when the user says specific trigger phrases. * feat: lifecycle map + proactive preference system Root gstack description now includes a developer workflow guide mapping 12 stages to skills. Preamble reads proactive preference via gstack-config. Users can opt out with "stop suggesting things" and re-enable with "be proactive again" — natural language toggle, no CLI needed. * test: 11 journey-stage E2E routing tests + trigger phrase validation Each test simulates a real development stage (ideation, plan review, debug, QA, ship, retro...) with realistic project context and verifies the right skill fires from natural language alone. 11/11 pass. * chore: bump version and changelog (v0.7.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-18 23:08:04 -05:00
Garry Tan	6000af4589	feat: founder discovery engine + /debug skill — v0.7.0 (#185 ) * feat: add escalation protocol to preamble — all skills get DONE/BLOCKED/NEEDS_CONTEXT Every skill now reports completion status (DONE, DONE_WITH_CONCERNS, BLOCKED, NEEDS_CONTEXT) and has escalation rules: 3 failed attempts → STOP, security uncertainty → STOP, scope exceeds verification → STOP. "It is always OK to stop and say 'this is too hard for me.'" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add verification gate to /ship (Step 6.5) — no push without fresh evidence Before pushing, re-verify tests if code changed during review fixes. Rationalization prevention: "Should work now" → RUN IT. "I'm confident" → Confidence is not evidence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add scope drift detection + verification of claims to /review Step 1.5: Before reviewing code quality, check if the diff matches stated intent. Flags scope creep and missing requirements (INFORMATIONAL). Step 5 addition: Every review claim must cite evidence — "this pattern is safe" needs a line reference, "tests cover this" needs a test name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: mandatory implementation alternatives + design doc lookup in /plan-ceo-review Step 0C-bis: Every plan must consider 2-3 approaches (minimal viable vs ideal architecture) before mode selection. RECOMMENDATION required. Pre-Review System Audit now checks ~/.gstack/projects/ for /brainstorm design docs (branch-filtered with fallback). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: design doc lookup in /plan-eng-review + fix branch name sanitization Step 0 now checks ~/.gstack/projects/ for /brainstorm design docs (branch-filtered with fallback, reads Supersedes: for revision context). Fix: branch names with '/' (e.g. garrytan/better-process) now get sanitized via tr '/' '-' in test plan artifact filenames. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: new /brainstorm and /debug skills /brainstorm: Socratic design exploration before planning. Context gathering, clarifying questions (smart-skip), related design discovery (keyword grep), premise challenge, forced alternatives, design doc artifact with lineage tracking (Supersedes: field). Writes to ~/.gstack/projects/$SLUG/. /debug: Systematic root-cause debugging. Iron Law: no fixes without root cause investigation. Pattern analysis, hypothesis testing with 3-strike escalation, structured DEBUG REPORT output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: structural tests for new skills + escalation protocol assertions Add brainstorm + debug to skillsWithUpdateCheck and skillsWithPreamble arrays. Add structural tests: brainstorm (Phase 1-6, Design Doc, Supersedes, Smart-skip), debug (Iron Law, Root Cause, Pattern Analysis, Hypothesis, DEBUG REPORT, 3-strike). Add escalation protocol tests (DONE_WITH_CONCERNS, BLOCKED, NEEDS_CONTEXT) for all preamble skills. Also: 2 new TODOs (design docs → Supabase sync, /plan-design-review skill), update CLAUDE.md project structure with new skill directories. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.6.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: rename /brainstorm → /office-hours across references Update CHANGELOG, CLAUDE.md, TODOS, design-consultation, plan-ceo-review, and gen-skill-docs to reference the new office-hours skill name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: YC Office Hours — dual-mode product diagnostic + builder brainstorm Rewrite /office-hours with two modes: Startup mode: six forcing questions (Demand Reality, Status Quo, Desperate Specificity, Narrowest Wedge, Observation & Surprise, Future-Fit) that push founders toward radical honesty about demand, users, and product decisions. Includes smart routing by product stage, intrapreneurship adaptation, and YC apply CTA for strong-signal founders. Builder mode: generative brainstorming for side projects, hackathons, learning, and open source. Enthusiastic collaborator tone, design thinking questions, no business interrogation. Mode is determined by an explicit question in Phase 1 — no guessing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add 14 assertions for YC Office Hours content coverage Validates dual-mode structure (Startup/Builder), all six forcing questions, builder brainstorming content, intrapreneurship adaptation, YC apply CTA, and operating principles for both modes. 192 tests total, all passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.6.1 - README.md: added /office-hours and /debug to skills table, updated skill count from 13 to 15, added both to install instructions - docs/skills.md: added /office-hours and /debug deep dive sections - CLAUDE.md: updated office-hours description to reflect dual-mode - CONTRIBUTING.md: updated skill count from 13 to 15 - CHANGELOG.md: added YC Office Hours and /debug entries to 0.6.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: founder discovery engine in /office-hours (v0.7.0) Turn /office-hours into a YC founder discovery engine. Every session now ends with three beats: signal reflection (specific callbacks to what the user said), "One more thing." transition, and a personal plea from Garry Tan with three tiers based on founder signal strength. Top tier uses AskUserQuestion to ask directly and opens ycombinator.com/apply?ref=gstack. Adds Phase 4.5 (Founder Signal Synthesis), "What I noticed about how you think" section to both design doc templates, anti-slop GOOD/BAD examples, and emotional targets per tier. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add validation assertions for founder discovery engine 8 new assertions covering: YC apply CTA with ref=gstack tracking, "What I noticed" design doc section, golden age framing, Garry Tan personal plea, founder signal synthesis phase, three-tier decision rubric, anti-slop GOOD/BAD examples, "One more thing" transition beat. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.7.0 VERSION: 0.6.4.1 → 0.7.0 CHANGELOG: new entry — Office Hours Gets Personal README: updated /office-hours and /plan-design-review descriptions docs/skills.md: updated /office-hours table + deep dive section TODOS.md: added /yc-prep skill TODO (P2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove duplicate Install section, fix stale skills lists, deduplicate CHANGELOG entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 11:19:04 -05:00
Garry Tan	bc86a665b7	feat: add trigger phrases to skill descriptions for better model matching (v0.6.4.1) (#169 ) * feat: add trigger phrases to skill descriptions for better model matching Anthropic's skill best practices: "the description field is not a summary — it's when to trigger." Add explicit "Use when asked to..." phrases to 12 skill descriptions so Claude's auto-discovery works with natural language requests like "deploy this" or "check my diff", not just explicit /slash-commands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add on-demand hooks and telemetry to TODOS.md Captures two ideas from Anthropic's skill best practices post: - /careful, /freeze, /guard on-demand hook skills (P3) - Skill usage telemetry via preamble JSONL append (P3) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.6.4.1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: exclude internal details from CHANGELOG style guide Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 08:06:46 -05:00
Garry Tan	9d47619e4c	feat: Completeness Principle — Boil the Lake (v0.6.1) (#140 ) * feat: Completeness Principle — Boil the Lake (WIP, pre-merge) Add Completeness Principle to all skill preambles, dual-time estimates, compression table, anti-pattern gallery, Lake Score, and completeness gaps review category. VERSION/CHANGELOG will be rebased after merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update stale version reference in TODOS.md (v0.5.3 → v0.6.1) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update CHANGELOG date + README for v0.6.1 features - Add date to CHANGELOG 0.6.1 entry - Add Completeness Principle to README intro - Add SELECTIVE EXPANSION mode to CEO review section - Add test bootstrap mention to /ship section - Fix uninstall command missing design-consultation in project uninstall - Add "recommends shortcuts" and "no tests" to Without gstack list Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: split README into lean intro + docs/ directory (gh CLI pattern) README: 875 → 243 lines. Keeps intro, skill table, demo, install, and troubleshooting. All per-skill deep dives, Greptile integration guide, and contributor mode docs moved to docs/ directory. - docs/skills.md — full philosophy and examples for all 13 skills - docs/greptile.md — Greptile setup and triage workflow - docs/contributor-mode.md — how to enable and use contributor mode - README now links to docs/ via Documentation table - Updated skill table entries with latest features (fix-first, regression tests, test health, completeness gaps) - Updated demo transcript with AUTO-FIXED, coverage audit, regression test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove "competitor" language, rewrite README in Garry's voice Replace "browses competitors" with "knows the landscape" / "what's out there" throughout all user-facing copy. Trim README from 243 to 167 lines — tighter, more opinionated, less listicle energy. Remove Completeness Principle from README top (it lives in CLAUDE.md and the skill preambles where Claude actually reads it). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: rewrite README in Garry's raw voice — AGI era, L8 factory, real stories The README now sounds like Garry, not a product page. Leads with the live experiment, the 16k LOC/day reality, the real-life coding stories (Austin, hospital bedside). Highlights the newest unlocks (design at the heart, /qa parallelism, smart review routing, test bootstrap). Closes with an open invitation — free MIT, fork it, let's all ride the wave together. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Garry's bonafides to README intro — Palantir, Posterous, YC, 600k LOC Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add real /retro numbers — 140k lines, 362 commits across 3 projects Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add "in the last 60 days" timeframe to 600k LOC claim Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add GitHub contribution graphs — 2026 vs 2013 side by side Same person, different era. 2013: 772 contributions building Bookface. 2026: 1,237 contributions and accelerating. The difference is the tooling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: clarify /retro stats are from last 7 days Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add designer/PM/eng manager roles to intro Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove Josh/L8 reference from README Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move demo up, make it dramatically more impressive Show the actual architecture diagram, auto-fixed issues, 100% coverage, regression test generation. Punch line: "That is not a copilot. That is a team." Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove "My journey" section — intro already covers it Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: prefix all skill commands with You: in demo transcript Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: collapse You/Claude lines in demo — no gap between command and response Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: clarify plan mode flow in demo — approve, exit, Claude implements Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move /ship to end of demo — review → QA → ship is the real flow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add /plan-design-review to demo, tighten CEO response Shorter CEO reply, compressed eng diagram, added design audit with AI Slop score. Seven commands now: plan → eng → build → design → review → QA → ship. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: move design review before implementation — it's part of planning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: reorder demo — design before eng, after CEO Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: remove URL from /plan-design-review in demo Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add [...] annotations showing what actually happens at each step Each step now shows what the agent does under the hood: 8 expansion proposals cherry-picked, 80-item design audit, ASCII diagrams for every flow, 2400 lines written in 8 minutes, real browser QA, bug found and fixed. Makes the demo feel real, not abstract. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: rename Contributor Mode to How to Contribute in docs table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Coinbase, Instacart, Rippling to YC bonafides Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add "one or two people in a garage" to founder story Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add skill table to top of skills.md with anchor links Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: consolidate — roll contributor-mode into CONTRIBUTING, greptile into skills - docs/contributor-mode.md → merged into CONTRIBUTING.md (session awareness section) - docs/greptile.md → merged into docs/skills.md (Greptile integration section) - Reordered docs table: Skills > Architecture > Browser > Contributing > Changelog Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 16:34:08 -05:00
Garry Tan	a68244ab57	feat: /document-release skill — post-ship doc updates (v0.4.3) (#109 ) * docs: update project documentation for v0.4.2 - README: skill count 9→10, added /document-release to skills table, install/uninstall sections, and dedicated section with example - CHANGELOG: added /document-release bullet to v0.4.2 entry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add /document-release skill with smart VERSION handling New skill runs after /ship but before PR merge. Reads every doc file, cross-references the diff, auto-updates factual changes, asks about risky edits. CHANGELOG clobber protection: never uses Write tool on CHANGELOG.md, only Edit with exact old_string matches. Smart VERSION logic: instead of silently skipping already-bumped versions, compares CHANGELOG entry scope against full diff and asks if significant uncovered changes exist. Also fixes gstack-upgrade/SKILL.md missing from skill-check.ts SKILL_FILES array (existing inconsistency with gen-skill-docs.ts). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /review Step 5.6 — documentation staleness check Review skill now cross-references code changes against doc files. If a doc describes a feature that changed but the doc wasn't updated, flags it as INFORMATIONAL with a pointer to /document-release. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: /document-release E2E with CHANGELOG clobber guard E2E test creates a repo with existing CHANGELOG entries, runs /document-release, and asserts original entries survive. Critical guardrail against the incident where an agent replaced CHANGELOG entries during conflict resolution. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump to v0.4.3 — /document-release skill Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files after merge * chore: regenerate SKILL.md files after merge --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-16 12:30:22 -05:00

12 Commits