mirror of https://github.com/garrytan/gstack.git synced 2026-05-01 19:25:10 +02:00

T

Garry Tan 7ff0f84b1e feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

* refactor: extract {{TEST_COVERAGE_AUDIT}} shared resolver

DRY extraction of the test coverage audit methodology into a shared
generator function with three explicit placeholders:
- TEST_COVERAGE_AUDIT_PLAN (plan-eng-review)
- TEST_COVERAGE_AUDIT_SHIP (ship)
- TEST_COVERAGE_AUDIT_REVIEW (review)

Shared across all modes: codepath tracing, ASCII diagram format,
quality scoring rubric, E2E test decision matrix, regression rule,
and test framework detection via CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: plan-eng-review uses shared test coverage audit

Replace the thin 6-line Section 3 test review with the full shared
methodology via {{TEST_COVERAGE_AUDIT_PLAN}}. Plan mode now:
- Traces every codepath with full ASCII diagrams
- Adds missing tests to the plan (not just "check for tests")
- Writes test plan artifact for /qa consumption
- Includes E2E/eval recommendations and regression detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: ship uses shared test coverage audit

Replace 135 lines of inline Step 3.4 methodology with
{{TEST_COVERAGE_AUDIT_SHIP}}. Functionally identical output plus:
- E2E test decision matrix (marks paths needing E2E vs unit)
- Eval recommendations for LLM prompt changes
- Regression detection iron rule
- Test framework detection via CLAUDE.md first
- Test plan artifact for /qa consumption

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /review Step 4.75 test coverage diagram

Add codepath tracing to the pre-landing review via
{{TEST_COVERAGE_AUDIT_REVIEW}}. Review mode:
- Produces ASCII coverage diagram (same methodology as plan/ship)
- Generates tests for gaps via Fix-First (ASK user)
- Subsumes Pass 2 "Test Gaps" checklist category
- Gaps are INFORMATIONAL findings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: mode differentiation + regression guard for coverage audit

10 new tests verifying the three TEST_COVERAGE_AUDIT placeholders:
- All modes share: codepath tracing, E2E matrix, regression rule
- Plan mode: adds to plan + artifact, no ship-specific content
- Ship mode: auto-generates + before/after count + coverage summary
- Review mode: Fix-First ASK + INFORMATIONAL, no artifact
- Regression guard: ship SKILL.md preserves all key phrases

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: extract shared coverage audit fixture + review E2E

- Extract billing.ts fixture into coverage-audit-fixture.ts (DRY)
- Refactor ship-coverage-audit E2E to use shared fixture
- Add review-coverage-audit E2E for Step 4.75
- Update touchfiles: both E2Es depend on shared fixture

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: strengthen E2E assertions for coverage audit tests

The coverage audit E2E tests (ship + review) were only asserting
exitReason === 'success' and readCalls > 0 — they passed even
if the agent produced no coverage diagram. Add assertion that
the output contains either GAP or TESTED markers.

Found during /review.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: plan mode traces the plan, not the git diff

Codex adversarial review caught that plan-eng-review was inheriting
"git diff origin/<base>...HEAD" from the shared resolver, but plan mode
reviews a plan document, not a code diff. Plan mode now says:
"Trace every codepath in the plan" and "Read the plan document."

Ship and review modes keep the git diff instruction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.9.5.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: test coverage catalog + failure triage (merged branches) (#285)

* feat: add bin/gstack-repo-mode — solo vs collaborative detection with caching

Detects whether a repo is solo-dev (one person does 80%+ of recent commits)
or collaborative. Uses 90-day git shortlog window with 7-day cache in
~/.gstack/projects/{SLUG}/repo-mode.json. Config override via
`gstack-config set repo_mode solo|collaborative` takes precedence over
the heuristic. Minimum 5 commits required to classify (otherwise unknown).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: test failure ownership triage — see something say something

Adds two new preamble sections to all gstack skills:
- Repo Ownership Mode: explains solo vs collaborative behavior
- See Something, Say Something: proactive issue flagging principle

Adds {{TEST_FAILURE_TRIAGE}} template variable (opt-in, used by /ship):
- Classifies test failures as in-branch vs pre-existing
- Solo mode defaults to "investigate and fix now"
- Collaborative mode offers "blame + assign GitHub issue" option
- Also offers P0 TODO and skip options

/ship Step 3 now triages test failures instead of hard-stopping on all
failures. In-branch failures still block shipping. Pre-existing failures
get user-directed triage based on repo mode.

Adds P2 TODO for gstack notes system (deferred lightweight reminder).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files for Claude and Codex hosts

All 22 Claude skills and 21 Codex skills regenerated with new preamble
sections (Repo Ownership Mode, See Something Say Something) and
{{TEST_FAILURE_TRIAGE}} resolved in ship/SKILL.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: validate repo mode values to prevent shell injection

Codex adversarial review found that unvalidated config/cache values
could be injected into shell via source <(gstack-repo-mode). Added
validate_mode() that only allows solo|collaborative|unknown — anything
else becomes "unknown". Prevents persistent code execution through
malicious config.yaml or tampered cache JSON.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: shell injection via branch names + feature-branch sampling bias

Codex code review found two issues:

P1: eval $(gstack-slug) in gstack-repo-mode executes branch names as
shell. Branch names like foo$(touch${IFS}pwned) are valid git refs and
would execute arbitrary commands. Fix: compute SLUG directly with sed
instead of eval'ing gstack-slug output.

P2: git shortlog HEAD only sees current branch history. On feature
branches that haven't merged main recently, other contributors disappear
from the sample. Fix: use git shortlog on the default branch
(origin/main) instead of HEAD.

Also improved blame lookup in collaborative triage to check both the
test file and the production code it covers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: broaden codex-host stripping test to accommodate triage section

"Investigate and fix" now appears in TEST_FAILURE_TRIAGE (not just the
Codex review step). Use CODEX_REVIEWS config string as a more specific
marker for detecting the Codex review step in Codex-hosted skills.

* fix: replace template placeholder in TODOS.md with readable text

{{TEST_FAILURE_TRIAGE}} is template syntax but TODOS.md is not processed
by gen-skill-docs — replaced with human-readable reference.

* chore: bump version and changelog (v0.9.5.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add bin/ directory to project structure in CLAUDE.md

* test: add triage resolver unit tests, plan-eng coverage audit E2E, and triage E2E

- TEST_FAILURE_TRIAGE resolver: 6 unit tests verifying all triage steps (T1-T4),
  REPO_MODE branching, and safety default for ambiguous failures
- plan-eng-coverage-audit E2E: tests /plan-eng-review coverage audit codepath
  (gap identified during eng review — existed on neither branch)
- ship-triage E2E: planted-bug fixture with in-branch (truncate null) and
  pre-existing (divide-by-zero) failures; verifies correct classification
- Touchfile entries for diff-based test selection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate stale Codex SKILL.md for retro

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: gstack-repo-mode handles repos without origin remote

Split `git remote get-url origin` into a separate variable with `|| true`
so the script doesn't crash under `set -euo pipefail` in local-only repos.
Falls back to REPO_MODE=unknown gracefully.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: REPO_MODE defaults to unknown when helper emits nothing

Changed preamble from `source <(...) || REPO_MODE=unknown` (which doesn't
catch empty output) to `source <(...) || true` followed by
`REPO_MODE=${REPO_MODE:-unknown}`. Regenerated all SKILL.md files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: triage E2E runs both test files in subprocesses

math.test.js called process.exit(1) which killed the runner before
string.test.js could execute. Changed test runner to use child_process
so each test runs independently and both failure classes are exercised.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: gstack-repo-mode handles repos without origin remote

Fall back through origin/main → origin/master → HEAD when
git symbolic-ref refs/remotes/origin/HEAD is not set. Prevents
shortlog crash in repos where origin/HEAD isn't configured.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: triage E2E runs both test files in subprocesses

Add assertions verifying both math.test.js (pre-existing failure) and
string.test.js (in-branch failure) actually executed during triage.
Prevents false passes where only one failure class is exercised.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: REPO_MODE defaults to unknown when helper emits nothing

- Remove head -20 truncation that biased solo classification by
  dropping low-volume contributors from the denominator
- Use atomic write (mktemp + mv) for cache to prevent concurrent
  preamble reads from seeing partial JSON

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add test coverage catalog to CHANGELOG + update project structure

- CHANGELOG: add 6 entries for coverage audit, review Step 4.75, E2E
  recommendations, regression iron rule, failure triage, repo-mode fix
- CLAUDE.md: add missing skill directories (autoplan, benchmark, canary,
  codex, land-and-deploy, setup-deploy) to project structure

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.10.1.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: CHANGELOG rules — branch-scoped versions, never fold into old entries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-22 11:28:16 -07:00

.agents/skills

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

.github/workflows

feat: multi-agent support — gstack works on Codex, Gemini CLI, and Cursor (v0.9.0) (#226 )

2026-03-19 18:20:50 -07:00

autoplan

feat: /autoplan — auto-review pipeline (v0.10.0.0) (#327 )

2026-03-22 10:21:52 -07:00

benchmark

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

bin

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

browse

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

canary

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

careful

feat: safety hook skills + skill usage telemetry (v0.7.1) (#189 )

2026-03-18 23:57:59 -05:00

codex

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

design-consultation

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

design-review

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

docs

fix: security hardening + issue triage (v0.8.3) (#205 )

2026-03-19 01:58:43 -05:00

document-release

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

freeze

feat: safety hook skills + skill usage telemetry (v0.7.1) (#189 )

2026-03-18 23:57:59 -05:00

gstack-upgrade

feat: add trigger phrases to skill descriptions for better model matching (v0.6.4.1) (#169 )

2026-03-18 08:06:46 -05:00

guard

feat: safety hook skills + skill usage telemetry (v0.7.1) (#189 )

2026-03-18 23:57:59 -05:00

investigate

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

land-and-deploy

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

office-hours

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

plan-ceo-review

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

plan-design-review

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

plan-eng-review

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

qa-only

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

retro

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

review

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

scripts

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

setup-browser-cookies

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

setup-deploy

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

ship

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

supabase

feat: opt-in usage telemetry + community intelligence platform (v0.8.6) (#210 )

2026-03-19 17:21:05 -07:00

test

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

unfreeze

feat: safety hook skills + skill usage telemetry (v0.7.1) (#189 )

2026-03-18 23:57:59 -05:00

.env.example

feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41 )

2026-03-13 21:08:12 -07:00

.gitignore

feat: /codex skill — multi-AI second opinion + proactive suggestions (#197 )

2026-03-19 00:22:52 -05:00

AGENTS.md

feat: multi-agent support — gstack works on Codex, Gemini CLI, and Cursor (v0.9.0) (#226 )

2026-03-19 18:20:50 -07:00

ARCHITECTURE.md

feat: /land-and-deploy, /canary, /benchmark + perf review (v0.7.0) (#183 )

2026-03-21 14:31:36 -07:00

BROWSER.md

docs: rewrite README + skills docs, auto-invoke /document-release (v0.8.4) (#207 )

2026-03-19 01:38:54 -05:00

CHANGELOG.md

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

CLAUDE.md

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

conductor.json

feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41 )

2026-03-13 21:08:12 -07:00

CONTRIBUTING.md

feat: /land-and-deploy, /canary, /benchmark + perf review (v0.7.0) (#183 )

2026-03-21 14:31:36 -07:00

ETHOS.md

feat: Search Before Building — builder ethos + skill integrations (v0.9.5.0) (#298 )

2026-03-21 11:46:06 -07:00

LICENSE

Initial release — gstack v0.0.1

2026-03-12 01:32:16 -07:00

package.json

docs: v0.9.8.0 — deploy pipeline docs + pre-merge readiness gate (#306 )

2026-03-21 14:54:26 -07:00

README.md

docs: v0.9.8.0 — deploy pipeline docs + pre-merge readiness gate (#306 )

2026-03-21 14:54:26 -07:00

setup

feat: Search Before Building — builder ethos + skill integrations (v0.9.5.0) (#298 )

2026-03-21 11:46:06 -07:00

SKILL.md

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

SKILL.md.tmpl

feat: /autoplan — auto-review pipeline (v0.10.0.0) (#327 )

2026-03-22 10:21:52 -07:00

TODOS.md

docs: v0.9.8.0 — deploy pipeline docs + pre-merge readiness gate (#306 )

2026-03-21 14:54:26 -07:00

VERSION

feat: test coverage catalog — shared audit across plan/ship/review (v0.10.1.0) (#259 )

2026-03-22 11:28:16 -07:00

README.md

gstack

Hi, I'm Garry Tan. I'm President & CEO of Y Combinator, where I've worked with thousands of startups including Coinbase, Instacart, and Rippling when the founders were just one or two people in a garage — companies now worth tens of billions of dollars. Before YC, I designed the Palantir logo and was one of the first eng manager/PM/designers there. I cofounded Posterous, a blog platform we sold to Twitter. I built Bookface, YC's internal social network, back in 2013. I've been building products as a designer, PM, and eng manager for a long time.

And right now I am in the middle of something that feels like a new era entirely.

In the last 60 days I have written over 600,000 lines of production code — 35% tests — and I am doing 10,000 to 20,000 usable lines of code per day as a part-time part of my day while doing all my duties as CEO of YC. That is not a typo. My last /retro (developer stats from the last 7 days) across 3 projects: 140,751 lines added, 362 commits, ~115k net LOC. The models are getting dramatically better every week. We are at the dawn of something real — one person shipping at a scale that used to require a team of twenty.

2026 — 1,237 contributions and counting:

2013 — when I built Bookface at YC (772 contributions):

Same person. Different era. The difference is the tooling.

gstack is how I do it. It is my open source software factory. It turns Claude Code into a virtual engineering team you actually manage — a CEO who rethinks the product, an eng manager who locks the architecture, a designer who catches AI slop, a paranoid reviewer who finds production bugs, a QA lead who opens a real browser and clicks through your app, and a release engineer who ships the PR. Eighteen specialists and seven power tools, all as slash commands, all Markdown, all free, MIT license, available right now.

I am learning how to get to the edge of what agentic systems can do as of March 2026, and this is my live experiment. I am sharing it because I want the whole world on this journey with me.

Fork it. Improve it. Make it yours. Don't player hate, appreciate.

Who this is for:

Founders and CEOs — especially technical ones who still want to ship. This is how you build like a team of twenty.
First-time Claude Code users — gstack is the best way to start. Structured roles instead of a blank prompt.
Tech leads and staff engineers — bring rigorous review, QA, and release automation to every PR

Quick start: your first 10 minutes

Install gstack (30 seconds — see below)
Run /office-hours — describe what you're building. It will reframe the problem before you write a line of code.
Run /plan-ceo-review on any feature idea
Run /review on any branch with changes
Run /qa on your staging URL
Stop there. You'll know if this is for you.

Expect first useful run in under 5 minutes on any repo with tests already set up.

If you only read one more section, read this one.

Install — takes 30 seconds

Requirements: Claude Code, Git, Bun v1.0+, Node.js (Windows only)

Step 1: Install on your machine

Open Claude Code and paste this. Claude does the rest.

Install gstack: run git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp__claude-in-chrome__* tools, and lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codex, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade. Then ask the user if they also want to add gstack to the current project so teammates get it.

Step 2: Add to your repo so teammates get it (optional)

Add gstack to this project: run cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp__claude-in-chrome__* tools, lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codex, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade, and tells Claude that if gstack skills aren't working, run cd .claude/skills/gstack && ./setup to build the binary and register skills.

Real files get committed to your repo (not a submodule), so git clone just works. Everything lives inside .claude/. Nothing touches your PATH or runs in the background.

Codex, Gemini CLI, or Cursor

gstack works on any agent that supports the SKILL.md standard. Skills live in .agents/skills/ and are discovered automatically.

git clone https://github.com/garrytan/gstack.git ~/.codex/skills/gstack
cd ~/.codex/skills/gstack && ./setup --host codex

Or let setup auto-detect which agents you have installed:

git clone https://github.com/garrytan/gstack.git ~/gstack
cd ~/gstack && ./setup --host auto

This installs to ~/.claude/skills/gstack and/or ~/.codex/skills/gstack depending on what's available. All 25 skills work across all supported agents. Hook-based safety skills (careful, freeze, guard) use inline safety advisory prose on non-Claude hosts.

See it work

You:    I want to build a daily briefing app for my calendar.
You:    /office-hours
Claude: [asks about the pain — specific examples, not hypotheticals]

You:    Multiple Google calendars, events with stale info, wrong locations.
        Prep takes forever and the results aren't good enough...

Claude: I'm going to push back on the framing. You said "daily briefing
        app." But what you actually described is a personal chief of
        staff AI.
        [extracts 5 capabilities you didn't realize you were describing]
        [challenges 4 premises — you agree, disagree, or adjust]
        [generates 3 implementation approaches with effort estimates]
        RECOMMENDATION: Ship the narrowest wedge tomorrow, learn from
        real usage. The full vision is a 3-month project — start with
        the daily briefing that actually works.
        [writes design doc → feeds into downstream skills automatically]

You:    /plan-ceo-review
        [reads the design doc, challenges scope, runs 10-section review]

You:    /plan-eng-review
        [ASCII diagrams for data flow, state machines, error paths]
        [test matrix, failure modes, security concerns]

You:    Approve plan. Exit plan mode.
        [writes 2,400 lines across 11 files. ~8 minutes.]

You:    /review
        [AUTO-FIXED] 2 issues. [ASK] Race condition → you approve fix.

You:    /qa https://staging.myapp.com
        [opens real browser, clicks through flows, finds and fixes a bug]

You:    /ship
        Tests: 42 → 51 (+9 new). PR: github.com/you/app/pull/42

You said "daily briefing app." The agent said "you're building a chief of staff AI" — because it listened to your pain, not your feature request. Then it challenged your premises, generated three approaches, recommended the narrowest wedge, and wrote a design doc that fed into every downstream skill. Eight commands. That is not a copilot. That is a team.

The sprint

gstack is a process, not a collection of tools. The skills are ordered the way a sprint runs:

Think → Plan → Build → Review → Test → Ship → Reflect

Each skill feeds into the next. /office-hours writes a design doc that /plan-ceo-review reads. /plan-eng-review writes a test plan that /qa picks up. /review catches bugs that /ship verifies are fixed. Nothing falls through the cracks because every step knows what came before it.

One sprint, one person, one feature — that takes about 30 minutes with gstack. But here's what changes everything: you can run 10-15 of these sprints in parallel. Different features, different branches, different agents — all at the same time. That is how I ship 10,000+ lines of production code per day while doing my actual job.

Skill	Your specialist	What they do
`/office-hours`	YC Office Hours	Start here. Six forcing questions that reframe your product before you write code. Pushes back on your framing, challenges premises, generates implementation alternatives. Design doc feeds into every downstream skill.
`/plan-ceo-review`	CEO / Founder	Rethink the problem. Find the 10-star product hiding inside the request. Four modes: Expansion, Selective Expansion, Hold Scope, Reduction.
`/plan-eng-review`	Eng Manager	Lock in architecture, data flow, diagrams, edge cases, and tests. Forces hidden assumptions into the open.
`/plan-design-review`	Senior Designer	Rates each design dimension 0-10, explains what a 10 looks like, then edits the plan to get there. AI Slop detection. Interactive — one AskUserQuestion per design choice.
`/design-consultation`	Design Partner	Build a complete design system from scratch. Knows the landscape, proposes creative risks, generates realistic product mockups. Design at the heart of all other phases.
`/review`	Staff Engineer	Find the bugs that pass CI but blow up in production. Auto-fixes the obvious ones. Flags completeness gaps.
`/investigate`	Debugger	Systematic root-cause debugging. Iron Law: no fixes without investigation. Traces data flow, tests hypotheses, stops after 3 failed fixes.
`/design-review`	Designer Who Codes	Same audit as /plan-design-review, then fixes what it finds. Atomic commits, before/after screenshots.
`/qa`	QA Lead	Test your app, find bugs, fix them with atomic commits, re-verify. Auto-generates regression tests for every fix.
`/qa-only`	QA Reporter	Same methodology as /qa but report only. Use when you want a pure bug report without code changes.
`/ship`	Release Engineer	Sync main, run tests, audit coverage, push, open PR. Bootstraps test frameworks if you don't have one. One command.
`/land-and-deploy`	Release Engineer	Merge the PR, wait for CI and deploy, verify production health. Takes over after `/ship`. One command from "approved" to "verified in production."
`/canary`	SRE	Post-deploy monitoring loop. Watches for console errors, performance regressions, and page failures. Periodic screenshots and anomaly detection.
`/benchmark`	Performance Engineer	Baseline page load times, Core Web Vitals, and resource sizes. Compare before/after on every PR. Catch bundle size regressions before they ship.
`/document-release`	Technical Writer	Update all project docs to match what you just shipped. Catches stale READMEs automatically.
`/retro`	Eng Manager	Team-aware weekly retro. Per-person breakdowns, shipping streaks, test health trends, growth opportunities.
`/browse`	QA Engineer	Give the agent eyes. Real Chromium browser, real clicks, real screenshots. ~100ms per command.
`/setup-browser-cookies`	Session Manager	Import cookies from your real browser (Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages.

Power tools

Skill	What it does
`/codex`	Second Opinion — independent code review from OpenAI Codex CLI. Three modes: review (pass/fail gate), adversarial challenge, and open consultation. Cross-model analysis when both `/review` and `/codex` have run.
`/careful`	Safety Guardrails — warns before destructive commands (rm -rf, DROP TABLE, force-push). Say "be careful" to activate. Override any warning.
`/freeze`	Edit Lock — restrict file edits to one directory. Prevents accidental changes outside scope while debugging.
`/guard`	Full Safety — `/careful` + `/freeze` in one command. Maximum safety for prod work.
`/unfreeze`	Unlock — remove the `/freeze` boundary.
`/setup-deploy`	Deploy Configurator — one-time setup for `/land-and-deploy`. Detects your platform, production URL, and deploy commands.
`/gstack-upgrade`	Self-Updater — upgrade gstack to latest. Detects global vs vendored install, syncs both, shows what changed.

Deep dives with examples and philosophy for every skill →

What's new and why it matters

/office-hours reframes your product before you write code. You say "daily briefing app." It listens to your actual pain, pushes back on the framing, tells you you're really building a personal chief of staff AI, challenges your premises, and generates three implementation approaches with effort estimates. The design doc it writes feeds directly into /plan-ceo-review and /plan-eng-review — so every downstream skill starts with real clarity instead of a vague feature request.

Design is at the heart. /design-consultation doesn't just pick fonts. It researches what's out there in your space, proposes safe choices AND creative risks, generates realistic mockups of your actual product, and writes DESIGN.md — and then /design-review and /plan-eng-review read what you chose. Design decisions flow through the whole system.

/qa was a massive unlock. It let me go from 6 to 12 parallel workers. Claude Code saying "I SEE THE ISSUE" and then actually fixing it, generating a regression test, and verifying the fix — that changed how I work. The agent has eyes now.

Smart review routing. Just like at a well-run startup: CEO doesn't have to look at infra bug fixes, design review isn't needed for backend changes. gstack tracks what reviews are run, figures out what's appropriate, and just does the smart thing. The Review Readiness Dashboard tells you where you stand before you ship.

Test everything. /ship bootstraps test frameworks from scratch if your project doesn't have one. Every /ship run produces a coverage audit. Every /qa bug fix generates a regression test. 100% test coverage is the goal — tests make vibe coding safe instead of yolo coding.

Ship to production in one command. /land-and-deploy picks up where /ship left off — merges your PR, waits for CI and deploy, then runs canary verification on your production URL. Auto-detects Fly.io, Render, Vercel, Netlify, Heroku, or GitHub Actions. If something breaks, it offers a revert. Pair with /canary for extended post-deploy monitoring and /benchmark to catch performance regressions before they ship.

/document-release is the engineer you never had. It reads every doc file in your project, cross-references the diff, and updates everything that drifted. README, ARCHITECTURE, CONTRIBUTING, CLAUDE.md, TODOS — all kept current automatically. And now /ship auto-invokes it — docs stay current without an extra command.

Browser handoff when the AI gets stuck. Hit a CAPTCHA, auth wall, or MFA prompt? $B handoff opens a visible Chrome at the exact same page with all your cookies and tabs intact. Solve the problem, tell Claude you're done, $B resume picks up right where it left off. The agent even suggests it automatically after 3 consecutive failures.

Multi-AI second opinion. /codex gets an independent review from OpenAI's Codex CLI — a completely different AI looking at the same diff. Three modes: code review with a pass/fail gate, adversarial challenge that actively tries to break your code, and open consultation with session continuity. When both /review (Claude) and /codex (OpenAI) have reviewed the same branch, you get a cross-model analysis showing which findings overlap and which are unique to each.

Safety guardrails on demand. Say "be careful" and /careful warns before any destructive command — rm -rf, DROP TABLE, force-push, git reset --hard. /freeze locks edits to one directory while debugging so Claude can't accidentally "fix" unrelated code. /guard activates both. /investigate auto-freezes to the module being investigated.

Proactive skill suggestions. gstack notices what stage you're in — brainstorming, reviewing, debugging, testing — and suggests the right skill. Don't like it? Say "stop suggesting" and it remembers across sessions.

10-15 parallel sprints

gstack is powerful with one sprint. It is transformative with ten running at once.

Conductor runs multiple Claude Code sessions in parallel — each in its own isolated workspace. One session running /office-hours on a new idea, another doing /review on a PR, a third implementing a feature, a fourth running /qa on staging, and six more on other branches. All at the same time. I regularly run 10-15 parallel sprints — that's the practical max right now.

The sprint structure is what makes parallelism work. Without a process, ten agents is ten sources of chaos. With a process — think, plan, build, review, test, ship — each agent knows exactly what to do and when to stop. You manage them the way a CEO manages a team: check in on the decisions that matter, let the rest run.

Come ride the wave

This is free, MIT licensed, open source, available now. No premium tier. No waitlist. No strings.

I open sourced how I do development and I am actively upgrading my own software factory here. You can fork it and make it your own. That's the whole point. I want everyone on this journey.

Same tools, different outcome — because gstack gives you structured roles and review gates, not generic agent chaos. That governance is the difference between shipping fast and shipping reckless.

The models are getting better fast. The people who figure out how to work with them now — really work with them, not just dabble — are going to have a massive advantage. This is that window. Let's go.

Eighteen specialists and seven power tools. All slash commands. All Markdown. All free. github.com/garrytan/gstack — MIT License

We're hiring. Want to ship 10K+ LOC/day and help harden gstack? Come work at YC — ycombinator.com/software Extremely competitive salary and equity. San Francisco, Dogpatch District.

Docs

Doc	What it covers
Skill Deep Dives	Philosophy, examples, and workflow for every skill (includes Greptile integration)
Builder Ethos	Builder philosophy: Boil the Lake, Search Before Building, three layers of knowledge
Architecture	Design decisions and system internals
Browser Reference	Full command reference for `/browse`
Contributing	Dev setup, testing, contributor mode, and dev mode
Changelog	What's new in every version

Privacy & Telemetry

gstack includes opt-in usage telemetry to help improve the project. Here's exactly what happens:

Default is off. Nothing is sent anywhere unless you explicitly say yes.
On first run, gstack asks if you want to share anonymous usage data. You can say no.
What's sent (if you opt in): skill name, duration, success/fail, gstack version, OS. That's it.
What's never sent: code, file paths, repo names, branch names, prompts, or any user-generated content.
Change anytime: gstack-config set telemetry off disables everything instantly.

Data is stored in Supabase (open source Firebase alternative). The schema is in supabase/migrations/001_telemetry.sql — you can verify exactly what's collected. The Supabase publishable key in the repo is a public key (like a Firebase API key) — row-level security policies restrict it to insert-only access.

Local analytics are always available. Run gstack-analytics to see your personal usage dashboard from the local JSONL file — no remote data needed.

Troubleshooting

Skill not showing up? cd ~/.claude/skills/gstack && ./setup

/browse fails? cd ~/.claude/skills/gstack && bun install && bun run build

Stale install? Run /gstack-upgrade — or set auto_upgrade: true in ~/.gstack/config.yaml

Windows users: gstack works on Windows 11 via Git Bash or WSL. Node.js is required in addition to Bun — Bun has a known bug with Playwright's pipe transport on Windows (bun#4253). The browse server automatically falls back to Node.js. Make sure both bun and node are on your PATH.

Claude says it can't see the skills? Make sure your project's CLAUDE.md has a gstack section. Add this:

## gstack
Use /browse from gstack for all web browsing. Never use mcp__claude-in-chrome__* tools.
Available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review,
/design-consultation, /review, /ship, /browse, /qa, /qa-only, /design-review,
/setup-browser-cookies, /retro, /investigate, /document-release, /codex, /careful,
/freeze, /guard, /unfreeze, /gstack-upgrade.

License

MIT. Free forever. Go build something.

Languages

TypeScript 75.7%

Go Template 14.5%

Shell 6.4%

JavaScript 2.1%

CSS 0.8%

Other 0.5%