mirror of https://github.com/garrytan/gstack.git synced 2026-05-01 19:25:10 +02:00

T

Garry Tan 6000af4589 feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

* feat: add escalation protocol to preamble — all skills get DONE/BLOCKED/NEEDS_CONTEXT

Every skill now reports completion status (DONE, DONE_WITH_CONCERNS, BLOCKED,
NEEDS_CONTEXT) and has escalation rules: 3 failed attempts → STOP, security
uncertainty → STOP, scope exceeds verification → STOP.

"It is always OK to stop and say 'this is too hard for me.'"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add verification gate to /ship (Step 6.5) — no push without fresh evidence

Before pushing, re-verify tests if code changed during review fixes.
Rationalization prevention: "Should work now" → RUN IT.
"I'm confident" → Confidence is not evidence.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add scope drift detection + verification of claims to /review

Step 1.5: Before reviewing code quality, check if the diff matches stated
intent. Flags scope creep and missing requirements (INFORMATIONAL).

Step 5 addition: Every review claim must cite evidence — "this pattern is
safe" needs a line reference, "tests cover this" needs a test name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: mandatory implementation alternatives + design doc lookup in /plan-ceo-review

Step 0C-bis: Every plan must consider 2-3 approaches (minimal viable vs ideal
architecture) before mode selection. RECOMMENDATION required.

Pre-Review System Audit now checks ~/.gstack/projects/ for /brainstorm design
docs (branch-filtered with fallback).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: design doc lookup in /plan-eng-review + fix branch name sanitization

Step 0 now checks ~/.gstack/projects/ for /brainstorm design docs
(branch-filtered with fallback, reads Supersedes: for revision context).

Fix: branch names with '/' (e.g. garrytan/better-process) now get
sanitized via tr '/' '-' in test plan artifact filenames.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: new /brainstorm and /debug skills

/brainstorm: Socratic design exploration before planning. Context gathering,
clarifying questions (smart-skip), related design discovery (keyword grep),
premise challenge, forced alternatives, design doc artifact with lineage
tracking (Supersedes: field). Writes to ~/.gstack/projects/$SLUG/.

/debug: Systematic root-cause debugging. Iron Law: no fixes without root
cause investigation. Pattern analysis, hypothesis testing with 3-strike
escalation, structured DEBUG REPORT output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: structural tests for new skills + escalation protocol assertions

Add brainstorm + debug to skillsWithUpdateCheck and skillsWithPreamble arrays.
Add structural tests: brainstorm (Phase 1-6, Design Doc, Supersedes, Smart-skip),
debug (Iron Law, Root Cause, Pattern Analysis, Hypothesis, DEBUG REPORT, 3-strike).
Add escalation protocol tests (DONE_WITH_CONCERNS, BLOCKED, NEEDS_CONTEXT) for
all preamble skills.

Also: 2 new TODOs (design docs → Supabase sync, /plan-design-review skill),
update CLAUDE.md project structure with new skill directories.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.6.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: rename /brainstorm → /office-hours across references

Update CHANGELOG, CLAUDE.md, TODOS, design-consultation, plan-ceo-review,
and gen-skill-docs to reference the new office-hours skill name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: YC Office Hours — dual-mode product diagnostic + builder brainstorm

Rewrite /office-hours with two modes:

Startup mode: six forcing questions (Demand Reality, Status Quo, Desperate
Specificity, Narrowest Wedge, Observation & Surprise, Future-Fit) that push
founders toward radical honesty about demand, users, and product decisions.
Includes smart routing by product stage, intrapreneurship adaptation, and
YC apply CTA for strong-signal founders.

Builder mode: generative brainstorming for side projects, hackathons,
learning, and open source. Enthusiastic collaborator tone, design thinking
questions, no business interrogation.

Mode is determined by an explicit question in Phase 1 — no guessing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add 14 assertions for YC Office Hours content coverage

Validates dual-mode structure (Startup/Builder), all six forcing questions,
builder brainstorming content, intrapreneurship adaptation, YC apply CTA,
and operating principles for both modes. 192 tests total, all passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.6.1

- README.md: added /office-hours and /debug to skills table, updated
  skill count from 13 to 15, added both to install instructions
- docs/skills.md: added /office-hours and /debug deep dive sections
- CLAUDE.md: updated office-hours description to reflect dual-mode
- CONTRIBUTING.md: updated skill count from 13 to 15
- CHANGELOG.md: added YC Office Hours and /debug entries to 0.6.0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: founder discovery engine in /office-hours (v0.7.0)

Turn /office-hours into a YC founder discovery engine. Every session now
ends with three beats: signal reflection (specific callbacks to what the
user said), "One more thing." transition, and a personal plea from Garry
Tan with three tiers based on founder signal strength. Top tier uses
AskUserQuestion to ask directly and opens ycombinator.com/apply?ref=gstack.

Adds Phase 4.5 (Founder Signal Synthesis), "What I noticed about how you
think" section to both design doc templates, anti-slop GOOD/BAD examples,
and emotional targets per tier.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add validation assertions for founder discovery engine

8 new assertions covering: YC apply CTA with ref=gstack tracking,
"What I noticed" design doc section, golden age framing, Garry Tan
personal plea, founder signal synthesis phase, three-tier decision
rubric, anti-slop GOOD/BAD examples, "One more thing" transition beat.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update project documentation for v0.7.0

VERSION: 0.6.4.1 → 0.7.0
CHANGELOG: new entry — Office Hours Gets Personal
README: updated /office-hours and /plan-design-review descriptions
docs/skills.md: updated /office-hours table + deep dive section
TODOS.md: added /yc-prep skill TODO (P2)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove duplicate Install section, fix stale skills lists, deduplicate CHANGELOG entries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-18 11:19:04 -05:00

.github/workflows

feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41 )

2026-03-13 21:08:12 -07:00

bin

feat: design review lite in /review and /ship + gstack-diff-scope (v0.6.3) (#142 )

2026-03-17 20:12:55 -05:00

browse

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

debug

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

design-consultation

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

design-review

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

docs

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

document-release

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

gstack-upgrade

feat: add trigger phrases to skill descriptions for better model matching (v0.6.4.1) (#169 )

2026-03-18 08:06:46 -05:00

office-hours

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

plan-ceo-review

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

plan-design-review

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

plan-eng-review

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

qa-only

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

retro

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

review

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

scripts

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

setup-browser-cookies

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

ship

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

test

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

.env.example

feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41 )

2026-03-13 21:08:12 -07:00

.gitignore

feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41 )

2026-03-13 21:08:12 -07:00

ARCHITECTURE.md

feat: interactive /plan-design-review + CEO invokes designer + 100% coverage (v0.6.4) (#149 )

2026-03-17 22:48:48 -05:00

BROWSER.md

feat: await support in browse js/eval + contributor mode v2 (#104 )

2026-03-16 11:28:58 -05:00

CHANGELOG.md

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

CLAUDE.md

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

conductor.json

feat: SKILL.md template system, 3-tier testing, DX tools (v0.3.3) (#41 )

2026-03-13 21:08:12 -07:00

CONTRIBUTING.md

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

LICENSE

Initial release — gstack v0.0.1

2026-03-12 01:32:16 -07:00

package.json

Merge pull request #93 from lucasbraud/fix/windows-build-glob

2026-03-17 23:55:35 -05:00

README.md

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

setup

Merge remote-tracking branch 'origin/main' into v0.3.6-qa-upgrades

2026-03-14 02:35:48 -05:00

SKILL.md

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

SKILL.md.tmpl

feat: show screenshots to user during QA and browse sessions (v0.5.0.1) (#129 )

2026-03-17 10:30:19 -05:00

TODOS.md

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

VERSION

feat: founder discovery engine + /debug skill — v0.7.0 (#185 )

2026-03-18 11:19:04 -05:00

README.md

gstack

Hi, I'm Garry Tan. I'm President & CEO of Y Combinator, where I've worked with thousands of startups including Coinbase, Instacart, and Rippling when the founders were just one or two people in a garage — companies now worth tens of billions of dollars. Before YC, I designed the Palantir logo and was one of the first eng manager/PM/designers there. I cofounded Posterous, a blog platform we sold to Twitter. I built Bookface, YC's internal social network, back in 2013. I've been building products as a designer, PM, and eng manager for a long time.

And right now I am in the middle of something that feels like a new era entirely.

In the last 60 days I have written over 600,000 lines of production code — 35% tests — and I am doing 10,000 to 20,000 usable lines of code per day as a part-time part of my day while doing all my duties as CEO of YC. That is not a typo. My last /retro (developer stats from the last 7 days) across 3 projects: 140,751 lines added, 362 commits, ~115k net LOC. The models are getting dramatically better every week. We are at the dawn of something real — one person shipping at a scale that used to require a team of twenty.

2026 — 1,237 contributions and counting:

2013 — when I built Bookface at YC (772 contributions):

Same person. Different era. The difference is the tooling.

gstack is how I do it. It is my open source software factory. It turns Claude Code into a virtual engineering team you actually manage — a CEO who rethinks the product, an eng manager who locks the architecture, a designer who catches AI slop, a paranoid reviewer who finds production bugs, a QA lead who opens a real browser and clicks through your app, and a release engineer who ships the PR. Fifteen specialists, all as slash commands, all Markdown, all free, MIT license, available right now.

I am learning how to get to the edge of what agentic systems can do as of March 2026, and this is my live experiment. I am sharing it because I want the whole world on this journey with me.

Fork it. Improve it. Make it yours. Don't player hate, appreciate.

Who this is for:

Founders and CEOs — especially technical ones who still want to ship. This is how you build like a team of twenty.
First-time Claude Code users — gstack is the best way to start. Structured roles instead of a blank prompt.
Tech leads and staff engineers — bring rigorous review, QA, and release automation to every PR

Quick start: your first 10 minutes

Install gstack (30 seconds — see below)
Run /plan-ceo-review on any feature idea
Run /review on any branch with changes
Run /qa on your staging URL
Stop there. You'll know if this is for you.

Expect first useful run in under 5 minutes on any repo with tests already set up.

If you only read one more section, read this one.

Install — takes 30 seconds

Requirements: Claude Code, Git, Bun v1.0+

Step 1: Install on your machine

Open Claude Code and paste this. Claude does the rest.

Install gstack: run git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp__claude-in-chrome__* tools, and lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /retro, /debug, /document-release. Then ask the user if they also want to add gstack to the current project so teammates get it.

Step 2: Add to your repo so teammates get it (optional)

Add gstack to this project: run cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp__claude-in-chrome__* tools, lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /review, /ship, /browse, /qa, /qa-only, /design-review, /setup-browser-cookies, /retro, /debug, /document-release, and tells Claude that if gstack skills aren't working, run cd .claude/skills/gstack && ./setup to build the binary and register skills.

Real files get committed to your repo (not a submodule), so git clone just works. Everything lives inside .claude/. Nothing touches your PATH or runs in the background.

See it work

You:    I want to add photo upload for sellers.
You:    /plan-ceo-review
Claude: "Photo upload" is not the feature. The real job is helping
        sellers create listings that actually sell. What if we
        auto-identify the product, pull specs and comps from the
        web, and draft the listing automatically? That's 10 stars.
        "Upload a photo" is 3 stars. Which are we building?
        [8 expansion proposals, you cherry-pick 5, defer 3 to backlog]

You:    /plan-design-review
Claude: Design Score: B  |  AI Slop Score: C
        "Upload flow looks like a default Bootstrap form."
        [80-item audit, infers your design system, exports DESIGN.md]
        [flags 3 AI slop patterns: gradient hero, icon grid, uniform radius]

You:    /plan-eng-review
Claude: ┌─────────┐    ┌──────────┐    ┌──────────┐    ┌─────────┐
        │ Upload  │───▶│ Classify │───▶│ Enrich   │───▶│ Draft   │
        │ (sync)  │    │ (async)  │    │ (async)  │    │ (async) │
        └─────────┘    └──────────┘    └──────────┘    └─────────┘
        [ASCII diagrams for every data flow, state machine, error path]
        [14-case test matrix, 6 failure modes mapped, 3 security concerns]

You:    Approve plan. Exit plan mode.
        [Claude writes 2,400 lines across 11 files — models, services,
         controllers, views, migrations, and tests. ~8 minutes.]

You:    /review
Claude: [AUTO-FIXED] Orphan S3 cleanup on failed upload
        [AUTO-FIXED] Missing index on listings.status
        [ASK] Race condition on hero image selection → You: yes
        [traces every new enum value through all switch statements]
        3 issues — 2 auto-fixed, 1 fixed.

You:    /qa https://staging.myapp.com
Claude: [opens real browser, logs in, uploads photos, clicks through flows]
        Upload → classify → enrich → draft: end to end ✓
        Mobile: ✓  |  Slow connection: ✓  |  Bad image: ✓
        [finds bug: preview doesn't clear on second upload — fixes it]
        Regression test generated.

You:    /ship
Claude: Tests: 42 → 51 (+9 new)
        Coverage: 14/14 code paths (100%)
        PR: github.com/you/app/pull/42

One feature. Seven commands. The agent reframed the product, ran an 80-item design audit, drew the architecture, wrote 2,400 lines of code, found a race condition I would have missed, auto-fixed two issues, opened a real browser to QA test, found and fixed a bug I didn't know about, wrote 9 tests, and generated a regression test. That is not a copilot. That is a team.

The team

Skill	Your specialist	What they do
`/plan-ceo-review`	CEO / Founder	Rethink the problem. Find the 10-star product hiding inside the request. Four modes: Expansion, Selective Expansion, Hold Scope, Reduction.
`/plan-eng-review`	Eng Manager	Lock in architecture, data flow, diagrams, edge cases, and tests. Forces hidden assumptions into the open.
`/plan-design-review`	Senior Designer	Rates each design dimension 0-10, explains what a 10 looks like, then edits the plan to get there. AI Slop detection. Interactive — one AskUserQuestion per design choice.
`/design-consultation`	Design Partner	Build a complete design system from scratch. Knows the landscape, proposes creative risks, generates realistic product mockups. Design at the heart of all other phases.
`/review`	Staff Engineer	Find the bugs that pass CI but blow up in production. Auto-fixes the obvious ones. Flags completeness gaps.
`/ship`	Release Engineer	Sync main, run tests, audit coverage, push, open PR. Bootstraps test frameworks if you don't have one. One command.
`/browse`	QA Engineer	Give the agent eyes. Real Chromium browser, real clicks, real screenshots. ~100ms per command.
`/qa`	QA Lead	Test your app, find bugs, fix them with atomic commits, re-verify. Auto-generates regression tests for every fix.
`/qa-only`	QA Reporter	Same methodology as /qa but report only. Use when you want a pure bug report without code changes.
`/design-review`	Designer Who Codes	Same audit as /plan-design-review, then fixes what it finds. Atomic commits, before/after screenshots.
`/setup-browser-cookies`	Session Manager	Import cookies from your real browser (Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages.
`/retro`	Eng Manager	Team-aware weekly retro. Per-person breakdowns, shipping streaks, test health trends, growth opportunities.
`/office-hours`	YC Office Hours	Two modes. Startup: six forcing questions on demand, users, and product. Builder: brainstorming for side projects, hackathons, and learning. Writes a design doc with personal observations about how you think.
`/debug`	Debugger	Systematic root-cause debugging. Iron Law: no fixes without investigation. Traces data flow, tests hypotheses, stops after 3 failed fixes.
`/document-release`	Technical Writer	Update all project docs to match what you just shipped. Catches stale READMEs automatically.

Deep dives with examples and philosophy for every skill →

What's new and why it matters

Design is at the heart. /design-consultation doesn't just pick fonts. It researches what's out there in your space, proposes safe choices AND creative risks, generates realistic mockups of your actual product, and writes DESIGN.md — and then /design-review and /plan-eng-review read what you chose. Design decisions flow through the whole system.

/qa was a massive unlock. It let me go from 6 to 12 parallel workers. Claude Code saying "I SEE THE ISSUE" and then actually fixing it, generating a regression test, and verifying the fix — that changed how I work. The agent has eyes now.

Smart review routing. Just like at a well-run startup: CEO doesn't have to look at infra bug fixes, design review isn't needed for backend changes. gstack tracks what reviews are run, figures out what's appropriate, and just does the smart thing. The Review Readiness Dashboard tells you where you stand before you ship.

Test everything. /ship bootstraps test frameworks from scratch if your project doesn't have one. Every /ship run produces a coverage audit. Every /qa bug fix generates a regression test. 100% test coverage is the goal — tests make vibe coding safe instead of yolo coding.

/document-release is the engineer you never had. It reads every doc file in your project, cross-references the diff, and updates everything that drifted. README, ARCHITECTURE, CONTRIBUTING, CLAUDE.md, TODOS — all kept current automatically.

10 sessions at once

gstack is powerful with one session. It is transformative with ten.

Conductor runs multiple Claude Code sessions in parallel — each in its own isolated workspace. One session running /qa on staging, another doing /review on a PR, a third implementing a feature, and seven more on other branches. All at the same time.

One person, ten parallel agents, each with the right cognitive mode. That is a different way of building software.

Come ride the wave

This is free, MIT licensed, open source, available now. No premium tier. No waitlist. No strings.

I open sourced how I do development and I am actively upgrading my own software factory here. You can fork it and make it your own. That's the whole point. I want everyone on this journey.

Same tools, different outcome — because gstack gives you structured roles and review gates, not generic agent chaos. That governance is the difference between shipping fast and shipping reckless.

The models are getting better fast. The people who figure out how to work with them now — really work with them, not just dabble — are going to have a massive advantage. This is that window. Let's go.

Fifteen specialists. All slash commands. All Markdown. All free. github.com/garrytan/gstack — MIT License

We're hiring. Want to ship 10K+ LOC/day and help harden gstack? Come work at YC — ycombinator.com/software Extremely competitive salary and equity. San Francisco, Dogpatch District.

Docs

Doc	What it covers
Skill Deep Dives	Philosophy, examples, and workflow for every skill (includes Greptile integration)
Architecture	Design decisions and system internals
Browser Reference	Full command reference for `/browse`
Contributing	Dev setup, testing, contributor mode, and dev mode
Changelog	What's new in every version

Troubleshooting

Skill not showing up? cd ~/.claude/skills/gstack && ./setup

/browse fails? cd ~/.claude/skills/gstack && bun install && bun run build

Stale install? Run /gstack-upgrade — or set auto_upgrade: true in ~/.gstack/config.yaml

Claude says it can't see the skills? Make sure your project's CLAUDE.md has a gstack section. Add this:

## gstack
Use /browse from gstack for all web browsing. Never use mcp__claude-in-chrome__* tools.
Available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review,
/design-consultation, /review, /ship, /browse, /qa, /qa-only, /design-review,
/setup-browser-cookies, /retro, /debug, /document-release.

License

MIT. Free forever. Go build something.

Languages

TypeScript 75.7%

Go Template 14.5%

Shell 6.4%

JavaScript 2.1%

CSS 0.8%

Other 0.5%