mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 11:45:20 +02:00
a1a933614c
* feat: CDP inspector module — persistent sessions, CSS cascade, style modification New browse/src/cdp-inspector.ts with full CDP inspection engine: - inspectElement() via CSS.getMatchedStylesForNode + DOM.getBoxModel - modifyStyle() via CSS.setStyleTexts with headless page.evaluate fallback - Persistent CDP session lifecycle (create, reuse, detach on nav, re-create) - Specificity sorting, overridden property detection, UA rule filtering - Modification history with undo support - formatInspectorResult() for CLI output Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: browse server inspector endpoints + inspect/style/cleanup/prettyscreenshot CLI Server endpoints: POST /inspector/pick, GET /inspector, POST /inspector/apply, POST /inspector/reset, GET /inspector/history, GET /inspector/events (SSE). CLI commands: inspect (CDP cascade), style (live CSS mod), cleanup (page clutter removal), prettyscreenshot (clean screenshot pipeline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar CSS inspector — element picker, box model, rule cascade, quick edit Extension changes for the visual CSS inspector: - inspector.js: element picker with hover highlight, CSS selector generation, basic mode fallback (getComputedStyle + CSSOM), page alteration handlers - inspector.css: picker overlay styles (blue highlight + tooltip) - background.js: inspector message routing (picker <-> server <-> sidepanel) - sidepanel: Inspector tab with box model viz (gstack palette), matched rules with specificity badges, computed styles, click-to-edit quick edit, Send to Agent/Code button, empty/loading/error states Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: document inspect, style, cleanup, prettyscreenshot browse commands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: auto-track user-created tabs and handle tab close browser-manager.ts changes: - context.on('page') listener: automatically tracks tabs opened by the user (Cmd+T, right-click open in new tab, window.open). Previously only programmatic newTab() was tracked, so user tabs were invisible. - page.on('close') handler in wirePageEvents: removes closed tabs from the pages map and switches activeTabId to the last remaining tab. - syncActiveTabByUrl: match Chrome extension's active tab URL to the correct Playwright page for accurate tab identity. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: per-tab agent isolation via BROWSE_TAB environment variable Prevents parallel sidebar agents from interfering with each other's tab context. Three-layer fix: - sidebar-agent.ts: passes BROWSE_TAB=<tabId> env var to each claude process, per-tab processing set allows concurrent agents across tabs - cli.ts: reads process.env.BROWSE_TAB and includes tabId in command request body - server.ts: handleCommand() temporarily switches activeTabId when tabId is present, restores after command completes (safe: Bun event loop is single-threaded) Also: per-tab agent state (TabAgentState map), per-tab message queuing, per-tab chat buffers, verbose streaming narration, stop button endpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar per-tab chat context, tab bar sync, stop button, UX polish Extension changes: - sidepanel.js: per-tab chat history (tabChatHistories map), switchChatTab() swaps entire chat view, browserTabActivated handler for instant tab sync, stop button wired to /sidebar-agent/stop, pollTabs renders tab bar - sidepanel.html: updated banner text ("Browser co-pilot"), stop button markup, input placeholder "Ask about this page..." - sidepanel.css: tab bar styles, stop button styles, loading state fixes - background.js: chrome.tabs.onActivated sends browserTabActivated to sidepanel with tab URL for instant tab switch detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: per-tab isolation, BROWSE_TAB pinning, tab tracking, sidebar UX sidebar-agent.test.ts (new tests): - BROWSE_TAB env var passed to claude process - CLI reads BROWSE_TAB and sends tabId in body - handleCommand accepts tabId, saves/restores activeTabId - Tab pinning only activates when tabId provided - Per-tab agent state, queue, concurrency - processingTabs set for parallel agents sidebar-ux.test.ts (new tests): - context.on('page') tracks user-created tabs - page.on('close') removes tabs from pages map - Tab isolation uses BROWSE_TAB not system prompt hack - Per-tab chat context in sidepanel - Tab bar rendering, stop button, banner text Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve merge conflicts — keep security defenses + per-tab isolation Merged main's security improvements (XML escaping, prompt injection defense, allowed commands whitelist, --model opus, Write tool, stderr capture) with our branch's per-tab isolation (BROWSE_TAB env var, processingTabs set, no --resume). Updated test expectations for expanded system prompt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.13.9.0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add inspector message types to background.js allowlist Pre-existing bug found by Codex: ALLOWED_TYPES in background.js was missing all inspector message types (startInspector, stopInspector, elementPicked, pickerCancelled, applyStyle, toggleClass, injectCSS, resetAll, inspectResult). Messages were silently rejected, making the inspector broken on ALL pages. Also: separate executeScript and insertCSS into individual try blocks in injectInspector(), store inspectorMode for routing, and add content.js fallback when script injection fails (CSP, chrome:// pages). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: basic element picker in content.js for CSP-restricted pages When inspector.js can't be injected (CSP, chrome:// pages), content.js provides a basic picker using getComputedStyle + CSSOM: - startBasicPicker/stopBasicPicker message handlers - captureBasicData() with ~30 key CSS properties, box model, matched rules - Hover highlight with outline save/restore (never leaves artifacts) - Click uses e.target directly (no re-querying by selector) - Sends inspectResult with mode:'basic' for sidebar rendering - Escape key cancels picker and restores outlines Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: cleanup + screenshot buttons in sidebar inspector toolbar Two action buttons in the inspector toolbar: - Cleanup (🧹): POSTs cleanup --all to server, shows spinner, chat notification on success, resets inspector state (element may be removed) - Screenshot (📸): POSTs screenshot to server, shows spinner, chat notification with saved file path Shared infrastructure: - .inspector-action-btn CSS with loading spinner via ::after pseudo-element - chat-notification type in addChatEntry() for system messages - package.json version bump to 0.13.9.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: inspector allowlist, CSP fallback, cleanup/screenshot buttons 16 new tests in sidebar-ux.test.ts: - Inspector message allowlist includes all inspector types - content.js basic picker (startBasicPicker, captureBasicData, CSSOM, outline save/restore, inspectResult with mode basic, Escape cleanup) - background.js CSP fallback (separate try blocks, inspectorMode, fallback) - Cleanup button (POST /command, inspector reset after success) - Screenshot button (POST /command, notification rendering) - Chat notification type and CSS styles Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.13.9.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: cleanup + screenshot buttons in chat toolbar (not just inspector) Quick actions toolbar (🧹 Cleanup, 📸 Screenshot) now appears above the chat input, always visible. Both inspector and chat buttons share runCleanup() and runScreenshot() helper functions. Clicking either set shows loading state on both simultaneously. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: chat toolbar buttons, shared helpers, quick-action-btn styles Tests that chat toolbar exists (chat-cleanup-btn, chat-screenshot-btn, quick-actions container), CSS styles (.quick-action-btn, .quick-action-btn.loading), shared runCleanup/runScreenshot helper functions, and cleanup inspector reset. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: aggressive cleanup heuristics — overlays, scroll unlock, blur removal Massively expanded CLEANUP_SELECTORS with patterns from uBlock Origin and Readability.js research: - ads: 30+ selectors (Google, Amazon, Outbrain, Taboola, Criteo, etc.) - cookies: OneTrust, Cookiebot, TrustArc, Quantcast + generic patterns - overlays (NEW): paywalls, newsletter popups, interstitials, push prompts, app download banners, survey modals - social: follow prompts, share tools - Cleanup now defaults to --all when no args (sidebar button fix) - Uses !important on all display:none (overrides inline styles) - Unlocks body/html scroll (overflow:hidden from modal lockout) - Removes blur/filter effects (paywall content blur) - Removes max-height truncation (article teaser truncation) - Collapses empty ad placeholder whitespace (empty divs after ad removal) - Skips gstack-ctrl indicator in sticky removal Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: disable action buttons when disconnected, no error spam - setActionButtonsEnabled() toggles .disabled class on all cleanup/screenshot buttons (both chat toolbar and inspector toolbar) - Called with false in updateConnection when server URL is null - Called with true when connection established - runCleanup/runScreenshot silently return when disconnected instead of showing 'Not connected' error notifications - CSS .disabled style: pointer-events:none, opacity:0.3, cursor:not-allowed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: cleanup heuristics, button disabled state, overlay selectors 17 new tests: - cleanup defaults to --all on empty args - CLEANUP_SELECTORS overlays category (paywall, newsletter, interstitial) - Major ad networks in selectors (doubleclick, taboola, criteo, etc.) - Major consent frameworks (OneTrust, Cookiebot, TrustArc, Quantcast) - !important override for inline styles - Scroll unlock (body overflow:hidden) - Blur removal (paywall content blur) - Article truncation removal (max-height) - Empty placeholder collapse - gstack-ctrl indicator skip in sticky cleanup - setActionButtonsEnabled function - Buttons disabled when disconnected - No error spam from cleanup/screenshot when disconnected - CSS disabled styles for action buttons Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: LLM-based page cleanup — agent analyzes page semantically Instead of brittle CSS selectors, the cleanup button now sends a prompt to the sidebar agent (which IS an LLM). The agent: 1. Runs deterministic $B cleanup --all as a quick first pass 2. Takes a snapshot to see what's left 3. Analyzes the page semantically to identify remaining clutter 4. Removes elements intelligently, preserving site branding This means cleanup works correctly on any site without site-specific selectors. The LLM understands that "Your Daily Puzzles" is clutter, "ADVERTISEMENT" is junk, but the SF Chronicle masthead should stay. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: aggressive cleanup heuristics + preserve top nav bar Deterministic cleanup improvements (used as first pass before LLM analysis): - New 'clutter' category: audio players, podcast widgets, sidebar puzzles/games, recirculation widgets (taboola, outbrain, nativo), cross-promotion banners - Text-content detection: removes "ADVERTISEMENT", "Article continues below", "Sponsored", "Paid content" labels and their parent wrappers - Sticky fix: preserves the topmost full-width element near viewport top (site nav bar) instead of hiding all sticky/fixed elements. Sorts by vertical position, preserves the first one that spans >80% viewport width. Tests: clutter category, ad label removal, nav bar preservation logic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: LLM-based cleanup architecture, deterministic heuristics, sticky nav 22 new tests covering: - Cleanup button uses /sidebar-command (agent) not /command (deterministic) - Cleanup prompt includes deterministic first pass + agent snapshot analysis - Cleanup prompt lists specific clutter categories for agent guidance - Cleanup prompt preserves site identity (masthead, headline, body, byline) - Cleanup prompt instructs scroll unlock and $B eval removal - Loading state management (async agent, setTimeout) - Deterministic clutter: audio/podcast, games/puzzles, recirculation - Ad label text patterns (ADVERTISEMENT, Sponsored, Article continues) - Ad label parent wrapper hiding for small containers - Sticky nav preservation (sort by position, first full-width near top) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: prevent repeat chat message rendering on reconnect/replay Root cause: server persists chat to disk (chat.jsonl) and replays on restart. Client had no dedup, so every reconnect re-rendered the entire history. Messages from an old HN session would repeat endlessly on the SF Chronicle tab. Fix: renderedEntryIds Set tracks which entry IDs have been rendered. addChatEntry skips entries already in the set. Entries without an id (local notifications) bypass the check. Clear chat resets the set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: agent stops when done, no focus stealing, opus for prompt injection safety Three fixes for sidebar agent UX: - System prompt: "Be CONCISE. STOP as soon as the task is done. Do NOT keep exploring or doing bonus work." Prevents agent from endlessly taking screenshots and highlighting elements after answering the question. - switchTab(id, opts): new bringToFront option. Internal tab pinning (BROWSE_TAB) uses bringToFront: false so agent commands never steal window focus from the user's active app. - Keep opus model (not sonnet) for prompt injection resistance on untrusted web pages. Remove Write from allowedTools (agent only needs Bash for $B). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: agent conciseness, focus stealing, opus model, switchTab opts Tests for the three UX fixes: - System prompt contains STOP/CONCISE/Do NOT keep exploring - sidebar agent uses opus (not sonnet) for prompt injection resistance - switchTab has bringToFront option, defaults to true (opt-out) - handleCommand tab pinning uses bringToFront: false (no focus steal) - Updated stale tests: switchTab signature, allowedTools excludes Write, narration -> conciseness, tab pinning restore calls Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: sidebar CSS interaction E2E — HN comment highlight round-trip New E2E test (periodic tier, ~$2/run) that exercises the full sidebar agent pipeline with CSS interaction: 1. Agent navigates to Hacker News 2. Clicks into the top story's comments 3. Reads comments and identifies the most insightful one 4. Highlights it with a 4px solid orange outline via style injection Tests: navigation, snapshot, text reading, LLM judgment, CSS modification. Requires real browser + real Claude (ANTHROPIC_API_KEY). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sidebar CSS E2E test — correct idle timeout (ms not s), pipe stdio Root cause of test failure: BROWSE_IDLE_TIMEOUT is in milliseconds, not seconds. '600' = 0.6 seconds, server died immediately after health check. Fixed to '600000' (10 minutes). Also: use 'pipe' stdio instead of file descriptors (closing fds kills child on macOS/bun), catch ConnectionRefused on poll retry, 4 min poll timeout for the multi-step opus task. Test passes: agent navigates to HN, reads comments, identifies most insightful one, highlights it with orange CSS, stops. 114s, $0.00. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
300 lines
25 KiB
Markdown
300 lines
25 KiB
Markdown
# gstack
|
|
|
|
> "I don't think I've typed like a line of code probably since December, basically, which is an extremely large change." — [Andrej Karpathy](https://fortune.com/2026/03/21/andrej-karpathy-openai-cofounder-ai-agents-coding-state-of-psychosis-openclaw/), No Priors podcast, March 2026
|
|
|
|
When I heard Karpathy say this, I wanted to find out how. How does one person ship like a team of twenty? Peter Steinberger built [OpenClaw](https://github.com/openclaw/openclaw) — 247K GitHub stars — essentially solo with AI agents. The revolution is here. A single builder with the right tooling can move faster than a traditional team.
|
|
|
|
I'm [Garry Tan](https://x.com/garrytan), President & CEO of [Y Combinator](https://www.ycombinator.com/). I've worked with thousands of startups — Coinbase, Instacart, Rippling — when they were one or two people in a garage. Before YC, I was one of the first eng/PM/designers at Palantir, cofounded Posterous (sold to Twitter), and built Bookface, YC's internal social network.
|
|
|
|
**gstack is my answer.** I've been building products for twenty years, and right now I'm shipping more code than I ever have. In the last 60 days: **600,000+ lines of production code** (35% tests), **10,000-20,000 lines per day**, part-time, while running YC full-time. Here's my last `/retro` across 3 projects: **140,751 lines added, 362 commits, ~115k net LOC** in one week.
|
|
|
|
**2026 — 1,237 contributions and counting:**
|
|
|
|

|
|
|
|
**2013 — when I built Bookface at YC (772 contributions):**
|
|
|
|

|
|
|
|
Same person. Different era. The difference is the tooling.
|
|
|
|
**gstack is how I do it.** It turns Claude Code into a virtual engineering team — a CEO who rethinks the product, an eng manager who locks architecture, a designer who catches AI slop, a reviewer who finds production bugs, a QA lead who opens a real browser, a security officer who runs OWASP + STRIDE audits, and a release engineer who ships the PR. Twenty-three specialists and eight power tools, all slash commands, all Markdown, all free, MIT license.
|
|
|
|
This is my open source software factory. I use it every day. I'm sharing it because these tools should be available to everyone.
|
|
|
|
Fork it. Improve it. Make it yours. And if you want to hate on free open source software — you're welcome to, but I'd rather you just try it first.
|
|
|
|
**Who this is for:**
|
|
- **Founders and CEOs** — especially technical ones who still want to ship
|
|
- **First-time Claude Code users** — structured roles instead of a blank prompt
|
|
- **Tech leads and staff engineers** — rigorous review, QA, and release automation on every PR
|
|
|
|
## Quick start
|
|
|
|
1. Install gstack (30 seconds — see below)
|
|
2. Run `/office-hours` — describe what you're building
|
|
3. Run `/plan-ceo-review` on any feature idea
|
|
4. Run `/review` on any branch with changes
|
|
5. Run `/qa` on your staging URL
|
|
6. Stop there. You'll know if this is for you.
|
|
|
|
## Install — 30 seconds
|
|
|
|
**Requirements:** [Claude Code](https://docs.anthropic.com/en/docs/claude-code), [Git](https://git-scm.com/), [Bun](https://bun.sh/) v1.0+, [Node.js](https://nodejs.org/) (Windows only)
|
|
|
|
### Step 1: Install on your machine
|
|
|
|
Open Claude Code and paste this. Claude does the rest.
|
|
|
|
> Install gstack: run **`git clone --single-branch --depth 1 https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup`** then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /design-shotgun, /design-html, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /connect-chrome, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codex, /cso, /autoplan, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade, /learn. Then ask the user if they also want to add gstack to the current project so teammates get it.
|
|
|
|
### Step 2: Add to your repo so teammates get it (optional)
|
|
|
|
> Add gstack to this project: run **`cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`** then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review, /design-consultation, /design-shotgun, /design-html, /review, /ship, /land-and-deploy, /canary, /benchmark, /browse, /connect-chrome, /qa, /qa-only, /design-review, /setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codex, /cso, /autoplan, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade, /learn, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.
|
|
|
|
Real files get committed to your repo (not a submodule), so `git clone` just works. Everything lives inside `.claude/`. Nothing touches your PATH or runs in the background.
|
|
|
|
> **Contributing or need full history?** The commands above use `--depth 1` for a fast install. If you plan to contribute or need full git history, do a full clone instead:
|
|
> ```bash
|
|
> git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack
|
|
> ```
|
|
|
|
### Codex, Gemini CLI, or Cursor
|
|
|
|
gstack works on any agent that supports the [SKILL.md standard](https://github.com/anthropics/claude-code). Skills live in `.agents/skills/` and are discovered automatically.
|
|
|
|
Install to one repo:
|
|
|
|
```bash
|
|
git clone --single-branch --depth 1 https://github.com/garrytan/gstack.git .agents/skills/gstack
|
|
cd .agents/skills/gstack && ./setup --host codex
|
|
```
|
|
|
|
When setup runs from `.agents/skills/gstack`, it installs the generated Codex skills next to it in the same repo and does not write to `~/.codex/skills`.
|
|
|
|
Install once for your user account:
|
|
|
|
```bash
|
|
git clone --single-branch --depth 1 https://github.com/garrytan/gstack.git ~/gstack
|
|
cd ~/gstack && ./setup --host codex
|
|
```
|
|
|
|
`setup --host codex` creates the runtime root at `~/.codex/skills/gstack` and
|
|
links the generated Codex skills at the top level. This avoids duplicate skill
|
|
discovery from the source repo checkout.
|
|
|
|
Or let setup auto-detect which agents you have installed:
|
|
|
|
```bash
|
|
git clone --single-branch --depth 1 https://github.com/garrytan/gstack.git ~/gstack
|
|
cd ~/gstack && ./setup --host auto
|
|
```
|
|
|
|
For Codex-compatible hosts, setup now supports both repo-local installs from `.agents/skills/gstack` and user-global installs from `~/.codex/skills/gstack`. All 31 skills work across all supported agents. Hook-based safety skills (careful, freeze, guard) use inline safety advisory prose on non-Claude hosts.
|
|
|
|
### Factory Droid
|
|
|
|
gstack works with [Factory Droid](https://factory.ai). Skills install to `.factory/skills/` and are discovered automatically. Sensitive skills (ship, land-and-deploy, guard) use `disable-model-invocation: true` so Droids don't auto-invoke them.
|
|
|
|
```bash
|
|
git clone --single-branch --depth 1 https://github.com/garrytan/gstack.git ~/gstack
|
|
cd ~/gstack && ./setup --host factory
|
|
```
|
|
|
|
Skills install to `~/.factory/skills/gstack-*/`. Restart `droid` to rescan skills, then type `/qa` to get started.
|
|
|
|
## See it work
|
|
|
|
```
|
|
You: I want to build a daily briefing app for my calendar.
|
|
You: /office-hours
|
|
Claude: [asks about the pain — specific examples, not hypotheticals]
|
|
|
|
You: Multiple Google calendars, events with stale info, wrong locations.
|
|
Prep takes forever and the results aren't good enough...
|
|
|
|
Claude: I'm going to push back on the framing. You said "daily briefing
|
|
app." But what you actually described is a personal chief of
|
|
staff AI.
|
|
[extracts 5 capabilities you didn't realize you were describing]
|
|
[challenges 4 premises — you agree, disagree, or adjust]
|
|
[generates 3 implementation approaches with effort estimates]
|
|
RECOMMENDATION: Ship the narrowest wedge tomorrow, learn from
|
|
real usage. The full vision is a 3-month project — start with
|
|
the daily briefing that actually works.
|
|
[writes design doc → feeds into downstream skills automatically]
|
|
|
|
You: /plan-ceo-review
|
|
[reads the design doc, challenges scope, runs 10-section review]
|
|
|
|
You: /plan-eng-review
|
|
[ASCII diagrams for data flow, state machines, error paths]
|
|
[test matrix, failure modes, security concerns]
|
|
|
|
You: Approve plan. Exit plan mode.
|
|
[writes 2,400 lines across 11 files. ~8 minutes.]
|
|
|
|
You: /review
|
|
[AUTO-FIXED] 2 issues. [ASK] Race condition → you approve fix.
|
|
|
|
You: /qa https://staging.myapp.com
|
|
[opens real browser, clicks through flows, finds and fixes a bug]
|
|
|
|
You: /ship
|
|
Tests: 42 → 51 (+9 new). PR: github.com/you/app/pull/42
|
|
```
|
|
|
|
You said "daily briefing app." The agent said "you're building a chief of staff AI" — because it listened to your pain, not your feature request. Eight commands, end to end. That is not a copilot. That is a team.
|
|
|
|
## The sprint
|
|
|
|
gstack is a process, not a collection of tools. The skills run in the order a sprint runs:
|
|
|
|
**Think → Plan → Build → Review → Test → Ship → Reflect**
|
|
|
|
Each skill feeds into the next. `/office-hours` writes a design doc that `/plan-ceo-review` reads. `/plan-eng-review` writes a test plan that `/qa` picks up. `/review` catches bugs that `/ship` verifies are fixed. Nothing falls through the cracks because every step knows what came before it.
|
|
|
|
| Skill | Your specialist | What they do |
|
|
|-------|----------------|--------------|
|
|
| `/office-hours` | **YC Office Hours** | Start here. Six forcing questions that reframe your product before you write code. Pushes back on your framing, challenges premises, generates implementation alternatives. Design doc feeds into every downstream skill. |
|
|
| `/plan-ceo-review` | **CEO / Founder** | Rethink the problem. Find the 10-star product hiding inside the request. Four modes: Expansion, Selective Expansion, Hold Scope, Reduction. |
|
|
| `/plan-eng-review` | **Eng Manager** | Lock in architecture, data flow, diagrams, edge cases, and tests. Forces hidden assumptions into the open. |
|
|
| `/plan-design-review` | **Senior Designer** | Rates each design dimension 0-10, explains what a 10 looks like, then edits the plan to get there. AI Slop detection. Interactive — one AskUserQuestion per design choice. |
|
|
| `/design-consultation` | **Design Partner** | Build a complete design system from scratch. Researches the landscape, proposes creative risks, generates realistic product mockups. |
|
|
| `/review` | **Staff Engineer** | Find the bugs that pass CI but blow up in production. Auto-fixes the obvious ones. Flags completeness gaps. |
|
|
| `/investigate` | **Debugger** | Systematic root-cause debugging. Iron Law: no fixes without investigation. Traces data flow, tests hypotheses, stops after 3 failed fixes. |
|
|
| `/design-review` | **Designer Who Codes** | Same audit as /plan-design-review, then fixes what it finds. Atomic commits, before/after screenshots. |
|
|
| `/design-shotgun` | **Design Explorer** | Generate multiple AI design variants, open a comparison board in your browser, and iterate until you approve a direction. Taste memory biases toward your preferences. |
|
|
| `/design-html` | **Design Engineer** | Takes an approved mockup from `/design-shotgun` and generates production-quality HTML with Pretext for computed text layout. Text reflows on resize, heights adjust to content. Smart API routing picks the right Pretext patterns per design type. Framework detection for React/Svelte/Vue. |
|
|
| `/qa` | **QA Lead** | Test your app, find bugs, fix them with atomic commits, re-verify. Auto-generates regression tests for every fix. |
|
|
| `/qa-only` | **QA Reporter** | Same methodology as /qa but report only. Pure bug report without code changes. |
|
|
| `/cso` | **Chief Security Officer** | OWASP Top 10 + STRIDE threat model. Zero-noise: 17 false positive exclusions, 8/10+ confidence gate, independent finding verification. Each finding includes a concrete exploit scenario. |
|
|
| `/ship` | **Release Engineer** | Sync main, run tests, audit coverage, push, open PR. Bootstraps test frameworks if you don't have one. |
|
|
| `/land-and-deploy` | **Release Engineer** | Merge the PR, wait for CI and deploy, verify production health. One command from "approved" to "verified in production." |
|
|
| `/canary` | **SRE** | Post-deploy monitoring loop. Watches for console errors, performance regressions, and page failures. |
|
|
| `/benchmark` | **Performance Engineer** | Baseline page load times, Core Web Vitals, and resource sizes. Compare before/after on every PR. |
|
|
| `/document-release` | **Technical Writer** | Update all project docs to match what you just shipped. Catches stale READMEs automatically. |
|
|
| `/retro` | **Eng Manager** | Team-aware weekly retro. Per-person breakdowns, shipping streaks, test health trends, growth opportunities. `/retro global` runs across all your projects and AI tools (Claude Code, Codex, Gemini). |
|
|
| `/browse` | **QA Engineer** | Give the agent eyes. Real Chromium browser, real clicks, real screenshots. ~100ms per command. `$B connect` launches your real Chrome as a headed window — watch every action live. |
|
|
| `/setup-browser-cookies` | **Session Manager** | Import cookies from your real browser (Chrome, Arc, Brave, Edge) into the headless session. Test authenticated pages. |
|
|
| `/autoplan` | **Review Pipeline** | One command, fully reviewed plan. Runs CEO → design → eng review automatically with encoded decision principles. Surfaces only taste decisions for your approval. |
|
|
| `/learn` | **Memory** | Manage what gstack learned across sessions. Review, search, prune, and export project-specific patterns, pitfalls, and preferences. Learnings compound across sessions so gstack gets smarter on your codebase over time. |
|
|
|
|
### Power tools
|
|
|
|
| Skill | What it does |
|
|
|-------|-------------|
|
|
| `/codex` | **Second Opinion** — independent code review from OpenAI Codex CLI. Three modes: review (pass/fail gate), adversarial challenge, and open consultation. Cross-model analysis when both `/review` and `/codex` have run. |
|
|
| `/careful` | **Safety Guardrails** — warns before destructive commands (rm -rf, DROP TABLE, force-push). Say "be careful" to activate. Override any warning. |
|
|
| `/freeze` | **Edit Lock** — restrict file edits to one directory. Prevents accidental changes outside scope while debugging. |
|
|
| `/guard` | **Full Safety** — `/careful` + `/freeze` in one command. Maximum safety for prod work. |
|
|
| `/unfreeze` | **Unlock** — remove the `/freeze` boundary. |
|
|
| `/connect-chrome` | **Chrome Controller** — launch Chrome with the Side Panel extension. Watch every action live, inspect CSS on any element, clean up pages, and take screenshots. Each tab gets its own agent. |
|
|
| `/setup-deploy` | **Deploy Configurator** — one-time setup for `/land-and-deploy`. Detects your platform, production URL, and deploy commands. |
|
|
| `/gstack-upgrade` | **Self-Updater** — upgrade gstack to latest. Detects global vs vendored install, syncs both, shows what changed. |
|
|
|
|
**[Deep dives with examples and philosophy for every skill →](docs/skills.md)**
|
|
|
|
## Parallel sprints
|
|
|
|
gstack works well with one sprint. It gets interesting with ten running at once.
|
|
|
|
**Design is at the heart.** `/design-consultation` builds your design system from scratch, researches the space, proposes creative risks, and writes `DESIGN.md`. `/design-shotgun` generates multiple visual variants and opens a comparison board so you can pick a direction. `/design-html` takes that approved mockup and generates production-quality HTML with Pretext, where text actually reflows on resize instead of breaking with hardcoded heights. Then `/design-review` and `/plan-eng-review` read what you chose. Design decisions flow through the whole system.
|
|
|
|
**`/qa` was a massive unlock.** It let me go from 6 to 12 parallel workers. Claude Code saying *"I SEE THE ISSUE"* and then actually fixing it, generating a regression test, and verifying the fix — that changed how I work. The agent has eyes now.
|
|
|
|
**Smart review routing.** Just like at a well-run startup: CEO doesn't have to look at infra bug fixes, design review isn't needed for backend changes. gstack tracks what reviews are run, figures out what's appropriate, and just does the smart thing. The Review Readiness Dashboard tells you where you stand before you ship.
|
|
|
|
**Test everything.** `/ship` bootstraps test frameworks from scratch if your project doesn't have one. Every `/ship` run produces a coverage audit. Every `/qa` bug fix generates a regression test. 100% test coverage is the goal — tests make vibe coding safe instead of yolo coding.
|
|
|
|
**`/document-release` is the engineer you never had.** It reads every doc file in your project, cross-references the diff, and updates everything that drifted. README, ARCHITECTURE, CONTRIBUTING, CLAUDE.md, TODOS — all kept current automatically. And now `/ship` auto-invokes it — docs stay current without an extra command.
|
|
|
|
**Real browser mode.** `$B connect` launches your actual Chrome as a headed window controlled by Playwright. You watch Claude click, fill, and navigate in real time — same window, same screen. A subtle green shimmer at the top edge tells you which Chrome window gstack controls. All existing browse commands work unchanged. `$B disconnect` returns to headless. A Chrome extension Side Panel shows a live activity feed of every command and a chat sidebar where you can direct Claude. This is co-presence — Claude isn't remote-controlling a hidden browser, it's sitting next to you in the same cockpit.
|
|
|
|
**Sidebar agent — your AI browser assistant.** Type natural language instructions in the Chrome side panel and a child Claude instance executes them. "Navigate to the settings page and screenshot it." "Fill out this form with test data." "Go through every item in this list and extract the prices." Each task gets up to 5 minutes. The sidebar agent runs in an isolated session, so it won't interfere with your main Claude Code window. It's like having a second pair of hands in the browser.
|
|
|
|
**Personal automation.** The sidebar agent isn't just for dev workflows. Example: "Browse my kid's school parent portal and add all the other parents' names, phone numbers, and photos to my Google Contacts." Two ways to get authenticated: (1) log in once in the headed browser — your session persists, or (2) run `/setup-browser-cookies` to import cookies from your real Chrome. Once authenticated, Claude navigates the directory, extracts the data, and creates the contacts.
|
|
|
|
**Browser handoff when the AI gets stuck.** Hit a CAPTCHA, auth wall, or MFA prompt? `$B handoff` opens a visible Chrome at the exact same page with all your cookies and tabs intact. Solve the problem, tell Claude you're done, `$B resume` picks up right where it left off. The agent even suggests it automatically after 3 consecutive failures.
|
|
|
|
**Multi-AI second opinion.** `/codex` gets an independent review from OpenAI's Codex CLI — a completely different AI looking at the same diff. Three modes: code review with a pass/fail gate, adversarial challenge that actively tries to break your code, and open consultation with session continuity. When both `/review` (Claude) and `/codex` (OpenAI) have reviewed the same branch, you get a cross-model analysis showing which findings overlap and which are unique to each.
|
|
|
|
**Safety guardrails on demand.** Say "be careful" and `/careful` warns before any destructive command — rm -rf, DROP TABLE, force-push, git reset --hard. `/freeze` locks edits to one directory while debugging so Claude can't accidentally "fix" unrelated code. `/guard` activates both. `/investigate` auto-freezes to the module being investigated.
|
|
|
|
**Proactive skill suggestions.** gstack notices what stage you're in — brainstorming, reviewing, debugging, testing — and suggests the right skill. Don't like it? Say "stop suggesting" and it remembers across sessions.
|
|
|
|
## 10-15 parallel sprints
|
|
|
|
gstack is powerful with one sprint. It is transformative with ten running at once.
|
|
|
|
[Conductor](https://conductor.build) runs multiple Claude Code sessions in parallel — each in its own isolated workspace. One session running `/office-hours` on a new idea, another doing `/review` on a PR, a third implementing a feature, a fourth running `/qa` on staging, and six more on other branches. All at the same time. I regularly run 10-15 parallel sprints — that's the practical max right now.
|
|
|
|
The sprint structure is what makes parallelism work. Without a process, ten agents is ten sources of chaos. With a process — think, plan, build, review, test, ship — each agent knows exactly what to do and when to stop. You manage them the way a CEO manages a team: check in on the decisions that matter, let the rest run.
|
|
|
|
---
|
|
|
|
Free, MIT licensed, open source. No premium tier, no waitlist.
|
|
|
|
I open sourced how I build software. You can fork it and make it your own.
|
|
|
|
> **We're hiring.** Want to ship 10K+ LOC/day and help harden gstack?
|
|
> Come work at YC — [ycombinator.com/software](https://ycombinator.com/software)
|
|
> Extremely competitive salary and equity. San Francisco, Dogpatch District.
|
|
|
|
## Docs
|
|
|
|
| Doc | What it covers |
|
|
|-----|---------------|
|
|
| [Skill Deep Dives](docs/skills.md) | Philosophy, examples, and workflow for every skill (includes Greptile integration) |
|
|
| [Builder Ethos](ETHOS.md) | Builder philosophy: Boil the Lake, Search Before Building, three layers of knowledge |
|
|
| [Architecture](ARCHITECTURE.md) | Design decisions and system internals |
|
|
| [Browser Reference](BROWSER.md) | Full command reference for `/browse` |
|
|
| [Contributing](CONTRIBUTING.md) | Dev setup, testing, contributor mode, and dev mode |
|
|
| [Changelog](CHANGELOG.md) | What's new in every version |
|
|
|
|
## Privacy & Telemetry
|
|
|
|
gstack includes **opt-in** usage telemetry to help improve the project. Here's exactly what happens:
|
|
|
|
- **Default is off.** Nothing is sent anywhere unless you explicitly say yes.
|
|
- **On first run,** gstack asks if you want to share anonymous usage data. You can say no.
|
|
- **What's sent (if you opt in):** skill name, duration, success/fail, gstack version, OS. That's it.
|
|
- **What's never sent:** code, file paths, repo names, branch names, prompts, or any user-generated content.
|
|
- **Change anytime:** `gstack-config set telemetry off` disables everything instantly.
|
|
|
|
Data is stored in [Supabase](https://supabase.com) (open source Firebase alternative). The schema is in [`supabase/migrations/`](supabase/migrations/) — you can verify exactly what's collected. The Supabase publishable key in the repo is a public key (like a Firebase API key) — row-level security policies deny all direct access. Telemetry flows through validated edge functions that enforce schema checks, event type allowlists, and field length limits.
|
|
|
|
**Local analytics are always available.** Run `gstack-analytics` to see your personal usage dashboard from the local JSONL file — no remote data needed.
|
|
|
|
## Troubleshooting
|
|
|
|
**Skill not showing up?** `cd ~/.claude/skills/gstack && ./setup`
|
|
|
|
**`/browse` fails?** `cd ~/.claude/skills/gstack && bun install && bun run build`
|
|
|
|
**Stale install?** Run `/gstack-upgrade` — or set `auto_upgrade: true` in `~/.gstack/config.yaml`
|
|
|
|
**Want shorter commands?** `cd ~/.claude/skills/gstack && ./setup --no-prefix` — switches from `/gstack-qa` to `/qa`. Your choice is remembered for future upgrades.
|
|
|
|
**Want namespaced commands?** `cd ~/.claude/skills/gstack && ./setup --prefix` — switches from `/qa` to `/gstack-qa`. Useful if you run other skill packs alongside gstack.
|
|
|
|
**Codex says "Skipped loading skill(s) due to invalid SKILL.md"?** Your Codex skill descriptions are stale. Fix: `cd ~/.codex/skills/gstack && git pull && ./setup --host codex` — or for repo-local installs: `cd "$(readlink -f .agents/skills/gstack)" && git pull && ./setup --host codex`
|
|
|
|
**Windows users:** gstack works on Windows 11 via Git Bash or WSL. Node.js is required in addition to Bun — Bun has a known bug with Playwright's pipe transport on Windows ([bun#4253](https://github.com/oven-sh/bun/issues/4253)). The browse server automatically falls back to Node.js. Make sure both `bun` and `node` are on your PATH.
|
|
|
|
**Claude says it can't see the skills?** Make sure your project's `CLAUDE.md` has a gstack section. Add this:
|
|
|
|
```
|
|
## gstack
|
|
Use /browse from gstack for all web browsing. Never use mcp__claude-in-chrome__* tools.
|
|
Available skills: /office-hours, /plan-ceo-review, /plan-eng-review, /plan-design-review,
|
|
/design-consultation, /design-shotgun, /design-html, /review, /ship, /land-and-deploy,
|
|
/canary, /benchmark, /browse, /connect-chrome, /qa, /qa-only, /design-review,
|
|
/setup-browser-cookies, /setup-deploy, /retro, /investigate, /document-release, /codex,
|
|
/cso, /autoplan, /careful, /freeze, /guard, /unfreeze, /gstack-upgrade, /learn.
|
|
```
|
|
|
|
## License
|
|
|
|
MIT. Free forever. Go build something.
|