docs: self-learning roadmap — 5-release design doc

Covers: R1 GStack Learns (v0.14), R2 Review Army (v0.15), R3 Smart Ceremony
(v0.16), R4 /autoship (v0.17), R5 Studio (v0.18). Inspired by Compound
Engineering, adapted to GStack's architecture.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-28 22:56:22 -07:00
parent 7f98931e28
commit 65f4264667
+139
View File
@@ -0,0 +1,139 @@
# Design: GStack Self-Learning Infrastructure
Generated by /office-hours + /plan-ceo-review + /plan-eng-review on 2026-03-28
Branch: garrytan/ce-features
Repo: gstack
Status: ACTIVE
Mode: Open Source / Community
## Problem Statement
GStack runs 30+ skills across sessions but learns nothing between them. A /review
session catches an N+1 query pattern, and the next /review on the same codebase
starts from scratch. A /ship run discovers the test command, and every future /ship
re-discovers it. A /investigate finds a tricky race condition, and no future session
knows about it.
Every AI coding tool has this problem. Cursor has per-user memory. Claude Code has
CLAUDE.md. Windsurf has persistent context. But none of them compound. None of them
structure what they learn. None of them share knowledge across skills.
## What We're Building
Per-project institutional knowledge that compounds across sessions and skills.
Structured, typed, confidence-scored learnings that every gstack skill can read and
write. The goal: after 20 sessions on the same codebase, gstack knows every
architectural decision, every past bug pattern, and every time it was wrong.
## North Star
/autoship (Release 4). A full engineering team in one command. Describe a feature,
approve the plan, everything else is automatic. /autoship can't work without
learnings, because without memory it repeats the same mistakes. Releases 1-3 are
the infrastructure that makes /autoship actually work.
## Audience
YC founders building with AI. The people who run gstack on real codebases 20+ times
a week and notice when it asks the same question twice.
## Differentiation
| Tool | Memory model | Scope | Structure |
|------|-------------|-------|-----------|
| Cursor | Per-user chat memory | Per-session | Unstructured |
| CLAUDE.md | Static file | Per-project | Manual |
| Windsurf | Persistent context | Per-session | Unstructured |
| **GStack** | **Per-project JSONL** | **Cross-session, cross-skill** | **Typed, scored, decaying** |
---
## Release Roadmap
### Release 1: "GStack Learns" (v0.14)
**Headline:** Every session makes the next one smarter.
What ships:
- Learnings persistence at `~/.gstack/projects/{slug}/learnings.jsonl`
- `/learn` skill for manual review, search, prune, export
- Confidence calibration on all review findings (1-10 scores with display rules)
- Confidence decay for observed/inferred learnings (1pt/30d)
- Cross-project learnings discovery (opt-in, AskUserQuestion consent)
- "Learning applied" callouts when reviews match past learnings
- Integration into /review, /ship, /plan-*, /office-hours, /investigate, /retro
Schema (Supabase-compatible):
```json
{
"ts": "2026-03-28T12:00:00Z",
"skill": "review",
"type": "pitfall",
"key": "n-plus-one-activerecord",
"insight": "Always check includes() for has_many in list endpoints",
"confidence": 8,
"source": "observed",
"branch": "feature-x",
"commit": "abc1234",
"files": ["app/models/user.rb"]
}
```
Types: `pattern` | `pitfall` | `preference` | `architecture` | `tool`
Sources: `observed` | `user-stated` | `inferred` | `cross-model`
Architecture: append-only JSONL. Duplicates resolved at read time ("latest winner"
per key+type). No write-time mutation, no race conditions. Follows the existing
gstack-review-log pattern.
### Release 2: "Review Army" (v0.15)
**Headline:** 10 specialist reviewers on every PR.
What ships:
- Parallel review agents: always-on (correctness, testing, maintainability) +
conditional (security, performance, API, data-migrations, reliability) +
stack-specific (Rails, TypeScript, Python, frontend-races)
- Red team reviewer activated for large diffs and high-risk domains
- Structured findings with confidence scores + merge/dedup across agents
### Release 3: "Smart Ceremony" (v0.16)
**Headline:** GStack respects your time.
What ships:
- Scope assessment (TINY/SMALL/MEDIUM/LARGE) in /review, /ship, /autoplan
- Ceremony skipping based on diff size and scope category
- File-based todo lifecycle (/triage for interactive approval, /resolve for batch
resolution via parallel agents)
### Release 4: "/autoship — One Command, Full Feature" (v0.17)
**Headline:** Describe a feature. Approve the plan. Everything else is automatic.
What ships:
- /autoship autonomous pipeline: office-hours → autoplan → build → review → qa →
ship → learn. 7 phases, 1 approval gate (the plan).
- /ideate brainstorming skill (parallel divergent agents + adversarial filtering)
- Research agents in /plan-eng-review (codebase analyst, history analyst,
best practices researcher, learnings researcher)
### Release 5: "Studio" (v0.18)
**Headline:** The full-stack AI engineering studio.
What ships:
- Figma design sync (pixel-matching iteration loop)
- Feature video recording (auto-generated PR demos)
- PR feedback resolution (parallel comment resolver)
- Swarm orchestration (multi-worktree parallel builds)
- /onboard (auto-generated contributor guide)
- /triage-prs (batch PR triage for maintainers)
- Codex build delegation (delegate implementation to Codex CLI)
- Cross-platform portability (Copilot, Kiro, Windsurf output)
---
## Acknowledged Inspiration
The self-learning roadmap was inspired by ideas from the [Compound Engineering](https://github.com/nicobailon/compound-engineering) project by Nico Bailon. Their exploration of learnings persistence, parallel review agents, and autonomous pipelines catalyzed the design of GStack's approach. We adapted every concept to fit GStack's template system, voice, and architecture rather than porting directly.