Files
gstack/docs/designs/SELF_LEARNING_V0.md
T
Garry Tan ae0a9ad195 feat: GStack Learns — per-project self-learning infrastructure (v0.13.4.0) (#622)
* feat: learnings + confidence resolvers — cross-skill memory infrastructure

Three new resolvers for the self-learning system:
- LEARNINGS_SEARCH: tells skills to load prior learnings before analysis
- LEARNINGS_LOG: tells skills to capture discoveries after completing work
- CONFIDENCE_CALIBRATION: adds 1-10 confidence scoring to all review findings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: learnings bin scripts — append-only JSONL read/write

gstack-learnings-log: validates JSON, auto-injects timestamp, appends to
~/.gstack/projects/$SLUG/learnings.jsonl. Append-only (no mutation).

gstack-learnings-search: reads/filters/dedupes learnings with confidence
decay (observed/inferred lose 1pt/30d), cross-project discovery, and
"latest winner" resolution per key+type.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: learnings count in preamble output

Every skill now prints "LEARNINGS: N entries loaded" during preamble,
making the compounding loop visible to the user.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: integrate learnings + confidence into 9 skill templates

Add {{LEARNINGS_SEARCH}}, {{LEARNINGS_LOG}}, and {{CONFIDENCE_CALIBRATION}}
placeholders to review, ship, plan-eng-review, plan-ceo-review, office-hours,
investigate, retro, and cso templates. Regenerated all SKILL.md files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: /learn skill — manage project learnings

New skill for reviewing, searching, pruning, and exporting what gstack
has learned across sessions. Commands: /learn, /learn search, /learn prune,
/learn export, /learn stats, /learn add.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: self-learning roadmap — 5-release design doc

Covers: R1 GStack Learns (v0.14), R2 Review Army (v0.15), R3 Smart Ceremony
(v0.16), R4 /autoship (v0.17), R5 Studio (v0.18). Inspired by Compound
Engineering, adapted to GStack's architecture.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: learnings bin script unit tests — 13 tests, free

Tests gstack-learnings-log (valid/invalid JSON, timestamp injection,
append-only) and gstack-learnings-search (dedup, type/query/limit filters,
confidence decay, user-stated no-decay, malformed JSONL skip).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.4.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: learnings resolver + bin script edge case tests — 21 new tests, free

Adds gen-skill-docs coverage for LEARNINGS_SEARCH, LEARNINGS_LOG, and
CONFIDENCE_CALIBRATION resolvers. Adds bin script edge cases: timestamp
preservation, special characters, files array, sort order, type grouping,
combined filtering, missing fields, confidence floor at 0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: sync package.json version with VERSION file (0.13.4.0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: gitignore .factory/ — generated output, not source

Same pattern as .claude/skills/ and .agents/. These SKILL.md files are
generated from .tmpl templates by gen:skill-docs --host factory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: /learn E2E — seed 3 learnings, verify agent surfaces them

Seeds N+1 query pattern, stale cache pitfall, and rubocop preference
into learnings.jsonl, then runs /learn and checks that at least 2/3
appear in the agent's output. Gate tier, ~$0.25/run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 17:02:01 -06:00

5.5 KiB

Design: GStack Self-Learning Infrastructure

Generated by /office-hours + /plan-ceo-review + /plan-eng-review on 2026-03-28 Branch: garrytan/ce-features Repo: gstack Status: ACTIVE Mode: Open Source / Community

Problem Statement

GStack runs 30+ skills across sessions but learns nothing between them. A /review session catches an N+1 query pattern, and the next /review on the same codebase starts from scratch. A /ship run discovers the test command, and every future /ship re-discovers it. A /investigate finds a tricky race condition, and no future session knows about it.

Every AI coding tool has this problem. Cursor has per-user memory. Claude Code has CLAUDE.md. Windsurf has persistent context. But none of them compound. None of them structure what they learn. None of them share knowledge across skills.

What We're Building

Per-project institutional knowledge that compounds across sessions and skills. Structured, typed, confidence-scored learnings that every gstack skill can read and write. The goal: after 20 sessions on the same codebase, gstack knows every architectural decision, every past bug pattern, and every time it was wrong.

North Star

/autoship (Release 4). A full engineering team in one command. Describe a feature, approve the plan, everything else is automatic. /autoship can't work without learnings, because without memory it repeats the same mistakes. Releases 1-3 are the infrastructure that makes /autoship actually work.

Audience

YC founders building with AI. The people who run gstack on real codebases 20+ times a week and notice when it asks the same question twice.

Differentiation

Tool Memory model Scope Structure
Cursor Per-user chat memory Per-session Unstructured
CLAUDE.md Static file Per-project Manual
Windsurf Persistent context Per-session Unstructured
GStack Per-project JSONL Cross-session, cross-skill Typed, scored, decaying

Release Roadmap

Release 1: "GStack Learns" (v0.14)

Headline: Every session makes the next one smarter.

What ships:

  • Learnings persistence at ~/.gstack/projects/{slug}/learnings.jsonl
  • /learn skill for manual review, search, prune, export
  • Confidence calibration on all review findings (1-10 scores with display rules)
  • Confidence decay for observed/inferred learnings (1pt/30d)
  • Cross-project learnings discovery (opt-in, AskUserQuestion consent)
  • "Learning applied" callouts when reviews match past learnings
  • Integration into /review, /ship, /plan-*, /office-hours, /investigate, /retro

Schema (Supabase-compatible):

{
  "ts": "2026-03-28T12:00:00Z",
  "skill": "review",
  "type": "pitfall",
  "key": "n-plus-one-activerecord",
  "insight": "Always check includes() for has_many in list endpoints",
  "confidence": 8,
  "source": "observed",
  "branch": "feature-x",
  "commit": "abc1234",
  "files": ["app/models/user.rb"]
}

Types: pattern | pitfall | preference | architecture | tool Sources: observed | user-stated | inferred | cross-model

Architecture: append-only JSONL. Duplicates resolved at read time ("latest winner" per key+type). No write-time mutation, no race conditions. Follows the existing gstack-review-log pattern.

Release 2: "Review Army" (v0.15)

Headline: 10 specialist reviewers on every PR.

What ships:

  • Parallel review agents: always-on (correctness, testing, maintainability) + conditional (security, performance, API, data-migrations, reliability) + stack-specific (Rails, TypeScript, Python, frontend-races)
  • Red team reviewer activated for large diffs and high-risk domains
  • Structured findings with confidence scores + merge/dedup across agents

Release 3: "Smart Ceremony" (v0.16)

Headline: GStack respects your time.

What ships:

  • Scope assessment (TINY/SMALL/MEDIUM/LARGE) in /review, /ship, /autoplan
  • Ceremony skipping based on diff size and scope category
  • File-based todo lifecycle (/triage for interactive approval, /resolve for batch resolution via parallel agents)

Release 4: "/autoship — One Command, Full Feature" (v0.17)

Headline: Describe a feature. Approve the plan. Everything else is automatic.

What ships:

  • /autoship autonomous pipeline: office-hours → autoplan → build → review → qa → ship → learn. 7 phases, 1 approval gate (the plan).
  • /ideate brainstorming skill (parallel divergent agents + adversarial filtering)
  • Research agents in /plan-eng-review (codebase analyst, history analyst, best practices researcher, learnings researcher)

Release 5: "Studio" (v0.18)

Headline: The full-stack AI engineering studio.

What ships:

  • Figma design sync (pixel-matching iteration loop)
  • Feature video recording (auto-generated PR demos)
  • PR feedback resolution (parallel comment resolver)
  • Swarm orchestration (multi-worktree parallel builds)
  • /onboard (auto-generated contributor guide)
  • /triage-prs (batch PR triage for maintainers)
  • Codex build delegation (delegate implementation to Codex CLI)
  • Cross-platform portability (Copilot, Kiro, Windsurf output)

Acknowledged Inspiration

The self-learning roadmap was inspired by ideas from the Compound Engineering project by Nico Bailon. Their exploration of learnings persistence, parallel review agents, and autonomous pipelines catalyzed the design of GStack's approach. We adapted every concept to fit GStack's template system, voice, and architecture rather than porting directly.