mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-01 19:25:10 +02:00
0a803f9e81
* docs: add design doc for /plan-tune v1 (observational substrate) Canonical record of the /plan-tune v1 design: typed question registry, per-question explicit preferences, inline tune: feedback with user-origin gate, dual-track profile (declared + inferred separately), and plain-English inspection skill. Captures every decision with pros/cons, what's deferred to v2 with explicit acceptance criteria, and what was rejected entirely. Codex review drove a substantial scope rollback from the initial CEO EXPANSION plan. 15+ legitimate findings (substrate claim was false without a typed registry; E4/E6/clamp logical contradiction; profile poisoning attack surface; LANDED preamble side effect; implementation order) shaped the final shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: typed question registry for /plan-tune v1 foundation scripts/question-registry.ts declares 53 recurring AskUserQuestion categories across 15 skills (ship, review, office-hours, plan-ceo-review, plan-eng-review, plan-design-review, plan-devex-review, qa, investigate, land-and-deploy, cso, gstack-upgrade, preamble, plan-tune, autoplan). Each entry has: stable kebab-case id, skill owner, category (approval | clarification | routing | cherry-pick | feedback-loop), door_type (one-way | two-way), optional stable option keys, optional psychographic signal_key, and a one-line description. 12 of 53 are one-way doors (destructive ops, architecture/data forks, security/compliance). These are ALWAYS asked regardless of user preference. Helpers: getQuestion(id), getOneWayDoorIds(), getAllRegisteredIds(), getRegistryStats(). No binary or resolver wiring yet — this is the schema substrate the rest of /plan-tune builds on. Ad-hoc question_ids (not registered) still log but skip psychographic signal attribution. Future /plan-tune skill surfaces frequently-firing ad-hoc ids as candidates for registry promotion. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: registry schema + safety + coverage tests (gate tier) 20 tests validating the question registry: Schema (7 tests): - Every entry has required fields - All ids are kebab-case and start with their skill name - No duplicate ids - Categories are from the allowed set - door_type is one-way | two-way - Options arrays are well-formed - Descriptions are short and single-line Helpers (5 tests): - getQuestion returns entry for known id, undefined for unknown - getOneWayDoorIds includes destructive questions, excludes two-way - getAllRegisteredIds count matches QUESTIONS keys - getRegistryStats totals are internally consistent One-way door safety (2 tests): - Every critical question (test failure, SQL safety, LLM trust boundary, security scan, merge confirm, rollback, fix apply, premise revise, arch finding, privacy gate, user challenge) is declared one-way - At least 10 one-way doors exist (catches regression if declarations are accidentally dropped) Registry breadth (3 tests): - 11 high-volume skills each have >= 1 registered question - Preamble one-time prompts are registered - /plan-tune's own questions are registered Signal map references (1 test): - signal_key values are typed kebab-case strings Template coverage (2 tests, informational): - AskUserQuestion usage across templates is non-trivial (>20) - Registry spans >= 10 skills 20 pass, 0 fail. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: one-way door classifier (belt-and-suspenders safety fallback) scripts/one-way-doors.ts — secondary keyword-pattern classifier that catches destructive questions even when the registry doesn't have an entry for them. The registry's door_type field (from scripts/question-registry.ts) is the PRIMARY safety gate. This classifier is the fallback for ad-hoc question_ids that agents generate at runtime. Classification priority: 1. Registry lookup by question_id → use declared door_type 2. Skill:category fallback (cso:approval, land-and-deploy:approval) 3. Keyword pattern match against question_summary 4. Default: treat as two-way (safer to log the miss than auto-decide unsafely) Covers 21 destructive patterns across: - File system (rm -rf, delete, wipe, purge, truncate) - Database (drop table/database/schema, delete from) - Git/VCS (force-push, reset --hard, checkout --, branch -D) - Deploy/infra (kubectl delete, terraform destroy, rollback) - Credentials (revoke/reset/rotate API key|token|secret|password) - Architecture (breaking change, schema migration, data model change) 7 new tests in test/plan-tune.test.ts covering: registry-first lookup, unknown-id fallthrough, keyword matching on destructive phrasings including embedded filler words ("rotate the API key"), skill-category fallback, benign questions defaulting to two-way, pattern-list non-empty. 27 pass, 0 fail. 1270 expect() calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: psychographic signal map + builder archetypes scripts/psychographic-signals.ts — hand-crafted {signal_key, user_choice} → {dimension, delta} map. Version 0.1.0. Conservative deltas (±0.03 to ±0.06 per event). Covers 9 signal keys: scope-appetite, architecture-care, code-quality-care, test-discipline, detail-preference, design-care, devex-care, distribution-care, session-mode. Helpers: applySignal() mutates running totals, newDimensionTotals() creates empty starting state, normalizeToDimensionValue() sigmoid-clamps accumulated delta to [0,1] (0 → 0.5 neutral), validateRegistrySignalKeys() checks that every signal_key in the registry has a SIGNAL_MAP entry. In v1 the signal map is used ONLY to compute inferred dimension values for /plan-tune inspection output. No skill behavior adapts to these signals until v2. scripts/archetypes.ts — 8 named archetypes + Polymath fallback: - Cathedral Builder (boil-the-ocean + architecture-first) - Ship-It Pragmatist (small scope + fast) - Deep Craft (detail-verbose + principled) - Taste Maker (intuitive, overrides recommendations) - Solo Operator (high-autonomy, delegates) - Consultant (hands-on, consulted on everything) - Wedge Hunter (narrow scope aggressively) - Builder-Coach (balanced steering) - Polymath (fallback when no archetype matches) matchArchetype() uses L2 distance scaled by tightness, with a 0.55 threshold below which we return Polymath. v1 ships the model stable; v2 narrative/vibe commands wire it into user-facing output. 14 new tests: signal map consistency vs registry, applySignal behavior for known/unknown keys, normalization bounds, archetype schema validity, name uniqueness, matchArchetype correctness for each reference profile, Polymath fallback for outliers. 41 pass, 0 fail total in test/plan-tune.test.ts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: bin/gstack-question-log — append validated AskUserQuestion events Append-only JSONL log at ~/.gstack/projects/{SLUG}/question-log.jsonl. Schema: {skill, question_id, question_summary, category?, door_type?, options_count?, user_choice, recommended?, followed_recommendation?, session_id?, ts} Validates: - skill is kebab-case - question_id is kebab-case, <= 64 chars - question_summary non-empty, <= 200 chars, newlines flattened - category is one of approval/clarification/routing/cherry-pick/feedback-loop - door_type is one-way or two-way - options_count is integer in [1, 26] - user_choice non-empty string, <= 64 chars Injection defense on question_summary rejects the same patterns as gstack-learnings-log (ignore previous instructions, system:, override:, do not report, etc). followed_recommendation is auto-computed when both user_choice and recommended are present. ts auto-injected as ISO 8601 if missing. 21 tests covering: valid payloads, full field preservation, auto-followed computation, appending, long-summary truncation, newline flattening, invalid JSON, missing fields, bad case, oversized ids, invalid enum values, out-of-range options_count, and 6 injection attack patterns. 21 pass, 0 fail, 43 expect() calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: bin/gstack-developer-profile — unified profile with migration bin/gstack-developer-profile supersedes bin/gstack-builder-profile. The old binary becomes a one-line legacy shim delegating to --read for /office-hours backward compat. Subcommands: --read legacy KEY:VALUE output (tier, session_count, etc) --migrate folds ~/.gstack/builder-profile.jsonl into ~/.gstack/developer-profile.json. Atomic (temp + rename), idempotent (no-op when target exists or source absent), archives source as .migrated-YYYY-MM-DD-HHMMSS --derive recomputes inferred dimensions from question-log.jsonl using the signal map in scripts/psychographic-signals.ts --profile full profile JSON --gap declared vs inferred diff JSON --trace <dim> event-level trace of what contributed to a dimension --check-mismatch flags dimensions where declared and inferred disagree by > 0.3 (requires >= 10 events first) --vibe archetype name + description from scripts/archetypes.ts --narrative (v2 stub) Auto-migration on first read: if legacy file exists and new file doesn't, migrate before reading. Creates a neutral (all-0.5) stub if nothing exists. Unified schema (see docs/designs/PLAN_TUNING_V0.md §Architecture): {identity, declared, inferred: {values, sample_size, diversity}, gap, overrides, sessions, signals_accumulated, schema_version} 25 new tests across subcommand behaviors: - --read defaults + stub creation - --migrate: 3 sessions preserved with signal tallies, idempotency, archival - Tier calculation: welcome_back / regular / inner_circle boundaries - --derive: neutral-when-empty, upward nudge on 'expand', downward on 'reduce', recomputable (same input → same output), ad-hoc unregistered ids ignored - --trace: contributing events, empty for untouched dims, error without arg - --gap: empty when no declared, correctly computed otherwise - --vibe: returns archetype name + description - --check-mismatch: threshold behavior, 10+ sample requirement - Unknown subcommand errors 25 pass, 0 fail, 60 expect() calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: bin/gstack-question-preference — explicit preferences + user-origin gate Subcommands: --check <id> → ASK_NORMALLY | AUTO_DECIDE (decides if a registered question should be auto-decided by the agent) --write '{…}' → set a preference (requires user-origin source) --read → dump preferences JSON --clear [id] → clear one or all --stats → short counts summary Preference values: always-ask | never-ask | ask-only-for-one-way. Stored at ~/.gstack/projects/{SLUG}/question-preferences.json. Safety contract (the core of Codex finding #16, profile-poisoning defense from docs/designs/PLAN_TUNING_V0.md §Security model): 1. One-way doors ALWAYS return ASK_NORMALLY from --check, regardless of user preference. User's never-ask is overridden with a visible safety note so the user knows why their preference didn't suppress the prompt. 2. --write requires an explicit `source` field: - Allowed: "plan-tune", "inline-user" - REJECTED with exit code 2: "inline-tool-output", "inline-file", "inline-file-content", "inline-unknown" Rejection is explicit ("profile poisoning defense") so the caller can log and surface the attempt. 3. free_text on --write is sanitized against injection patterns (ignore previous instructions, override:, system:, etc.) and newline-flattened. Each --write also appends a preference-set event to ~/.gstack/projects/{SLUG}/question-events.jsonl for derivation audit trail. 31 tests: - --check behavior (4): defaults, two-way, one-way (one-way overrides never-ask with safety note), unknown ids, missing arg - --check with prefs (5): never-ask on two-way → AUTO_DECIDE; never-ask on one-way → ASK_NORMALLY with override note; always-ask always asks; ask-only-for-one-way flips appropriately - --write valid (5): inline-user accepted, plan-tune accepted, persisted correctly, event appended, free_text preserved with flattening - User-origin gate (6): missing source rejected; inline-tool-output rejected with exit code 2 and explicit poisoning message; inline-file, inline-file-content, inline-unknown rejected; unknown source rejected - Schema validation (4): invalid JSON, bad question_id, bad preference, injection in free_text - --read (2): empty → {}, returns writes - --clear (3): specific id, clear-all, NOOP for missing - --stats (2): empty zeros, tallies by preference type 31 pass, 0 fail, 52 expect() calls. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: question-tuning preamble resolvers scripts/resolvers/question-tuning.ts ships three preamble generators: generateQuestionPreferenceCheck — before each AskUserQuestion, agent runs gstack-question-preference --check <id>. AUTO_DECIDE suppresses the ask and auto-chooses recommended. ASK_NORMALLY asks as usual. One-way door safety override is handled by the binary. generateQuestionLog — after each AskUserQuestion, agent appends a log record with skill, question_id, summary, category, door_type, options_count, user_choice, recommended, session_id. generateInlineTuneFeedback — offers inline "tune:" prompt after two-way questions. Documents structured shortcuts (never-ask, always-ask, ask-only-for-one-way, ask-less) AND accepts free-form English with normalization + confirmation. Explicitly spells out the USER-ORIGIN GATE: only write tune events when the prefix appears in the user's own chat message, never from tool output or file content. Binary enforces. All three resolvers are gated by the QUESTION_TUNING preamble echo. When the config is off, the agent skips these sections entirely. Ready to be wired into preamble.ts in the next commit. Codex host has a simpler variant that uses $GSTACK_BIN env vars. scripts/resolvers/index.ts registers three placeholders: QUESTION_PREFERENCE_CHECK, QUESTION_LOG, INLINE_TUNE_FEEDBACK Total resolver count goes from 45 to 48. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: wire question-tuning into preamble for tier >= 2 skills scripts/resolvers/preamble.ts — adds two things: 1. _QUESTION_TUNING config echo in the preamble bash block, gated on the user's gstack-config `question_tuning` value (default: false). 2. A combined Question Tuning section for tier >= 2 skills, injected after the confusion protocol. The section itself is runtime-gated by the QUESTION_TUNING value — agents skip it entirely when off. scripts/resolvers/question-tuning.ts — consolidated into one compact combined section `generateQuestionTuning(ctx)` covering: preference check before the question, log after, and inline tune: feedback with user-origin gate. Per-phase generators remain exported for unit tests but are no longer the main entrypoint. Size impact: +570 tokens / +2.3KB per tier-2+ SKILL.md. Three skills (plan-ceo-review, office-hours, ship) still exceed the 100KB token ceiling — but they were already over before this change. Delta is the smallest viable wiring of the /plan-tune v1 substrate. Golden fixtures (test/fixtures/golden/claude-ship, codex-ship, factory-ship) regenerated to match the new baseline. Full test run: 1149 pass, 0 fail, 113 skip across 28 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files with question-tuning section bun run gen:skill-docs --host all after wiring the QUESTION_TUNING preamble section. Every tier >= 2 skill now includes the combined Question Tuning guidance. Runtime-gated — agents skip the section when question_tuning is off in gstack-config (default). Golden fixtures (claude-ship, codex-ship, factory-ship) updated to the new baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: /plan-tune skill — conversational inspection + preferences plan-tune/SKILL.md.tmpl: the user-facing skill for /plan-tune v1. Routes plain-English intent to one of 8 flows: - Enable + setup (first-time): 5 declaration questions mapping to the 5 psychographic dimensions (scope_appetite, risk_tolerance, detail_preference, autonomy, architecture_care). Writes to developer-profile.json declared.*. - Inspect profile: plain-English rendering of declared + inferred + gap. Uses word bands (low/balanced/high) not raw floats. Shows vibe archetype when calibration gate is met. - Review question log: top-20 question frequencies with follow/override counts. Highlights override-heavy questions as candidates for never-ask. - Set a preference: normalizes "stop asking me about X" → never-ask, etc. Confirms ambiguous phrasings before writing via gstack-question-preference. - Edit declared profile: interprets free-form ("more boil-the-ocean") and CONFIRMS before mutating declared.* (trust boundary per Codex #15). - Show gap: declared vs inferred diff with plain-English severity bands (close / drift / mismatch). Never auto-updates declared from the gap. - Stats: preference counts + diversity/calibration status. - Enable / disable: gstack-config set question_tuning true|false. Design constraints enforced: - Plain English everywhere. No CLI subcommand syntax required. Shortcuts (`profile`, `vibe`, `stats`, `setup`) exist but optional. - user-origin gate on tune: writes. source: "plan-tune" for user-invoked /plan-tune; source: "inline-user" for inline tune: from other skills. - One-way doors override never-ask (safety, surfaced to user). - No behavior adaptation in v1 — this skill inspects and configures only. Generates plan-tune/SKILL.md at ~11.6k tokens, well under the 100KB ceiling. Generated for all hosts via `bun run gen:skill-docs --host all`. Full free test suite: 1149 pass, 0 fail, 113 skip across 28 files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: end-to-end pipeline + preamble injection coverage Added 6 tests to test/plan-tune.test.ts: Preamble injection (3 tests): - tier 2+ includes Question Tuning section with preference check, log, and user-origin gate language ('profile-poisoning defense', 'inline-user') - tier 1 does NOT include the prose section (QUESTION_TUNING bash echo still fires since it's in the bash block all tiers share) - codex host swaps binDir references to $GSTACK_BIN End-to-end pipeline (3 tests) — real binaries working together, not mocks: - Log 5 expand choices → --derive → profile shows scope_appetite > 0.5 (full log → registry lookup → signal map → normalization round-trip) - --write source: inline-tool-output rejected; --read confirms no pref was persisted (the profile-poisoning defense actually works end-to-end) - Migrate a 3-session legacy file; confirm legacy gstack-builder-profile shim still returns SESSION_COUNT: 3, TIER: welcome_back, CROSS_PROJECT: true test/plan-tune.test.ts now has 47 tests total. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: E2E test for /plan-tune plain-English inspection flow (gate tier) test/skill-e2e-plan-tune.test.ts — verifies /plan-tune correctly routes plain-English intent ("review the questions I've been asked") to the Review question log section without requiring CLI subcommand syntax. Seeds a synthetic question-log.jsonl with 3 entries exercising: - override behavior (user chose expand over recommended selective) - one-way door respect (user followed ship-test-failure-triage recommendation) - two-way override (user skipped recommended changelog polish) Invokes the skill via `claude -p` and asserts: - Agent surfaces >= 2 of 3 logged question_ids in output - Agent notices override/skip behavior from the log - Exit reason is success or error_max_turns (not agent-crash) Gate-tier because the core v1 DX promise is plain-English intent routing. If it requires memorized subcommands or breaks on natural language, that's a regression of the defining feature. Registered in test/helpers/touchfiles.ts with dependencies: - plan-tune/** (skill template + generated md) - scripts/question-registry.ts (required for log lookup) - scripts/psychographic-signals.ts, scripts/one-way-doors.ts (derive path) - bin/gstack-question-log, gstack-question-preference, gstack-developer-profile Skipped when EVALS_ENABLED is not set; runs on `bun run test:evals`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.19.0.0) — /plan-tune v1 Ships /plan-tune as observational substrate: typed question registry, dual-track developer profile (declared + inferred), explicit per-question preferences with user-origin gate, inline tune: feedback across every tier >= 2 skill, unified developer-profile.json with migration from builder-profile.jsonl. Scope rolled back from initial CEO EXPANSION plan after outside-voice review (Codex). 6 deferrals tracked as P0 TODOs with explicit acceptance criteria: E1 substrate wiring, E3 narrative/vibe, E4 blind-spot coach, E5 LANDED celebration, E6 auto-adjustment, E7 psychographic auto-decide. See docs/designs/PLAN_TUNING_V0.md for the full design record. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): harden Dockerfile.ci against transient Ubuntu mirror failures The CI image build failed with: E: Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/... Connection failed [IP: 91.189.92.22 80] ERROR: process "/bin/sh -c apt-get update && apt-get install ..." did not complete successfully: exit code: 100 archive.ubuntu.com periodically returns "connection refused" on individual regional mirrors. Without retry logic a single failed fetch nukes the whole Docker build. Three defenses, layered: 1. /etc/apt/apt.conf.d/80-retries — apt fetches each package up to 5 times with a 30s timeout. Handles per-package flakes. 2. Shell-loop retry around the whole apt-get step (x3, 10s sleep) — handles the case where apt-get update itself can't reach any mirror. 3. --retry 5 --retry-delay 5 --retry-connrefused on all curl fetches (bun install script, GitHub CLI keyring, NodeSource setup script). Applied to every apt-get and curl call in the Dockerfile. No behavior change on happy path — only kicks in when mirrors blip. Fixes the build-image job that was blocking CI on the /plan-tune PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add PLAN_TUNING_V1 + PACING_UPDATES_V0 design docs Captures the V1 design (ELI10 writing + LOC reframe) in docs/designs/PLAN_TUNING_V1.md and the extracted V1.1 pacing-overhaul plan in docs/designs/PACING_UPDATES_V0.md. V1 scope was reduced from the original bundled pacing + writing-style plan after three engineering-review passes revealed structural gaps in the pacing workstream that couldn't be closed via plan-text editing. TODOS.md P0 entry links to V1.1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: curated jargon list for V1 writing-style glossing Repo-owned list of ~50 high-frequency technical terms (idempotent, race condition, N+1, backpressure, etc.) that gstack glosses on first use in tier-≥2 skill output. Baked into generated SKILL.md prose at gen-skill-docs time. Terms not on this list are assumed plain-English enough. Contributions via PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(preamble): V1 Writing Style section + EXPLAIN_LEVEL echo + migration prompt Adds a new Writing Style section to tier-≥2 preamble output composing with the existing AskUserQuestion Format section. Six rules: jargon glossed on first use per skill invocation (from scripts/jargon-list.json), outcome- framed questions, short sentences, decisions close with user impact, gloss-on-first-use even if user pasted term, user-turn override for "be terse" requests. Baked conditionally (skip if EXPLAIN_LEVEL: terse). Adds EXPLAIN_LEVEL preamble echo using \${binDir} (host-portable matching V0 QUESTION_TUNING pattern). Adds WRITING_STYLE_PENDING echo reading a flag file written by the V0→V1 upgrade migration; on first post-upgrade skill run, the agent fires a one-time AskUserQuestion offering terse mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(gstack-config): validate explain_level + document in header Adds explain_level: default|terse to the annotated config header with a one-line description. Whitelists valid values; on set of an unknown value, prints a specific warning ("explain_level '\$VALUE' not recognized. Valid values: default, terse. Using default.") and writes the default value. Matches V1 preamble's EXPLAIN_LEVEL echo expectation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: V1 upgrade migration — writing-style opt-out prompt New migration script following existing v0.15.2.0.sh / v0.16.2.0.sh pattern. Writes a .writing-style-prompt-pending flag file on first run post-upgrade. The preamble's migration-prompt block reads the flag and fires a one-time AskUserQuestion offering the user a choice between the new default writing style and restoring V0 prose via \`gstack-config set explain_level terse\`. Idempotent via flag files; if the user has already set explain_level explicitly, counts as answered and skips. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: LOC reframe tooling — throughput comparison + README updater + scc installer Three new scripts: - scripts/garry-output-comparison.ts — enumerates Garry-authored commits in 2013 + 2026 on public repos, extracts ADDED lines from git diff, classifies as logical SLOC via scc --stdin (regex fallback if scc missing). Writes docs/throughput-2013-vs-2026.json with per-language breakdown + explicit caveats (public repos only, commit-style drift, private-work exclusion). - scripts/update-readme-throughput.ts — reads the JSON if present, replaces the README's <!-- GSTACK-THROUGHPUT-PLACEHOLDER --> anchor with the computed multiple (preserving the anchor for future runs). If JSON missing, writes GSTACK-THROUGHPUT-PENDING marker that CI rejects — forcing the build to run before commit. - scripts/setup-scc.sh — standalone OS-detecting installer for scc. Not a package.json dependency (95% of users never run throughput). Brew on macOS, apt on Linux, GitHub releases link on Windows. Two-string anchor pattern (PLACEHOLDER vs PENDING) prevents the pipeline from destroying its own update path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(retro): surface logical SLOC + weighted commits above raw LOC V1 reorders the /retro summary table to lead with features shipped, then commits + weighted commits (commits × files-touched capped at 20), then PRs merged, then logical SLOC added as the primary code-volume metric. Raw LOC stays present but is demoted to context. Rationale inline in the template: ten lines of a good fix is not less shipping than ten thousand lines of scaffold. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(v1): README hero reframe + writing-style + CHANGELOG + version bump to 1.0.0.0 README.md: - Hero removes "600,000+ lines of production code" framing; replaces with the computed 2013-vs-2026 pro-rata multiple (via <!-- GSTACK-THROUGHPUT-PLACEHOLDER --> anchor, filled by the update-readme-throughput build step). - Hiring callout: "ship real products at AI-coding speed" instead of "10K+ LOC/day." - New Writing Style section (~80 words) between Quick start and Install: "v1 prompts = simpler" framing, outcome-language example, terse-mode opt-out, pointer to /plan-tune. CLAUDE.md: one-paragraph Writing style (V1) note under project conventions, linking to preamble resolver + V1 design docs. CHANGELOG.md: V1 entry on top of v0.19.0.0 with user-facing narrative (what changes, how to opt out, for-contributors notes). Mentions scope reduction — pacing overhaul ships in V1.1. CONTRIBUTING.md: one-paragraph note on jargon-list.json maintenance (PR to add/remove terms; regenerate via gen:skill-docs). VERSION + package.json: bump to 1.0.0.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files + golden fixtures for V1 Mechanical regeneration from the updated templates in prior commits: - Writing Style section now appears in tier-≥2 skill output. - EXPLAIN_LEVEL + WRITING_STYLE_PENDING echoes in preamble bash. - V1 migration-prompt block fires conditionally on first upgrade. - Jargon list inlined into preamble prose at gen time. - Retro template's logical SLOC + weighted commits order applied. Regenerated for all 8 hosts via bun run gen:skill-docs --host all. Golden ship-skill fixtures refreshed from regenerated outputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: V1 gate coverage — writing-style resolver + config + jargon + migration + dormancy Six new gate-tier test files: - test/writing-style-resolver.test.ts — asserts Writing Style section is injected into tier-≥2 preamble, all 6 rules present, jargon list inlined, terse-mode gate condition present, Codex output uses \$GSTACK_BIN (not ~/.claude/), tier-1 does NOT get the section, migration-prompt block present. - test/explain-level-config.test.ts — gstack-config set/get round-trip for default + terse, unknown-value warns + defaults to default, header documents the key, round-trip across set→set→get. - test/jargon-list.test.ts — shape + ~50 terms + no duplicates (case-insensitive) + includes canonical high-signal terms. - test/v0-dormancy.test.ts — 5D dimension names + archetype names forbidden in default-mode tier-≥2 SKILL.md output, except for plan-tune and office-hours where they're load-bearing. - test/readme-throughput.test.ts — script replaces anchor with number on happy path, writes PENDING marker when JSON missing, CI gate asserts committed README contains no PENDING string. - test/upgrade-migration-v1.test.ts — fresh run writes pending flag, idempotent after user-answered, pre-existing explain_level counts as answered. All 95 V1 test-expect() calls pass. Full suite: 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: compute real 2013-vs-2026 throughput multiple (130.2×) Ran scripts/garry-output-comparison.ts across all 15 public garrytan/* repos. Aggregated results into docs/throughput-2013-vs-2026.json and ran scripts/update-readme-throughput.ts to replace the README placeholder. 2013 public activity: 2 commits, 2,384 logical lines added across 1 week, in 1 repo (zurb-foundation-wysihtml5 upstream contribution). 2026 public activity: 279 commits, 310,484 logical lines added across 17 active weeks, in 3 repos (gbrain, gstack, resend_robot). Multiples (public repos only, apples-to-apples): - Logical SLOC: 130.2× - Commits per active week: 8.2× - Raw lines added: 134.4× Private work at both eras (2013 Bookface at YC, Posterous-era code, 2026 internal tools) is excluded from this comparison. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat: 207× throughput multiple (with private repos + Bookface) Re-ran scripts/garry-output-comparison.ts across all 41 repos under garrytan/* (15 public + 26 private), including Bookface (YC's internal social network, 2013-era work). 2013 activity: 71 commits, 5,143 logical lines, 4 active repos (bookface, delicounter, tandong, zurb-foundation-wysihtml5) 2026 activity: 350 commits, 1,064,818 logical lines, 15 active repos (gbrain, gstack, gbrowser, tax-app, kumo, tenjin, autoemail, kitsune, easy-chromium-compiles, conductor-playground, garryslist-agent, baku, gstack-website, resend_robot, garryslist-brain) Multiples: - Logical SLOC: 207× (up from 130.2× when including private work) - Raw lines: 223× - Commits/active-week: 3.4× Stopped committing docs/throughput-2013-vs-2026.json — analysis is a local artifact, not repo state. Added docs/throughput-*.json to .gitignore. Full markdown analysis at ~/throughput-analysis-2026-04-18.md (local-only). README multiple is now hardcoded; re-run the script and edit manually when you want to refresh it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: run rate vs year-to-date throughput comparison Two separate numbers in the README hero: - Run rate: ~700× (9,859 logical lines/day in 2026 vs 14/day in 2013) - Year-to-date: 207× (2026 through April 18 already exceeds 2013 full year by 207×) Previous "207× pro-rata" framing mixed full-year 2013 vs partial-year 2026. Run rate is the apples-to-apples normalization; YTD is the "already produced" total. Both are honest; both are compelling; they measure different things. Analysis at ~/throughput-analysis-2026-04-18.md (local-only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(throughput): script natively computes to-date + run-rate multiples Enhanced scripts/garry-output-comparison.ts so both calculations come out of a single run instead of being reassembled ad-hoc in bash: PerYearResult now includes: - days_elapsed — 365 for past years, day-of-year for current - is_partial — flags the current (in-progress) year - per_day_rate — logical/raw/commits normalized by calendar day - annualized_projection — per_day_rate × 365 Output JSON's `multiples` now has two sibling blocks: - multiples.to_date — raw volume ratios (2026-YTD / 2013-full-year) - multiples.run_rate — per-day pace ratios (apples-to-apples) Back-compat: multiples.logical_lines_added still aliases to_date for older consumers reading the JSON. Updated README hero to cite both (picking up brain/* repo that was missed in the earlier aggregation pass): 2026 run rate: ~880× my 2013 pace (12,382 vs 14 logical lines/day) 2026 YTD: 260× the entire 2013 year Stderr summary now prints both multiples at the end of each run. Full analysis at ~/throughput-analysis-2026-04-18.md (local-only). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: ON_THE_LOC_CONTROVERSY methodology post + README link Long-form response to the "LOC is a meaningless vanity metric" critique. Covers: - The three branches of the LOC critique and which are right - Why logical SLOC (NCLOC) beats raw LOC as the honest measurement - Full method: author-scoped git diff, regex-classified added lines, aggregated across 41 public + private garrytan/* repos - Both calculations: to-date (260x) and run-rate (879x) - Steelman of the critics (greenfield-vs-maintenance, survivorship bias, quality-adjusted productivity, time-to-first-user) - Reproduction instructions Linked from README hero via a blockquote directly below the number. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * exclude: tax-app from throughput analysis (import-dominated history) tax-app's history is one commit of 104K logical lines — an initial import of a codebase, not authored work. Removing it to keep the comparison honest. Changes: - scripts/garry-output-comparison.ts: added EXCLUDED_REPOS constant with tax-app + a one-line rationale. The script now skips excluded repos with a stderr note and deletes any stale output JSON so aggregation loops don't pick up pre-exclusion numbers. - README hero: updated to 810× run rate + 240× YTD (were 880×/260×). Wording updated to "40 public + private repos ... after excluding repos dominated by imported code." - docs/ON_THE_LOC_CONTROVERSY.md: updated all numbers, added an "Exclusions" paragraph explaining tax-app, removed tax-app from the "shipped not WIP" example list. New numbers (2026 through day 108, without tax-app): - To-date: 240× logical SLOC (1,233,062 vs 5,143) - Run rate: 810× per-day pace (11,417 vs 14 logical/day) - Annualized: ~4.2M logical lines projected Future re-runs automatically skip tax-app. Add more exclusions to EXCLUDED_REPOS at the top of the script with a one-line rationale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: correct tax-app exclusion rationale tax-app is a demo app I built for an upcoming YC channel video, not an "import-dominated history" as the previous commit claimed. Excluded because it's not production shipping work, not because of an import commit. Updated rationale in scripts/garry-output-comparison.ts's EXCLUDED_REPOS constant, in docs/ON_THE_LOC_CONTROVERSY.md's method section + conclusion, and in the README hero wording ("one demo repo" vs the earlier "repos dominated by imported code"). Numbers unchanged — the exclusion itself is the same, just the reason. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: harden ON_THE_LOC_CONTROVERSY against Cramer + neckbeard critiques Reframes the thesis as "engineers can fly now" (amplification, not replacement) and fortifies the soft spots critics will attack. Added: - Flight-thesis opener: pilot vs walker, leverage not replacement. - Second deflation layer for AI verbosity (on top of NCLOC). Headline moves from 810x to 408x after generous 2x AI-boilerplate cut, with explicit sensitivity analysis showing the number is still large under pessimistic priors (5x → 162x, 10x → 81x, 100x impossible). - Weekly distribution check (kills "you had one burst week" attack). - Revert rate (2.0%) and post-merge fix rate (6.3%) with OSS comparables (K8s/Rails/Django band). Addresses "where are your error rates" directly. - Named production adoption signals (gstack 1000+ installs, gbrain beta, resend_robot paying API) with explicit concession that "shipped != used at scale" for most of the corpus. - Harder steelman: 5 specific concessions with quantified pivot points (e.g., "if 2013 baseline was 3.5x higher, 810x → 228x, still high"). Removed factual error: Posterous acquisition paragraph (Garry had already left Posterous by 2011, so the "Twitter bought our private repos" excuse for the 2013 corpus gap doesn't apply). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: update gstack/gbrain adoption numbers in LOC controversy post gstack: "1,000+ distinct project installations" → "tens of thousands of daily active users" (telemetry-reported, community tier, opt-in). gbrain: "small set of beta testers" → "hundreds of beta testers running it live." Both are the accurate current numbers. The concession paragraph below (about shipped != adopted at scale for the long-tail repos) still reads correctly since it's about the corpus as a whole, not gstack/gbrain specifically. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: reframe reproducibility note as OSS breakout flex "You'd need access to my private repos" → "Bookface and Posthaven are private, but gstack and gbrain are open-sourced with tens of thousands of GitHub stars and tens of thousands of confirmed regular users, among the most-used OSS projects in the world that didn't exist three months ago." Keeps the `gh repo list` command at the end for the actual reproducibility instruction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Rewrite LOC controversy post - Lead with concession (LOC is garbage, do the math anyway) - Preempt 14 lines/day meme with historical baselines (Brooks, Jones, McConnell) - Remove 'neckbeard' language throughout - Add slop-scan story (Ben Vinegar, 5.24 → 1.96, 62% cut) - David Cramer GUnit joke - Add testing philosophy section (the real unlock) - ASCII weekly distribution chart - gstack telemetry section with real numbers (15K installs, 305K invocations, 95.2% success) - Top skills usage chart - Pick-your-priors paragraph moved earlier (the killer) - Sharper close: run the script, show me your numbers * docs: four precision fixes on LOC controversy post 1. Citation fix. Kernighan didn't say anything about LOC-as-metric (that's the famous "aircraft building by weight" quote, commonly misattributed but actually Bill Gates). Replaced "Kernighan implied it before that" with the real Dijkstra quote ("lines produced" vs "lines spent" from EWD1036, with direct link) + the Gates quote. Verified via web search. 2. Slop-scan direction clarified. "(highest on his benchmark)" was ambiguous — could read as a brag. Now: "Higher score = more slop. He ran it on gstack and we scored 5.24, the worst he'd measured at the time." Then the 62% cut lands as an actual win. 3. Prose/chart skill-usage ordering now matches. Added /plan-eng-review (28,014) to the prose list so it doesn't conflict with the chart below it. 4. Cut the "David — I owe you one / GUnit" insider joke. Most readers won't connect Cramer → Sentry → GUnit naming. Ends the slop-scan paragraph on the stronger line: "Run `bun test` and watch 2,000+ tests pass." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: tighten four LOC post citations to match primary sources 1. Bill Gates quote: flagged as folklore-grade. Was "Bill Gates put it more memorably" (firm attribution). Now "The old line (widely attributed to Bill Gates, sourcing murky) puts it more memorably." The quote stands; honesty about attribution avoids the same misattribution trap we just fixed for Kernighan. 2. Capers Jones: "15-50 across thousands of projects" → "roughly 16-38 LOC/day across thousands of projects" — matches his actual published measurements (which also report as 325-750 LOC/month). 3. Steve McConnell: "10-50 for finished, tested, delivered code" was folklore. Replaced with his actual project-size-dependent range from Code Complete: "20-125 LOC/day for small projects (10K LOC) down to 1.5-25 for large projects (10M LOC) — it's size-dependent, not a single number." 4. Revert rate comparison: "Kubernetes, Rails, and Django historically run 1.5-3%" was unsourced. Replaced with "mature OSS codebases typically run 1-3%" + "run the same command on whatever you consider the bar and compare." No false specificity about which repos. Net: every quantitative citation in the post now matches primary-source figures or is explicitly flagged as folklore. Neckbeards can verify. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: drop Writing style section from README Was sitting in prime real estate between Quick start and Install — internal implementation detail, not something users need up-front. Existing coverage is enough: - Upgrade migration prompt notifies users on first post-upgrade run - CLAUDE.md has the contributor note - docs/designs/PLAN_TUNING_V1.md has the full design Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: collapse team-mode setup into one paste-and-go command Step 2 was three separate code blocks: setup --team, then team-init, then git add/commit. Mirrors Step 1's style now — one shell one-liner that does all three. Subshell (cd && ./setup --team) keeps the user in their repo pwd so team-init + git commit land in the right place. "Swap required for optional" moved to a one-liner below. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: move full-clone footnote from README to CONTRIBUTING The "Contributing or need full history?" note is for contributors, not for someone following the README install flow. Moved into CONTRIBUTING's Quick start section where it fits next to the existing clone command, with a tip to upgrade an existing shallow clone via \`git fetch --unshallow\`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: root <root@localhost>
885 lines
37 KiB
Cheetah
885 lines
37 KiB
Cheetah
---
|
||
name: retro
|
||
preamble-tier: 2
|
||
version: 2.0.0
|
||
description: |
|
||
Weekly engineering retrospective. Analyzes commit history, work patterns,
|
||
and code quality metrics with persistent history and trend tracking.
|
||
Team-aware: breaks down per-person contributions with praise and growth areas.
|
||
Use when asked to "weekly retro", "what did we ship", or "engineering retrospective".
|
||
Proactively suggest at the end of a work week or sprint. (gstack)
|
||
allowed-tools:
|
||
- Bash
|
||
- Read
|
||
- Write
|
||
- Glob
|
||
- AskUserQuestion
|
||
triggers:
|
||
- weekly retro
|
||
- what did we ship
|
||
- engineering retrospective
|
||
---
|
||
|
||
{{PREAMBLE}}
|
||
|
||
{{BASE_BRANCH_DETECT}}
|
||
|
||
# /retro — Weekly Engineering Retrospective
|
||
|
||
Generates a comprehensive engineering retrospective analyzing commit history, work patterns, and code quality metrics. Team-aware: identifies the user running the command, then analyzes every contributor with per-person praise and growth opportunities. Designed for a senior IC/CTO-level builder using Claude Code as a force multiplier.
|
||
|
||
## User-invocable
|
||
When the user types `/retro`, run this skill.
|
||
|
||
## Arguments
|
||
- `/retro` — default: last 7 days
|
||
- `/retro 24h` — last 24 hours
|
||
- `/retro 14d` — last 14 days
|
||
- `/retro 30d` — last 30 days
|
||
- `/retro compare` — compare current window vs prior same-length window
|
||
- `/retro compare 14d` — compare with explicit window
|
||
- `/retro global` — cross-project retro across all AI coding tools (7d default)
|
||
- `/retro global 14d` — cross-project retro with explicit window
|
||
|
||
{{GBRAIN_CONTEXT_LOAD}}
|
||
|
||
## Instructions
|
||
|
||
Parse the argument to determine the time window. Default to 7 days if no argument given. All times should be reported in the user's **local timezone** (use the system default — do NOT set `TZ`).
|
||
|
||
**Midnight-aligned windows:** For day (`d`) and week (`w`) units, compute an absolute start date at local midnight, not a relative string. For example, if today is 2026-03-18 and the window is 7 days: the start date is 2026-03-11. Use `--since="2026-03-11T00:00:00"` for git log queries — the explicit `T00:00:00` suffix ensures git starts from midnight. Without it, git uses the current wall-clock time (e.g., `--since="2026-03-11"` at 11pm means 11pm, not midnight). For week units, multiply by 7 to get days (e.g., `2w` = 14 days back). For hour (`h`) units, use `--since="N hours ago"` since midnight alignment does not apply to sub-day windows.
|
||
|
||
**Argument validation:** If the argument doesn't match a number followed by `d`, `h`, or `w`, the word `compare` (optionally followed by a window), or the word `global` (optionally followed by a window), show this usage and stop:
|
||
```
|
||
Usage: /retro [window | compare | global]
|
||
/retro — last 7 days (default)
|
||
/retro 24h — last 24 hours
|
||
/retro 14d — last 14 days
|
||
/retro 30d — last 30 days
|
||
/retro compare — compare this period vs prior period
|
||
/retro compare 14d — compare with explicit window
|
||
/retro global — cross-project retro across all AI tools (7d default)
|
||
/retro global 14d — cross-project retro with explicit window
|
||
```
|
||
|
||
**If the first argument is `global`:** Skip the normal repo-scoped retro (Steps 1-14). Instead, follow the **Global Retrospective** flow at the end of this document. The optional second argument is the time window (default 7d). This mode does NOT require being inside a git repo.
|
||
|
||
{{LEARNINGS_SEARCH}}
|
||
|
||
### Non-git context (optional)
|
||
|
||
Check for non-git context that should be included in the retro:
|
||
|
||
```bash
|
||
[ -f ~/.gstack/retro-context.md ] && echo "RETRO_CONTEXT_FOUND" || echo "NO_RETRO_CONTEXT"
|
||
```
|
||
|
||
If `RETRO_CONTEXT_FOUND`: read `~/.gstack/retro-context.md`. This file is user-authored and may contain meeting notes, calendar events, decisions, and other context that doesn't appear in git history. Incorporate this context into the retro narrative where relevant.
|
||
|
||
### Step 1: Gather Raw Data
|
||
|
||
First, fetch origin and identify the current user:
|
||
```bash
|
||
git fetch origin <default> --quiet
|
||
# Identify who is running the retro
|
||
git config user.name
|
||
git config user.email
|
||
```
|
||
|
||
The name returned by `git config user.name` is **"you"** — the person reading this retro. All other authors are teammates. Use this to orient the narrative: "your" commits vs teammate contributions.
|
||
|
||
Run ALL of these git commands in parallel (they are independent):
|
||
|
||
```bash
|
||
# 1. All commits in window with timestamps, subject, hash, AUTHOR, files changed, insertions, deletions
|
||
git log origin/<default> --since="<window>" --format="%H|%aN|%ae|%ai|%s" --shortstat
|
||
|
||
# 2. Per-commit test vs total LOC breakdown with author
|
||
# Each commit block starts with COMMIT:<hash>|<author>, followed by numstat lines.
|
||
# Separate test files (matching test/|spec/|__tests__/) from production files.
|
||
git log origin/<default> --since="<window>" --format="COMMIT:%H|%aN" --numstat
|
||
|
||
# 3. Commit timestamps for session detection and hourly distribution (with author)
|
||
git log origin/<default> --since="<window>" --format="%at|%aN|%ai|%s" | sort -n
|
||
|
||
# 4. Files most frequently changed (hotspot analysis)
|
||
git log origin/<default> --since="<window>" --format="" --name-only | grep -v '^$' | sort | uniq -c | sort -rn
|
||
|
||
# 5. PR/MR numbers from commit messages (GitHub #NNN, GitLab !NNN)
|
||
git log origin/<default> --since="<window>" --format="%s" | grep -oE '[#!][0-9]+' | sort -t'#' -k1 | uniq
|
||
|
||
# 6. Per-author file hotspots (who touches what)
|
||
git log origin/<default> --since="<window>" --format="AUTHOR:%aN" --name-only
|
||
|
||
# 7. Per-author commit counts (quick summary)
|
||
git shortlog origin/<default> --since="<window>" -sn --no-merges
|
||
|
||
# 8. Greptile triage history (if available)
|
||
cat ~/.gstack/greptile-history.md 2>/dev/null || true
|
||
|
||
# 9. TODOS.md backlog (if available)
|
||
cat TODOS.md 2>/dev/null || true
|
||
|
||
# 10. Test file count
|
||
find . -name '*.test.*' -o -name '*.spec.*' -o -name '*_test.*' -o -name '*_spec.*' 2>/dev/null | grep -v node_modules | wc -l
|
||
|
||
# 11. Regression test commits in window
|
||
git log origin/<default> --since="<window>" --oneline --grep="test(qa):" --grep="test(design):" --grep="test: coverage"
|
||
|
||
# 12. gstack skill usage telemetry (if available)
|
||
cat ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
|
||
|
||
# 12. Test files changed in window
|
||
git log origin/<default> --since="<window>" --format="" --name-only | grep -E '\.(test|spec)\.' | sort -u | wc -l
|
||
```
|
||
|
||
### Step 2: Compute Metrics
|
||
|
||
Calculate and present these metrics in a summary table:
|
||
|
||
| Metric | Value |
|
||
|--------|-------|
|
||
| **Features shipped** (from CHANGELOG + merged PR titles) | N |
|
||
| Commits to main | N |
|
||
| Weighted commits (commits × avg files-touched, capped at 20 per commit) | N |
|
||
| Contributors | N |
|
||
| PRs merged | N |
|
||
| **Logical SLOC added** (non-blank, non-comment — primary code-volume metric) | N |
|
||
| Raw LOC: insertions | N |
|
||
| Raw LOC: deletions | N |
|
||
| Raw LOC: net | N |
|
||
| Test LOC (insertions) | N |
|
||
| Test LOC ratio | N% |
|
||
| Version range | vX.Y.Z.W → vX.Y.Z.W |
|
||
| Active days | N |
|
||
| Detected sessions | N |
|
||
| Avg raw LOC/session-hour | N |
|
||
| Greptile signal | N% (Y catches, Z FPs) |
|
||
| Test Health | N total tests · M added this period · K regression tests |
|
||
|
||
**Metric order rationale (V1):** features shipped leads — what users got. Commits
|
||
and weighted commits reflect intent-to-ship. Logical SLOC added reflects real
|
||
new functionality. Raw LOC is demoted to context because AI inflates it; ten
|
||
lines of a good fix is not less shipping than ten thousand lines of scaffold.
|
||
See docs/designs/PLAN_TUNING_V1.md §Workstream C.
|
||
|
||
Then show a **per-author leaderboard** immediately below:
|
||
|
||
```
|
||
Contributor Commits +/- Top area
|
||
You (garry) 32 +2400/-300 browse/
|
||
alice 12 +800/-150 app/services/
|
||
bob 3 +120/-40 tests/
|
||
```
|
||
|
||
Sort by commits descending. The current user (from `git config user.name`) always appears first, labeled "You (name)".
|
||
|
||
**Greptile signal (if history exists):** Read `~/.gstack/greptile-history.md` (fetched in Step 1, command 8). Filter entries within the retro time window by date. Count entries by type: `fix`, `fp`, `already-fixed`. Compute signal ratio: `(fix + already-fixed) / (fix + already-fixed + fp)`. If no entries exist in the window or the file doesn't exist, skip the Greptile metric row. Skip unparseable lines silently.
|
||
|
||
**Backlog Health (if TODOS.md exists):** Read `TODOS.md` (fetched in Step 1, command 9). Compute:
|
||
- Total open TODOs (exclude items in `## Completed` section)
|
||
- P0/P1 count (critical/urgent items)
|
||
- P2 count (important items)
|
||
- Items completed this period (items in Completed section with dates within the retro window)
|
||
- Items added this period (cross-reference git log for commits that modified TODOS.md within the window)
|
||
|
||
Include in the metrics table:
|
||
```
|
||
| Backlog Health | N open (X P0/P1, Y P2) · Z completed this period |
|
||
```
|
||
|
||
If TODOS.md doesn't exist, skip the Backlog Health row.
|
||
|
||
**Skill Usage (if analytics exist):** Read `~/.gstack/analytics/skill-usage.jsonl` if it exists. Filter entries within the retro time window by `ts` field. Separate skill activations (no `event` field) from hook fires (`event: "hook_fire"`). Aggregate by skill name. Present as:
|
||
|
||
```
|
||
| Skill Usage | /ship(12) /qa(8) /review(5) · 3 safety hook fires |
|
||
```
|
||
|
||
If the JSONL file doesn't exist or has no entries in the window, skip the Skill Usage row.
|
||
|
||
**Eureka Moments (if logged):** Read `~/.gstack/analytics/eureka.jsonl` if it exists. Filter entries within the retro time window by `ts` field. For each eureka moment, show the skill that flagged it, the branch, and a one-line summary of the insight. Present as:
|
||
|
||
```
|
||
| Eureka Moments | 2 this period |
|
||
```
|
||
|
||
If moments exist, list them:
|
||
```
|
||
EUREKA /office-hours (branch: garrytan/auth-rethink): "Session tokens don't need server storage — browser crypto API makes client-side JWT validation viable"
|
||
EUREKA /plan-eng-review (branch: garrytan/cache-layer): "Redis isn't needed here — Bun's built-in LRU cache handles this workload"
|
||
```
|
||
|
||
If the JSONL file doesn't exist or has no entries in the window, skip the Eureka Moments row.
|
||
|
||
### Step 3: Commit Time Distribution
|
||
|
||
Show hourly histogram in local time using bar chart:
|
||
|
||
```
|
||
Hour Commits ████████████████
|
||
00: 4 ████
|
||
07: 5 █████
|
||
...
|
||
```
|
||
|
||
Identify and call out:
|
||
- Peak hours
|
||
- Dead zones
|
||
- Whether pattern is bimodal (morning/evening) or continuous
|
||
- Late-night coding clusters (after 10pm)
|
||
|
||
### Step 4: Work Session Detection
|
||
|
||
Detect sessions using **45-minute gap** threshold between consecutive commits. For each session report:
|
||
- Start/end time (Pacific)
|
||
- Number of commits
|
||
- Duration in minutes
|
||
|
||
Classify sessions:
|
||
- **Deep sessions** (50+ min)
|
||
- **Medium sessions** (20-50 min)
|
||
- **Micro sessions** (<20 min, typically single-commit fire-and-forget)
|
||
|
||
Calculate:
|
||
- Total active coding time (sum of session durations)
|
||
- Average session length
|
||
- LOC per hour of active time
|
||
|
||
### Step 5: Commit Type Breakdown
|
||
|
||
Categorize by conventional commit prefix (feat/fix/refactor/test/chore/docs). Show as percentage bar:
|
||
|
||
```
|
||
feat: 20 (40%) ████████████████████
|
||
fix: 27 (54%) ███████████████████████████
|
||
refactor: 2 ( 4%) ██
|
||
```
|
||
|
||
Flag if fix ratio exceeds 50% — this signals a "ship fast, fix fast" pattern that may indicate review gaps.
|
||
|
||
### Step 6: Hotspot Analysis
|
||
|
||
Show top 10 most-changed files. Flag:
|
||
- Files changed 5+ times (churn hotspots)
|
||
- Test files vs production files in the hotspot list
|
||
- VERSION/CHANGELOG frequency (version discipline indicator)
|
||
|
||
### Step 7: PR Size Distribution
|
||
|
||
From commit diffs, estimate PR sizes and bucket them:
|
||
- **Small** (<100 LOC)
|
||
- **Medium** (100-500 LOC)
|
||
- **Large** (500-1500 LOC)
|
||
- **XL** (1500+ LOC)
|
||
|
||
### Step 8: Focus Score + Ship of the Week
|
||
|
||
**Focus score:** Calculate the percentage of commits touching the single most-changed top-level directory (e.g., `app/services/`, `app/views/`). Higher score = deeper focused work. Lower score = scattered context-switching. Report as: "Focus score: 62% (app/services/)"
|
||
|
||
**Ship of the week:** Auto-identify the single highest-LOC PR in the window. Highlight it:
|
||
- PR number and title
|
||
- LOC changed
|
||
- Why it matters (infer from commit messages and files touched)
|
||
|
||
### Step 9: Team Member Analysis
|
||
|
||
For each contributor (including the current user), compute:
|
||
|
||
1. **Commits and LOC** — total commits, insertions, deletions, net LOC
|
||
2. **Areas of focus** — which directories/files they touched most (top 3)
|
||
3. **Commit type mix** — their personal feat/fix/refactor/test breakdown
|
||
4. **Session patterns** — when they code (their peak hours), session count
|
||
5. **Test discipline** — their personal test LOC ratio
|
||
6. **Biggest ship** — their single highest-impact commit or PR in the window
|
||
|
||
**For the current user ("You"):** This section gets the deepest treatment. Include all the detail from the solo retro — session analysis, time patterns, focus score. Frame it in first person: "Your peak hours...", "Your biggest ship..."
|
||
|
||
**For each teammate:** Write 2-3 sentences covering what they worked on and their pattern. Then:
|
||
|
||
- **Praise** (1-2 specific things): Anchor in actual commits. Not "great work" — say exactly what was good. Examples: "Shipped the entire auth middleware rewrite in 3 focused sessions with 45% test coverage", "Every PR under 200 LOC — disciplined decomposition."
|
||
- **Opportunity for growth** (1 specific thing): Frame as a leveling-up suggestion, not criticism. Anchor in actual data. Examples: "Test ratio was 12% this week — adding test coverage to the payment module before it gets more complex would pay off", "5 fix commits on the same file suggest the original PR could have used a review pass."
|
||
|
||
**If only one contributor (solo repo):** Skip the team breakdown and proceed as before — the retro is personal.
|
||
|
||
**If there are Co-Authored-By trailers:** Parse `Co-Authored-By:` lines in commit messages. Credit those authors for the commit alongside the primary author. Note AI co-authors (e.g., `noreply@anthropic.com`) but do not include them as team members — instead, track "AI-assisted commits" as a separate metric.
|
||
|
||
{{LEARNINGS_LOG}}
|
||
|
||
{{GBRAIN_SAVE_RESULTS}}
|
||
|
||
### Step 10: Week-over-Week Trends (if window >= 14d)
|
||
|
||
If the time window is 14 days or more, split into weekly buckets and show trends:
|
||
- Commits per week (total and per-author)
|
||
- LOC per week
|
||
- Test ratio per week
|
||
- Fix ratio per week
|
||
- Session count per week
|
||
|
||
### Step 11: Streak Tracking
|
||
|
||
Count consecutive days with at least 1 commit to origin/<default>, going back from today. Track both team streak and personal streak:
|
||
|
||
```bash
|
||
# Team streak: all unique commit dates (local time) — no hard cutoff
|
||
git log origin/<default> --format="%ad" --date=format:"%Y-%m-%d" | sort -u
|
||
|
||
# Personal streak: only the current user's commits
|
||
git log origin/<default> --author="<user_name>" --format="%ad" --date=format:"%Y-%m-%d" | sort -u
|
||
```
|
||
|
||
Count backward from today — how many consecutive days have at least one commit? This queries the full history so streaks of any length are reported accurately. Display both:
|
||
- "Team shipping streak: 47 consecutive days"
|
||
- "Your shipping streak: 32 consecutive days"
|
||
|
||
### Step 12: Load History & Compare
|
||
|
||
Before saving the new snapshot, check for prior retro history:
|
||
|
||
```bash
|
||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||
ls -t .context/retros/*.json 2>/dev/null
|
||
```
|
||
|
||
**If prior retros exist:** Load the most recent one using the Read tool. Calculate deltas for key metrics and include a **Trends vs Last Retro** section:
|
||
```
|
||
Last Now Delta
|
||
Test ratio: 22% → 41% ↑19pp
|
||
Sessions: 10 → 14 ↑4
|
||
LOC/hour: 200 → 350 ↑75%
|
||
Fix ratio: 54% → 30% ↓24pp (improving)
|
||
Commits: 32 → 47 ↑47%
|
||
Deep sessions: 3 → 5 ↑2
|
||
```
|
||
|
||
**If no prior retros exist:** Skip the comparison section and append: "First retro recorded — run again next week to see trends."
|
||
|
||
### Step 13: Save Retro History
|
||
|
||
After computing all metrics (including streak) and loading any prior history for comparison, save a JSON snapshot:
|
||
|
||
```bash
|
||
mkdir -p .context/retros
|
||
```
|
||
|
||
Determine the next sequence number for today (substitute the actual date for `$(date +%Y-%m-%d)`):
|
||
```bash
|
||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||
# Count existing retros for today to get next sequence number
|
||
today=$(date +%Y-%m-%d)
|
||
existing=$(ls .context/retros/${today}-*.json 2>/dev/null | wc -l | tr -d ' ')
|
||
next=$((existing + 1))
|
||
# Save as .context/retros/${today}-${next}.json
|
||
```
|
||
|
||
Use the Write tool to save the JSON file with this schema:
|
||
```json
|
||
{
|
||
"date": "2026-03-08",
|
||
"window": "7d",
|
||
"metrics": {
|
||
"commits": 47,
|
||
"contributors": 3,
|
||
"prs_merged": 12,
|
||
"insertions": 3200,
|
||
"deletions": 800,
|
||
"net_loc": 2400,
|
||
"test_loc": 1300,
|
||
"test_ratio": 0.41,
|
||
"active_days": 6,
|
||
"sessions": 14,
|
||
"deep_sessions": 5,
|
||
"avg_session_minutes": 42,
|
||
"loc_per_session_hour": 350,
|
||
"feat_pct": 0.40,
|
||
"fix_pct": 0.30,
|
||
"peak_hour": 22,
|
||
"ai_assisted_commits": 32
|
||
},
|
||
"authors": {
|
||
"Garry Tan": { "commits": 32, "insertions": 2400, "deletions": 300, "test_ratio": 0.41, "top_area": "browse/" },
|
||
"Alice": { "commits": 12, "insertions": 800, "deletions": 150, "test_ratio": 0.35, "top_area": "app/services/" }
|
||
},
|
||
"version_range": ["1.16.0.0", "1.16.1.0"],
|
||
"streak_days": 47,
|
||
"tweetable": "Week of Mar 1: 47 commits (3 contributors), 3.2k LOC, 38% tests, 12 PRs, peak: 10pm",
|
||
"greptile": {
|
||
"fixes": 3,
|
||
"fps": 1,
|
||
"already_fixed": 2,
|
||
"signal_pct": 83
|
||
}
|
||
}
|
||
```
|
||
|
||
**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. Only include the `test_health` field if test files were found (command 10 returns > 0). If any has no data, omit the field entirely.
|
||
|
||
Include test health data in the JSON when test files exist:
|
||
```json
|
||
"test_health": {
|
||
"total_test_files": 47,
|
||
"tests_added_this_period": 5,
|
||
"regression_test_commits": 3,
|
||
"test_files_changed": 8
|
||
}
|
||
```
|
||
|
||
Include backlog data in the JSON when TODOS.md exists:
|
||
```json
|
||
"backlog": {
|
||
"total_open": 28,
|
||
"p0_p1": 2,
|
||
"p2": 8,
|
||
"completed_this_period": 3,
|
||
"added_this_period": 1
|
||
}
|
||
```
|
||
|
||
### Step 14: Write the Narrative
|
||
|
||
Structure the output as:
|
||
|
||
---
|
||
|
||
**Tweetable summary** (first line, before everything else):
|
||
```
|
||
Week of Mar 1: 47 commits (3 contributors), 3.2k LOC, 38% tests, 12 PRs, peak: 10pm | Streak: 47d
|
||
```
|
||
|
||
## Engineering Retro: [date range]
|
||
|
||
### Summary Table
|
||
(from Step 2)
|
||
|
||
### Trends vs Last Retro
|
||
(from Step 11, loaded before save — skip if first retro)
|
||
|
||
### Time & Session Patterns
|
||
(from Steps 3-4)
|
||
|
||
Narrative interpreting what the team-wide patterns mean:
|
||
- When the most productive hours are and what drives them
|
||
- Whether sessions are getting longer or shorter over time
|
||
- Estimated hours per day of active coding (team aggregate)
|
||
- Notable patterns: do team members code at the same time or in shifts?
|
||
|
||
### Shipping Velocity
|
||
(from Steps 5-7)
|
||
|
||
Narrative covering:
|
||
- Commit type mix and what it reveals
|
||
- PR size distribution and what it reveals about shipping cadence
|
||
- Fix-chain detection (sequences of fix commits on the same subsystem)
|
||
- Version bump discipline
|
||
|
||
### Code Quality Signals
|
||
- Test LOC ratio trend
|
||
- Hotspot analysis (are the same files churning?)
|
||
- Greptile signal ratio and trend (if history exists): "Greptile: X% signal (Y valid catches, Z false positives)"
|
||
|
||
### Test Health
|
||
- Total test files: N (from command 10)
|
||
- Tests added this period: M (from command 12 — test files changed)
|
||
- Regression test commits: list `test(qa):` and `test(design):` and `test: coverage` commits from command 11
|
||
- If prior retro exists and has `test_health`: show delta "Test count: {last} → {now} (+{delta})"
|
||
- If test ratio < 20%: flag as growth area — "100% test coverage is the goal. Tests make vibe coding safe."
|
||
|
||
### Plan Completion
|
||
Check review JSONL logs for plan completion data from /ship runs this period:
|
||
|
||
```bash
|
||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||
eval "$(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)"
|
||
cat ~/.gstack/projects/$SLUG/*-reviews.jsonl 2>/dev/null | grep '"skill":"ship"' | grep '"plan_items_total"' || echo "NO_PLAN_DATA"
|
||
```
|
||
|
||
If plan completion data exists within the retro time window:
|
||
- Count branches shipped with plans (entries that have `plan_items_total` > 0)
|
||
- Compute average completion: sum of `plan_items_done` / sum of `plan_items_total`
|
||
- Identify most-skipped item category if data supports it
|
||
|
||
Output:
|
||
```
|
||
Plan Completion This Period:
|
||
{N} branches shipped with plans
|
||
Average completion: {X}% ({done}/{total} items)
|
||
```
|
||
|
||
If no plan data exists, skip this section silently.
|
||
|
||
### Focus & Highlights
|
||
(from Step 8)
|
||
- Focus score with interpretation
|
||
- Ship of the week callout
|
||
|
||
### Your Week (personal deep-dive)
|
||
(from Step 9, for the current user only)
|
||
|
||
This is the section the user cares most about. Include:
|
||
- Their personal commit count, LOC, test ratio
|
||
- Their session patterns and peak hours
|
||
- Their focus areas
|
||
- Their biggest ship
|
||
- **What you did well** (2-3 specific things anchored in commits)
|
||
- **Where to level up** (1-2 specific, actionable suggestions)
|
||
|
||
### Team Breakdown
|
||
(from Step 9, for each teammate — skip if solo repo)
|
||
|
||
For each teammate (sorted by commits descending), write a section:
|
||
|
||
#### [Name]
|
||
- **What they shipped**: 2-3 sentences on their contributions, areas of focus, and commit patterns
|
||
- **Praise**: 1-2 specific things they did well, anchored in actual commits. Be genuine — what would you actually say in a 1:1? Examples:
|
||
- "Cleaned up the entire auth module in 3 small, reviewable PRs — textbook decomposition"
|
||
- "Added integration tests for every new endpoint, not just happy paths"
|
||
- "Fixed the N+1 query that was causing 2s load times on the dashboard"
|
||
- **Opportunity for growth**: 1 specific, constructive suggestion. Frame as investment, not criticism. Examples:
|
||
- "Test coverage on the payment module is at 8% — worth investing in before the next feature lands on top of it"
|
||
- "Most commits land in a single burst — spacing work across the day could reduce context-switching fatigue"
|
||
- "All commits land between 1-4am — sustainable pace matters for code quality long-term"
|
||
|
||
**AI collaboration note:** If many commits have `Co-Authored-By` AI trailers (e.g., Claude, Copilot), note the AI-assisted commit percentage as a team metric. Frame it neutrally — "N% of commits were AI-assisted" — without judgment.
|
||
|
||
### Top 3 Team Wins
|
||
Identify the 3 highest-impact things shipped in the window across the whole team. For each:
|
||
- What it was
|
||
- Who shipped it
|
||
- Why it matters (product/architecture impact)
|
||
|
||
### 3 Things to Improve
|
||
Specific, actionable, anchored in actual commits. Mix personal and team-level suggestions. Phrase as "to get even better, the team could..."
|
||
|
||
### 3 Habits for Next Week
|
||
Small, practical, realistic. Each must be something that takes <5 minutes to adopt. At least one should be team-oriented (e.g., "review each other's PRs same-day").
|
||
|
||
### Week-over-Week Trends
|
||
(if applicable, from Step 10)
|
||
|
||
---
|
||
|
||
## Global Retrospective Mode
|
||
|
||
When the user runs `/retro global` (or `/retro global 14d`), follow this flow instead of the repo-scoped Steps 1-14. This mode works from any directory — it does NOT require being inside a git repo.
|
||
|
||
### Global Step 1: Compute time window
|
||
|
||
Same midnight-aligned logic as the regular retro. Default 7d. The second argument after `global` is the window (e.g., `14d`, `30d`, `24h`).
|
||
|
||
### Global Step 2: Run discovery
|
||
|
||
Locate and run the discovery script using this fallback chain:
|
||
|
||
```bash
|
||
DISCOVER_BIN=""
|
||
[ -x ~/.claude/skills/gstack/bin/gstack-global-discover ] && DISCOVER_BIN=~/.claude/skills/gstack/bin/gstack-global-discover
|
||
[ -z "$DISCOVER_BIN" ] && [ -x .claude/skills/gstack/bin/gstack-global-discover ] && DISCOVER_BIN=.claude/skills/gstack/bin/gstack-global-discover
|
||
[ -z "$DISCOVER_BIN" ] && which gstack-global-discover >/dev/null 2>&1 && DISCOVER_BIN=$(which gstack-global-discover)
|
||
[ -z "$DISCOVER_BIN" ] && [ -f bin/gstack-global-discover.ts ] && DISCOVER_BIN="bun run bin/gstack-global-discover.ts"
|
||
echo "DISCOVER_BIN: $DISCOVER_BIN"
|
||
```
|
||
|
||
If no binary is found, tell the user: "Discovery script not found. Run `bun run build` in the gstack directory to compile it." and stop.
|
||
|
||
Run the discovery:
|
||
```bash
|
||
$DISCOVER_BIN --since "<window>" --format json 2>/tmp/gstack-discover-stderr
|
||
```
|
||
|
||
Read the stderr output from `/tmp/gstack-discover-stderr` for diagnostic info. Parse the JSON output from stdout.
|
||
|
||
If `total_sessions` is 0, say: "No AI coding sessions found in the last <window>. Try a longer window: `/retro global 30d`" and stop.
|
||
|
||
### Global Step 3: Run git log on each discovered repo
|
||
|
||
For each repo in the discovery JSON's `repos` array, find the first valid path in `paths[]` (directory exists with `.git/`). If no valid path exists, skip the repo and note it.
|
||
|
||
**For local-only repos** (where `remote` starts with `local:`): skip `git fetch` and use the local default branch. Use `git log HEAD` instead of `git log origin/$DEFAULT`.
|
||
|
||
**For repos with remotes:**
|
||
|
||
```bash
|
||
git -C <path> fetch origin --quiet 2>/dev/null
|
||
```
|
||
|
||
Detect the default branch for each repo: first try `git symbolic-ref refs/remotes/origin/HEAD`, then check common branch names (`main`, `master`), then fall back to `git rev-parse --abbrev-ref HEAD`. Use the detected branch as `<default>` in the commands below.
|
||
|
||
```bash
|
||
# Commits with stats
|
||
git -C <path> log origin/$DEFAULT --since="<start_date>T00:00:00" --format="%H|%aN|%ai|%s" --shortstat
|
||
|
||
# Commit timestamps for session detection, streak, and context switching
|
||
git -C <path> log origin/$DEFAULT --since="<start_date>T00:00:00" --format="%at|%aN|%ai|%s" | sort -n
|
||
|
||
# Per-author commit counts
|
||
git -C <path> shortlog origin/$DEFAULT --since="<start_date>T00:00:00" -sn --no-merges
|
||
|
||
# PR/MR numbers from commit messages (GitHub #NNN, GitLab !NNN)
|
||
git -C <path> log origin/$DEFAULT --since="<start_date>T00:00:00" --format="%s" | grep -oE '[#!][0-9]+' | sort -t'#' -k1 | uniq
|
||
```
|
||
|
||
For repos that fail (deleted paths, network errors): skip and note "N repos could not be reached."
|
||
|
||
### Global Step 4: Compute global shipping streak
|
||
|
||
For each repo, get commit dates (capped at 365 days):
|
||
|
||
```bash
|
||
git -C <path> log origin/$DEFAULT --since="365 days ago" --format="%ad" --date=format:"%Y-%m-%d" | sort -u
|
||
```
|
||
|
||
Union all dates across all repos. Count backward from today — how many consecutive days have at least one commit to ANY repo? If the streak hits 365 days, display as "365+ days".
|
||
|
||
### Global Step 5: Compute context switching metric
|
||
|
||
From the commit timestamps gathered in Step 3, group by date. For each date, count how many distinct repos had commits that day. Report:
|
||
- Average repos/day
|
||
- Maximum repos/day
|
||
- Which days were focused (1 repo) vs. fragmented (3+ repos)
|
||
|
||
### Global Step 6: Per-tool productivity patterns
|
||
|
||
From the discovery JSON, analyze tool usage patterns:
|
||
- Which AI tool is used for which repos (exclusive vs. shared)
|
||
- Session count per tool
|
||
- Behavioral patterns (e.g., "Codex used exclusively for myapp, Claude Code for everything else")
|
||
|
||
### Global Step 7: Aggregate and generate narrative
|
||
|
||
Structure the output with the **shareable personal card first**, then the full
|
||
team/project breakdown below. The personal card is designed to be screenshot-friendly
|
||
— everything someone would want to share on X/Twitter in one clean block.
|
||
|
||
---
|
||
|
||
**Tweetable summary** (first line, before everything else):
|
||
```
|
||
Week of Mar 14: 5 projects, 138 commits, 250k LOC across 5 repos | 48 AI sessions | Streak: 52d 🔥
|
||
```
|
||
|
||
## 🚀 Your Week: [user name] — [date range]
|
||
|
||
This section is the **shareable personal card**. It contains ONLY the current user's
|
||
stats — no team data, no project breakdowns. Designed to screenshot and post.
|
||
|
||
Use the user identity from `git config user.name` to filter all per-repo git data.
|
||
Aggregate across all repos to compute personal totals.
|
||
|
||
Render as a single visually clean block. Left border only — no right border (LLMs
|
||
can't align right borders reliably). Pad repo names to the longest name so columns
|
||
align cleanly. Never truncate project names.
|
||
|
||
```
|
||
╔═══════════════════════════════════════════════════════════════
|
||
║ [USER NAME] — Week of [date]
|
||
╠═══════════════════════════════════════════════════════════════
|
||
║
|
||
║ [N] commits across [M] projects
|
||
║ +[X]k LOC added · [Y]k LOC deleted · [Z]k net
|
||
║ [N] AI coding sessions (CC: X, Codex: Y, Gemini: Z)
|
||
║ [N]-day shipping streak 🔥
|
||
║
|
||
║ PROJECTS
|
||
║ ─────────────────────────────────────────────────────────
|
||
║ [repo_name_full] [N] commits +[X]k LOC [solo/team]
|
||
║ [repo_name_full] [N] commits +[X]k LOC [solo/team]
|
||
║ [repo_name_full] [N] commits +[X]k LOC [solo/team]
|
||
║
|
||
║ SHIP OF THE WEEK
|
||
║ [PR title] — [LOC] lines across [N] files
|
||
║
|
||
║ TOP WORK
|
||
║ • [1-line description of biggest theme]
|
||
║ • [1-line description of second theme]
|
||
║ • [1-line description of third theme]
|
||
║
|
||
║ Powered by gstack
|
||
╚═══════════════════════════════════════════════════════════════
|
||
```
|
||
|
||
**Rules for the personal card:**
|
||
- Only show repos where the user has commits. Skip repos with 0 commits.
|
||
- Sort repos by user's commit count descending.
|
||
- **Never truncate repo names.** Use the full repo name (e.g., `analyze_transcripts`
|
||
not `analyze_trans`). Pad the name column to the longest repo name so all columns
|
||
align. If names are long, widen the box — the box width adapts to content.
|
||
- For LOC, use "k" formatting for thousands (e.g., "+64.0k" not "+64010").
|
||
- Role: "solo" if user is the only contributor, "team" if others contributed.
|
||
- Ship of the Week: the user's single highest-LOC PR across ALL repos.
|
||
- Top Work: 3 bullet points summarizing the user's major themes, inferred from
|
||
commit messages. Not individual commits — synthesize into themes.
|
||
E.g., "Built /retro global — cross-project retrospective with AI session discovery"
|
||
not "feat: gstack-global-discover" + "feat: /retro global template".
|
||
- The card must be self-contained. Someone seeing ONLY this block should understand
|
||
the user's week without any surrounding context.
|
||
- Do NOT include team members, project totals, or context switching data here.
|
||
|
||
**Personal streak:** Use the user's own commits across all repos (filtered by
|
||
`--author`) to compute a personal streak, separate from the team streak.
|
||
|
||
---
|
||
|
||
## Global Engineering Retro: [date range]
|
||
|
||
Everything below is the full analysis — team data, project breakdowns, patterns.
|
||
This is the "deep dive" that follows the shareable card.
|
||
|
||
### All Projects Overview
|
||
| Metric | Value |
|
||
|--------|-------|
|
||
| Projects active | N |
|
||
| Total commits (all repos, all contributors) | N |
|
||
| Total LOC | +N / -N |
|
||
| AI coding sessions | N (CC: X, Codex: Y, Gemini: Z) |
|
||
| Active days | N |
|
||
| Global shipping streak (any contributor, any repo) | N consecutive days |
|
||
| Context switches/day | N avg (max: M) |
|
||
|
||
### Per-Project Breakdown
|
||
For each repo (sorted by commits descending):
|
||
- Repo name (with % of total commits)
|
||
- Commits, LOC, PRs merged, top contributor
|
||
- Key work (inferred from commit messages)
|
||
- AI sessions by tool
|
||
|
||
**Your Contributions** (sub-section within each project):
|
||
For each project, add a "Your contributions" block showing the current user's
|
||
personal stats within that repo. Use the user identity from `git config user.name`
|
||
to filter. Include:
|
||
- Your commits / total commits (with %)
|
||
- Your LOC (+insertions / -deletions)
|
||
- Your key work (inferred from YOUR commit messages only)
|
||
- Your commit type mix (feat/fix/refactor/chore/docs breakdown)
|
||
- Your biggest ship in this repo (highest-LOC commit or PR)
|
||
|
||
If the user is the only contributor, say "Solo project — all commits are yours."
|
||
If the user has 0 commits in a repo (team project they didn't touch this period),
|
||
say "No commits this period — [N] AI sessions only." and skip the breakdown.
|
||
|
||
Format:
|
||
```
|
||
**Your contributions:** 47/244 commits (19%), +4.2k/-0.3k LOC
|
||
Key work: Writer Chat, email blocking, security hardening
|
||
Biggest ship: PR #605 — Writer Chat eats the admin bar (2,457 ins, 46 files)
|
||
Mix: feat(3) fix(2) chore(1)
|
||
```
|
||
|
||
### Cross-Project Patterns
|
||
- Time allocation across projects (% breakdown, use YOUR commits not total)
|
||
- Peak productivity hours aggregated across all repos
|
||
- Focused vs. fragmented days
|
||
- Context switching trends
|
||
|
||
### Tool Usage Analysis
|
||
Per-tool breakdown with behavioral patterns:
|
||
- Claude Code: N sessions across M repos — patterns observed
|
||
- Codex: N sessions across M repos — patterns observed
|
||
- Gemini: N sessions across M repos — patterns observed
|
||
|
||
### Ship of the Week (Global)
|
||
Highest-impact PR across ALL projects. Identify by LOC and commit messages.
|
||
|
||
### 3 Cross-Project Insights
|
||
What the global view reveals that no single-repo retro could show.
|
||
|
||
### 3 Habits for Next Week
|
||
Considering the full cross-project picture.
|
||
|
||
---
|
||
|
||
### Global Step 8: Load history & compare
|
||
|
||
```bash
|
||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||
ls -t ~/.gstack/retros/global-*.json 2>/dev/null | head -5
|
||
```
|
||
|
||
**Only compare against a prior retro with the same `window` value** (e.g., 7d vs 7d). If the most recent prior retro has a different window, skip comparison and note: "Prior global retro used a different window — skipping comparison."
|
||
|
||
If a matching prior retro exists, load it with the Read tool. Show a **Trends vs Last Global Retro** table with deltas for key metrics: total commits, LOC, sessions, streak, context switches/day.
|
||
|
||
If no prior global retros exist, append: "First global retro recorded — run again next week to see trends."
|
||
|
||
### Global Step 9: Save snapshot
|
||
|
||
```bash
|
||
mkdir -p ~/.gstack/retros
|
||
```
|
||
|
||
Determine the next sequence number for today:
|
||
```bash
|
||
setopt +o nomatch 2>/dev/null || true # zsh compat
|
||
today=$(date +%Y-%m-%d)
|
||
existing=$(ls ~/.gstack/retros/global-${today}-*.json 2>/dev/null | wc -l | tr -d ' ')
|
||
next=$((existing + 1))
|
||
```
|
||
|
||
Use the Write tool to save JSON to `~/.gstack/retros/global-${today}-${next}.json`:
|
||
|
||
```json
|
||
{
|
||
"type": "global",
|
||
"date": "2026-03-21",
|
||
"window": "7d",
|
||
"projects": [
|
||
{
|
||
"name": "gstack",
|
||
"remote": "<detected from git remote get-url origin, normalized to HTTPS>",
|
||
"commits": 47,
|
||
"insertions": 3200,
|
||
"deletions": 800,
|
||
"sessions": { "claude_code": 15, "codex": 3, "gemini": 0 }
|
||
}
|
||
],
|
||
"totals": {
|
||
"commits": 182,
|
||
"insertions": 15300,
|
||
"deletions": 4200,
|
||
"projects": 5,
|
||
"active_days": 6,
|
||
"sessions": { "claude_code": 48, "codex": 8, "gemini": 3 },
|
||
"global_streak_days": 52,
|
||
"avg_context_switches_per_day": 2.1
|
||
},
|
||
"tweetable": "Week of Mar 14: 5 projects, 182 commits, 15.3k LOC | CC: 48, Codex: 8, Gemini: 3 | Focus: gstack (58%) | Streak: 52d"
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Compare Mode
|
||
|
||
When the user runs `/retro compare` (or `/retro compare 14d`):
|
||
|
||
1. Compute metrics for the current window (default 7d) using the midnight-aligned start date (same logic as the main retro — e.g., if today is 2026-03-18 and window is 7d, use `--since="2026-03-11T00:00:00"`)
|
||
2. Compute metrics for the immediately prior same-length window using both `--since` and `--until` with midnight-aligned dates to avoid overlap (e.g., for a 7d window starting 2026-03-11: prior window is `--since="2026-03-04T00:00:00" --until="2026-03-11T00:00:00"`)
|
||
3. Show a side-by-side comparison table with deltas and arrows
|
||
4. Write a brief narrative highlighting the biggest improvements and regressions
|
||
5. Save only the current-window snapshot to `.context/retros/` (same as a normal retro run); do **not** persist the prior-window metrics.
|
||
|
||
## Tone
|
||
|
||
- Encouraging but candid, no coddling
|
||
- Specific and concrete — always anchor in actual commits/code
|
||
- Skip generic praise ("great job!") — say exactly what was good and why
|
||
- Frame improvements as leveling up, not criticism
|
||
- **Praise should feel like something you'd actually say in a 1:1** — specific, earned, genuine
|
||
- **Growth suggestions should feel like investment advice** — "this is worth your time because..." not "you failed at..."
|
||
- Never compare teammates against each other negatively. Each person's section stands on its own.
|
||
- Keep total output around 3000-4500 words (slightly longer to accommodate team sections)
|
||
- Use markdown tables and code blocks for data, prose for narrative
|
||
- Output directly to the conversation — do NOT write to filesystem (except the `.context/retros/` JSON snapshot)
|
||
|
||
## Important Rules
|
||
|
||
- ALL narrative output goes directly to the user in the conversation. The ONLY file written is the `.context/retros/` JSON snapshot.
|
||
- Use `origin/<default>` for all git queries (not local main which may be stale)
|
||
- Display all timestamps in the user's local timezone (do not override `TZ`)
|
||
- If the window has zero commits, say so and suggest a different window
|
||
- Round LOC/hour to nearest 50
|
||
- Treat merge commits as PR boundaries
|
||
- Do not read CLAUDE.md or other docs — this skill is self-contained
|
||
- On first run (no prior retros), skip comparison sections gracefully
|
||
- **Global mode:** Does NOT require being inside a git repo. Saves snapshots to `~/.gstack/retros/` (not `.context/retros/`). Gracefully skip AI tools that aren't installed. Only compare against prior global retros with the same window value. If streak hits 365d cap, display as "365+ days".
|