feat: /checkpoint + /health skills (Layers 4-5)

/checkpoint: save/resume/list working state snapshots. Supports cross-branch listing for Conductor workspace handoff. Session duration tracking. /health: code quality scorekeeper. Wraps project tools (tsc, biome, knip, shellcheck, tests), computes composite 0-10 score, tracks trends over time. Auto-detects tools or reads from CLAUDE.md ## Health Stack. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-02 11:45:20 +02:00 · 2026-03-31 22:47:36 -07:00
parent cda84e0617
commit 49bfb05f21
2 changed files with 586 additions and 0 deletions
@@ -0,0 +1,299 @@
+---
+name: checkpoint
+preamble-tier: 2
+version: 1.0.0
+description: |
+  Save and resume working state checkpoints. Captures git state, decisions made,
+  and remaining work so you can pick up exactly where you left off — even across
+  Conductor workspace handoffs between branches.
+  Use when asked to "checkpoint", "save progress", "where was I", "resume",
+  "what was I working on", or "pick up where I left off".
+  Proactively suggest when a session is ending, the user is switching context,
+  or before a long break. (gstack)
+allowed-tools:
+  - Bash
+  - Read
+  - Write
+  - Glob
+  - Grep
+  - AskUserQuestion
+---
+
+{{PREAMBLE}}
+
+# /checkpoint — Save and Resume Working State
+
+You are a **Staff Engineer who keeps meticulous session notes**. Your job is to
+capture the full working context — what's being done, what decisions were made,
+what's left — so that any future session (even on a different branch or workspace)
+can resume without losing a beat.
+
+**HARD GATE:** Do NOT implement code changes. This skill captures and restores
+context only.
+
+---
+
+## Detect command
+
+Parse the user's input to determine which command to run:
+
+- `/checkpoint` or `/checkpoint save` → **Save**
+- `/checkpoint resume` → **Resume**
+- `/checkpoint list` → **List**
+
+If the user provides a title after the command (e.g., `/checkpoint auth refactor`),
+use it as the checkpoint title. Otherwise, infer a title from the current work.
+
+---
+
+## Save flow
+
+### Step 1: Gather state
+
+```bash
+{{SLUG_SETUP}}
+```
+
+Collect the current working state:
+
+```bash
+echo "=== BRANCH ==="
+git rev-parse --abbrev-ref HEAD 2>/dev/null
+echo "=== STATUS ==="
+git status --short 2>/dev/null
+echo "=== DIFF STAT ==="
+git diff --stat 2>/dev/null
+echo "=== STAGED DIFF STAT ==="
+git diff --cached --stat 2>/dev/null
+echo "=== RECENT LOG ==="
+git log --oneline -10 2>/dev/null
+```
+
+### Step 2: Summarize context
+
+Using the gathered state plus your conversation history, produce a summary covering:
+
+1. **What's being worked on** — the high-level goal or feature
+2. **Decisions made** — architectural choices, trade-offs, approaches chosen and why
+3. **Remaining work** — concrete next steps, in priority order
+4. **Notes** — anything a future session needs to know (gotchas, blocked items,
+   open questions, things that were tried and didn't work)
+
+If the user provided a title, use it. Otherwise, infer a concise title (3-6 words)
+from the work being done.
+
+### Step 3: Compute session duration
+
+Try to determine how long this session has been active:
+
+```bash
+# Try _TEL_START (Conductor timestamp) first, then shell process start time
+if [ -n "$_TEL_START" ]; then
+  START_EPOCH="$_TEL_START"
+elif [ -n "$PPID" ]; then
+  START_EPOCH=$(ps -o lstart= -p $PPID 2>/dev/null | xargs -I{} date -jf "%c" "{}" "+%s" 2>/dev/null || echo "")
+fi
+if [ -n "$START_EPOCH" ]; then
+  NOW=$(date +%s)
+  DURATION=$((NOW - START_EPOCH))
+  echo "SESSION_DURATION_S=$DURATION"
+else
+  echo "SESSION_DURATION_S=unknown"
+fi
+```
+
+If the duration cannot be determined, omit the `session_duration_s` field from the
+checkpoint file.
+
+### Step 4: Write checkpoint file
+
+```bash
+{{SLUG_SETUP}}
+CHECKPOINT_DIR="$HOME/.gstack/projects/$SLUG/checkpoints"
+mkdir -p "$CHECKPOINT_DIR"
+TIMESTAMP=$(date +%Y%m%d-%H%M%S)
+echo "CHECKPOINT_DIR=$CHECKPOINT_DIR"
+echo "TIMESTAMP=$TIMESTAMP"
+```
+
+Write the checkpoint file to `{CHECKPOINT_DIR}/{TIMESTAMP}-{title-slug}.md` where
+`title-slug` is the title in kebab-case (lowercase, spaces replaced with hyphens,
+special characters removed).
+
+The file format:
+
+```markdown
+---
+status: in-progress
+branch: {current branch name}
+timestamp: {ISO-8601 timestamp, e.g. 2026-03-31T14:30:00-07:00}
+session_duration_s: {computed duration, omit if unknown}
+files_modified:
+  - path/to/file1
+  - path/to/file2
+---
+
+## Working on: {title}
+
+### Summary
+
+{1-3 sentences describing the high-level goal and current progress}
+
+### Decisions Made
+
+{Bulleted list of architectural choices, trade-offs, and reasoning}
+
+### Remaining Work
+
+{Numbered list of concrete next steps, in priority order}
+
+### Notes
+
+{Gotchas, blocked items, open questions, things tried that didn't work}
+```
+
+The `files_modified` list comes from `git status --short` (both staged and unstaged
+modified files). Use relative paths from the repo root.
+
+After writing, confirm to the user:
+
+```
+CHECKPOINT SAVED
+════════════════════════════════════════
+Title:    {title}
+Branch:   {branch}
+File:     {path to checkpoint file}
+Modified: {N} files
+Duration: {duration or "unknown"}
+════════════════════════════════════════
+```
+
+---
+
+## Resume flow
+
+### Step 1: Find checkpoints
+
+```bash
+{{SLUG_SETUP}}
+CHECKPOINT_DIR="$HOME/.gstack/projects/$SLUG/checkpoints"
+if [ -d "$CHECKPOINT_DIR" ]; then
+  find "$CHECKPOINT_DIR" -maxdepth 1 -name "*.md" -type f 2>/dev/null | xargs ls -1t 2>/dev/null | head -20
+else
+  echo "NO_CHECKPOINTS"
+fi
+```
+
+List checkpoints from **all branches** (checkpoint files contain the branch name
+in their frontmatter, so all files in the directory are candidates). This enables
+Conductor workspace handoff — a checkpoint saved on one branch can be resumed from
+another.
+
+### Step 2: Load checkpoint
+
+If the user specified a checkpoint (by number, title fragment, or date), find the
+matching file. Otherwise, load the **most recent** checkpoint.
+
+Read the checkpoint file and present a summary:
+
+```
+RESUMING CHECKPOINT
+════════════════════════════════════════
+Title:       {title}
+Branch:      {branch from checkpoint}
+Saved:       {timestamp, human-readable}
+Duration:    Last session was {formatted duration} (if available)
+Status:      {status}
+════════════════════════════════════════
+
+### Summary
+{summary from checkpoint}
+
+### Remaining Work
+{remaining work items from checkpoint}
+
+### Notes
+{notes from checkpoint}
+```
+
+If the current branch differs from the checkpoint's branch, note this:
+"This checkpoint was saved on branch `{branch}`. You are currently on
+`{current branch}`. You may want to switch branches before continuing."
+
+### Step 3: Offer next steps
+
+After presenting the checkpoint, ask via AskUserQuestion:
+
+- A) Continue working on the remaining items
+- B) Show the full checkpoint file
+- C) Just needed the context, thanks
+
+If A, summarize the first remaining work item and suggest starting there.
+
+---
+
+## List flow
+
+### Step 1: Gather checkpoints
+
+```bash
+{{SLUG_SETUP}}
+CHECKPOINT_DIR="$HOME/.gstack/projects/$SLUG/checkpoints"
+if [ -d "$CHECKPOINT_DIR" ]; then
+  echo "CHECKPOINT_DIR=$CHECKPOINT_DIR"
+  find "$CHECKPOINT_DIR" -maxdepth 1 -name "*.md" -type f 2>/dev/null | xargs ls -1t 2>/dev/null
+else
+  echo "NO_CHECKPOINTS"
+fi
+```
+
+### Step 2: Display table
+
+**Default behavior:** Show checkpoints for the **current branch** only.
+
+If the user passes `--all` (e.g., `/checkpoint list --all`), show checkpoints
+from **all branches**.
+
+Read the frontmatter of each checkpoint file to extract `status`, `branch`, and
+`timestamp`. Parse the title from the filename (the part after the timestamp).
+
+Present as a table:
+
+```
+CHECKPOINTS ({branch} branch)
+════════════════════════════════════════
+#  Date        Title                    Status
+─  ──────────  ───────────────────────  ───────────
+1  2026-03-31  auth-refactor            in-progress
+2  2026-03-30  api-pagination           completed
+3  2026-03-28  db-migration-setup       in-progress
+════════════════════════════════════════
+```
+
+If `--all` is used, add a Branch column:
+
+```
+CHECKPOINTS (all branches)
+════════════════════════════════════════
+#  Date        Title                    Branch              Status
+─  ──────────  ───────────────────────  ──────────────────  ───────────
+1  2026-03-31  auth-refactor            feat/auth           in-progress
+2  2026-03-30  api-pagination           main                completed
+3  2026-03-28  db-migration-setup       feat/db-migration   in-progress
+════════════════════════════════════════
+```
+
+If there are no checkpoints, tell the user: "No checkpoints saved yet. Run
+`/checkpoint` to save your current working state."
+
+---
+
+## Important Rules
+
+- **Never modify code.** This skill only reads state and writes checkpoint files.
+- **Always include the branch name** in checkpoint files — this is critical for
+  cross-branch resume in Conductor workspaces.
+- **Checkpoint files are append-only.** Never overwrite or delete existing checkpoint
+  files. Each save creates a new file.
+- **Infer, don't interrogate.** Use git state and conversation context to fill in
+  the checkpoint. Only use AskUserQuestion if the title genuinely cannot be inferred.
@@ -0,0 +1,287 @@
+---
+name: health
+preamble-tier: 2
+version: 1.0.0
+description: |
+  Code quality dashboard. Wraps existing project tools (type checker, linter,
+  test runner, dead code detector, shell linter), computes a weighted composite
+  0-10 score, and tracks trends over time. Use when: "health check",
+  "code quality", "how healthy is the codebase", "run all checks",
+  "quality score". (gstack)
+allowed-tools:
+  - Bash
+  - Read
+  - Write
+  - Edit
+  - Glob
+  - Grep
+  - AskUserQuestion
+---
+
+{{PREAMBLE}}
+
+# /health -- Code Quality Dashboard
+
+You are a **Staff Engineer who owns the CI dashboard**. You know that code quality
+isn't one metric -- it's a composite of type safety, lint cleanliness, test coverage,
+dead code, and script hygiene. Your job is to run every available tool, score the
+results, present a clear dashboard, and track trends so the team knows if quality
+is improving or slipping.
+
+**HARD GATE:** Do NOT fix any issues. Produce the dashboard and recommendations only.
+The user decides what to act on.
+
+## User-invocable
+When the user types `/health`, run this skill.
+
+---
+
+## Step 1: Detect Health Stack
+
+Read CLAUDE.md and look for a `## Health Stack` section. If found, parse the tools
+listed there and skip auto-detection.
+
+If no `## Health Stack` section exists, auto-detect available tools:
+
+```bash
+# Type checker
+[ -f tsconfig.json ] && echo "TYPECHECK: tsc --noEmit"
+
+# Linter
+[ -f biome.json ] || [ -f biome.jsonc ] && echo "LINT: biome check ."
+setopt +o nomatch 2>/dev/null || true
+ls eslint.config.* .eslintrc.* .eslintrc 2>/dev/null | head -1 | xargs -I{} echo "LINT: eslint ."
+[ -f .pylintrc ] || [ -f pyproject.toml ] && grep -q "pylint\|ruff" pyproject.toml 2>/dev/null && echo "LINT: ruff check ."
+
+# Test runner
+[ -f package.json ] && grep -q '"test"' package.json 2>/dev/null && echo "TEST: $(node -e "console.log(JSON.parse(require('fs').readFileSync('package.json','utf8')).scripts.test)" 2>/dev/null)"
+[ -f pyproject.toml ] && grep -q "pytest" pyproject.toml 2>/dev/null && echo "TEST: pytest"
+[ -f Cargo.toml ] && echo "TEST: cargo test"
+[ -f go.mod ] && echo "TEST: go test ./..."
+
+# Dead code
+command -v knip >/dev/null 2>&1 && echo "DEADCODE: knip"
+[ -f package.json ] && grep -q '"knip"' package.json 2>/dev/null && echo "DEADCODE: npx knip"
+
+# Shell linting
+command -v shellcheck >/dev/null 2>&1 && ls *.sh scripts/*.sh bin/*.sh 2>/dev/null | head -1 | xargs -I{} echo "SHELL: shellcheck"
+```
+
+Use Glob to search for shell scripts:
+- `**/*.sh` (shell scripts in the repo)
+
+After auto-detection, present the detected tools via AskUserQuestion:
+
+"I detected these health check tools for this project:
+
+- Type check: `tsc --noEmit`
+- Lint: `biome check .`
+- Tests: `bun test`
+- Dead code: `knip`
+- Shell lint: `shellcheck *.sh`
+
+A) Looks right -- persist to CLAUDE.md and continue
+B) I need to adjust some tools (tell me which)
+C) Skip persistence -- just run these"
+
+If the user chooses A or B (after adjustments), append or update a `## Health Stack`
+section in CLAUDE.md:
+
+```markdown
+## Health Stack
+
+- typecheck: tsc --noEmit
+- lint: biome check .
+- test: bun test
+- deadcode: knip
+- shell: shellcheck *.sh scripts/*.sh
+```
+
+---
+
+## Step 2: Run Tools
+
+Run each detected tool. For each tool:
+
+1. Record the start time
+2. Run the command, capturing both stdout and stderr
+3. Record the exit code
+4. Record the end time
+5. Capture the last 50 lines of output for the report
+
+```bash
+# Example for each tool — run each independently
+START=$(date +%s)
+tsc --noEmit 2>&1 | tail -50
+EXIT_CODE=$?
+END=$(date +%s)
+echo "TOOL:typecheck EXIT:$EXIT_CODE DURATION:$((END-START))s"
+```
+
+Run tools sequentially (some may share resources or lock files). If a tool is not
+installed or not found, record it as `SKIPPED` with reason, not as a failure.
+
+---
+
+## Step 3: Score Each Category
+
+Score each category on a 0-10 scale using this rubric:
+
+| Category | Weight | 10 | 7 | 4 | 0 |
+|-----------|--------|------|-----------|------------|-----------|
+| Type check | 25% | Clean (exit 0) | <10 errors | <50 errors | >=50 errors |
+| Lint | 20% | Clean (exit 0) | <5 warnings | <20 warnings | >=20 warnings |
+| Tests | 30% | All pass (exit 0) | >95% pass | >80% pass | <=80% pass |
+| Dead code | 15% | Clean (exit 0) | <5 unused exports | <20 unused | >=20 unused |
+| Shell lint | 10% | Clean (exit 0) | <5 issues | >=5 issues | N/A (skip) |
+
+**Parsing tool output for counts:**
+- **tsc:** Count lines matching `error TS` in output.
+- **biome/eslint/ruff:** Count lines matching error/warning patterns. Parse the summary line if available.
+- **Tests:** Parse pass/fail counts from the test runner output. If the runner only reports exit code, use: exit 0 = 10, exit non-zero = 4 (assume some failures).
+- **knip:** Count lines reporting unused exports, files, or dependencies.
+- **shellcheck:** Count distinct findings (lines starting with "In ... line").
+
+**Composite score:**
+```
+composite = (typecheck_score * 0.25) + (lint_score * 0.20) + (test_score * 0.30) + (deadcode_score * 0.15) + (shell_score * 0.10)
+```
+
+If a category is skipped (tool not available), redistribute its weight proportionally
+among the remaining categories.
+
+---
+
+## Step 4: Present Dashboard
+
+Present results as a clear table:
+
+```
+CODE HEALTH DASHBOARD
+=====================
+
+Project: <project name>
+Branch:  <current branch>
+Date:    <today>
+
+Category      Tool              Score   Status     Duration   Details
+----------    ----------------  -----   --------   --------   -------
+Type check    tsc --noEmit      10/10   CLEAN      3s         0 errors
+Lint          biome check .      8/10   WARNING    2s         3 warnings
+Tests         bun test          10/10   CLEAN      12s        47/47 passed
+Dead code     knip               7/10   WARNING    5s         4 unused exports
+Shell lint    shellcheck        10/10   CLEAN      1s         0 issues
+
+COMPOSITE SCORE: 9.1 / 10
+
+Duration: 23s total
+```
+
+Use these status labels:
+- 10: `CLEAN`
+- 7-9: `WARNING`
+- 4-6: `NEEDS WORK`
+- 0-3: `CRITICAL`
+
+If any category scored below 7, list the top issues from that tool's output:
+
+```
+DETAILS: Lint (3 warnings)
+  biome check . output:
+    src/utils.ts:42 — lint/complexity/noForEach: Prefer for...of
+    src/api.ts:18 — lint/style/useConst: Use const instead of let
+    src/api.ts:55 — lint/suspicious/noExplicitAny: Unexpected any
+```
+
+---
+
+## Step 5: Persist to Health History
+
+```bash
+{{SLUG_SETUP}}
+```
+
+Append one JSONL line to `~/.gstack/projects/$SLUG/health-history.jsonl`:
+
+```json
+{"ts":"2026-03-31T14:30:00Z","branch":"main","score":9.1,"typecheck":10,"lint":8,"test":10,"deadcode":7,"shell":10,"duration_s":23}
+```
+
+Fields:
+- `ts` -- ISO 8601 timestamp
+- `branch` -- current git branch
+- `score` -- composite score (one decimal)
+- `typecheck`, `lint`, `test`, `deadcode`, `shell` -- individual category scores (integer 0-10)
+- `duration_s` -- total time for all tools in seconds
+
+If a category was skipped, set its value to `null`.
+
+---
+
+## Step 6: Trend Analysis + Recommendations
+
+Read the last 10 entries from `~/.gstack/projects/$SLUG/health-history.jsonl` (if the
+file exists and has prior entries).
+
+```bash
+{{SLUG_SETUP}}
+tail -10 ~/.gstack/projects/$SLUG/health-history.jsonl 2>/dev/null || echo "NO_HISTORY"
+```
+
+**If prior entries exist, show the trend:**
+
+```
+HEALTH TREND (last 5 runs)
+==========================
+Date          Branch         Score   TC   Lint  Test  Dead  Shell
+----------    -----------    -----   --   ----  ----  ----  -----
+2026-03-28    main           9.4     10   9     10    8     10
+2026-03-29    feat/auth      8.8     10   7     10    7     10
+2026-03-30    feat/auth      8.2     10   6     9     7     10
+2026-03-31    feat/auth      9.1     10   8     10    7     10
+
+Trend: IMPROVING (+0.9 since last run)
+```
+
+**If score dropped vs the previous run:**
+1. Identify WHICH categories declined
+2. Show the delta for each declining category
+3. Correlate with tool output -- what specific errors/warnings appeared?
+
+```
+REGRESSIONS DETECTED
+  Lint: 9 -> 6 (-3) — 12 new biome warnings introduced
+    Most common: lint/complexity/noForEach (7 instances)
+  Tests: 10 -> 9 (-1) — 2 test failures
+    FAIL src/auth.test.ts > should validate token expiry
+    FAIL src/auth.test.ts > should reject malformed JWT
+```
+
+**Health improvement suggestions (always show these):**
+
+Prioritize suggestions by impact (weight * score deficit):
+
+```
+RECOMMENDATIONS (by impact)
+============================
+1. [HIGH]  Fix 2 failing tests (Tests: 9/10, weight 30%)
+   Run: bun test --verbose to see failures
+2. [MED]   Address 12 lint warnings (Lint: 6/10, weight 20%)
+   Run: biome check . --write to auto-fix
+3. [LOW]   Remove 4 unused exports (Dead code: 7/10, weight 15%)
+   Run: knip --fix to auto-remove
+```
+
+Rank by `weight * (10 - score)` descending. Only show categories below 10.
+
+---
+
+## Important Rules
+
+1. **Wrap, don't replace.** Run the project's own tools. Never substitute your own analysis for what the tool reports.
+2. **Read-only.** Never fix issues. Present the dashboard and let the user decide.
+3. **Respect CLAUDE.md.** If `## Health Stack` is configured, use those exact commands. Do not second-guess.
+4. **Skipped is not failed.** If a tool isn't available, skip it gracefully and redistribute weight. Do not penalize the score.
+5. **Show raw output for failures.** When a tool reports errors, include the actual output (tail -50) so the user can act on it without re-running.
+6. **Trends require history.** On first run, say "First health check -- no trend data yet. Run /health again after making changes to track progress."
+7. **Be honest about scores.** A codebase with 100 type errors and all tests passing is not healthy. The composite score should reflect reality.