feat: cross-skill TODOS awareness + Greptile template refs in all skills

/ship Step 5.5: auto-detect completed TODOs, offer reorganization. /review Step 5.5: cross-reference PR against open TODOs. /plan-ceo-review, /plan-eng-review: TODOS context in planning. /retro: Backlog Health metric. /qa: bug TODO context in diff-aware mode. All Greptile-aware skills now reference reply templates and escalation detection.
2026-05-02 03:35:09 +02:00 · 2026-03-14 22:04:44 -05:00
parent 9b82ae8973
commit cec7ee9980
12 changed files with 278 additions and 38 deletions
@@ -74,7 +74,13 @@ git stash list                                 # Any stashed work
 grep -r "TODO\|FIXME\|HACK\|XXX" --include="*.rb" --include="*.js" -l
 find . -name "*.rb" -newer Gemfile.lock | head -20  # Recently touched files
 ```
-Then read CLAUDE.md, TODOS.md, and any existing architecture docs. Map:
+Then read CLAUDE.md, TODOS.md, and any existing architecture docs. When reading TODOS.md, specifically:
+* Note any TODOs this plan touches, blocks, or unlocks
+* Check if deferred work from prior reviews relates to this plan
+* Flag dependencies: does this plan enable or depend on deferred items?
+* Map known pain points (from TODOS) to this plan's scope
+
+Map:
 * What is the current system state?
 * What is already in flight (other open PRs, branches, stashed changes)?
 * What are the existing known pain points most relevant to this plan?
@@ -393,7 +399,7 @@ Complete table of every method that can fail, every exception class, rescued sta
 Any row with RESCUED=N, TEST=N, USER SEES=Silent → **CRITICAL GAP**.

 ### TODOS.md updates
-Present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step.
+Present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step. Follow the format in `.claude/skills/review/TODOS-format.md`.

 For each TODO, describe:
 * **What:** One-line description of the work.
@@ -65,7 +65,13 @@ git stash list                                 # Any stashed work
 grep -r "TODO\|FIXME\|HACK\|XXX" --include="*.rb" --include="*.js" -l
 find . -name "*.rb" -newer Gemfile.lock | head -20  # Recently touched files
 ```
-Then read CLAUDE.md, TODOS.md, and any existing architecture docs. Map:
+Then read CLAUDE.md, TODOS.md, and any existing architecture docs. When reading TODOS.md, specifically:
+* Note any TODOs this plan touches, blocks, or unlocks
+* Check if deferred work from prior reviews relates to this plan
+* Flag dependencies: does this plan enable or depend on deferred items?
+* Map known pain points (from TODOS) to this plan's scope
+
+Map:
 * What is the current system state?
 * What is already in flight (other open PRs, branches, stashed changes)?
 * What are the existing known pain points most relevant to this plan?
@@ -384,7 +390,7 @@ Complete table of every method that can fail, every exception class, rescued sta
 Any row with RESCUED=N, TEST=N, USER SEES=Silent → **CRITICAL GAP**.

 ### TODOS.md updates
-Present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step.
+Present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step. Follow the format in `.claude/skills/review/TODOS-format.md`.

 For each TODO, describe:
 * **What:** One-line description of the work.
@@ -51,6 +51,7 @@ Before reviewing anything, answer these questions:
 1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?
 2. **What is the minimum set of changes that achieves the stated goal?** Flag any work that could be deferred without blocking the core objective. Be ruthless about scope creep.
 3. **Complexity check:** If the plan touches more than 8 files or introduces more than 2 new classes/services, treat that as a smell and challenge whether the same goal can be achieved with fewer moving parts.
+4. **TODOS cross-reference:** Read `TODOS.md` if it exists. Are any deferred items blocking this plan? Can any deferred items be bundled into this PR without expanding scope? Does this plan create new work that should be captured as a TODO?

 Then ask if I want one of three options:
 1. **SCOPE REDUCTION:** The plan is overbuilt. Propose a minimal version that achieves the core goal, then review that.
@@ -123,7 +124,7 @@ Every plan review MUST produce a "NOT in scope" section listing work that was co
 List existing code/flows that already partially solve sub-problems in this plan, and whether the plan reuses them or unnecessarily rebuilds them.

 ### TODOS.md updates
-After all review sections are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step.
+After all review sections are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step. Follow the format in `.claude/skills/review/TODOS-format.md`.

 For each TODO, describe:
 * **What:** One-line description of the work.
@@ -42,6 +42,7 @@ Before reviewing anything, answer these questions:
 1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?
 2. **What is the minimum set of changes that achieves the stated goal?** Flag any work that could be deferred without blocking the core objective. Be ruthless about scope creep.
 3. **Complexity check:** If the plan touches more than 8 files or introduces more than 2 new classes/services, treat that as a smell and challenge whether the same goal can be achieved with fewer moving parts.
+4. **TODOS cross-reference:** Read `TODOS.md` if it exists. Are any deferred items blocking this plan? Can any deferred items be bundled into this PR without expanding scope? Does this plan create new work that should be captured as a TODO?

 Then ask if I want one of three options:
 1. **SCOPE REDUCTION:** The plan is overbuilt. Propose a minimal version that achieves the core goal, then review that.
@@ -114,7 +115,7 @@ Every plan review MUST produce a "NOT in scope" section listing work that was co
 List existing code/flows that already partially solve sub-problems in this plan, and whether the plan reuses them or unnecessarily rebuilds them.

 ### TODOS.md updates
-After all review sections are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step.
+After all review sections are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step. Follow the format in `.claude/skills/review/TODOS-format.md`.

 For each TODO, describe:
 * **What:** One-line description of the work.
@@ -110,7 +110,9 @@ This is the **primary mode** for developers verifying their work. When the user

 5. **Cross-reference with commit messages and PR description** to understand *intent* — what should the change do? Verify it actually does that.

-6. **Report findings** scoped to the branch changes:
+6. **Check TODOS.md** (if it exists) for known bugs or issues related to the changed files. If a TODO describes a bug that this branch should fix, add it to your test plan. If you find a new bug during QA that isn't in TODOS.md, note it in the report.
+
+7. **Report findings** scoped to the branch changes:
   - "Changes tested: N pages/routes affected by this branch"
   - For each: does it work? Screenshot evidence.
   - Any regressions on adjacent pages?
@@ -84,7 +84,9 @@ This is the **primary mode** for developers verifying their work. When the user

 5. **Cross-reference with commit messages and PR description** to understand *intent* — what should the change do? Verify it actually does that.

-6. **Report findings** scoped to the branch changes:
+6. **Check TODOS.md** (if it exists) for known bugs or issues related to the changed files. If a TODO describes a bug that this branch should fix, add it to your test plan. If you find a new bug during QA that isn't in TODOS.md, note it in the report.
+
+7. **Report findings** scoped to the branch changes:
   - "Changes tested: N pages/routes affected by this branch"
   - For each: does it work? Screenshot evidence.
   - Any regressions on adjacent pages?
@@ -95,6 +95,9 @@ git shortlog origin/main --since="<window>" -sn --no-merges

 # 8. Greptile triage history (if available)
 cat ~/.gstack/greptile-history.md 2>/dev/null || true
+
+# 9. TODOS.md backlog (if available)
+cat TODOS.md 2>/dev/null || true
 ```

 ### Step 2: Compute Metrics
@@ -130,6 +133,20 @@ Sort by commits descending. The current user (from `git config user.name`) alway

 **Greptile signal (if history exists):** Read `~/.gstack/greptile-history.md` (fetched in Step 1, command 8). Filter entries within the retro time window by date. Count entries by type: `fix`, `fp`, `already-fixed`. Compute signal ratio: `(fix + already-fixed) / (fix + already-fixed + fp)`. If no entries exist in the window or the file doesn't exist, skip the Greptile metric row. Skip unparseable lines silently.

+**Backlog Health (if TODOS.md exists):** Read `TODOS.md` (fetched in Step 1, command 9). Compute:
+- Total open TODOs (exclude items in `## Completed` section)
+- P0/P1 count (critical/urgent items)
+- P2 count (important items)
+- Items completed this period (items in Completed section with dates within the retro window)
+- Items added this period (cross-reference git log for commits that modified TODOS.md within the window)
+
+Include in the metrics table:
+```
+| Backlog Health | N open (X P0/P1, Y P2) · Z completed this period |
+```
+
+If TODOS.md doesn't exist, skip the Backlog Health row.
+
 ### Step 3: Commit Time Distribution

 Show hourly histogram in Pacific time using bar chart:
@@ -325,7 +342,18 @@ Use the Write tool to save the JSON file with this schema:
 }
 ```

-**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. If no history data is available, omit the field entirely.
+**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. If either has no data, omit the field entirely.
+
+Include backlog data in the JSON when TODOS.md exists:
+```json
+  "backlog": {
+    "total_open": 28,
+    "p0_p1": 2,
+    "p2": 8,
+    "completed_this_period": 3,
+    "added_this_period": 1
+  }
+```

 ### Step 14: Write the Narrative

@@ -86,6 +86,9 @@ git shortlog origin/main --since="<window>" -sn --no-merges

 # 8. Greptile triage history (if available)
 cat ~/.gstack/greptile-history.md 2>/dev/null || true
+
+# 9. TODOS.md backlog (if available)
+cat TODOS.md 2>/dev/null || true
 ```

 ### Step 2: Compute Metrics
@@ -121,6 +124,20 @@ Sort by commits descending. The current user (from `git config user.name`) alway

 **Greptile signal (if history exists):** Read `~/.gstack/greptile-history.md` (fetched in Step 1, command 8). Filter entries within the retro time window by date. Count entries by type: `fix`, `fp`, `already-fixed`. Compute signal ratio: `(fix + already-fixed) / (fix + already-fixed + fp)`. If no entries exist in the window or the file doesn't exist, skip the Greptile metric row. Skip unparseable lines silently.

+**Backlog Health (if TODOS.md exists):** Read `TODOS.md` (fetched in Step 1, command 9). Compute:
+- Total open TODOs (exclude items in `## Completed` section)
+- P0/P1 count (critical/urgent items)
+- P2 count (important items)
+- Items completed this period (items in Completed section with dates within the retro window)
+- Items added this period (cross-reference git log for commits that modified TODOS.md within the window)
+
+Include in the metrics table:
+```
+| Backlog Health | N open (X P0/P1, Y P2) · Z completed this period |
+```
+
+If TODOS.md doesn't exist, skip the Backlog Health row.
+
 ### Step 3: Commit Time Distribution

 Show hourly histogram in Pacific time using bar chart:
@@ -316,7 +333,18 @@ Use the Write tool to save the JSON file with this schema:
 }
 ```

-**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. If no history data is available, omit the field entirely.
+**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. If either has no data, omit the field entirely.
+
+Include backlog data in the JSON when TODOS.md exists:
+```json
+  "backlog": {
+    "total_open": 28,
+    "p0_p1": 2,
+    "p2": 8,
+    "completed_this_period": 3,
+    "added_this_period": 1
+  }
+```

 ### Step 14: Write the Narrative

@@ -49,7 +49,7 @@ Read `.claude/skills/review/checklist.md`.

 ## Step 2.5: Check for Greptile review comments

-Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, and classify steps.
+Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, classify, and **escalation detection** steps.

 **If no PR exists, `gh` fails, API returns an error, or there are zero Greptile comments:** Skip this step silently. Greptile integration is additive — the review works without it.

@@ -95,7 +95,9 @@ After outputting your own findings, if Greptile comments were classified in Step

 **Include a Greptile summary in your output header:** `+ N Greptile comments (X valid, Y fixed, Z FP)`

-1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses C (false positive), post a reply using the appropriate API from the triage doc and save the pattern to both per-project and global greptile-history (see greptile-triage.md for write details).
+Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.
+
+1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.

 2. **FALSE POSITIVE comments:** Present each one via AskUserQuestion:
   - Show the Greptile comment: file:line (or [top-level]) + body summary + permalink URL
@@ -105,19 +107,32 @@ After outputting your own findings, if Greptile comments were classified in Step
     - B) Fix it anyway (if low-effort and harmless)
     - C) Ignore — don't reply, don't fix

-   If the user chooses A, post a reply using the appropriate API from the triage doc and save the pattern to both per-project and global greptile-history (see greptile-triage.md for write details).
+   If the user chooses A, reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history.

-3. **VALID BUT ALREADY FIXED comments:** Reply acknowledging the catch — no AskUserQuestion needed:
-   - Post reply: `"Good catch — already fixed in <commit-sha>."`
-   - Save to both per-project and global greptile-history (see greptile-triage.md for write details)
+3. **VALID BUT ALREADY FIXED comments:** Reply using the **Already Fixed reply template** from greptile-triage.md — no AskUserQuestion needed:
+   - Include what was done and the fixing commit SHA
+   - Save to both per-project and global greptile-history

 4. **SUPPRESSED comments:** Skip silently — these are known false positives from previous triage.

 ---

+## Step 5.5: TODOS cross-reference
+
+Read `TODOS.md` in the repository root (if it exists). Cross-reference the PR against open TODOs:
+
+- **Does this PR close any open TODOs?** If yes, note which items in your output: "This PR addresses TODO: <title>"
+- **Does this PR create work that should become a TODO?** If yes, flag it as an informational finding.
+- **Are there related TODOs that provide context for this review?** If yes, reference them when discussing related findings.
+
+If TODOS.md doesn't exist, skip this step silently.
+
+---
+
 ## Important Rules

 - **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
 - **Read-only by default.** Only modify files if the user explicitly chooses "Fix it now" on a critical issue. Never commit, push, or create PRs.
 - **Be terse.** One line problem, one line fix. No preamble.
 - **Only flag real problems.** Skip anything that's fine.
+- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence. Never post vague replies.
@@ -40,7 +40,7 @@ Read `.claude/skills/review/checklist.md`.

 ## Step 2.5: Check for Greptile review comments

-Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, and classify steps.
+Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, classify, and **escalation detection** steps.

 **If no PR exists, `gh` fails, API returns an error, or there are zero Greptile comments:** Skip this step silently. Greptile integration is additive — the review works without it.

@@ -86,7 +86,9 @@ After outputting your own findings, if Greptile comments were classified in Step

 **Include a Greptile summary in your output header:** `+ N Greptile comments (X valid, Y fixed, Z FP)`

-1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses C (false positive), post a reply using the appropriate API from the triage doc and save the pattern to both per-project and global greptile-history (see greptile-triage.md for write details).
+Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.
+
+1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.

 2. **FALSE POSITIVE comments:** Present each one via AskUserQuestion:
   - Show the Greptile comment: file:line (or [top-level]) + body summary + permalink URL
@@ -96,19 +98,32 @@ After outputting your own findings, if Greptile comments were classified in Step
     - B) Fix it anyway (if low-effort and harmless)
     - C) Ignore — don't reply, don't fix

-   If the user chooses A, post a reply using the appropriate API from the triage doc and save the pattern to both per-project and global greptile-history (see greptile-triage.md for write details).
+   If the user chooses A, reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history.

-3. **VALID BUT ALREADY FIXED comments:** Reply acknowledging the catch — no AskUserQuestion needed:
-   - Post reply: `"Good catch — already fixed in <commit-sha>."`
-   - Save to both per-project and global greptile-history (see greptile-triage.md for write details)
+3. **VALID BUT ALREADY FIXED comments:** Reply using the **Already Fixed reply template** from greptile-triage.md — no AskUserQuestion needed:
+   - Include what was done and the fixing commit SHA
+   - Save to both per-project and global greptile-history

 4. **SUPPRESSED comments:** Skip silently — these are known false positives from previous triage.

 ---

+## Step 5.5: TODOS cross-reference
+
+Read `TODOS.md` in the repository root (if it exists). Cross-reference the PR against open TODOs:
+
+- **Does this PR close any open TODOs?** If yes, note which items in your output: "This PR addresses TODO: <title>"
+- **Does this PR create work that should become a TODO?** If yes, flag it as an informational finding.
+- **Are there related TODOs that provide context for this review?** If yes, reference them when discussing related findings.
+
+If TODOS.md doesn't exist, skip this step silently.
+
+---
+
 ## Important Rules

 - **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
 - **Read-only by default.** Only modify files if the user explicitly chooses "Fix it now" on a critical issue. Never commit, push, or create PRs.
 - **Be terse.** One line problem, one line fix. No preamble.
 - **Only flag real problems.** Skip anything that's fine.
+- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence. Never post vague replies.
@@ -35,6 +35,8 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
 - Pre-landing review finds CRITICAL issues and user chooses to fix (not acknowledge or skip)
 - MINOR or MAJOR version bump needed (ask — see Step 4)
 - Greptile review comments that need user decision (complex fixes, false positives)
+- TODOS.md missing and user wants to create one (ask — see Step 5.5)
+- TODOS.md disorganized and user wants to reorganize (ask — see Step 5.5)

 **Never stop for:**
 - Uncommitted changes (always include them)
@@ -42,6 +44,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
 - CHANGELOG content (auto-generate from diff)
 - Commit message approval (auto-commit)
 - Multi-file changesets (auto-split into bisectable commits)
+- TODOS.md completed-item detection (auto-mark)

 ---

@@ -185,7 +188,7 @@ Save the review output — it goes into the PR body in Step 8.

 ## Step 3.75: Address Greptile review comments (if PR exists)

-Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, and classify steps.
+Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, classify, and **escalation detection** steps.

 **If no PR exists, `gh` fails, API returns an error, or there are zero Greptile comments:** Skip this step silently. Continue to Step 4.

@@ -193,18 +196,20 @@ Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, an

 Include a Greptile summary in your output: `+ N Greptile comments (X valid, Y fixed, Z FP)`

+Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.
+
 For each classified comment:

 **VALID & ACTIONABLE:** Use AskUserQuestion with:
 - The comment (file:line or [top-level] + body summary + permalink URL)
 - Your recommended fix
 - Options: A) Fix now (recommended), B) Acknowledge and ship anyway, C) It's a false positive
- If user chooses A: apply the fix, commit the fixed files (`git add <fixed-files> && git commit -m "fix: address Greptile review — <brief description>"`), reply to the comment (`"Fixed in <commit-sha>."`), and save to both per-project and global greptile-history (see greptile-triage.md for write details, type: fix).
- If user chooses C: reply explaining the false positive, save to both per-project and global greptile-history (type: fp).
+- If user chooses A: apply the fix, commit the fixed files (`git add <fixed-files> && git commit -m "fix: address Greptile review — <brief description>"`), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation), and save to both per-project and global greptile-history (type: fix).
+- If user chooses C: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp).

-**VALID BUT ALREADY FIXED:** Reply acknowledging the catch — no AskUserQuestion needed:
- Post reply: `"Good catch — already fixed in <commit-sha>."`
- Save to both per-project and global greptile-history (see greptile-triage.md for write details, type: already-fixed)
+**VALID BUT ALREADY FIXED:** Reply using the **Already Fixed reply template** from greptile-triage.md — no AskUserQuestion needed:
+- Include what was done and the fixing commit SHA
+- Save to both per-project and global greptile-history (type: already-fixed)

 **FALSE POSITIVE:** Use AskUserQuestion:
 - Show the comment and why you think it's wrong (file:line or [top-level] + body summary + permalink URL)
@@ -212,7 +217,7 @@ For each classified comment:
  - A) Reply to Greptile explaining the false positive (recommended if clearly wrong)
  - B) Fix it anyway (if trivial)
  - C) Ignore silently
- If user chooses A: post reply using the appropriate API from the triage doc, save to both per-project and global greptile-history (type: fp)
+- If user chooses A: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp)

 **SUPPRESSED:** Skip silently — these are known false positives from previous triage.

@@ -261,6 +266,61 @@ For each classified comment:

 ---

+## Step 5.5: TODOS.md (auto-update)
+
+Cross-reference the project's TODOS.md against the changes being shipped. Mark completed items automatically; prompt only if the file is missing or disorganized.
+
+Read `.claude/skills/review/TODOS-format.md` for the canonical format reference.
+
+**1. Check if TODOS.md exists** in the repository root.
+
+**If TODOS.md does not exist:** Use AskUserQuestion:
+- Message: "GStack recommends maintaining a TODOS.md organized by skill/component, then priority (P0 at top through P4, then Completed at bottom). See TODOS-format.md for the full format. Would you like to create one?"
+- Options: A) Create it now, B) Skip for now
+- If A: Create `TODOS.md` with a skeleton (# TODOS heading + ## Completed section). Continue to step 3.
+- If B: Skip the rest of Step 5.5. Continue to Step 6.
+
+**2. Check structure and organization:**
+
+Read TODOS.md and verify it follows the recommended structure:
+- Items grouped under `## <Skill/Component>` headings
+- Each item has `**Priority:**` field with P0-P4 value
+- A `## Completed` section at the bottom
+
+**If disorganized** (missing priority fields, no component groupings, no Completed section): Use AskUserQuestion:
+- Message: "TODOS.md doesn't follow the recommended structure (skill/component groupings, P0-P4 priority, Completed section). Would you like to reorganize it?"
+- Options: A) Reorganize now (recommended), B) Leave as-is
+- If A: Reorganize in-place following TODOS-format.md. Preserve all content — only restructure, never delete items.
+- If B: Continue to step 3 without restructuring.
+
+**3. Detect completed TODOs:**
+
+This step is fully automatic — no user interaction.
+
+Use the diff and commit history already gathered in earlier steps:
+- `git diff main...HEAD` (full diff against main)
+- `git log main..HEAD --oneline` (all commits being shipped)
+
+For each TODO item, check if the changes in this PR complete it by:
+- Matching commit messages against the TODO title and description
+- Checking if files referenced in the TODO appear in the diff
+- Checking if the TODO's described work matches the functional changes
+
+**Be conservative:** Only mark a TODO as completed if there is clear evidence in the diff. If uncertain, leave it alone.
+
+**4. Move completed items** to the `## Completed` section at the bottom. Append: `**Completed:** vX.Y.Z (YYYY-MM-DD)`
+
+**5. Output summary:**
+- `TODOS.md: N items marked complete (item1, item2, ...). M items remaining.`
+- Or: `TODOS.md: No completed items detected. M items remaining.`
+- Or: `TODOS.md: Created.` / `TODOS.md: Reorganized.`
+
+**6. Defensive:** If TODOS.md cannot be written (permission error, disk full), warn the user and continue. Never stop the ship workflow for a TODOS failure.
+
+Save this summary — it goes into the PR body in Step 8.
+
+---
+
 ## Step 6: Commit (bisectable chunks)

 **Goal:** Create small, logical commits that work well with `git bisect` and help LLMs understand what changed.
@@ -271,7 +331,7 @@ For each classified comment:
   - **Infrastructure:** migrations, config changes, route additions
   - **Models & services:** new models, services, concerns (with their tests)
   - **Controllers & views:** controllers, views, JS/React components (with their tests)
-   - **VERSION + CHANGELOG:** always in the final commit
+   - **VERSION + CHANGELOG + TODOS.md:** always in the final commit

 3. **Rules for splitting:**
   - A model and its test file go in the same commit
@@ -329,6 +389,12 @@ gh pr create --title "<type>: <summary>" --body "$(cat <<'EOF'
 <If no Greptile comments found: "No Greptile comments.">
 <If no PR existed during Step 3.75: omit this section entirely>

+## TODOS
+<If items marked complete: bullet list of completed items with version>
+<If no items completed: "No TODO items completed in this PR.">
+<If TODOS.md created or reorganized: note that>
+<If TODOS.md doesn't exist and user skipped: omit this section>
+
 ## Test plan
 - [x] All Rails tests pass (N runs, 0 failures)
 - [x] All Vitest tests pass (N tests)
@@ -351,4 +417,6 @@ EOF
 - **Always use the 4-digit version format** from the VERSION file.
 - **Date format in CHANGELOG:** `YYYY-MM-DD`
 - **Split commits for bisectability** — each commit = one logical change.
+- **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done.
+- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence (inline diff, code references, re-rank suggestion). Never post vague replies.
 - **The goal is: user says `/ship`, next thing they see is the review + PR URL.**
@@ -26,6 +26,8 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
 - Pre-landing review finds CRITICAL issues and user chooses to fix (not acknowledge or skip)
 - MINOR or MAJOR version bump needed (ask — see Step 4)
 - Greptile review comments that need user decision (complex fixes, false positives)
+- TODOS.md missing and user wants to create one (ask — see Step 5.5)
+- TODOS.md disorganized and user wants to reorganize (ask — see Step 5.5)

 **Never stop for:**
 - Uncommitted changes (always include them)
@@ -33,6 +35,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
 - CHANGELOG content (auto-generate from diff)
 - Commit message approval (auto-commit)
 - Multi-file changesets (auto-split into bisectable commits)
+- TODOS.md completed-item detection (auto-mark)

 ---

@@ -176,7 +179,7 @@ Save the review output — it goes into the PR body in Step 8.

 ## Step 3.75: Address Greptile review comments (if PR exists)

-Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, and classify steps.
+Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, classify, and **escalation detection** steps.

 **If no PR exists, `gh` fails, API returns an error, or there are zero Greptile comments:** Skip this step silently. Continue to Step 4.

@@ -184,18 +187,20 @@ Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, an

 Include a Greptile summary in your output: `+ N Greptile comments (X valid, Y fixed, Z FP)`

+Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.
+
 For each classified comment:

 **VALID & ACTIONABLE:** Use AskUserQuestion with:
 - The comment (file:line or [top-level] + body summary + permalink URL)
 - Your recommended fix
 - Options: A) Fix now (recommended), B) Acknowledge and ship anyway, C) It's a false positive
- If user chooses A: apply the fix, commit the fixed files (`git add <fixed-files> && git commit -m "fix: address Greptile review — <brief description>"`), reply to the comment (`"Fixed in <commit-sha>."`), and save to both per-project and global greptile-history (see greptile-triage.md for write details, type: fix).
- If user chooses C: reply explaining the false positive, save to both per-project and global greptile-history (type: fp).
+- If user chooses A: apply the fix, commit the fixed files (`git add <fixed-files> && git commit -m "fix: address Greptile review — <brief description>"`), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation), and save to both per-project and global greptile-history (type: fix).
+- If user chooses C: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp).

-**VALID BUT ALREADY FIXED:** Reply acknowledging the catch — no AskUserQuestion needed:
- Post reply: `"Good catch — already fixed in <commit-sha>."`
- Save to both per-project and global greptile-history (see greptile-triage.md for write details, type: already-fixed)
+**VALID BUT ALREADY FIXED:** Reply using the **Already Fixed reply template** from greptile-triage.md — no AskUserQuestion needed:
+- Include what was done and the fixing commit SHA
+- Save to both per-project and global greptile-history (type: already-fixed)

 **FALSE POSITIVE:** Use AskUserQuestion:
 - Show the comment and why you think it's wrong (file:line or [top-level] + body summary + permalink URL)
@@ -203,7 +208,7 @@ For each classified comment:
  - A) Reply to Greptile explaining the false positive (recommended if clearly wrong)
  - B) Fix it anyway (if trivial)
  - C) Ignore silently
- If user chooses A: post reply using the appropriate API from the triage doc, save to both per-project and global greptile-history (type: fp)
+- If user chooses A: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp)

 **SUPPRESSED:** Skip silently — these are known false positives from previous triage.

@@ -252,6 +257,61 @@ For each classified comment:

 ---

+## Step 5.5: TODOS.md (auto-update)
+
+Cross-reference the project's TODOS.md against the changes being shipped. Mark completed items automatically; prompt only if the file is missing or disorganized.
+
+Read `.claude/skills/review/TODOS-format.md` for the canonical format reference.
+
+**1. Check if TODOS.md exists** in the repository root.
+
+**If TODOS.md does not exist:** Use AskUserQuestion:
+- Message: "GStack recommends maintaining a TODOS.md organized by skill/component, then priority (P0 at top through P4, then Completed at bottom). See TODOS-format.md for the full format. Would you like to create one?"
+- Options: A) Create it now, B) Skip for now
+- If A: Create `TODOS.md` with a skeleton (# TODOS heading + ## Completed section). Continue to step 3.
+- If B: Skip the rest of Step 5.5. Continue to Step 6.
+
+**2. Check structure and organization:**
+
+Read TODOS.md and verify it follows the recommended structure:
+- Items grouped under `## <Skill/Component>` headings
+- Each item has `**Priority:**` field with P0-P4 value
+- A `## Completed` section at the bottom
+
+**If disorganized** (missing priority fields, no component groupings, no Completed section): Use AskUserQuestion:
+- Message: "TODOS.md doesn't follow the recommended structure (skill/component groupings, P0-P4 priority, Completed section). Would you like to reorganize it?"
+- Options: A) Reorganize now (recommended), B) Leave as-is
+- If A: Reorganize in-place following TODOS-format.md. Preserve all content — only restructure, never delete items.
+- If B: Continue to step 3 without restructuring.
+
+**3. Detect completed TODOs:**
+
+This step is fully automatic — no user interaction.
+
+Use the diff and commit history already gathered in earlier steps:
+- `git diff main...HEAD` (full diff against main)
+- `git log main..HEAD --oneline` (all commits being shipped)
+
+For each TODO item, check if the changes in this PR complete it by:
+- Matching commit messages against the TODO title and description
+- Checking if files referenced in the TODO appear in the diff
+- Checking if the TODO's described work matches the functional changes
+
+**Be conservative:** Only mark a TODO as completed if there is clear evidence in the diff. If uncertain, leave it alone.
+
+**4. Move completed items** to the `## Completed` section at the bottom. Append: `**Completed:** vX.Y.Z (YYYY-MM-DD)`
+
+**5. Output summary:**
+- `TODOS.md: N items marked complete (item1, item2, ...). M items remaining.`
+- Or: `TODOS.md: No completed items detected. M items remaining.`
+- Or: `TODOS.md: Created.` / `TODOS.md: Reorganized.`
+
+**6. Defensive:** If TODOS.md cannot be written (permission error, disk full), warn the user and continue. Never stop the ship workflow for a TODOS failure.
+
+Save this summary — it goes into the PR body in Step 8.
+
+---
+
 ## Step 6: Commit (bisectable chunks)

 **Goal:** Create small, logical commits that work well with `git bisect` and help LLMs understand what changed.
@@ -262,7 +322,7 @@ For each classified comment:
   - **Infrastructure:** migrations, config changes, route additions
   - **Models & services:** new models, services, concerns (with their tests)
   - **Controllers & views:** controllers, views, JS/React components (with their tests)
-   - **VERSION + CHANGELOG:** always in the final commit
+   - **VERSION + CHANGELOG + TODOS.md:** always in the final commit

 3. **Rules for splitting:**
   - A model and its test file go in the same commit
@@ -320,6 +380,12 @@ gh pr create --title "<type>: <summary>" --body "$(cat <<'EOF'
 <If no Greptile comments found: "No Greptile comments.">
 <If no PR existed during Step 3.75: omit this section entirely>

+## TODOS
+<If items marked complete: bullet list of completed items with version>
+<If no items completed: "No TODO items completed in this PR.">
+<If TODOS.md created or reorganized: note that>
+<If TODOS.md doesn't exist and user skipped: omit this section>
+
 ## Test plan
 - [x] All Rails tests pass (N runs, 0 failures)
 - [x] All Vitest tests pass (N tests)
@@ -342,4 +408,6 @@ EOF
 - **Always use the 4-digit version format** from the VERSION file.
 - **Date format in CHANGELOG:** `YYYY-MM-DD`
 - **Split commits for bisectability** — each commit = one logical change.
+- **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done.
+- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence (inline diff, code references, re-rank suggestion). Never post vague replies.
 - **The goal is: user says `/ship`, next thing they see is the review + PR URL.**