mirror of https://github.com/garrytan/gstack.git synced 2026-05-07 05:56:41 +02:00

Files

T

Garry Tan b4c33261c7 feat: add scope drift detection + verification of claims to /review

Step 1.5: Before reviewing code quality, check if the diff matches stated
intent. Flags scope creep and missing requirements (INFORMATIONAL).

Step 5 addition: Every review claim must cite evidence — "this pattern is
safe" needs a line reference, "tests cover this" needs a test name.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-16 10:10:00 -05:00

12 KiB

Raw Blame History

name, version, description, allowed-tools

name

version

description

allowed-tools

review

1.0.0

Pre-landing PR review. Analyzes diff against main for SQL safety, LLM trust boundary violations, conditional side effects, and other structural issues.

Bash

Read

Edit

Write

Grep

Glob

AskUserQuestion

Preamble (run first)

_UPD=$(~/.claude/skills/gstack/bin/gstack-update-check 2>/dev/null || .claude/skills/gstack/bin/gstack-update-check 2>/dev/null || true)
[ -n "$_UPD" ] && echo "$_UPD" || true
mkdir -p ~/.gstack/sessions
touch ~/.gstack/sessions/"$PPID"
_SESSIONS=$(find ~/.gstack/sessions -mmin -120 -type f 2>/dev/null | wc -l | tr -d ' ')
find ~/.gstack/sessions -mmin +120 -type f -delete 2>/dev/null || true
_CONTRIB=$(~/.claude/skills/gstack/bin/gstack-config get gstack_contributor 2>/dev/null || true)

If output shows UPGRADE_AVAILABLE <old> <new>: read ~/.claude/skills/gstack/gstack-upgrade/SKILL.md and follow the "Inline upgrade flow" (auto-upgrade if configured, otherwise AskUserQuestion with 4 options, write snooze state if declined). If JUST_UPGRADED <from> <to>: tell user "Running gstack v{to} (just updated!)" and continue.

AskUserQuestion Format

ALWAYS follow this structure for every AskUserQuestion call:

Context: project name, current branch, what we're working on (1-2 sentences)
The specific question or decision point
RECOMMENDATION: Choose [X] because [one-line reason]
Lettered options: A) ... B) ... C) ...

If _SESSIONS is 3 or more: the user is juggling multiple gstack sessions and context-switching heavily. ELI16 mode — they may not remember what this conversation is about. Every AskUserQuestion MUST re-ground them: state the project, the branch, the current plan/task, then the specific problem, THEN the recommendation and options. Be extra clear and self-contained — assume they haven't looked at this window in 20 minutes.

Per-skill instructions may add additional formatting rules on top of this baseline.

Contributor Mode

If _CONTRIB is true: you are in contributor mode. When you hit friction with gstack itself (not the user's app), file a field report. Think: "hey, I was trying to do X with gstack and it didn't work / was confusing / was annoying. Here's what happened."

gstack issues: browse command fails/wrong output, snapshot missing elements, skill instructions unclear or misleading, binary crash/hang, unhelpful error message, any rough edge or annoyance — even minor stuff. NOT gstack issues: user's app bugs, network errors to user's URL, auth failures on user's site.

To file: write ~/.gstack/contributor-logs/{slug}.md with this structure:

# {Title}

Hey gstack team — ran into this while using /{skill-name}:

**What I was trying to do:** {what the user/agent was attempting}
**What happened instead:** {what actually happened}
**How annoying (1-5):** {1=meh, 3=friction, 5=blocker}

## Steps to reproduce
1. {step}

## Raw output
(wrap any error messages or unexpected output in a markdown code block)

**Date:** {YYYY-MM-DD} | **Version:** {gstack version} | **Skill:** /{skill}

Then run: mkdir -p ~/.gstack/contributor-logs && open ~/.gstack/contributor-logs/{slug}.md

Slug: lowercase, hyphens, max 60 chars (e.g. browse-snapshot-ref-gap). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"

Completion Status Protocol

When completing a skill workflow, report status using one of:

DONE — All steps completed successfully. Evidence provided for each claim.
DONE_WITH_CONCERNS — Completed, but with issues the user should know about. List each concern.
BLOCKED — Cannot proceed. State what is blocking and what was tried.
NEEDS_CONTEXT — Missing information required to continue. State exactly what you need.

Escalation

It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."

Bad work is worse than no work. You will not be penalized for escalating.

If you have attempted a task 3 times without success, STOP and escalate.
If you are uncertain about a security-sensitive change, STOP and escalate.
If the scope of work exceeds what you can verify, STOP and escalate.

Escalation format:

STATUS: BLOCKED | NEEDS_CONTEXT
REASON: [1-2 sentences]
ATTEMPTED: [what you tried]
RECOMMENDATION: [what the user should do next]

Pre-Landing PR Review

You are running the /review workflow. Analyze the current branch's diff against main for structural issues that tests don't catch.

Step 1: Check branch

Run git branch --show-current to get the current branch.
If on main, output: "Nothing to review — you're on main or have no changes against main." and stop.
Run git fetch origin main --quiet && git diff origin/main --stat to check if there's a diff. If no diff, output the same message and stop.

Step 1.5: Scope Drift Detection

Before reviewing code quality, check: did they build what was requested — nothing more, nothing less?

Read TODOS.md (if it exists). Read PR description (gh pr view --json body --jq .body 2>/dev/null || true). Read commit messages (git log origin/main..HEAD --oneline). If no PR exists: rely on commit messages and TODOS.md for stated intent — this is the common case since /review runs before /ship creates the PR.
Identify the stated intent — what was this branch supposed to accomplish?
Run git diff origin/main --stat and compare the files changed against the stated intent.
Evaluate with skepticism:

SCOPE CREEP detection:
- Files changed that are unrelated to the stated intent
- New features or refactors not mentioned in the plan
- "While I was in there..." changes that expand blast radius
MISSING REQUIREMENTS detection:
- Requirements from TODOS.md/PR description not addressed in the diff
- Test coverage gaps for stated requirements
- Partial implementations (started but not finished)

Output (before the main review begins):

Scope Check: [CLEAN / DRIFT DETECTED / REQUIREMENTS MISSING]
Intent: <1-line summary of what was requested>
Delivered: <1-line summary of what the diff actually does>
[If drift: list each out-of-scope change]
[If missing: list each unaddressed requirement]

This is INFORMATIONAL — does not block the review. Proceed to Step 2.

Step 2: Read the checklist

Read .claude/skills/review/checklist.md.

If the file cannot be read, STOP and report the error. Do not proceed without the checklist.

Step 2.5: Check for Greptile review comments

Read .claude/skills/review/greptile-triage.md and follow the fetch, filter, classify, and escalation detection steps.

If no PR exists, gh fails, API returns an error, or there are zero Greptile comments: Skip this step silently. Greptile integration is additive — the review works without it.

If Greptile comments are found: Store the classifications (VALID & ACTIONABLE, VALID BUT ALREADY FIXED, FALSE POSITIVE, SUPPRESSED) — you will need them in Step 5.

Step 3: Get the diff

Fetch the latest main to avoid false positives from a stale local main:

git fetch origin main --quiet

Run git diff origin/main to get the full diff. This includes both committed and uncommitted changes against the latest main.

Step 4: Two-pass review

Apply the checklist against the diff in two passes:

Pass 1 (CRITICAL): SQL & Data Safety, Race Conditions & Concurrency, LLM Output Trust Boundary, Enum & Value Completeness
Pass 2 (INFORMATIONAL): Conditional Side Effects, Magic Numbers & String Coupling, Dead Code & Consistency, LLM Prompt Issues, Test Gaps, View/Frontend

Enum & Value Completeness requires reading code OUTSIDE the diff. When the diff introduces a new enum value, status, tier, or type constant, use Grep to find all files that reference sibling values, then Read those files to check if the new value is handled. This is the one category where within-diff review is insufficient.

Follow the output format specified in the checklist. Respect the suppressions — do NOT flag items listed in the "DO NOT flag" section.

Step 5: Output findings

Always output ALL findings — both critical and informational. The user must see every issue.

If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, then RECOMMENDATION: Choose A because [one-line reason], then options (A: Fix it now, B: Acknowledge, C: False positive — skip). After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed.
If only non-critical issues found: output findings. No further action needed.
If no issues found: output Pre-Landing Review: No issues found.

Verification of claims

Before producing the final review output:

If you claim "this pattern is safe" → cite the specific line proving safety
If you claim "this is handled elsewhere" → read and cite the handling code
If you claim "tests cover this" → name the test file and method
Never say "likely handled" or "probably tested" — verify or flag as unknown

Rationalization prevention: "This looks fine" is not a finding. Either cite evidence it IS fine, or flag it as unverified.

Greptile comment resolution

After outputting your own findings, if Greptile comments were classified in Step 2.5:

Include a Greptile summary in your output header: + N Greptile comments (X valid, Y fixed, Z FP)

Before replying to any comment, run the Escalation Detection algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.

VALID & ACTIONABLE comments: These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the Fix reply template from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the False Positive reply template (include evidence + suggested re-rank), save to both per-project and global greptile-history.
FALSE POSITIVE comments: Present each one via AskUserQuestion:
- Show the Greptile comment: file:line (or [top-level]) + body summary + permalink URL
- Explain concisely why it's a false positive
- Options:
  - A) Reply to Greptile explaining why this is incorrect (recommended if clearly wrong)
  - B) Fix it anyway (if low-effort and harmless)
  - C) Ignore — don't reply, don't fix
If the user chooses A, reply using the False Positive reply template from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history.
VALID BUT ALREADY FIXED comments: Reply using the Already Fixed reply template from greptile-triage.md — no AskUserQuestion needed:
- Include what was done and the fixing commit SHA
- Save to both per-project and global greptile-history
SUPPRESSED comments: Skip silently — these are known false positives from previous triage.

Step 5.5: TODOS cross-reference

Read TODOS.md in the repository root (if it exists). Cross-reference the PR against open TODOs:

Does this PR close any open TODOs? If yes, note which items in your output: "This PR addresses TODO:

12 KiB Raw Blame History