From 19fbf5b0b4739603c3e5e234acd08578aebe8452 Mon Sep 17 00:00:00 2001 From: Garry Tan Date: Wed, 1 Apr 2026 11:08:17 -0700 Subject: [PATCH] feat: cross-review finding dedup + test stub override + enriched review-log Step 5.0 suppresses findings previously skipped by the user when the relevant code hasn't changed. Test stub findings force ASK classification so users approve test creation. Review-log now includes quality_score, per-specialist stats, and per-finding action records. Co-Authored-By: Claude Opus 4.6 (1M context) --- review/SKILL.md | 90 ++++++++++++++++++++++++++++++++++++++++++-- review/SKILL.md.tmpl | 45 +++++++++++++++++++++- 2 files changed, 130 insertions(+), 5 deletions(-) diff --git a/review/SKILL.md b/review/SKILL.md index eeb3c2ec..9e5c1806 100644 --- a/review/SKILL.md +++ b/review/SKILL.md @@ -868,6 +868,20 @@ STACK="" echo "STACK: ${STACK:-unknown}" DIFF_LINES=$(git diff origin/ --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0") echo "DIFF_LINES: $DIFF_LINES" +# Detect test framework for specialist test stub generation +TEST_FW="" +[ -f jest.config.ts ] || [ -f jest.config.js ] && TEST_FW="jest" +[ -f vitest.config.ts ] && TEST_FW="vitest" +[ -f spec/spec_helper.rb ] || [ -f .rspec ] && TEST_FW="rspec" +[ -f pytest.ini ] || [ -f conftest.py ] && TEST_FW="pytest" +[ -f go.mod ] && TEST_FW="go-test" +echo "TEST_FW: ${TEST_FW:-unknown}" +``` + +### Read specialist hit rates (adaptive gating) + +```bash +~/.claude/skills/gstack/bin/gstack-specialist-stats 2>/dev/null || true ``` ### Select specialists @@ -887,8 +901,18 @@ Based on the scope signals above, select which specialists to dispatch. 6. **API Contract** — if SCOPE_API=true. Read `~/.claude/skills/gstack/review/specialists/api-contract.md` 7. **Design** — if SCOPE_FRONTEND=true. Use the existing design review checklist at `~/.claude/skills/gstack/review/design-checklist.md` -Note which specialists were selected and which were skipped. Print the selection: -"Dispatching N specialists: [names]. Skipped: [names] (scope not detected)." +### Adaptive gating + +After scope-based selection, apply adaptive gating based on specialist hit rates: + +For each conditional specialist that passed scope gating, check the `gstack-specialist-stats` output above: +- If tagged `[GATE_CANDIDATE]` (0 findings in 10+ dispatches): skip it. Print: "[specialist] auto-gated (0 findings in N reviews)." +- If tagged `[NEVER_GATE]`: always dispatch regardless of hit rate. Security and data-migration are insurance policy specialists — they should run even when silent. + +**Force flags:** If the user's prompt includes `--security`, `--performance`, `--testing`, `--maintainability`, `--data-migration`, `--api-contract`, `--design`, or `--all-specialists`, force-include that specialist regardless of gating. + +Note which specialists were selected, gated, and skipped. Print the selection: +"Dispatching N specialists: [names]. Skipped: [names] (scope not detected). Gated: [names] (0 findings in N+ reviews)." --- @@ -921,7 +945,11 @@ For each finding, output a JSON object on its own line: {\"severity\":\"CRITICAL|INFORMATIONAL\",\"confidence\":N,\"path\":\"file\",\"line\":N,\"category\":\"category\",\"summary\":\"description\",\"fix\":\"recommended fix\",\"fingerprint\":\"path:line:category\",\"specialist\":\"name\"} Required fields: severity, confidence, path, category, summary, specialist. -Optional: line, fix, fingerprint, evidence. +Optional: line, fix, fingerprint, evidence, test_stub. + +If you can write a test that would catch this issue, include it in the `test_stub` field. +Use the detected test framework ({TEST_FW}). Write a minimal skeleton — describe/it/test +blocks with clear intent. Skip test_stub for architectural or design-only findings. If no findings: output `NO FINDINGS` and nothing else. Do not output anything else — no preamble, no summary, no commentary. @@ -988,6 +1016,17 @@ PR Quality Score: X/10 These findings flow into Step 5 Fix-First alongside the CRITICAL pass findings from Step 4. The Fix-First heuristic applies identically — specialist findings follow the same AUTO-FIX vs ASK classification. +**Compile per-specialist stats:** +After merging findings, compile a `specialists` object for the review-log entry in Step 5.8. +For each specialist (testing, maintainability, security, performance, data-migration, api-contract, design, red-team): +- If dispatched: `{"dispatched": true, "findings": N, "critical": N, "informational": N}` +- If skipped by scope: `{"dispatched": false, "reason": "scope"}` +- If skipped by gating: `{"dispatched": false, "reason": "gated"}` +- If not applicable (e.g., red-team not activated): omit from the object + +Include the Design specialist even though it uses `design-checklist.md` instead of the specialist schema files. +Remember these stats — you will need them for the review-log entry in Step 5.8. + --- ### Red Team dispatch (conditional) @@ -1020,6 +1059,38 @@ If the Red Team subagent fails or times out, skip silently and continue. **Every finding gets action — not just critical ones.** +### Step 5.0: Cross-review finding dedup + +Before classifying findings, check if any were previously skipped by the user in a prior review on this branch. + +```bash +~/.claude/skills/gstack/bin/gstack-review-read +``` + +Parse the output: only lines BEFORE `---CONFIG---` are JSONL entries (the output also contains `---CONFIG---` and `---HEAD---` footer sections that are not JSONL — ignore those). + +For each JSONL entry that has a `findings` array: +1. Collect all fingerprints where `action: "skipped"` +2. Note the `commit` field from that entry + +If skipped fingerprints exist, get the list of files changed since that review: + +```bash +git diff --name-only HEAD +``` + +For each current finding (from both Step 4 critical pass and Step 4.5-4.6 specialists), check: +- Does its fingerprint match a previously skipped finding? +- Is the finding's file path NOT in the changed-files set? + +If both conditions are true: suppress the finding. It was intentionally skipped and the relevant code hasn't changed. + +Print: "Suppressed N findings from prior reviews (previously skipped by user)" + +**Only suppress `skipped` findings — never `fixed` or `auto-fixed`** (those might regress and should be re-checked). + +If no prior reviews exist or none have a `findings` array, skip this step silently. + Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)` ### Step 5a: Classify each finding @@ -1028,6 +1099,14 @@ For each finding, classify as AUTO-FIX or ASK per the Fix-First Heuristic in checklist.md. Critical findings lean toward ASK; informational findings lean toward AUTO-FIX. +**Test stub override:** Any finding that has a `test_stub` field (generated by a specialist) +is reclassified as ASK regardless of its original classification. When presenting the ASK +item, show the proposed test file path and the test code. The user approves or skips the +test creation. If approved, write the fix + test file. Derive the test file path from +the finding's `path` using project conventions (`spec/` for RSpec, `__tests__/` for +Jest/Vitest, `test_` prefix for pytest, `_test.go` suffix for Go). If the test file +already exists, append the new test. Output: `[FIXED + TEST] [file:line] Problem -> fix + test at [test_path]` + ### Step 5b: Auto-fix all AUTO-FIX items Apply each fix directly. For each one, output a one-line summary: @@ -1261,7 +1340,7 @@ recognize that Eng Review was run on this branch. Run: ```bash -~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"commit":"COMMIT"}' +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"quality_score":SCORE,"specialists":SPECIALISTS_JSON,"findings":FINDINGS_JSON,"commit":"COMMIT"}' ``` Substitute: @@ -1270,6 +1349,9 @@ Substitute: - `issues_found` = total remaining unresolved findings - `critical` = remaining unresolved critical findings - `informational` = remaining unresolved informational findings +- `quality_score` = the PR Quality Score computed in Step 4.6 (e.g., 7.5). If specialists were skipped (small diff), use `10.0` +- `specialists` = the per-specialist stats object compiled in Step 4.6. Each specialist that was considered gets an entry: `{"dispatched":true/false,"findings":N,"critical":N,"informational":N}` if dispatched, or `{"dispatched":false,"reason":"scope|gated"}` if skipped. Include Design specialist. Example: `{"testing":{"dispatched":true,"findings":2,"critical":0,"informational":2},"security":{"dispatched":false,"reason":"scope"}}` +- `findings` = array of per-finding records from Step 5. For each finding (from critical pass and specialists), include: `{"fingerprint":"path:line:category","severity":"CRITICAL|INFORMATIONAL","action":"ACTION"}`. ACTION is `"auto-fixed"` (Step 5b), `"fixed"` (user approved in Step 5d), or `"skipped"` (user chose Skip in Step 5c). Suppressed findings from Step 5.0 are NOT included (they were already recorded in a prior review entry). - `COMMIT` = output of `git rev-parse --short HEAD` ## Capture Learnings diff --git a/review/SKILL.md.tmpl b/review/SKILL.md.tmpl index fec5b568..e09d7154 100644 --- a/review/SKILL.md.tmpl +++ b/review/SKILL.md.tmpl @@ -103,6 +103,38 @@ Follow the output format specified in the checklist. Respect the suppressions **Every finding gets action — not just critical ones.** +### Step 5.0: Cross-review finding dedup + +Before classifying findings, check if any were previously skipped by the user in a prior review on this branch. + +```bash +~/.claude/skills/gstack/bin/gstack-review-read +``` + +Parse the output: only lines BEFORE `---CONFIG---` are JSONL entries (the output also contains `---CONFIG---` and `---HEAD---` footer sections that are not JSONL — ignore those). + +For each JSONL entry that has a `findings` array: +1. Collect all fingerprints where `action: "skipped"` +2. Note the `commit` field from that entry + +If skipped fingerprints exist, get the list of files changed since that review: + +```bash +git diff --name-only HEAD +``` + +For each current finding (from both Step 4 critical pass and Step 4.5-4.6 specialists), check: +- Does its fingerprint match a previously skipped finding? +- Is the finding's file path NOT in the changed-files set? + +If both conditions are true: suppress the finding. It was intentionally skipped and the relevant code hasn't changed. + +Print: "Suppressed N findings from prior reviews (previously skipped by user)" + +**Only suppress `skipped` findings — never `fixed` or `auto-fixed`** (those might regress and should be re-checked). + +If no prior reviews exist or none have a `findings` array, skip this step silently. + Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)` ### Step 5a: Classify each finding @@ -111,6 +143,14 @@ For each finding, classify as AUTO-FIX or ASK per the Fix-First Heuristic in checklist.md. Critical findings lean toward ASK; informational findings lean toward AUTO-FIX. +**Test stub override:** Any finding that has a `test_stub` field (generated by a specialist) +is reclassified as ASK regardless of its original classification. When presenting the ASK +item, show the proposed test file path and the test code. The user approves or skips the +test creation. If approved, write the fix + test file. Derive the test file path from +the finding's `path` using project conventions (`spec/` for RSpec, `__tests__/` for +Jest/Vitest, `test_` prefix for pytest, `_test.go` suffix for Go). If the test file +already exists, append the new test. Output: `[FIXED + TEST] [file:line] Problem -> fix + test at [test_path]` + ### Step 5b: Auto-fix all AUTO-FIX items Apply each fix directly. For each one, output a one-line summary: @@ -221,7 +261,7 @@ recognize that Eng Review was run on this branch. Run: ```bash -~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"commit":"COMMIT"}' +~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"review","timestamp":"TIMESTAMP","status":"STATUS","issues_found":N,"critical":N,"informational":N,"quality_score":SCORE,"specialists":SPECIALISTS_JSON,"findings":FINDINGS_JSON,"commit":"COMMIT"}' ``` Substitute: @@ -230,6 +270,9 @@ Substitute: - `issues_found` = total remaining unresolved findings - `critical` = remaining unresolved critical findings - `informational` = remaining unresolved informational findings +- `quality_score` = the PR Quality Score computed in Step 4.6 (e.g., 7.5). If specialists were skipped (small diff), use `10.0` +- `specialists` = the per-specialist stats object compiled in Step 4.6. Each specialist that was considered gets an entry: `{"dispatched":true/false,"findings":N,"critical":N,"informational":N}` if dispatched, or `{"dispatched":false,"reason":"scope|gated"}` if skipped. Include Design specialist. Example: `{"testing":{"dispatched":true,"findings":2,"critical":0,"informational":2},"security":{"dispatched":false,"reason":"scope"}}` +- `findings` = array of per-finding records from Step 5. For each finding (from critical pass and specialists), include: `{"fingerprint":"path:line:category","severity":"CRITICAL|INFORMATIONAL","action":"ACTION"}`. ACTION is `"auto-fixed"` (Step 5b), `"fixed"` (user approved in Step 5d), or `"skipped"` (user chose Skip in Step 5c). Suppressed findings from Step 5.0 are NOT included (they were already recorded in a prior review entry). - `COMMIT` = output of `git rev-parse --short HEAD` {{LEARNINGS_LOG}}