feat: test coverage catalog + failure triage (merged branches) (#285)

* feat: add bin/gstack-repo-mode — solo vs collaborative detection with caching

Detects whether a repo is solo-dev (one person does 80%+ of recent commits)
or collaborative. Uses 90-day git shortlog window with 7-day cache in
~/.gstack/projects/{SLUG}/repo-mode.json. Config override via
`gstack-config set repo_mode solo|collaborative` takes precedence over
the heuristic. Minimum 5 commits required to classify (otherwise unknown).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: test failure ownership triage — see something say something

Adds two new preamble sections to all gstack skills:
- Repo Ownership Mode: explains solo vs collaborative behavior
- See Something, Say Something: proactive issue flagging principle

Adds {{TEST_FAILURE_TRIAGE}} template variable (opt-in, used by /ship):
- Classifies test failures as in-branch vs pre-existing
- Solo mode defaults to "investigate and fix now"
- Collaborative mode offers "blame + assign GitHub issue" option
- Also offers P0 TODO and skip options

/ship Step 3 now triages test failures instead of hard-stopping on all
failures. In-branch failures still block shipping. Pre-existing failures
get user-directed triage based on repo mode.

Adds P2 TODO for gstack notes system (deferred lightweight reminder).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate SKILL.md files for Claude and Codex hosts

All 22 Claude skills and 21 Codex skills regenerated with new preamble
sections (Repo Ownership Mode, See Something Say Something) and
{{TEST_FAILURE_TRIAGE}} resolved in ship/SKILL.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: validate repo mode values to prevent shell injection

Codex adversarial review found that unvalidated config/cache values
could be injected into shell via source <(gstack-repo-mode). Added
validate_mode() that only allows solo|collaborative|unknown — anything
else becomes "unknown". Prevents persistent code execution through
malicious config.yaml or tampered cache JSON.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: shell injection via branch names + feature-branch sampling bias

Codex code review found two issues:

P1: eval $(gstack-slug) in gstack-repo-mode executes branch names as
shell. Branch names like foo$(touch${IFS}pwned) are valid git refs and
would execute arbitrary commands. Fix: compute SLUG directly with sed
instead of eval'ing gstack-slug output.

P2: git shortlog HEAD only sees current branch history. On feature
branches that haven't merged main recently, other contributors disappear
from the sample. Fix: use git shortlog on the default branch
(origin/main) instead of HEAD.

Also improved blame lookup in collaborative triage to check both the
test file and the production code it covers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: broaden codex-host stripping test to accommodate triage section

"Investigate and fix" now appears in TEST_FAILURE_TRIAGE (not just the
Codex review step). Use CODEX_REVIEWS config string as a more specific
marker for detecting the Codex review step in Codex-hosted skills.

* fix: replace template placeholder in TODOS.md with readable text

{{TEST_FAILURE_TRIAGE}} is template syntax but TODOS.md is not processed
by gen-skill-docs — replaced with human-readable reference.

* chore: bump version and changelog (v0.9.5.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add bin/ directory to project structure in CLAUDE.md

* test: add triage resolver unit tests, plan-eng coverage audit E2E, and triage E2E

- TEST_FAILURE_TRIAGE resolver: 6 unit tests verifying all triage steps (T1-T4),
  REPO_MODE branching, and safety default for ambiguous failures
- plan-eng-coverage-audit E2E: tests /plan-eng-review coverage audit codepath
  (gap identified during eng review — existed on neither branch)
- ship-triage E2E: planted-bug fixture with in-branch (truncate null) and
  pre-existing (divide-by-zero) failures; verifies correct classification
- Touchfile entries for diff-based test selection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate stale Codex SKILL.md for retro

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Garry Tan
2026-03-21 09:36:17 -07:00
committed by GitHub
parent 624c7793de
commit f954f3449a
43 changed files with 1232 additions and 18 deletions
+116
View File
@@ -152,6 +152,8 @@ _PROACTIVE=$(${ctx.paths.binDir}/gstack-config get proactive 2>/dev/null || echo
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
echo "BRANCH: $_BRANCH"
echo "PROACTIVE: $_PROACTIVE"
source <(${ctx.paths.binDir}/gstack-repo-mode 2>/dev/null) || REPO_MODE=unknown
echo "REPO_MODE: $REPO_MODE"
_LAKE_SEEN=$([ -f ~/.gstack/.completeness-intro-seen ] && echo "yes" || echo "no")
echo "LAKE_INTRO: $_LAKE_SEEN"
_TEL=$(~/.claude/skills/gstack/bin/gstack-config get telemetry 2>/dev/null || true)
@@ -263,6 +265,118 @@ AI-assisted coding makes the marginal cost of completeness near-zero. When you p
- BAD: Quoting only human-team effort: "This would take 2 weeks." (Say: "2 weeks human / ~1 hour CC.")`;
}
function generateRepoModeSection(): string {
return `## Repo Ownership Mode — See Something, Say Something
\`REPO_MODE\` from the preamble tells you who owns issues in this repo:
- **\`solo\`** — One person does 80%+ of the work. They own everything. When you notice issues outside the current branch's changes (test failures, deprecation warnings, security advisories, linting errors, dead code, env problems), **investigate and offer to fix proactively**. The solo dev is the only person who will fix it. Default to action.
- **\`collaborative\`** — Multiple active contributors. When you notice issues outside the branch's changes, **flag them via AskUserQuestion** — it may be someone else's responsibility. Default to asking, not fixing.
- **\`unknown\`** — Treat as collaborative (safer default — ask before fixing).
**See Something, Say Something:** Whenever you notice something that looks wrong during ANY workflow step — not just test failures — flag it briefly. One sentence: what you noticed and its impact. In solo mode, follow up with "Want me to fix it?" In collaborative mode, just flag it and move on.
Never let a noticed issue silently pass. The whole point is proactive communication.`;
}
function generateTestFailureTriage(): string {
return `## Test Failure Ownership Triage
When tests fail, do NOT immediately stop. First, determine ownership:
### Step T1: Classify each failure
For each failing test:
1. **Get the files changed on this branch:**
\`\`\`bash
git diff origin/<base>...HEAD --name-only
\`\`\`
2. **Classify the failure:**
- **In-branch** if: the failing test file itself was modified on this branch, OR the test output references code that was changed on this branch, OR you can trace the failure to a change in the branch diff.
- **Likely pre-existing** if: neither the test file nor the code it tests was modified on this branch, AND the failure is unrelated to any branch change you can identify.
- **When ambiguous, default to in-branch.** It is safer to stop the developer than to let a broken test ship. Only classify as pre-existing when you are confident.
This classification is heuristic — use your judgment reading the diff and the test output. You do not have a programmatic dependency graph.
### Step T2: Handle in-branch failures
**STOP.** These are your failures. Show them and do not proceed. The developer must fix their own broken tests before shipping.
### Step T3: Handle pre-existing failures
Check \`REPO_MODE\` from the preamble output.
**If REPO_MODE is \`solo\`:**
Use AskUserQuestion:
> These test failures appear pre-existing (not caused by your branch changes):
>
> [list each failure with file:line and brief error description]
>
> Since this is a solo repo, you're the only one who will fix these.
>
> RECOMMENDATION: Choose A — fix now while the context is fresh. Completeness: 9/10.
> A) Investigate and fix now (human: ~2-4h / CC: ~15min) — Completeness: 10/10
> B) Add as P0 TODO — fix after this branch lands — Completeness: 7/10
> C) Skip — I know about this, ship anyway — Completeness: 3/10
**If REPO_MODE is \`collaborative\` or \`unknown\`:**
Use AskUserQuestion:
> These test failures appear pre-existing (not caused by your branch changes):
>
> [list each failure with file:line and brief error description]
>
> This is a collaborative repo — these may be someone else's responsibility.
>
> RECOMMENDATION: Choose B — assign it to whoever broke it so the right person fixes it. Completeness: 9/10.
> A) Investigate and fix now anyway — Completeness: 10/10
> B) Blame + assign GitHub issue to the author — Completeness: 9/10
> C) Add as P0 TODO — Completeness: 7/10
> D) Skip — ship anyway — Completeness: 3/10
### Step T4: Execute the chosen action
**If "Investigate and fix now":**
- Switch to /investigate mindset: root cause first, then minimal fix.
- Fix the pre-existing failure.
- Commit the fix separately from the branch's changes: \`git commit -m "fix: pre-existing test failure in <test-file>"\`
- Continue with the workflow.
**If "Add as P0 TODO":**
- If \`TODOS.md\` exists, add the entry following the format in \`review/TODOS-format.md\` (or \`.claude/skills/review/TODOS-format.md\`).
- If \`TODOS.md\` does not exist, create it with the standard header and add the entry.
- Entry should include: title, the error output, which branch it was noticed on, and priority P0.
- Continue with the workflow — treat the pre-existing failure as non-blocking.
**If "Blame + assign GitHub issue" (collaborative only):**
- Find who likely broke it. Check BOTH the test file AND the production code it tests:
\`\`\`bash
# Who last touched the failing test?
git log --format="%an (%ae)" -1 -- <failing-test-file>
# Who last touched the production code the test covers? (often the actual breaker)
git log --format="%an (%ae)" -1 -- <source-file-under-test>
\`\`\`
If these are different people, prefer the production code author — they likely introduced the regression.
- Create a GitHub issue assigned to that person:
\`\`\`bash
gh issue create \\
--title "Pre-existing test failure: <test-name>" \\
--body "Found failing on branch <current-branch>. Failure is pre-existing.\\n\\n**Error:**\\n\`\`\`\\n<first 10 lines>\\n\`\`\`\\n\\n**Last modified by:** <author>\\n**Noticed by:** gstack /ship on <date>" \\
--assignee "<github-username>"
\`\`\`
- If \`gh\` is not available or \`--assignee\` fails (user not in org, etc.), create the issue without assignee and note who should look at it in the body.
- Continue with the workflow.
**If "Skip":**
- Continue with the workflow.
- Note in output: "Pre-existing test failure skipped: <test-name>"`;
}
function generateContributorMode(): string {
return `## Contributor Mode
@@ -365,6 +479,7 @@ function generatePreamble(ctx: TemplateContext): string {
generateTelemetryPrompt(ctx),
generateAskUserFormat(ctx),
generateCompletenessSection(),
generateRepoModeSection(),
generateContributorMode(),
generateCompletionStatus(),
].join('\n\n');
@@ -1926,6 +2041,7 @@ const RESOLVERS: Record<string, (ctx: TemplateContext) => string> = {
TEST_COVERAGE_AUDIT_PLAN: generateTestCoverageAuditPlan,
TEST_COVERAGE_AUDIT_SHIP: generateTestCoverageAuditShip,
TEST_COVERAGE_AUDIT_REVIEW: generateTestCoverageAuditReview,
TEST_FAILURE_TRIAGE: generateTestFailureTriage,
SPEC_REVIEW_LOOP: generateSpecReviewLoop,
DESIGN_SKETCH: generateDesignSketch,
BENEFITS_FROM: generateBenefitsFrom,