mirror of https://github.com/garrytan/gstack.git synced 2026-05-01 19:25:10 +02:00

Files

T

Garry Tan 9ca8f1d7a9 feat: adaptive gating + cross-review dedup for review army (v0.15.2.0) (#760 )

* feat: add test_stub optional field to specialist finding schema

All specialist prompts now document test_stub as an optional output field,
enabling specialists to suggest test code alongside findings.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: adaptive gating + test framework detection for review army

Adds gstack-specialist-stats binary for tracking specialist hit rates.
Resolver now detects test framework for test_stub generation, applies
adaptive gating to skip silent specialists, and compiles per-specialist
stats for the review-log entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: cross-review finding dedup + test stub override + enriched review-log

Step 5.0 suppresses findings previously skipped by the user when the
relevant code hasn't changed. Test stub findings force ASK classification
so users approve test creation. Review-log now includes quality_score,
per-specialist stats, and per-finding action records.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.15.2.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: bash operator precedence in test framework detection

[ -f a ] || [ -f b ] && X="y" evaluates as A || (B && C), so the
assignment only runs when the second test passes. Wrap the OR group
in braces: { [ -f a ] || [ -f b ]; } && X="y".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-04 22:46:21 -07:00

2.2 KiB

Raw Permalink Blame History

Red Team Review

Scope: When diff > 200 lines OR security specialist found CRITICAL findings. Runs AFTER other specialists. Output: JSON objects, one finding per line. Schema: {"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"red-team","summary":"...","fix":"...","fingerprint":"path:line:red-team","specialist":"red-team"} Optional: line, fix, fingerprint, evidence, test_stub. If no findings: output NO FINDINGS and nothing else.

This is NOT a checklist review. This is adversarial analysis.

You have access to the other specialists' findings (provided in your prompt). Your job is to find what they MISSED. Think like an attacker, a chaos engineer, and a hostile QA tester simultaneously.

Approach

1. Attack the Happy Path

What happens when the system is under 10x normal load?
What happens when two requests hit the same resource simultaneously?
What happens when the database is slow (>5s query time)?
What happens when an external service returns garbage?

2. Find the Silent Failures

Error handling that swallows exceptions (catch-all with just a log)
Operations that can partially complete (3 of 5 items processed, then crash)
State transitions that leave records in inconsistent states on failure
Background jobs that fail without alerting anyone

3. Exploit Trust Assumptions

Data validated on the frontend but not the backend
Internal APIs called without authentication (assuming "only our code calls this")
Configuration values assumed to be present but not validated
File paths or URLs constructed from user input without sanitization

4. Break the Edge Cases

What happens with the maximum possible input size?
What happens with zero items, empty strings, null values?
What happens on the first run ever (no existing data)?
What happens when the user clicks the button twice in 100ms?

5. Find What the Other Specialists Missed

Review each specialist's findings. What's the gap between their categories?
Look for cross-category issues (e.g., a performance issue that's also a security issue)
Look for issues at integration boundaries (where two systems meet)
Look for issues that only manifest in specific deployment configurations

2.2 KiB Raw Permalink Blame History