mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-06 05:35:46 +02:00
Merge remote-tracking branch 'origin/main' into garrytan/usage-telemetry
# Conflicts: # SKILL.md # TODOS.md # browse/SKILL.md # design-consultation/SKILL.md # design-review/SKILL.md # document-release/SKILL.md # plan-ceo-review/SKILL.md # plan-design-review/SKILL.md # plan-eng-review/SKILL.md # qa-only/SKILL.md # qa/SKILL.md # retro/SKILL.md # retro/SKILL.md.tmpl # review/SKILL.md # scripts/gen-skill-docs.ts # setup-browser-cookies/SKILL.md # ship/SKILL.md
This commit is contained in:
+128
-11
@@ -5,6 +5,7 @@ description: |
|
||||
Pre-landing PR review. Analyzes diff against the base branch for SQL safety, LLM trust
|
||||
boundary violations, conditional side effects, and other structural issues. Use when
|
||||
asked to "review this PR", "code review", "pre-landing review", or "check my diff".
|
||||
Proactively suggest when the user is about to merge or land code changes.
|
||||
allowed-tools:
|
||||
- Bash
|
||||
- Read
|
||||
@@ -40,7 +41,8 @@ _SESSION_ID="$$-$(date +%s)"
|
||||
echo "TELEMETRY: ${_TEL:-off}"
|
||||
echo "TEL_PROMPTED: $_TEL_PROMPTED"
|
||||
mkdir -p ~/.gstack/analytics
|
||||
for _PF in ~/.gstack/analytics/.pending-* 2>/dev/null; do [ -f "$_PF" ] && ~/.claude/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
|
||||
echo '{"skill":"review","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","repo":"'$(basename "$(git rev-parse --show-toplevel 2>/dev/null)" 2>/dev/null || echo "unknown")'"}' >> ~/.gstack/analytics/skill-usage.jsonl 2>/dev/null || true
|
||||
for _PF in ~/.gstack/analytics/.pending-*; do [ -f "$_PF" ] && ~/.claude/skills/gstack/bin/gstack-telemetry-log --event-type skill_run --skill _pending_finalize --outcome unknown --session-id "$_SESSION_ID" 2>/dev/null || true; break; done
|
||||
```
|
||||
|
||||
If `PROACTIVE` is `"false"`, do not proactively suggest gstack skills — only invoke
|
||||
@@ -155,13 +157,37 @@ Hey gstack team — ran into this while using /{skill-name}:
|
||||
|
||||
Slug: lowercase, hyphens, max 60 chars (e.g. `browse-js-no-await`). Skip if file already exists. Max 3 reports per session. File inline and continue — don't stop the workflow. Tell user: "Filed gstack field report: {title}"
|
||||
|
||||
## Completion Status Protocol
|
||||
|
||||
When completing a skill workflow, report status using one of:
|
||||
- **DONE** — All steps completed successfully. Evidence provided for each claim.
|
||||
- **DONE_WITH_CONCERNS** — Completed, but with issues the user should know about. List each concern.
|
||||
- **BLOCKED** — Cannot proceed. State what is blocking and what was tried.
|
||||
- **NEEDS_CONTEXT** — Missing information required to continue. State exactly what you need.
|
||||
|
||||
### Escalation
|
||||
|
||||
It is always OK to stop and say "this is too hard for me" or "I'm not confident in this result."
|
||||
|
||||
Bad work is worse than no work. You will not be penalized for escalating.
|
||||
- If you have attempted a task 3 times without success, STOP and escalate.
|
||||
- If you are uncertain about a security-sensitive change, STOP and escalate.
|
||||
- If the scope of work exceeds what you can verify, STOP and escalate.
|
||||
|
||||
Escalation format:
|
||||
```
|
||||
STATUS: BLOCKED | NEEDS_CONTEXT
|
||||
REASON: [1-2 sentences]
|
||||
ATTEMPTED: [what you tried]
|
||||
RECOMMENDATION: [what the user should do next]
|
||||
```
|
||||
|
||||
## Telemetry (run last)
|
||||
|
||||
After the skill workflow completes (success, error, or abort), write the .pending marker
|
||||
with the actual skill name, then log the telemetry event. Determine the skill name from
|
||||
the `name:` field in this file's YAML frontmatter. Determine the outcome from the
|
||||
workflow result (success if completed normally, error if it failed, abort if the user
|
||||
interrupted). Run this bash:
|
||||
After the skill workflow completes (success, error, or abort), log the telemetry event.
|
||||
Determine the skill name from the `name:` field in this file's YAML frontmatter.
|
||||
Determine the outcome from the workflow result (success if completed normally, error
|
||||
if it failed, abort if the user interrupted). Run this bash:
|
||||
|
||||
```bash
|
||||
_TEL_END=$(date +%s)
|
||||
@@ -210,6 +236,40 @@ You are running the `/review` workflow. Analyze the current branch's diff agains
|
||||
|
||||
---
|
||||
|
||||
## Step 1.5: Scope Drift Detection
|
||||
|
||||
Before reviewing code quality, check: **did they build what was requested — nothing more, nothing less?**
|
||||
|
||||
1. Read `TODOS.md` (if it exists). Read PR description (`gh pr view --json body --jq .body 2>/dev/null || true`).
|
||||
Read commit messages (`git log origin/<base>..HEAD --oneline`).
|
||||
**If no PR exists:** rely on commit messages and TODOS.md for stated intent — this is the common case since /review runs before /ship creates the PR.
|
||||
2. Identify the **stated intent** — what was this branch supposed to accomplish?
|
||||
3. Run `git diff origin/<base> --stat` and compare the files changed against the stated intent.
|
||||
4. Evaluate with skepticism:
|
||||
|
||||
**SCOPE CREEP detection:**
|
||||
- Files changed that are unrelated to the stated intent
|
||||
- New features or refactors not mentioned in the plan
|
||||
- "While I was in there..." changes that expand blast radius
|
||||
|
||||
**MISSING REQUIREMENTS detection:**
|
||||
- Requirements from TODOS.md/PR description not addressed in the diff
|
||||
- Test coverage gaps for stated requirements
|
||||
- Partial implementations (started but not finished)
|
||||
|
||||
5. Output (before the main review begins):
|
||||
```
|
||||
Scope Check: [CLEAN / DRIFT DETECTED / REQUIREMENTS MISSING]
|
||||
Intent: <1-line summary of what was requested>
|
||||
Delivered: <1-line summary of what the diff actually does>
|
||||
[If drift: list each out-of-scope change]
|
||||
[If missing: list each unaddressed requirement]
|
||||
```
|
||||
|
||||
6. This is **INFORMATIONAL** — does not block the review. Proceed to Step 2.
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Read the checklist
|
||||
|
||||
Read `.claude/skills/review/checklist.md`.
|
||||
@@ -260,7 +320,7 @@ Follow the output format specified in the checklist. Respect the suppressions
|
||||
Check if the diff touches frontend files using `gstack-diff-scope`:
|
||||
|
||||
```bash
|
||||
eval $(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)
|
||||
source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)
|
||||
```
|
||||
|
||||
**If `SCOPE_FRONTEND=false`:** Skip design review silently. No output.
|
||||
@@ -283,12 +343,10 @@ eval $(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)
|
||||
6. **Log the result** for the Review Readiness Dashboard:
|
||||
|
||||
```bash
|
||||
eval $(~/.claude/skills/gstack/bin/gstack-slug 2>/dev/null)
|
||||
mkdir -p ~/.gstack/projects/$SLUG
|
||||
echo '{"skill":"design-review-lite","timestamp":"TIMESTAMP","status":"STATUS","findings":N,"auto_fixed":M}' >> ~/.gstack/projects/$SLUG/$BRANCH-reviews.jsonl
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"design-review-lite","timestamp":"TIMESTAMP","status":"STATUS","findings":N,"auto_fixed":M,"commit":"COMMIT"}'
|
||||
```
|
||||
|
||||
Substitute: TIMESTAMP = ISO 8601 datetime, STATUS = "clean" if 0 findings or "issues_found", N = total findings, M = auto-fixed count.
|
||||
Substitute: TIMESTAMP = ISO 8601 datetime, STATUS = "clean" if 0 findings or "issues_found", N = total findings, M = auto-fixed count, COMMIT = output of `git rev-parse --short HEAD`.
|
||||
|
||||
Include any design findings alongside the findings from Step 4. They follow the same Fix-First flow in Step 5 — AUTO-FIX for mechanical CSS fixes, ASK for everything else.
|
||||
|
||||
@@ -342,6 +400,16 @@ Apply fixes for items where the user chose "Fix." Output what was fixed.
|
||||
|
||||
If no ASK items exist (everything was AUTO-FIX), skip the question entirely.
|
||||
|
||||
### Verification of claims
|
||||
|
||||
Before producing the final review output:
|
||||
- If you claim "this pattern is safe" → cite the specific line proving safety
|
||||
- If you claim "this is handled elsewhere" → read and cite the handling code
|
||||
- If you claim "tests cover this" → name the test file and method
|
||||
- Never say "likely handled" or "probably tested" — verify or flag as unknown
|
||||
|
||||
**Rationalization prevention:** "This looks fine" is not a finding. Either cite evidence it IS fine, or flag it as unverified.
|
||||
|
||||
### Greptile comment resolution
|
||||
|
||||
After outputting your own findings, if Greptile comments were classified in Step 2.5:
|
||||
@@ -396,6 +464,55 @@ If no documentation files exist, skip this step silently.
|
||||
|
||||
---
|
||||
|
||||
## Step 5.7: Codex second opinion (optional)
|
||||
|
||||
After completing the review, check if the Codex CLI is available:
|
||||
|
||||
```bash
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
```
|
||||
|
||||
If Codex is available, use AskUserQuestion:
|
||||
|
||||
```
|
||||
Review complete. Want an independent second opinion from Codex (OpenAI)?
|
||||
|
||||
A) Run Codex code review — independent diff review with pass/fail gate
|
||||
B) Run Codex adversarial challenge — try to find ways this code will fail in production
|
||||
C) Both — review first, then adversarial challenge
|
||||
D) Skip — no Codex review needed
|
||||
```
|
||||
|
||||
If the user chooses A, B, or C:
|
||||
|
||||
**For code review (A or C):** Run `codex review --base <base>` with a 5-minute timeout.
|
||||
Present the full output verbatim under a `CODEX SAYS (code review):` header.
|
||||
Check the output for `[P1]` markers — if found, note `GATE: FAIL`, otherwise `GATE: PASS`.
|
||||
After presenting, compare Codex's findings with your own review findings from Steps 4-5
|
||||
and output a CROSS-MODEL ANALYSIS showing what both found, what only Codex found,
|
||||
and what only Claude found.
|
||||
|
||||
**For adversarial challenge (B or C):** Run:
|
||||
```bash
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, failure modes. Be adversarial." -s read-only
|
||||
```
|
||||
Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header.
|
||||
|
||||
**Only if a code review ran (user chose A or C):** Persist the Codex review result to the review log:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}'
|
||||
```
|
||||
|
||||
Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail").
|
||||
|
||||
**Do NOT persist a codex-review entry when only the adversarial challenge (B) ran** —
|
||||
there is no gate verdict to record, and a false entry would make the Review Readiness
|
||||
Dashboard believe a code review happened when it didn't.
|
||||
|
||||
If Codex is not available, skip this step silently.
|
||||
|
||||
---
|
||||
|
||||
## Important Rules
|
||||
|
||||
- **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
|
||||
|
||||
@@ -5,6 +5,7 @@ description: |
|
||||
Pre-landing PR review. Analyzes diff against the base branch for SQL safety, LLM trust
|
||||
boundary violations, conditional side effects, and other structural issues. Use when
|
||||
asked to "review this PR", "code review", "pre-landing review", or "check my diff".
|
||||
Proactively suggest when the user is about to merge or land code changes.
|
||||
allowed-tools:
|
||||
- Bash
|
||||
- Read
|
||||
@@ -33,6 +34,40 @@ You are running the `/review` workflow. Analyze the current branch's diff agains
|
||||
|
||||
---
|
||||
|
||||
## Step 1.5: Scope Drift Detection
|
||||
|
||||
Before reviewing code quality, check: **did they build what was requested — nothing more, nothing less?**
|
||||
|
||||
1. Read `TODOS.md` (if it exists). Read PR description (`gh pr view --json body --jq .body 2>/dev/null || true`).
|
||||
Read commit messages (`git log origin/<base>..HEAD --oneline`).
|
||||
**If no PR exists:** rely on commit messages and TODOS.md for stated intent — this is the common case since /review runs before /ship creates the PR.
|
||||
2. Identify the **stated intent** — what was this branch supposed to accomplish?
|
||||
3. Run `git diff origin/<base> --stat` and compare the files changed against the stated intent.
|
||||
4. Evaluate with skepticism:
|
||||
|
||||
**SCOPE CREEP detection:**
|
||||
- Files changed that are unrelated to the stated intent
|
||||
- New features or refactors not mentioned in the plan
|
||||
- "While I was in there..." changes that expand blast radius
|
||||
|
||||
**MISSING REQUIREMENTS detection:**
|
||||
- Requirements from TODOS.md/PR description not addressed in the diff
|
||||
- Test coverage gaps for stated requirements
|
||||
- Partial implementations (started but not finished)
|
||||
|
||||
5. Output (before the main review begins):
|
||||
```
|
||||
Scope Check: [CLEAN / DRIFT DETECTED / REQUIREMENTS MISSING]
|
||||
Intent: <1-line summary of what was requested>
|
||||
Delivered: <1-line summary of what the diff actually does>
|
||||
[If drift: list each out-of-scope change]
|
||||
[If missing: list each unaddressed requirement]
|
||||
```
|
||||
|
||||
6. This is **INFORMATIONAL** — does not block the review. Proceed to Step 2.
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Read the checklist
|
||||
|
||||
Read `.claude/skills/review/checklist.md`.
|
||||
@@ -132,6 +167,16 @@ Apply fixes for items where the user chose "Fix." Output what was fixed.
|
||||
|
||||
If no ASK items exist (everything was AUTO-FIX), skip the question entirely.
|
||||
|
||||
### Verification of claims
|
||||
|
||||
Before producing the final review output:
|
||||
- If you claim "this pattern is safe" → cite the specific line proving safety
|
||||
- If you claim "this is handled elsewhere" → read and cite the handling code
|
||||
- If you claim "tests cover this" → name the test file and method
|
||||
- Never say "likely handled" or "probably tested" — verify or flag as unknown
|
||||
|
||||
**Rationalization prevention:** "This looks fine" is not a finding. Either cite evidence it IS fine, or flag it as unverified.
|
||||
|
||||
### Greptile comment resolution
|
||||
|
||||
After outputting your own findings, if Greptile comments were classified in Step 2.5:
|
||||
@@ -186,6 +231,55 @@ If no documentation files exist, skip this step silently.
|
||||
|
||||
---
|
||||
|
||||
## Step 5.7: Codex second opinion (optional)
|
||||
|
||||
After completing the review, check if the Codex CLI is available:
|
||||
|
||||
```bash
|
||||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||||
```
|
||||
|
||||
If Codex is available, use AskUserQuestion:
|
||||
|
||||
```
|
||||
Review complete. Want an independent second opinion from Codex (OpenAI)?
|
||||
|
||||
A) Run Codex code review — independent diff review with pass/fail gate
|
||||
B) Run Codex adversarial challenge — try to find ways this code will fail in production
|
||||
C) Both — review first, then adversarial challenge
|
||||
D) Skip — no Codex review needed
|
||||
```
|
||||
|
||||
If the user chooses A, B, or C:
|
||||
|
||||
**For code review (A or C):** Run `codex review --base <base>` with a 5-minute timeout.
|
||||
Present the full output verbatim under a `CODEX SAYS (code review):` header.
|
||||
Check the output for `[P1]` markers — if found, note `GATE: FAIL`, otherwise `GATE: PASS`.
|
||||
After presenting, compare Codex's findings with your own review findings from Steps 4-5
|
||||
and output a CROSS-MODEL ANALYSIS showing what both found, what only Codex found,
|
||||
and what only Claude found.
|
||||
|
||||
**For adversarial challenge (B or C):** Run:
|
||||
```bash
|
||||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, failure modes. Be adversarial." -s read-only
|
||||
```
|
||||
Present the full output verbatim under a `CODEX SAYS (adversarial challenge):` header.
|
||||
|
||||
**Only if a code review ran (user chose A or C):** Persist the Codex review result to the review log:
|
||||
```bash
|
||||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","gate":"GATE"}'
|
||||
```
|
||||
|
||||
Substitute: STATUS ("clean" if PASS, "issues_found" if FAIL), GATE ("pass" or "fail").
|
||||
|
||||
**Do NOT persist a codex-review entry when only the adversarial challenge (B) ran** —
|
||||
there is no gate verdict to record, and a false entry would make the Review Readiness
|
||||
Dashboard believe a code review happened when it didn't.
|
||||
|
||||
If Codex is not available, skip this step silently.
|
||||
|
||||
---
|
||||
|
||||
## Important Rules
|
||||
|
||||
- **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
|
||||
|
||||
+7
-7
@@ -35,16 +35,16 @@ Be terse. For each issue: one line describing the problem, one line with the fix
|
||||
### Pass 1 — CRITICAL
|
||||
|
||||
#### SQL & Data Safety
|
||||
- String interpolation in SQL (even if values are `.to_i`/`.to_f` — use `sanitize_sql_array` or Arel)
|
||||
- String interpolation in SQL (even if values are `.to_i`/`.to_f` — use parameterized queries (Rails: sanitize_sql_array/Arel; Node: prepared statements; Python: parameterized queries))
|
||||
- TOCTOU races: check-then-set patterns that should be atomic `WHERE` + `update_all`
|
||||
- `update_column`/`update_columns` bypassing validations on fields that have or should have constraints
|
||||
- N+1 queries: `.includes()` missing for associations used in loops/views (especially avatar, attachments)
|
||||
- Bypassing model validations for direct DB writes (Rails: update_column; Django: QuerySet.update(); Prisma: raw queries)
|
||||
- N+1 queries: Missing eager loading (Rails: .includes(); SQLAlchemy: joinedload(); Prisma: include) for associations used in loops/views
|
||||
|
||||
#### Race Conditions & Concurrency
|
||||
- Read-check-write without uniqueness constraint or `rescue RecordNotUnique; retry` (e.g., `where(hash:).first` then `save!` without handling concurrent insert)
|
||||
- `find_or_create_by` on columns without unique DB index — concurrent calls can create duplicates
|
||||
- Read-check-write without uniqueness constraint or catch duplicate key error and retry (e.g., `where(hash:).first` then `save!` without handling concurrent insert)
|
||||
- find-or-create without unique DB index — concurrent calls can create duplicates
|
||||
- Status transitions that don't use atomic `WHERE old_status = ? UPDATE SET new_status` — concurrent updates can skip or double-apply transitions
|
||||
- `html_safe` on user-controlled data (XSS) — check any `.html_safe`, `raw()`, or string interpolation into `html_safe` output
|
||||
- Unsafe HTML rendering (Rails: .html_safe/raw(); React: dangerouslySetInnerHTML; Vue: v-html; Django: |safe/mark_safe) on user-controlled data (XSS)
|
||||
|
||||
#### LLM Output Trust Boundary
|
||||
- LLM-generated values (emails, URLs, names) written to DB or passed to mailers without format validation. Add lightweight guards (`EMAIL_REGEXP`, `URI.parse`, `.strip`) before persisting.
|
||||
@@ -141,7 +141,7 @@ the agent auto-fixes a finding or asks the user.
|
||||
```
|
||||
AUTO-FIX (agent fixes without asking): ASK (needs human judgment):
|
||||
├─ Dead code / unused variables ├─ Security (auth, XSS, injection)
|
||||
├─ N+1 queries (missing .includes()) ├─ Race conditions
|
||||
├─ N+1 queries (missing eager loading) ├─ Race conditions
|
||||
├─ Stale comments contradicting code ├─ Design decisions
|
||||
├─ Magic numbers → named constants ├─ Large fixes (>20 lines)
|
||||
├─ Missing LLM output validation ├─ Enum completeness
|
||||
|
||||
@@ -9,7 +9,7 @@ This checklist applies to **source code in the diff** — not rendered output. R
|
||||
**Trigger:** Only run this checklist if the diff touches frontend files. Use `gstack-diff-scope` to detect:
|
||||
|
||||
```bash
|
||||
eval $(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)
|
||||
source <(~/.claude/skills/gstack/bin/gstack-diff-scope <base> 2>/dev/null)
|
||||
```
|
||||
|
||||
If `SCOPE_FRONTEND=false`, skip the entire design review silently.
|
||||
|
||||
Reference in New Issue
Block a user