mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-01 19:25:10 +02:00
feat: TODOS-aware skills, 2-tier Greptile replies, gitignore fix (#61)
* fix: log non-ENOENT errors in ensureStateDir() instead of silently swallowing
Replace bare catch {} with ENOENT-only silence. Non-ENOENT errors (EACCES,
ENOSPC) are now logged to .gstack/browse-server.log. Includes test for
permission-denied scenario with chmod 444.
* feat: merge TODO.md + TODOS.md into unified backlog with shared format reference
Merge TODO.md (roadmap) and TODOS.md (near-term) into one file organized by
skill/component with P0-P4 priority ordering and Completed section. Add shared
review/TODOS-format.md for canonical format. Add static validation tests.
* feat: add 2-tier Greptile reply system with escalation detection
Add reply templates (Tier 1 friendly, Tier 2 firm), explicit escalation
detection algorithm, and severity re-ranking guidance to greptile-triage.md.
* feat: cross-skill TODOS awareness + Greptile template refs in all skills
/ship Step 5.5: auto-detect completed TODOs, offer reorganization.
/review Step 5.5: cross-reference PR against open TODOs.
/plan-ceo-review, /plan-eng-review: TODOS context in planning.
/retro: Backlog Health metric. /qa: bug TODO context in diff-aware mode.
All Greptile-aware skills now reference reply templates and escalation detection.
* chore: bump version and changelog (v0.3.8)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: update CONTRIBUTING.md for v0.3.8 changes
Clarify test tier cost table (Tier 3 standalone vs combined), add TODOS.md
to "Things to know", mention Greptile triage in ship workflow description.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1,5 +1,27 @@
|
||||
# Changelog
|
||||
|
||||
## 0.3.8 — 2026-03-14
|
||||
|
||||
### Added
|
||||
- **TODOS.md as single source of truth** — merged `TODO.md` (roadmap) and `TODOS.md` (near-term) into one file organized by skill/component with P0-P4 priority ordering and a Completed section.
|
||||
- **`/ship` Step 5.5: TODOS.md management** — auto-detects completed items from the diff, marks them done with version annotations, offers to create/reorganize TODOS.md if missing or unstructured.
|
||||
- **Cross-skill TODOS awareness** — `/plan-ceo-review`, `/plan-eng-review`, `/retro`, `/review`, and `/qa` now read TODOS.md for project context. `/retro` adds Backlog Health metric (open counts, P0/P1 items, churn).
|
||||
- **Shared `review/TODOS-format.md`** — canonical TODO item format referenced by `/ship` and `/plan-ceo-review` to prevent format drift (DRY).
|
||||
- **Greptile 2-tier reply system** — Tier 1 (friendly, inline diff + explanation) for first responses; Tier 2 (firm, full evidence chain + re-rank request) when Greptile re-flags after a prior reply.
|
||||
- **Greptile reply templates** — structured templates in `greptile-triage.md` for fixes (inline diff), already-fixed (what was done), and false positives (evidence + suggested re-rank). Replaces vague one-line replies.
|
||||
- **Greptile escalation detection** — explicit algorithm to detect prior GStack replies on comment threads and auto-escalate to Tier 2.
|
||||
- **Greptile severity re-ranking** — replies now include `**Suggested re-rank:**` when Greptile miscategorizes issue severity.
|
||||
- Static validation tests for `TODOS-format.md` references across skills.
|
||||
|
||||
### Fixed
|
||||
- **`.gitignore` append failures silently swallowed** — `ensureStateDir()` bare `catch {}` replaced with ENOENT-only silence; non-ENOENT errors (EACCES, ENOSPC) logged to `.gstack/browse-server.log`.
|
||||
|
||||
### Changed
|
||||
- `TODO.md` deleted — all items merged into `TODOS.md`.
|
||||
- `/ship` Step 3.75 and `/review` Step 5 now reference reply templates and escalation detection from `greptile-triage.md`.
|
||||
- `/ship` Step 6 commit ordering includes TODOS.md in the final commit alongside VERSION + CHANGELOG.
|
||||
- `/ship` Step 8 PR body includes TODOS section.
|
||||
|
||||
## 0.3.7 — 2026-03-14
|
||||
|
||||
### Added
|
||||
|
||||
+6
-4
@@ -79,13 +79,14 @@ Bun auto-loads `.env` — no extra config. Conductor workspaces inherit `.env` f
|
||||
|
||||
| Tier | Command | Cost | What it tests |
|
||||
|------|---------|------|---------------|
|
||||
| 1 — Static | `bun test` | Free | Command validation, snapshot flags, SKILL.md correctness, observability unit tests |
|
||||
| 1 — Static | `bun test` | Free | Command validation, snapshot flags, SKILL.md correctness, TODOS-format.md refs, observability unit tests |
|
||||
| 2 — E2E | `bun run test:e2e` | ~$3.85 | Full skill execution via `claude -p` subprocess |
|
||||
| 3 — LLM eval | `bun run test:evals` | ~$4 | E2E + LLM-as-judge combined |
|
||||
| 3 — LLM eval | `bun run test:evals` | ~$0.15 standalone | LLM-as-judge scoring of generated SKILL.md docs |
|
||||
| 2+3 | `bun run test:evals` | ~$4 combined | E2E + LLM-as-judge (runs both) |
|
||||
|
||||
```bash
|
||||
bun test # Tier 1 only (runs on every commit, <5s)
|
||||
bun run test:e2e # Tier 2: E2E (needs EVALS=1, can't run inside Claude Code)
|
||||
bun run test:e2e # Tier 2: E2E only (needs EVALS=1, can't run inside Claude Code)
|
||||
bun run test:evals # Tier 2 + 3 combined (~$4/run)
|
||||
```
|
||||
|
||||
@@ -197,6 +198,7 @@ When Conductor creates a new workspace, `bin/dev-setup` runs automatically. It d
|
||||
## Things to know
|
||||
|
||||
- **SKILL.md files are generated.** Edit the `.tmpl` template, not the `.md`. Run `bun run gen:skill-docs` to regenerate.
|
||||
- **TODOS.md is the unified backlog.** Organized by skill/component with P0-P4 priorities. `/ship` auto-detects completed items. All planning/review/retro skills read it for context.
|
||||
- **Browse source changes need a rebuild.** If you touch `browse/src/*.ts`, run `bun run build`.
|
||||
- **Dev mode shadows your global install.** Project-local skills take priority over `~/.claude/skills/gstack`. `bin/dev-teardown` restores the global one.
|
||||
- **Conductor workspaces are independent.** Each workspace is its own git worktree. `bin/dev-setup` runs automatically via `conductor.json`.
|
||||
@@ -275,4 +277,4 @@ When you're happy with your skill edits:
|
||||
/ship
|
||||
```
|
||||
|
||||
This runs tests, reviews the diff, bumps the version, and opens a PR. See `ship/SKILL.md` for the full workflow.
|
||||
This runs tests, reviews the diff, triages Greptile comments (with 2-tier escalation), manages TODOS.md, bumps the version, and opens a PR. See `ship/SKILL.md` for the full workflow.
|
||||
|
||||
@@ -1,120 +0,0 @@
|
||||
# TODO — gstack roadmap
|
||||
|
||||
## Phase 1: Foundations (v0.2.0)
|
||||
- [x] Rename to gstack
|
||||
- [x] Restructure to monorepo layout
|
||||
- [x] Setup script for skill symlinks
|
||||
- [x] Snapshot command with ref-based element selection
|
||||
- [x] Snapshot tests
|
||||
|
||||
## Phase 2: Enhanced Browser (v0.2.0) ✅
|
||||
- [x] Annotated screenshots (--annotate flag, ref labels overlaid on screenshot)
|
||||
- [x] Snapshot diffing (--diff flag, unified diff against previous snapshot)
|
||||
- [x] Dialog handling (auto-accept/dismiss, dialog buffer, prevents browser lockup)
|
||||
- [x] File upload (upload <sel> <files>)
|
||||
- [x] Cursor-interactive elements (-C flag, cursor:pointer/onclick/tabindex scan)
|
||||
- [x] Element state checks (is visible/hidden/enabled/disabled/checked/editable/focused)
|
||||
- [x] CircularBuffer — O(1) ring buffer for console/network/dialog (was O(n) array+shift)
|
||||
- [x] Async buffer flush with Bun.write() (was appendFileSync)
|
||||
- [x] Health check with page.evaluate('1') + 2s timeout
|
||||
- [x] Playwright error wrapping — actionable messages for AI agents
|
||||
- [x] Fix useragent — context recreation preserves cookies/storage/URLs
|
||||
- [x] DRY: getCleanText exported, command sets in chain updated
|
||||
- [x] 148 integration tests (was ~63)
|
||||
|
||||
## Phase 3: QA Testing Agent (v0.3.0)
|
||||
- [x] `/qa` SKILL.md — 6-phase workflow: Initialize → Authenticate → Orient → Explore → Document → Wrap up
|
||||
- [x] Issue taxonomy reference (7 categories: visual, functional, UX, content, performance, console, accessibility)
|
||||
- [x] Severity classification (critical/high/medium/low)
|
||||
- [x] Exploration checklist per page
|
||||
- [x] Report template (structured markdown with per-issue evidence)
|
||||
- [x] Repro-first philosophy: every issue gets evidence before moving on
|
||||
- [x] Two evidence tiers: interactive bugs (multi-step screenshots), static bugs (single annotated screenshot)
|
||||
- [x] Key guidance: 5-10 well-documented issues per session, depth over breadth, write incrementally
|
||||
- [x] Three modes: full (systematic), quick (30-second smoke test), regression (compare against baseline)
|
||||
- [x] Framework detection guidance (Next.js, Rails, WordPress, SPA)
|
||||
- [x] Health score rubric (7 categories, weighted average)
|
||||
- [x] `wait --networkidle` / `wait --load` / `wait --domcontentloaded`
|
||||
- [x] `console --errors` (filter to error/warning only)
|
||||
- [x] `cookie-import <json-file>` (bulk cookie import with auto-fill domain)
|
||||
- [x] `browse/bin/find-browse` (DRY binary discovery across skills)
|
||||
- [ ] Video recording (deferred to Phase 5 — recreateContext destroys page state)
|
||||
|
||||
## Phase 3.5: Browser Cookie Import (v0.3.x)
|
||||
- [x] `cookie-import-browser` command (Chromium cookie DB decryption)
|
||||
- [x] Cookie picker web UI (served from browse server)
|
||||
- [x] `/setup-browser-cookies` skill
|
||||
- [x] Unit tests with encrypted cookie fixtures (18 tests)
|
||||
- [x] Browser registry (Comet, Chrome, Arc, Brave, Edge)
|
||||
|
||||
## Phase 3.6: Visual PR Annotations + S3 Upload
|
||||
- [ ] `/setup-gstack-upload` skill (configure S3 bucket for image hosting)
|
||||
- [ ] `browse/bin/gstack-upload` helper (upload file to S3, return public URL)
|
||||
- [ ] `/ship` Step 7.5: visual verification with screenshots in PR body
|
||||
- [ ] `/review` Step 4.5: visual review with annotated screenshots in PR
|
||||
- [ ] WebM → GIF conversion (ffmpeg) for video evidence in PRs
|
||||
- [ ] README documentation for visual PR annotations
|
||||
|
||||
## Phase 4: Skill + Browser Integration
|
||||
- [ ] ship + browse: post-deploy verification
|
||||
- Browse staging/preview URL after push
|
||||
- Screenshot key pages
|
||||
- Check console for JS errors
|
||||
- Compare staging vs prod via snapshot diff
|
||||
- Include verification screenshots in PR body
|
||||
- STOP if critical errors found
|
||||
- [ ] review + browse: visual diff review
|
||||
- Browse PR's preview deploy
|
||||
- Annotated screenshots of changed pages
|
||||
- Compare against production visually
|
||||
- Check responsive layouts (mobile/tablet/desktop)
|
||||
- Verify accessibility tree hasn't regressed
|
||||
- [ ] deploy-verify skill: lightweight post-deploy smoke test
|
||||
- Hit key URLs, verify 200s
|
||||
- Screenshot critical pages
|
||||
- Console error check
|
||||
- Compare against baseline snapshots
|
||||
- Pass/fail with evidence
|
||||
|
||||
## Phase 5: State & Sessions
|
||||
- [ ] Bundle server.ts into compiled binary (eliminate resolveServerScript() fallback chain entirely) (P2, M)
|
||||
- [ ] v20 encryption format support (AES-256-GCM) — future Chromium versions may change from v10
|
||||
- [ ] Sessions (isolated browser instances with separate cookies/storage/history)
|
||||
- [ ] State persistence (save/load cookies + localStorage to JSON files)
|
||||
- [ ] Auth vault (encrypted credential storage, referenced by name, LLM never sees passwords)
|
||||
- [ ] Video recording (record start/stop — needs sessions for clean context lifecycle)
|
||||
- [ ] retro + browse: deployment health tracking
|
||||
- Screenshot production state
|
||||
- Check perf metrics (page load times)
|
||||
- Count console errors across key pages
|
||||
- Track trends over retro window
|
||||
|
||||
## Phase 6: Advanced Browser
|
||||
- [ ] Iframe support (frame <sel>, frame main)
|
||||
- [ ] Semantic locators (find role/label/text/placeholder/testid with actions)
|
||||
- [ ] Device emulation presets (set device "iPhone 16 Pro")
|
||||
- [ ] Network mocking/routing (intercept, block, mock requests)
|
||||
- [ ] Download handling (click-to-download with path control)
|
||||
- [ ] Content safety (--max-output truncation, --allowed-domains)
|
||||
- [ ] Streaming (WebSocket live preview for pair browsing)
|
||||
- [ ] CDP mode (connect to already-running Chrome/Electron apps)
|
||||
|
||||
## Future Ideas
|
||||
- [ ] Linux/Windows cookie decryption (GNOME Keyring / kwallet / DPAPI)
|
||||
- [ ] Trend tracking across QA runs — compare baseline.json over time, detect regressions (P2, S)
|
||||
- [ ] CI/CD integration — `/qa` as GitHub Action step, fail PR if health score drops (P2, M)
|
||||
- [ ] Accessibility audit mode — `--a11y` flag for focused accessibility testing (P3, S)
|
||||
- [ ] Greptile training feedback loop — export suppression patterns to Greptile team for model improvement (P3, S)
|
||||
- [x] E2E test cost tracking — track cumulative API spend, warn if over threshold (P3, S)
|
||||
- [ ] E2E model pinning — pin E2E tests to claude-sonnet-4-6 for cost efficiency, add retry:2 for flaky LLM (P2, XS)
|
||||
- [ ] Smart default QA tier — after a few runs, check index.md for user's usual tier pick, skip the question (P2, S)
|
||||
|
||||
## Ideas & Notes
|
||||
- Browser is the nervous system — every skill should be able to see, interact with, and verify the web
|
||||
- Skills are the product; the browser enables them
|
||||
- One repo, one install, entire AI engineering workflow
|
||||
- Bun compiled binary matches Rust CLI performance for this use case (bottleneck is Chromium, not CLI parsing)
|
||||
- Accessibility tree snapshots use ~200-400 tokens vs ~3000-5000 for full DOM — critical for AI context efficiency
|
||||
- Locator map approach for refs: store Map<string, Locator> on BrowserManager, no DOM mutation, no CSP issues
|
||||
- Snapshot scoping (-i, -c, -d, -s flags) is critical for performance on large pages
|
||||
- All new commands follow existing pattern: add to command set, add switch case, return string
|
||||
@@ -1,36 +1,397 @@
|
||||
# TODOS
|
||||
|
||||
## Auto-upgrade mode (zero-prompt)
|
||||
## Browse
|
||||
|
||||
**What:** Add a `GSTACK_AUTO_UPGRADE=1` env var or `~/.gstack/config` option that skips the AskUserQuestion prompt and upgrades automatically when a new version is detected.
|
||||
### Bundle server.ts into compiled binary
|
||||
|
||||
**Why:** Power users and CI environments may want zero-friction upgrades without being asked every time.
|
||||
**What:** Eliminate `resolveServerScript()` fallback chain entirely — bundle server.ts into the compiled browse binary.
|
||||
|
||||
**Context:** The current upgrade system (v0.3.4) always prompts via AskUserQuestion. This TODO adds an opt-in bypass. Implementation is ~10 lines in the preamble instructions: check for the env var/config before calling AskUserQuestion, and if set, go straight to the upgrade flow. Depends on the full upgrade system being stable first — wait for user feedback on the prompt-based flow before adding this.
|
||||
**Why:** The current fallback chain (check adjacent to cli.ts, check global install) is fragile and caused bugs in v0.3.2. A single compiled binary is simpler and more reliable.
|
||||
|
||||
**Effort:** S (small)
|
||||
**Priority:** P3 (nice-to-have, revisit after adoption data)
|
||||
**Context:** Bun's `--compile` flag can bundle multiple entry points. The server is currently resolved at runtime via file path lookup. Bundling it removes the resolution step entirely.
|
||||
|
||||
## GitHub Actions eval upload
|
||||
**Effort:** M
|
||||
**Priority:** P2
|
||||
**Depends on:** None
|
||||
|
||||
### Sessions (isolated browser instances)
|
||||
|
||||
**What:** Isolated browser instances with separate cookies/storage/history, addressable by name.
|
||||
|
||||
**Why:** Enables parallel testing of different user roles, A/B test verification, and clean auth state management.
|
||||
|
||||
**Context:** Requires Playwright browser context isolation. Each session gets its own context with independent cookies/localStorage. Prerequisite for video recording (clean context lifecycle) and auth vault.
|
||||
|
||||
**Effort:** L
|
||||
**Priority:** P3
|
||||
|
||||
### Video recording
|
||||
|
||||
**What:** Record browser interactions as video (start/stop controls).
|
||||
|
||||
**Why:** Video evidence in QA reports and PR bodies. Currently deferred because `recreateContext()` destroys page state.
|
||||
|
||||
**Context:** Needs sessions for clean context lifecycle. Playwright supports video recording per context. Also needs WebM → GIF conversion for PR embedding.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P3
|
||||
**Depends on:** Sessions
|
||||
|
||||
### v20 encryption format support
|
||||
|
||||
**What:** AES-256-GCM support for future Chromium cookie DB versions (currently v10).
|
||||
|
||||
**Why:** Future Chromium versions may change encryption format. Proactive support prevents breakage.
|
||||
|
||||
**Effort:** S
|
||||
**Priority:** P3
|
||||
|
||||
### State persistence
|
||||
|
||||
**What:** Save/load cookies + localStorage to JSON files for reproducible test sessions.
|
||||
|
||||
**Why:** Enables "resume where I left off" for QA sessions and repeatable auth states.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P3
|
||||
**Depends on:** Sessions
|
||||
|
||||
### Auth vault
|
||||
|
||||
**What:** Encrypted credential storage, referenced by name. LLM never sees passwords.
|
||||
|
||||
**Why:** Security — currently auth credentials flow through the LLM context. Vault keeps secrets out of the AI's view.
|
||||
|
||||
**Effort:** L
|
||||
**Priority:** P3
|
||||
**Depends on:** Sessions, state persistence
|
||||
|
||||
### Iframe support
|
||||
|
||||
**What:** `frame <sel>` and `frame main` commands for cross-frame interaction.
|
||||
|
||||
**Why:** Many web apps use iframes (embeds, payment forms, ads). Currently invisible to browse.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P4
|
||||
|
||||
### Semantic locators
|
||||
|
||||
**What:** `find role/label/text/placeholder/testid` with attached actions.
|
||||
|
||||
**Why:** More resilient element selection than CSS selectors or ref numbers.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P4
|
||||
|
||||
### Device emulation presets
|
||||
|
||||
**What:** `set device "iPhone 16 Pro"` for mobile/tablet testing.
|
||||
|
||||
**Why:** Responsive layout testing without manual viewport resizing.
|
||||
|
||||
**Effort:** S
|
||||
**Priority:** P4
|
||||
|
||||
### Network mocking/routing
|
||||
|
||||
**What:** Intercept, block, and mock network requests.
|
||||
|
||||
**Why:** Test error states, loading states, and offline behavior.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P4
|
||||
|
||||
### Download handling
|
||||
|
||||
**What:** Click-to-download with path control.
|
||||
|
||||
**Why:** Test file download flows end-to-end.
|
||||
|
||||
**Effort:** S
|
||||
**Priority:** P4
|
||||
|
||||
### Content safety
|
||||
|
||||
**What:** `--max-output` truncation, `--allowed-domains` filtering.
|
||||
|
||||
**Why:** Prevent context window overflow and restrict navigation to safe domains.
|
||||
|
||||
**Effort:** S
|
||||
**Priority:** P4
|
||||
|
||||
### Streaming (WebSocket live preview)
|
||||
|
||||
**What:** WebSocket-based live preview for pair browsing sessions.
|
||||
|
||||
**Why:** Enables real-time collaboration — human watches AI browse.
|
||||
|
||||
**Effort:** L
|
||||
**Priority:** P4
|
||||
|
||||
### CDP mode
|
||||
|
||||
**What:** Connect to already-running Chrome/Electron apps via Chrome DevTools Protocol.
|
||||
|
||||
**Why:** Test production apps, Electron apps, and existing browser sessions without launching new instances.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P4
|
||||
|
||||
### Linux/Windows cookie decryption
|
||||
|
||||
**What:** GNOME Keyring / kwallet / DPAPI support for non-macOS cookie import.
|
||||
|
||||
**Why:** Cross-platform cookie import. Currently macOS-only (Keychain).
|
||||
|
||||
**Effort:** L
|
||||
**Priority:** P4
|
||||
|
||||
## Ship
|
||||
|
||||
### Ship log — persistent record of /ship runs
|
||||
|
||||
**What:** Append structured JSON entry to `.gstack/ship-log.json` at end of every /ship run (version, date, branch, PR URL, review findings, Greptile stats, todos completed, test results).
|
||||
|
||||
**Why:** /retro has no structured data about shipping velocity. Ship log enables: PRs-per-week trending, review finding rates, Greptile signal over time, test suite growth.
|
||||
|
||||
**Context:** /retro already reads greptile-history.md — same pattern. Eval persistence (eval-store.ts) shows the JSON append pattern exists in the codebase. ~15 lines in ship template.
|
||||
|
||||
**Effort:** S
|
||||
**Priority:** P2
|
||||
**Depends on:** None
|
||||
|
||||
### Post-deploy verification (ship + browse)
|
||||
|
||||
**What:** After push, browse staging/preview URL, screenshot key pages, check console for JS errors, compare staging vs prod via snapshot diff. Include verification screenshots in PR body. STOP if critical errors found.
|
||||
|
||||
**Why:** Catch deployment-time regressions (JS errors, broken layouts) before merge.
|
||||
|
||||
**Context:** Requires S3 upload infrastructure for PR screenshots. Pairs with visual PR annotations.
|
||||
|
||||
**Effort:** L
|
||||
**Priority:** P2
|
||||
**Depends on:** /setup-gstack-upload, visual PR annotations
|
||||
|
||||
### Visual verification with screenshots in PR body
|
||||
|
||||
**What:** /ship Step 7.5: screenshot key pages after push, embed in PR body.
|
||||
|
||||
**Why:** Visual evidence in PRs. Reviewers see what changed without deploying locally.
|
||||
|
||||
**Context:** Part of Phase 3.6. Needs S3 upload for image hosting.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P2
|
||||
**Depends on:** /setup-gstack-upload
|
||||
|
||||
## Review
|
||||
|
||||
### Inline PR annotations
|
||||
|
||||
**What:** /ship and /review post inline review comments at specific file:line locations using `gh api` to create pull request review comments.
|
||||
|
||||
**Why:** Line-level annotations are more actionable than top-level comments. The PR thread becomes a line-by-line conversation between Greptile, Claude, and human reviewers.
|
||||
|
||||
**Context:** GitHub supports inline review comments via `gh api repos/$REPO/pulls/$PR/reviews`. Pairs naturally with Phase 3.6 visual annotations.
|
||||
|
||||
**Effort:** S
|
||||
**Priority:** P2
|
||||
**Depends on:** None
|
||||
|
||||
### Greptile training feedback export
|
||||
|
||||
**What:** Aggregate greptile-history.md into machine-readable JSON summary of false positive patterns, exportable to the Greptile team for model improvement.
|
||||
|
||||
**Why:** Closes the feedback loop — Greptile can use FP data to stop making the same mistakes on your codebase.
|
||||
|
||||
**Context:** Was a P3 Future Idea. Upgraded to P2 now that greptile-history.md data infrastructure exists. The signal data is already being collected; this just makes it exportable. ~40 lines.
|
||||
|
||||
**Effort:** S
|
||||
**Priority:** P2
|
||||
**Depends on:** Enough FP data accumulated (10+ entries)
|
||||
|
||||
### Visual review with annotated screenshots
|
||||
|
||||
**What:** /review Step 4.5: browse PR's preview deploy, annotated screenshots of changed pages, compare against production, check responsive layouts, verify accessibility tree.
|
||||
|
||||
**Why:** Visual diff catches layout regressions that code review misses.
|
||||
|
||||
**Context:** Part of Phase 3.6. Needs S3 upload for image hosting.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P2
|
||||
**Depends on:** /setup-gstack-upload
|
||||
|
||||
## QA
|
||||
|
||||
### QA trend tracking
|
||||
|
||||
**What:** Compare baseline.json over time, detect regressions across QA runs.
|
||||
|
||||
**Why:** Spot quality trends — is the app getting better or worse?
|
||||
|
||||
**Context:** QA already writes structured reports. This adds cross-run comparison.
|
||||
|
||||
**Effort:** S
|
||||
**Priority:** P2
|
||||
|
||||
### CI/CD QA integration
|
||||
|
||||
**What:** `/qa` as GitHub Action step, fail PR if health score drops.
|
||||
|
||||
**Why:** Automated quality gate in CI. Catch regressions before merge.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P2
|
||||
|
||||
### Smart default QA tier
|
||||
|
||||
**What:** After a few runs, check index.md for user's usual tier pick, skip the AskUserQuestion.
|
||||
|
||||
**Why:** Reduces friction for repeat users.
|
||||
|
||||
**Effort:** S
|
||||
**Priority:** P2
|
||||
|
||||
### Accessibility audit mode
|
||||
|
||||
**What:** `--a11y` flag for focused accessibility testing.
|
||||
|
||||
**Why:** Dedicated accessibility testing beyond the general QA checklist.
|
||||
|
||||
**Effort:** S
|
||||
**Priority:** P3
|
||||
|
||||
## Retro
|
||||
|
||||
### Deployment health tracking (retro + browse)
|
||||
|
||||
**What:** Screenshot production state, check perf metrics (page load times), count console errors across key pages, track trends over retro window.
|
||||
|
||||
**Why:** Retro should include production health alongside code metrics.
|
||||
|
||||
**Context:** Requires browse integration. Screenshots + metrics fed into retro output.
|
||||
|
||||
**Effort:** L
|
||||
**Priority:** P3
|
||||
**Depends on:** Browse sessions
|
||||
|
||||
## Infrastructure
|
||||
|
||||
### /setup-gstack-upload skill (S3 bucket)
|
||||
|
||||
**What:** Configure S3 bucket for image hosting. One-time setup for visual PR annotations.
|
||||
|
||||
**Why:** Prerequisite for visual PR annotations in /ship and /review.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P2
|
||||
|
||||
### gstack-upload helper
|
||||
|
||||
**What:** `browse/bin/gstack-upload` — upload file to S3, return public URL.
|
||||
|
||||
**Why:** Shared utility for all skills that need to embed images in PRs.
|
||||
|
||||
**Effort:** S
|
||||
**Priority:** P2
|
||||
**Depends on:** /setup-gstack-upload
|
||||
|
||||
### WebM to GIF conversion
|
||||
|
||||
**What:** ffmpeg-based WebM → GIF conversion for video evidence in PRs.
|
||||
|
||||
**Why:** GitHub PR bodies render GIFs but not WebM. Needed for video recording evidence.
|
||||
|
||||
**Effort:** S
|
||||
**Priority:** P3
|
||||
**Depends on:** Video recording
|
||||
|
||||
### Deploy-verify skill
|
||||
|
||||
**What:** Lightweight post-deploy smoke test: hit key URLs, verify 200s, screenshot critical pages, console error check, compare against baseline snapshots. Pass/fail with evidence.
|
||||
|
||||
**Why:** Fast post-deploy confidence check, separate from full QA.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P2
|
||||
|
||||
### GitHub Actions eval upload
|
||||
|
||||
**What:** Run eval suite in CI, upload result JSON as artifact, post summary comment on PR.
|
||||
|
||||
**Why:** Currently evals only run locally. CI integration would catch quality regressions before merge and provide a persistent record of eval results per PR.
|
||||
**Why:** CI integration catches quality regressions before merge and provides persistent eval records per PR.
|
||||
|
||||
**Context:** Requires `ANTHROPIC_API_KEY` in CI secrets. Cost is ~$4/run. The eval persistence system (v0.3.6) writes JSON to `~/.gstack-dev/evals/` — CI would upload these as GitHub Actions artifacts and use `eval:compare` to post a delta comment on the PR.
|
||||
**Context:** Requires `ANTHROPIC_API_KEY` in CI secrets. Cost is ~$4/run. Eval persistence system (v0.3.6) writes JSON to `~/.gstack-dev/evals/` — CI would upload as GitHub Actions artifacts and use `eval:compare` to post delta comment.
|
||||
|
||||
**Depends on:** Eval persistence shipping (v0.3.6).
|
||||
**Effort:** M (medium)
|
||||
**Effort:** M
|
||||
**Priority:** P2
|
||||
**Depends on:** Eval persistence (shipped in v0.3.6)
|
||||
|
||||
### E2E model pinning
|
||||
|
||||
**What:** Pin E2E tests to claude-sonnet-4-6 for cost efficiency, add retry:2 for flaky LLM responses.
|
||||
|
||||
**Why:** Reduce E2E test cost and flakiness.
|
||||
|
||||
**Effort:** XS
|
||||
**Priority:** P2
|
||||
|
||||
## Eval web dashboard
|
||||
### Auto-upgrade mode (zero-prompt)
|
||||
|
||||
**What:** `bun run eval:dashboard` serves local HTML with charts: cost trending, detection rate over time, pass/fail history.
|
||||
**What:** `GSTACK_AUTO_UPGRADE=1` env var or `~/.gstack/config` option that skips the AskUserQuestion prompt and upgrades automatically.
|
||||
|
||||
**Why:** The CLI tools (`eval:list`, `eval:compare`, `eval:summary`) are good for quick checks but visual charts are better for spotting trends over many runs.
|
||||
**Why:** Power users and CI environments want zero-friction upgrades.
|
||||
|
||||
**Context:** Reads the same `~/.gstack-dev/evals/*.json` files. ~200 lines HTML + chart.js code served via a simple Bun HTTP server. No external dependencies beyond what's already installed.
|
||||
**Context:** Current upgrade system (v0.3.4) always prompts. This adds opt-in bypass. ~10 lines in preamble instructions.
|
||||
|
||||
**Depends on:** Eval persistence + eval:list shipping (v0.3.6).
|
||||
**Effort:** M (medium)
|
||||
**Priority:** P3 (nice-to-have, revisit after eval system sees regular use)
|
||||
**Effort:** S
|
||||
**Priority:** P3
|
||||
|
||||
### Eval web dashboard
|
||||
|
||||
**What:** `bun run eval:dashboard` serves local HTML with charts: cost trending, detection rate, pass/fail history.
|
||||
|
||||
**Why:** Visual charts better for spotting trends than CLI tools.
|
||||
|
||||
**Context:** Reads `~/.gstack-dev/evals/*.json`. ~200 lines HTML + chart.js via Bun HTTP server.
|
||||
|
||||
**Effort:** M
|
||||
**Priority:** P3
|
||||
**Depends on:** Eval persistence (shipped in v0.3.6)
|
||||
|
||||
## Completed
|
||||
|
||||
### Phase 1: Foundations (v0.2.0)
|
||||
- Rename to gstack
|
||||
- Restructure to monorepo layout
|
||||
- Setup script for skill symlinks
|
||||
- Snapshot command with ref-based element selection
|
||||
- Snapshot tests
|
||||
**Completed:** v0.2.0
|
||||
|
||||
### Phase 2: Enhanced Browser (v0.2.0)
|
||||
- Annotated screenshots, snapshot diffing, dialog handling, file upload
|
||||
- Cursor-interactive elements, element state checks
|
||||
- CircularBuffer, async buffer flush, health check
|
||||
- Playwright error wrapping, useragent fix
|
||||
- 148 integration tests
|
||||
**Completed:** v0.2.0
|
||||
|
||||
### Phase 3: QA Testing Agent (v0.3.0)
|
||||
- /qa SKILL.md with 6-phase workflow, 3 modes (full/quick/regression)
|
||||
- Issue taxonomy, severity classification, exploration checklist
|
||||
- Report template, health score rubric, framework detection
|
||||
- wait/console/cookie-import commands, find-browse binary
|
||||
**Completed:** v0.3.0
|
||||
|
||||
### Phase 3.5: Browser Cookie Import (v0.3.x)
|
||||
- cookie-import-browser command (Chromium cookie DB decryption)
|
||||
- Cookie picker web UI, /setup-browser-cookies skill
|
||||
- 18 unit tests, browser registry (Comet, Chrome, Arc, Brave, Edge)
|
||||
**Completed:** v0.3.1
|
||||
|
||||
### E2E test cost tracking
|
||||
- Track cumulative API spend, warn if over threshold
|
||||
**Completed:** v0.3.6
|
||||
|
||||
+11
-2
@@ -98,8 +98,17 @@ export function ensureStateDir(config: BrowseConfig): void {
|
||||
const separator = content.endsWith('\n') ? '' : '\n';
|
||||
fs.appendFileSync(gitignorePath, `${separator}.gstack/\n`);
|
||||
}
|
||||
} catch {
|
||||
// No .gitignore or unreadable — skip
|
||||
} catch (err: any) {
|
||||
if (err.code !== 'ENOENT') {
|
||||
// Write warning to server log (visible even in daemon mode)
|
||||
const logPath = path.join(config.stateDir, 'browse-server.log');
|
||||
try {
|
||||
fs.appendFileSync(logPath, `[${new Date().toISOString()}] Warning: could not update .gitignore at ${gitignorePath}: ${err.message}\n`);
|
||||
} catch {
|
||||
// stateDir write failed too — nothing more we can do
|
||||
}
|
||||
}
|
||||
// ENOENT (no .gitignore) — skip silently
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -95,6 +95,27 @@ describe('config', () => {
|
||||
fs.rmSync(tmpDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
test('logs warning to browse-server.log on non-ENOENT gitignore error', () => {
|
||||
const tmpDir = path.join(os.tmpdir(), `browse-gitignore-test-${Date.now()}`);
|
||||
fs.mkdirSync(tmpDir, { recursive: true });
|
||||
// Create a read-only .gitignore (no .gstack/ entry → would try to append)
|
||||
fs.writeFileSync(path.join(tmpDir, '.gitignore'), 'node_modules/\n');
|
||||
fs.chmodSync(path.join(tmpDir, '.gitignore'), 0o444);
|
||||
const config = resolveConfig({ BROWSE_STATE_FILE: path.join(tmpDir, '.gstack', 'browse.json') });
|
||||
ensureStateDir(config); // should not throw
|
||||
// Verify warning was written to server log
|
||||
const logPath = path.join(config.stateDir, 'browse-server.log');
|
||||
expect(fs.existsSync(logPath)).toBe(true);
|
||||
const logContent = fs.readFileSync(logPath, 'utf-8');
|
||||
expect(logContent).toContain('Warning: could not update .gitignore');
|
||||
// .gitignore should remain unchanged
|
||||
const gitignoreContent = fs.readFileSync(path.join(tmpDir, '.gitignore'), 'utf-8');
|
||||
expect(gitignoreContent).toBe('node_modules/\n');
|
||||
// Cleanup
|
||||
fs.chmodSync(path.join(tmpDir, '.gitignore'), 0o644);
|
||||
fs.rmSync(tmpDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
test('skips if no .gitignore exists', () => {
|
||||
const tmpDir = path.join(os.tmpdir(), `browse-gitignore-test-${Date.now()}`);
|
||||
fs.mkdirSync(tmpDir, { recursive: true });
|
||||
|
||||
@@ -74,7 +74,13 @@ git stash list # Any stashed work
|
||||
grep -r "TODO\|FIXME\|HACK\|XXX" --include="*.rb" --include="*.js" -l
|
||||
find . -name "*.rb" -newer Gemfile.lock | head -20 # Recently touched files
|
||||
```
|
||||
Then read CLAUDE.md, TODOS.md, and any existing architecture docs. Map:
|
||||
Then read CLAUDE.md, TODOS.md, and any existing architecture docs. When reading TODOS.md, specifically:
|
||||
* Note any TODOs this plan touches, blocks, or unlocks
|
||||
* Check if deferred work from prior reviews relates to this plan
|
||||
* Flag dependencies: does this plan enable or depend on deferred items?
|
||||
* Map known pain points (from TODOS) to this plan's scope
|
||||
|
||||
Map:
|
||||
* What is the current system state?
|
||||
* What is already in flight (other open PRs, branches, stashed changes)?
|
||||
* What are the existing known pain points most relevant to this plan?
|
||||
@@ -393,7 +399,7 @@ Complete table of every method that can fail, every exception class, rescued sta
|
||||
Any row with RESCUED=N, TEST=N, USER SEES=Silent → **CRITICAL GAP**.
|
||||
|
||||
### TODOS.md updates
|
||||
Present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step.
|
||||
Present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step. Follow the format in `.claude/skills/review/TODOS-format.md`.
|
||||
|
||||
For each TODO, describe:
|
||||
* **What:** One-line description of the work.
|
||||
|
||||
@@ -65,7 +65,13 @@ git stash list # Any stashed work
|
||||
grep -r "TODO\|FIXME\|HACK\|XXX" --include="*.rb" --include="*.js" -l
|
||||
find . -name "*.rb" -newer Gemfile.lock | head -20 # Recently touched files
|
||||
```
|
||||
Then read CLAUDE.md, TODOS.md, and any existing architecture docs. Map:
|
||||
Then read CLAUDE.md, TODOS.md, and any existing architecture docs. When reading TODOS.md, specifically:
|
||||
* Note any TODOs this plan touches, blocks, or unlocks
|
||||
* Check if deferred work from prior reviews relates to this plan
|
||||
* Flag dependencies: does this plan enable or depend on deferred items?
|
||||
* Map known pain points (from TODOS) to this plan's scope
|
||||
|
||||
Map:
|
||||
* What is the current system state?
|
||||
* What is already in flight (other open PRs, branches, stashed changes)?
|
||||
* What are the existing known pain points most relevant to this plan?
|
||||
@@ -384,7 +390,7 @@ Complete table of every method that can fail, every exception class, rescued sta
|
||||
Any row with RESCUED=N, TEST=N, USER SEES=Silent → **CRITICAL GAP**.
|
||||
|
||||
### TODOS.md updates
|
||||
Present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step.
|
||||
Present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step. Follow the format in `.claude/skills/review/TODOS-format.md`.
|
||||
|
||||
For each TODO, describe:
|
||||
* **What:** One-line description of the work.
|
||||
|
||||
@@ -51,6 +51,7 @@ Before reviewing anything, answer these questions:
|
||||
1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?
|
||||
2. **What is the minimum set of changes that achieves the stated goal?** Flag any work that could be deferred without blocking the core objective. Be ruthless about scope creep.
|
||||
3. **Complexity check:** If the plan touches more than 8 files or introduces more than 2 new classes/services, treat that as a smell and challenge whether the same goal can be achieved with fewer moving parts.
|
||||
4. **TODOS cross-reference:** Read `TODOS.md` if it exists. Are any deferred items blocking this plan? Can any deferred items be bundled into this PR without expanding scope? Does this plan create new work that should be captured as a TODO?
|
||||
|
||||
Then ask if I want one of three options:
|
||||
1. **SCOPE REDUCTION:** The plan is overbuilt. Propose a minimal version that achieves the core goal, then review that.
|
||||
@@ -123,7 +124,7 @@ Every plan review MUST produce a "NOT in scope" section listing work that was co
|
||||
List existing code/flows that already partially solve sub-problems in this plan, and whether the plan reuses them or unnecessarily rebuilds them.
|
||||
|
||||
### TODOS.md updates
|
||||
After all review sections are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step.
|
||||
After all review sections are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step. Follow the format in `.claude/skills/review/TODOS-format.md`.
|
||||
|
||||
For each TODO, describe:
|
||||
* **What:** One-line description of the work.
|
||||
|
||||
@@ -42,6 +42,7 @@ Before reviewing anything, answer these questions:
|
||||
1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?
|
||||
2. **What is the minimum set of changes that achieves the stated goal?** Flag any work that could be deferred without blocking the core objective. Be ruthless about scope creep.
|
||||
3. **Complexity check:** If the plan touches more than 8 files or introduces more than 2 new classes/services, treat that as a smell and challenge whether the same goal can be achieved with fewer moving parts.
|
||||
4. **TODOS cross-reference:** Read `TODOS.md` if it exists. Are any deferred items blocking this plan? Can any deferred items be bundled into this PR without expanding scope? Does this plan create new work that should be captured as a TODO?
|
||||
|
||||
Then ask if I want one of three options:
|
||||
1. **SCOPE REDUCTION:** The plan is overbuilt. Propose a minimal version that achieves the core goal, then review that.
|
||||
@@ -114,7 +115,7 @@ Every plan review MUST produce a "NOT in scope" section listing work that was co
|
||||
List existing code/flows that already partially solve sub-problems in this plan, and whether the plan reuses them or unnecessarily rebuilds them.
|
||||
|
||||
### TODOS.md updates
|
||||
After all review sections are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step.
|
||||
After all review sections are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step. Follow the format in `.claude/skills/review/TODOS-format.md`.
|
||||
|
||||
For each TODO, describe:
|
||||
* **What:** One-line description of the work.
|
||||
|
||||
+3
-1
@@ -110,7 +110,9 @@ This is the **primary mode** for developers verifying their work. When the user
|
||||
|
||||
5. **Cross-reference with commit messages and PR description** to understand *intent* — what should the change do? Verify it actually does that.
|
||||
|
||||
6. **Report findings** scoped to the branch changes:
|
||||
6. **Check TODOS.md** (if it exists) for known bugs or issues related to the changed files. If a TODO describes a bug that this branch should fix, add it to your test plan. If you find a new bug during QA that isn't in TODOS.md, note it in the report.
|
||||
|
||||
7. **Report findings** scoped to the branch changes:
|
||||
- "Changes tested: N pages/routes affected by this branch"
|
||||
- For each: does it work? Screenshot evidence.
|
||||
- Any regressions on adjacent pages?
|
||||
|
||||
+3
-1
@@ -84,7 +84,9 @@ This is the **primary mode** for developers verifying their work. When the user
|
||||
|
||||
5. **Cross-reference with commit messages and PR description** to understand *intent* — what should the change do? Verify it actually does that.
|
||||
|
||||
6. **Report findings** scoped to the branch changes:
|
||||
6. **Check TODOS.md** (if it exists) for known bugs or issues related to the changed files. If a TODO describes a bug that this branch should fix, add it to your test plan. If you find a new bug during QA that isn't in TODOS.md, note it in the report.
|
||||
|
||||
7. **Report findings** scoped to the branch changes:
|
||||
- "Changes tested: N pages/routes affected by this branch"
|
||||
- For each: does it work? Screenshot evidence.
|
||||
- Any regressions on adjacent pages?
|
||||
|
||||
+29
-1
@@ -95,6 +95,9 @@ git shortlog origin/main --since="<window>" -sn --no-merges
|
||||
|
||||
# 8. Greptile triage history (if available)
|
||||
cat ~/.gstack/greptile-history.md 2>/dev/null || true
|
||||
|
||||
# 9. TODOS.md backlog (if available)
|
||||
cat TODOS.md 2>/dev/null || true
|
||||
```
|
||||
|
||||
### Step 2: Compute Metrics
|
||||
@@ -130,6 +133,20 @@ Sort by commits descending. The current user (from `git config user.name`) alway
|
||||
|
||||
**Greptile signal (if history exists):** Read `~/.gstack/greptile-history.md` (fetched in Step 1, command 8). Filter entries within the retro time window by date. Count entries by type: `fix`, `fp`, `already-fixed`. Compute signal ratio: `(fix + already-fixed) / (fix + already-fixed + fp)`. If no entries exist in the window or the file doesn't exist, skip the Greptile metric row. Skip unparseable lines silently.
|
||||
|
||||
**Backlog Health (if TODOS.md exists):** Read `TODOS.md` (fetched in Step 1, command 9). Compute:
|
||||
- Total open TODOs (exclude items in `## Completed` section)
|
||||
- P0/P1 count (critical/urgent items)
|
||||
- P2 count (important items)
|
||||
- Items completed this period (items in Completed section with dates within the retro window)
|
||||
- Items added this period (cross-reference git log for commits that modified TODOS.md within the window)
|
||||
|
||||
Include in the metrics table:
|
||||
```
|
||||
| Backlog Health | N open (X P0/P1, Y P2) · Z completed this period |
|
||||
```
|
||||
|
||||
If TODOS.md doesn't exist, skip the Backlog Health row.
|
||||
|
||||
### Step 3: Commit Time Distribution
|
||||
|
||||
Show hourly histogram in Pacific time using bar chart:
|
||||
@@ -325,7 +342,18 @@ Use the Write tool to save the JSON file with this schema:
|
||||
}
|
||||
```
|
||||
|
||||
**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. If no history data is available, omit the field entirely.
|
||||
**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. If either has no data, omit the field entirely.
|
||||
|
||||
Include backlog data in the JSON when TODOS.md exists:
|
||||
```json
|
||||
"backlog": {
|
||||
"total_open": 28,
|
||||
"p0_p1": 2,
|
||||
"p2": 8,
|
||||
"completed_this_period": 3,
|
||||
"added_this_period": 1
|
||||
}
|
||||
```
|
||||
|
||||
### Step 14: Write the Narrative
|
||||
|
||||
|
||||
+29
-1
@@ -86,6 +86,9 @@ git shortlog origin/main --since="<window>" -sn --no-merges
|
||||
|
||||
# 8. Greptile triage history (if available)
|
||||
cat ~/.gstack/greptile-history.md 2>/dev/null || true
|
||||
|
||||
# 9. TODOS.md backlog (if available)
|
||||
cat TODOS.md 2>/dev/null || true
|
||||
```
|
||||
|
||||
### Step 2: Compute Metrics
|
||||
@@ -121,6 +124,20 @@ Sort by commits descending. The current user (from `git config user.name`) alway
|
||||
|
||||
**Greptile signal (if history exists):** Read `~/.gstack/greptile-history.md` (fetched in Step 1, command 8). Filter entries within the retro time window by date. Count entries by type: `fix`, `fp`, `already-fixed`. Compute signal ratio: `(fix + already-fixed) / (fix + already-fixed + fp)`. If no entries exist in the window or the file doesn't exist, skip the Greptile metric row. Skip unparseable lines silently.
|
||||
|
||||
**Backlog Health (if TODOS.md exists):** Read `TODOS.md` (fetched in Step 1, command 9). Compute:
|
||||
- Total open TODOs (exclude items in `## Completed` section)
|
||||
- P0/P1 count (critical/urgent items)
|
||||
- P2 count (important items)
|
||||
- Items completed this period (items in Completed section with dates within the retro window)
|
||||
- Items added this period (cross-reference git log for commits that modified TODOS.md within the window)
|
||||
|
||||
Include in the metrics table:
|
||||
```
|
||||
| Backlog Health | N open (X P0/P1, Y P2) · Z completed this period |
|
||||
```
|
||||
|
||||
If TODOS.md doesn't exist, skip the Backlog Health row.
|
||||
|
||||
### Step 3: Commit Time Distribution
|
||||
|
||||
Show hourly histogram in Pacific time using bar chart:
|
||||
@@ -316,7 +333,18 @@ Use the Write tool to save the JSON file with this schema:
|
||||
}
|
||||
```
|
||||
|
||||
**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. If no history data is available, omit the field entirely.
|
||||
**Note:** Only include the `greptile` field if `~/.gstack/greptile-history.md` exists and has entries within the time window. Only include the `backlog` field if `TODOS.md` exists. If either has no data, omit the field entirely.
|
||||
|
||||
Include backlog data in the JSON when TODOS.md exists:
|
||||
```json
|
||||
"backlog": {
|
||||
"total_open": 28,
|
||||
"p0_p1": 2,
|
||||
"p2": 8,
|
||||
"completed_this_period": 3,
|
||||
"added_this_period": 1
|
||||
}
|
||||
```
|
||||
|
||||
### Step 14: Write the Narrative
|
||||
|
||||
|
||||
+21
-6
@@ -49,7 +49,7 @@ Read `.claude/skills/review/checklist.md`.
|
||||
|
||||
## Step 2.5: Check for Greptile review comments
|
||||
|
||||
Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, and classify steps.
|
||||
Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, classify, and **escalation detection** steps.
|
||||
|
||||
**If no PR exists, `gh` fails, API returns an error, or there are zero Greptile comments:** Skip this step silently. Greptile integration is additive — the review works without it.
|
||||
|
||||
@@ -95,7 +95,9 @@ After outputting your own findings, if Greptile comments were classified in Step
|
||||
|
||||
**Include a Greptile summary in your output header:** `+ N Greptile comments (X valid, Y fixed, Z FP)`
|
||||
|
||||
1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses C (false positive), post a reply using the appropriate API from the triage doc and save the pattern to both per-project and global greptile-history (see greptile-triage.md for write details).
|
||||
Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.
|
||||
|
||||
1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
|
||||
|
||||
2. **FALSE POSITIVE comments:** Present each one via AskUserQuestion:
|
||||
- Show the Greptile comment: file:line (or [top-level]) + body summary + permalink URL
|
||||
@@ -105,19 +107,32 @@ After outputting your own findings, if Greptile comments were classified in Step
|
||||
- B) Fix it anyway (if low-effort and harmless)
|
||||
- C) Ignore — don't reply, don't fix
|
||||
|
||||
If the user chooses A, post a reply using the appropriate API from the triage doc and save the pattern to both per-project and global greptile-history (see greptile-triage.md for write details).
|
||||
If the user chooses A, reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history.
|
||||
|
||||
3. **VALID BUT ALREADY FIXED comments:** Reply acknowledging the catch — no AskUserQuestion needed:
|
||||
- Post reply: `"Good catch — already fixed in <commit-sha>."`
|
||||
- Save to both per-project and global greptile-history (see greptile-triage.md for write details)
|
||||
3. **VALID BUT ALREADY FIXED comments:** Reply using the **Already Fixed reply template** from greptile-triage.md — no AskUserQuestion needed:
|
||||
- Include what was done and the fixing commit SHA
|
||||
- Save to both per-project and global greptile-history
|
||||
|
||||
4. **SUPPRESSED comments:** Skip silently — these are known false positives from previous triage.
|
||||
|
||||
---
|
||||
|
||||
## Step 5.5: TODOS cross-reference
|
||||
|
||||
Read `TODOS.md` in the repository root (if it exists). Cross-reference the PR against open TODOs:
|
||||
|
||||
- **Does this PR close any open TODOs?** If yes, note which items in your output: "This PR addresses TODO: <title>"
|
||||
- **Does this PR create work that should become a TODO?** If yes, flag it as an informational finding.
|
||||
- **Are there related TODOs that provide context for this review?** If yes, reference them when discussing related findings.
|
||||
|
||||
If TODOS.md doesn't exist, skip this step silently.
|
||||
|
||||
---
|
||||
|
||||
## Important Rules
|
||||
|
||||
- **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
|
||||
- **Read-only by default.** Only modify files if the user explicitly chooses "Fix it now" on a critical issue. Never commit, push, or create PRs.
|
||||
- **Be terse.** One line problem, one line fix. No preamble.
|
||||
- **Only flag real problems.** Skip anything that's fine.
|
||||
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence. Never post vague replies.
|
||||
|
||||
+21
-6
@@ -40,7 +40,7 @@ Read `.claude/skills/review/checklist.md`.
|
||||
|
||||
## Step 2.5: Check for Greptile review comments
|
||||
|
||||
Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, and classify steps.
|
||||
Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, classify, and **escalation detection** steps.
|
||||
|
||||
**If no PR exists, `gh` fails, API returns an error, or there are zero Greptile comments:** Skip this step silently. Greptile integration is additive — the review works without it.
|
||||
|
||||
@@ -86,7 +86,9 @@ After outputting your own findings, if Greptile comments were classified in Step
|
||||
|
||||
**Include a Greptile summary in your output header:** `+ N Greptile comments (X valid, Y fixed, Z FP)`
|
||||
|
||||
1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses C (false positive), post a reply using the appropriate API from the triage doc and save the pattern to both per-project and global greptile-history (see greptile-triage.md for write details).
|
||||
Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.
|
||||
|
||||
1. **VALID & ACTIONABLE comments:** These are already included in your CRITICAL findings — they follow the same AskUserQuestion flow (A: Fix it now, B: Acknowledge, C: False positive). If the user chooses A (fix), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation). If the user chooses C (false positive), reply using the **False Positive reply template** (include evidence + suggested re-rank), save to both per-project and global greptile-history.
|
||||
|
||||
2. **FALSE POSITIVE comments:** Present each one via AskUserQuestion:
|
||||
- Show the Greptile comment: file:line (or [top-level]) + body summary + permalink URL
|
||||
@@ -96,19 +98,32 @@ After outputting your own findings, if Greptile comments were classified in Step
|
||||
- B) Fix it anyway (if low-effort and harmless)
|
||||
- C) Ignore — don't reply, don't fix
|
||||
|
||||
If the user chooses A, post a reply using the appropriate API from the triage doc and save the pattern to both per-project and global greptile-history (see greptile-triage.md for write details).
|
||||
If the user chooses A, reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history.
|
||||
|
||||
3. **VALID BUT ALREADY FIXED comments:** Reply acknowledging the catch — no AskUserQuestion needed:
|
||||
- Post reply: `"Good catch — already fixed in <commit-sha>."`
|
||||
- Save to both per-project and global greptile-history (see greptile-triage.md for write details)
|
||||
3. **VALID BUT ALREADY FIXED comments:** Reply using the **Already Fixed reply template** from greptile-triage.md — no AskUserQuestion needed:
|
||||
- Include what was done and the fixing commit SHA
|
||||
- Save to both per-project and global greptile-history
|
||||
|
||||
4. **SUPPRESSED comments:** Skip silently — these are known false positives from previous triage.
|
||||
|
||||
---
|
||||
|
||||
## Step 5.5: TODOS cross-reference
|
||||
|
||||
Read `TODOS.md` in the repository root (if it exists). Cross-reference the PR against open TODOs:
|
||||
|
||||
- **Does this PR close any open TODOs?** If yes, note which items in your output: "This PR addresses TODO: <title>"
|
||||
- **Does this PR create work that should become a TODO?** If yes, flag it as an informational finding.
|
||||
- **Are there related TODOs that provide context for this review?** If yes, reference them when discussing related findings.
|
||||
|
||||
If TODOS.md doesn't exist, skip this step silently.
|
||||
|
||||
---
|
||||
|
||||
## Important Rules
|
||||
|
||||
- **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
|
||||
- **Read-only by default.** Only modify files if the user explicitly chooses "Fix it now" on a critical issue. Never commit, push, or create PRs.
|
||||
- **Be terse.** One line problem, one line fix. No preamble.
|
||||
- **Only flag real problems.** Skip anything that's fine.
|
||||
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence. Never post vague replies.
|
||||
|
||||
@@ -0,0 +1,62 @@
|
||||
# TODOS.md Format Reference
|
||||
|
||||
Shared reference for the canonical TODOS.md format. Referenced by `/ship` (Step 5.5) and `/plan-ceo-review` (TODOS.md updates section) to ensure consistent TODO item structure.
|
||||
|
||||
---
|
||||
|
||||
## File Structure
|
||||
|
||||
```markdown
|
||||
# TODOS
|
||||
|
||||
## <Skill/Component> ← e.g., ## Browse, ## Ship, ## Review, ## Infrastructure
|
||||
<items sorted P0 first, then P1, P2, P3, P4>
|
||||
|
||||
## Completed
|
||||
<finished items with completion annotation>
|
||||
```
|
||||
|
||||
**Sections:** Organize by skill or component (`## Browse`, `## Ship`, `## Review`, `## QA`, `## Retro`, `## Infrastructure`). Within each section, sort items by priority (P0 at top).
|
||||
|
||||
---
|
||||
|
||||
## TODO Item Format
|
||||
|
||||
Each item is an H3 under its section:
|
||||
|
||||
```markdown
|
||||
### <Title>
|
||||
|
||||
**What:** One-line description of the work.
|
||||
|
||||
**Why:** The concrete problem it solves or value it unlocks.
|
||||
|
||||
**Context:** Enough detail that someone picking this up in 3 months understands the motivation, the current state, and where to start.
|
||||
|
||||
**Effort:** S / M / L / XL
|
||||
**Priority:** P0 / P1 / P2 / P3 / P4
|
||||
**Depends on:** <prerequisites, or "None">
|
||||
```
|
||||
|
||||
**Required fields:** What, Why, Context, Effort, Priority
|
||||
**Optional fields:** Depends on, Blocked by
|
||||
|
||||
---
|
||||
|
||||
## Priority Definitions
|
||||
|
||||
- **P0** — Blocking: must be done before next release
|
||||
- **P1** — Critical: should be done this cycle
|
||||
- **P2** — Important: do when P0/P1 are clear
|
||||
- **P3** — Nice-to-have: revisit after adoption/usage data
|
||||
- **P4** — Someday: good idea, no urgency
|
||||
|
||||
---
|
||||
|
||||
## Completed Item Format
|
||||
|
||||
When an item is completed, move it to the `## Completed` section preserving its original content and appending:
|
||||
|
||||
```markdown
|
||||
**Completed:** vX.Y.Z (YYYY-MM-DD)
|
||||
```
|
||||
@@ -93,6 +93,92 @@ gh api repos/$REPO/issues/$PR_NUMBER/comments \
|
||||
|
||||
---
|
||||
|
||||
## Reply Templates
|
||||
|
||||
Use these templates for every Greptile reply. Always include concrete evidence — never post vague replies.
|
||||
|
||||
### Tier 1 (First response) — Friendly, evidence-included
|
||||
|
||||
**For FIXES (user chose to fix the issue):**
|
||||
|
||||
```
|
||||
**Fixed** in `<commit-sha>`.
|
||||
|
||||
\`\`\`diff
|
||||
- <old problematic line(s)>
|
||||
+ <new fixed line(s)>
|
||||
\`\`\`
|
||||
|
||||
**Why:** <1-sentence explanation of what was wrong and how the fix addresses it>
|
||||
```
|
||||
|
||||
**For ALREADY FIXED (issue addressed in a prior commit on the branch):**
|
||||
|
||||
```
|
||||
**Already fixed** in `<commit-sha>`.
|
||||
|
||||
**What was done:** <1-2 sentences describing how the existing commit addresses this issue>
|
||||
```
|
||||
|
||||
**For FALSE POSITIVES (the comment is incorrect):**
|
||||
|
||||
```
|
||||
**Not a bug.** <1 sentence directly stating why this is incorrect>
|
||||
|
||||
**Evidence:**
|
||||
- <specific code reference showing the pattern is safe/correct>
|
||||
- <e.g., "The nil check is handled by `ActiveRecord::FinderMethods#find` which raises RecordNotFound, not nil">
|
||||
|
||||
**Suggested re-rank:** This appears to be a `<style|noise|misread>` issue, not a `<what Greptile called it>`. Consider lowering severity.
|
||||
```
|
||||
|
||||
### Tier 2 (Greptile re-flags after prior reply) — Firm, overwhelming evidence
|
||||
|
||||
Use Tier 2 when escalation detection (below) identifies a prior GStack reply on the same thread. Include maximum evidence to close the discussion.
|
||||
|
||||
```
|
||||
**This has been reviewed and confirmed as [intentional/already-fixed/not-a-bug].**
|
||||
|
||||
\`\`\`diff
|
||||
<full relevant diff showing the change or safe pattern>
|
||||
\`\`\`
|
||||
|
||||
**Evidence chain:**
|
||||
1. <file:line permalink showing the safe pattern or fix>
|
||||
2. <commit SHA where it was addressed, if applicable>
|
||||
3. <architecture rationale or design decision, if applicable>
|
||||
|
||||
**Suggested re-rank:** Please recalibrate — this is a `<actual category>` issue, not `<claimed category>`. [Link to specific file change permalink if helpful]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Escalation Detection
|
||||
|
||||
Before composing a reply, check if a prior GStack reply already exists on this comment thread:
|
||||
|
||||
1. **For line-level comments:** Fetch replies via `gh api repos/$REPO/pulls/$PR_NUMBER/comments/$COMMENT_ID/replies`. Check if any reply body contains GStack markers: `**Fixed**`, `**Not a bug.**`, `**Already fixed**`.
|
||||
|
||||
2. **For top-level comments:** Scan the fetched issue comments for replies posted after the Greptile comment that contain GStack markers.
|
||||
|
||||
3. **If a prior GStack reply exists AND Greptile posted again on the same file+category:** Use Tier 2 (firm) templates.
|
||||
|
||||
4. **If no prior GStack reply exists:** Use Tier 1 (friendly) templates.
|
||||
|
||||
If escalation detection fails (API error, ambiguous thread): default to Tier 1. Never escalate on ambiguity.
|
||||
|
||||
---
|
||||
|
||||
## Severity Assessment & Re-ranking
|
||||
|
||||
When classifying comments, also assess whether Greptile's implied severity matches reality:
|
||||
|
||||
- If Greptile flags something as a **security/correctness/race-condition** issue but it's actually a **style/performance** nit: include `**Suggested re-rank:**` in the reply requesting the category be corrected.
|
||||
- If Greptile flags a low-severity style issue as if it were critical: push back in the reply.
|
||||
- Always be specific about why the re-ranking is warranted — cite code and line numbers, not opinions.
|
||||
|
||||
---
|
||||
|
||||
## History File Writes
|
||||
|
||||
Before writing, ensure both directories exist:
|
||||
|
||||
+76
-8
@@ -35,6 +35,8 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
|
||||
- Pre-landing review finds CRITICAL issues and user chooses to fix (not acknowledge or skip)
|
||||
- MINOR or MAJOR version bump needed (ask — see Step 4)
|
||||
- Greptile review comments that need user decision (complex fixes, false positives)
|
||||
- TODOS.md missing and user wants to create one (ask — see Step 5.5)
|
||||
- TODOS.md disorganized and user wants to reorganize (ask — see Step 5.5)
|
||||
|
||||
**Never stop for:**
|
||||
- Uncommitted changes (always include them)
|
||||
@@ -42,6 +44,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
|
||||
- CHANGELOG content (auto-generate from diff)
|
||||
- Commit message approval (auto-commit)
|
||||
- Multi-file changesets (auto-split into bisectable commits)
|
||||
- TODOS.md completed-item detection (auto-mark)
|
||||
|
||||
---
|
||||
|
||||
@@ -185,7 +188,7 @@ Save the review output — it goes into the PR body in Step 8.
|
||||
|
||||
## Step 3.75: Address Greptile review comments (if PR exists)
|
||||
|
||||
Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, and classify steps.
|
||||
Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, classify, and **escalation detection** steps.
|
||||
|
||||
**If no PR exists, `gh` fails, API returns an error, or there are zero Greptile comments:** Skip this step silently. Continue to Step 4.
|
||||
|
||||
@@ -193,18 +196,20 @@ Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, an
|
||||
|
||||
Include a Greptile summary in your output: `+ N Greptile comments (X valid, Y fixed, Z FP)`
|
||||
|
||||
Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.
|
||||
|
||||
For each classified comment:
|
||||
|
||||
**VALID & ACTIONABLE:** Use AskUserQuestion with:
|
||||
- The comment (file:line or [top-level] + body summary + permalink URL)
|
||||
- Your recommended fix
|
||||
- Options: A) Fix now (recommended), B) Acknowledge and ship anyway, C) It's a false positive
|
||||
- If user chooses A: apply the fix, commit the fixed files (`git add <fixed-files> && git commit -m "fix: address Greptile review — <brief description>"`), reply to the comment (`"Fixed in <commit-sha>."`), and save to both per-project and global greptile-history (see greptile-triage.md for write details, type: fix).
|
||||
- If user chooses C: reply explaining the false positive, save to both per-project and global greptile-history (type: fp).
|
||||
- If user chooses A: apply the fix, commit the fixed files (`git add <fixed-files> && git commit -m "fix: address Greptile review — <brief description>"`), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation), and save to both per-project and global greptile-history (type: fix).
|
||||
- If user chooses C: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp).
|
||||
|
||||
**VALID BUT ALREADY FIXED:** Reply acknowledging the catch — no AskUserQuestion needed:
|
||||
- Post reply: `"Good catch — already fixed in <commit-sha>."`
|
||||
- Save to both per-project and global greptile-history (see greptile-triage.md for write details, type: already-fixed)
|
||||
**VALID BUT ALREADY FIXED:** Reply using the **Already Fixed reply template** from greptile-triage.md — no AskUserQuestion needed:
|
||||
- Include what was done and the fixing commit SHA
|
||||
- Save to both per-project and global greptile-history (type: already-fixed)
|
||||
|
||||
**FALSE POSITIVE:** Use AskUserQuestion:
|
||||
- Show the comment and why you think it's wrong (file:line or [top-level] + body summary + permalink URL)
|
||||
@@ -212,7 +217,7 @@ For each classified comment:
|
||||
- A) Reply to Greptile explaining the false positive (recommended if clearly wrong)
|
||||
- B) Fix it anyway (if trivial)
|
||||
- C) Ignore silently
|
||||
- If user chooses A: post reply using the appropriate API from the triage doc, save to both per-project and global greptile-history (type: fp)
|
||||
- If user chooses A: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp)
|
||||
|
||||
**SUPPRESSED:** Skip silently — these are known false positives from previous triage.
|
||||
|
||||
@@ -261,6 +266,61 @@ For each classified comment:
|
||||
|
||||
---
|
||||
|
||||
## Step 5.5: TODOS.md (auto-update)
|
||||
|
||||
Cross-reference the project's TODOS.md against the changes being shipped. Mark completed items automatically; prompt only if the file is missing or disorganized.
|
||||
|
||||
Read `.claude/skills/review/TODOS-format.md` for the canonical format reference.
|
||||
|
||||
**1. Check if TODOS.md exists** in the repository root.
|
||||
|
||||
**If TODOS.md does not exist:** Use AskUserQuestion:
|
||||
- Message: "GStack recommends maintaining a TODOS.md organized by skill/component, then priority (P0 at top through P4, then Completed at bottom). See TODOS-format.md for the full format. Would you like to create one?"
|
||||
- Options: A) Create it now, B) Skip for now
|
||||
- If A: Create `TODOS.md` with a skeleton (# TODOS heading + ## Completed section). Continue to step 3.
|
||||
- If B: Skip the rest of Step 5.5. Continue to Step 6.
|
||||
|
||||
**2. Check structure and organization:**
|
||||
|
||||
Read TODOS.md and verify it follows the recommended structure:
|
||||
- Items grouped under `## <Skill/Component>` headings
|
||||
- Each item has `**Priority:**` field with P0-P4 value
|
||||
- A `## Completed` section at the bottom
|
||||
|
||||
**If disorganized** (missing priority fields, no component groupings, no Completed section): Use AskUserQuestion:
|
||||
- Message: "TODOS.md doesn't follow the recommended structure (skill/component groupings, P0-P4 priority, Completed section). Would you like to reorganize it?"
|
||||
- Options: A) Reorganize now (recommended), B) Leave as-is
|
||||
- If A: Reorganize in-place following TODOS-format.md. Preserve all content — only restructure, never delete items.
|
||||
- If B: Continue to step 3 without restructuring.
|
||||
|
||||
**3. Detect completed TODOs:**
|
||||
|
||||
This step is fully automatic — no user interaction.
|
||||
|
||||
Use the diff and commit history already gathered in earlier steps:
|
||||
- `git diff main...HEAD` (full diff against main)
|
||||
- `git log main..HEAD --oneline` (all commits being shipped)
|
||||
|
||||
For each TODO item, check if the changes in this PR complete it by:
|
||||
- Matching commit messages against the TODO title and description
|
||||
- Checking if files referenced in the TODO appear in the diff
|
||||
- Checking if the TODO's described work matches the functional changes
|
||||
|
||||
**Be conservative:** Only mark a TODO as completed if there is clear evidence in the diff. If uncertain, leave it alone.
|
||||
|
||||
**4. Move completed items** to the `## Completed` section at the bottom. Append: `**Completed:** vX.Y.Z (YYYY-MM-DD)`
|
||||
|
||||
**5. Output summary:**
|
||||
- `TODOS.md: N items marked complete (item1, item2, ...). M items remaining.`
|
||||
- Or: `TODOS.md: No completed items detected. M items remaining.`
|
||||
- Or: `TODOS.md: Created.` / `TODOS.md: Reorganized.`
|
||||
|
||||
**6. Defensive:** If TODOS.md cannot be written (permission error, disk full), warn the user and continue. Never stop the ship workflow for a TODOS failure.
|
||||
|
||||
Save this summary — it goes into the PR body in Step 8.
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Commit (bisectable chunks)
|
||||
|
||||
**Goal:** Create small, logical commits that work well with `git bisect` and help LLMs understand what changed.
|
||||
@@ -271,7 +331,7 @@ For each classified comment:
|
||||
- **Infrastructure:** migrations, config changes, route additions
|
||||
- **Models & services:** new models, services, concerns (with their tests)
|
||||
- **Controllers & views:** controllers, views, JS/React components (with their tests)
|
||||
- **VERSION + CHANGELOG:** always in the final commit
|
||||
- **VERSION + CHANGELOG + TODOS.md:** always in the final commit
|
||||
|
||||
3. **Rules for splitting:**
|
||||
- A model and its test file go in the same commit
|
||||
@@ -329,6 +389,12 @@ gh pr create --title "<type>: <summary>" --body "$(cat <<'EOF'
|
||||
<If no Greptile comments found: "No Greptile comments.">
|
||||
<If no PR existed during Step 3.75: omit this section entirely>
|
||||
|
||||
## TODOS
|
||||
<If items marked complete: bullet list of completed items with version>
|
||||
<If no items completed: "No TODO items completed in this PR.">
|
||||
<If TODOS.md created or reorganized: note that>
|
||||
<If TODOS.md doesn't exist and user skipped: omit this section>
|
||||
|
||||
## Test plan
|
||||
- [x] All Rails tests pass (N runs, 0 failures)
|
||||
- [x] All Vitest tests pass (N tests)
|
||||
@@ -351,4 +417,6 @@ EOF
|
||||
- **Always use the 4-digit version format** from the VERSION file.
|
||||
- **Date format in CHANGELOG:** `YYYY-MM-DD`
|
||||
- **Split commits for bisectability** — each commit = one logical change.
|
||||
- **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done.
|
||||
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence (inline diff, code references, re-rank suggestion). Never post vague replies.
|
||||
- **The goal is: user says `/ship`, next thing they see is the review + PR URL.**
|
||||
|
||||
+76
-8
@@ -26,6 +26,8 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
|
||||
- Pre-landing review finds CRITICAL issues and user chooses to fix (not acknowledge or skip)
|
||||
- MINOR or MAJOR version bump needed (ask — see Step 4)
|
||||
- Greptile review comments that need user decision (complex fixes, false positives)
|
||||
- TODOS.md missing and user wants to create one (ask — see Step 5.5)
|
||||
- TODOS.md disorganized and user wants to reorganize (ask — see Step 5.5)
|
||||
|
||||
**Never stop for:**
|
||||
- Uncommitted changes (always include them)
|
||||
@@ -33,6 +35,7 @@ You are running the `/ship` workflow. This is a **non-interactive, fully automat
|
||||
- CHANGELOG content (auto-generate from diff)
|
||||
- Commit message approval (auto-commit)
|
||||
- Multi-file changesets (auto-split into bisectable commits)
|
||||
- TODOS.md completed-item detection (auto-mark)
|
||||
|
||||
---
|
||||
|
||||
@@ -176,7 +179,7 @@ Save the review output — it goes into the PR body in Step 8.
|
||||
|
||||
## Step 3.75: Address Greptile review comments (if PR exists)
|
||||
|
||||
Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, and classify steps.
|
||||
Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, classify, and **escalation detection** steps.
|
||||
|
||||
**If no PR exists, `gh` fails, API returns an error, or there are zero Greptile comments:** Skip this step silently. Continue to Step 4.
|
||||
|
||||
@@ -184,18 +187,20 @@ Read `.claude/skills/review/greptile-triage.md` and follow the fetch, filter, an
|
||||
|
||||
Include a Greptile summary in your output: `+ N Greptile comments (X valid, Y fixed, Z FP)`
|
||||
|
||||
Before replying to any comment, run the **Escalation Detection** algorithm from greptile-triage.md to determine whether to use Tier 1 (friendly) or Tier 2 (firm) reply templates.
|
||||
|
||||
For each classified comment:
|
||||
|
||||
**VALID & ACTIONABLE:** Use AskUserQuestion with:
|
||||
- The comment (file:line or [top-level] + body summary + permalink URL)
|
||||
- Your recommended fix
|
||||
- Options: A) Fix now (recommended), B) Acknowledge and ship anyway, C) It's a false positive
|
||||
- If user chooses A: apply the fix, commit the fixed files (`git add <fixed-files> && git commit -m "fix: address Greptile review — <brief description>"`), reply to the comment (`"Fixed in <commit-sha>."`), and save to both per-project and global greptile-history (see greptile-triage.md for write details, type: fix).
|
||||
- If user chooses C: reply explaining the false positive, save to both per-project and global greptile-history (type: fp).
|
||||
- If user chooses A: apply the fix, commit the fixed files (`git add <fixed-files> && git commit -m "fix: address Greptile review — <brief description>"`), reply using the **Fix reply template** from greptile-triage.md (include inline diff + explanation), and save to both per-project and global greptile-history (type: fix).
|
||||
- If user chooses C: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp).
|
||||
|
||||
**VALID BUT ALREADY FIXED:** Reply acknowledging the catch — no AskUserQuestion needed:
|
||||
- Post reply: `"Good catch — already fixed in <commit-sha>."`
|
||||
- Save to both per-project and global greptile-history (see greptile-triage.md for write details, type: already-fixed)
|
||||
**VALID BUT ALREADY FIXED:** Reply using the **Already Fixed reply template** from greptile-triage.md — no AskUserQuestion needed:
|
||||
- Include what was done and the fixing commit SHA
|
||||
- Save to both per-project and global greptile-history (type: already-fixed)
|
||||
|
||||
**FALSE POSITIVE:** Use AskUserQuestion:
|
||||
- Show the comment and why you think it's wrong (file:line or [top-level] + body summary + permalink URL)
|
||||
@@ -203,7 +208,7 @@ For each classified comment:
|
||||
- A) Reply to Greptile explaining the false positive (recommended if clearly wrong)
|
||||
- B) Fix it anyway (if trivial)
|
||||
- C) Ignore silently
|
||||
- If user chooses A: post reply using the appropriate API from the triage doc, save to both per-project and global greptile-history (type: fp)
|
||||
- If user chooses A: reply using the **False Positive reply template** from greptile-triage.md (include evidence + suggested re-rank), save to both per-project and global greptile-history (type: fp)
|
||||
|
||||
**SUPPRESSED:** Skip silently — these are known false positives from previous triage.
|
||||
|
||||
@@ -252,6 +257,61 @@ For each classified comment:
|
||||
|
||||
---
|
||||
|
||||
## Step 5.5: TODOS.md (auto-update)
|
||||
|
||||
Cross-reference the project's TODOS.md against the changes being shipped. Mark completed items automatically; prompt only if the file is missing or disorganized.
|
||||
|
||||
Read `.claude/skills/review/TODOS-format.md` for the canonical format reference.
|
||||
|
||||
**1. Check if TODOS.md exists** in the repository root.
|
||||
|
||||
**If TODOS.md does not exist:** Use AskUserQuestion:
|
||||
- Message: "GStack recommends maintaining a TODOS.md organized by skill/component, then priority (P0 at top through P4, then Completed at bottom). See TODOS-format.md for the full format. Would you like to create one?"
|
||||
- Options: A) Create it now, B) Skip for now
|
||||
- If A: Create `TODOS.md` with a skeleton (# TODOS heading + ## Completed section). Continue to step 3.
|
||||
- If B: Skip the rest of Step 5.5. Continue to Step 6.
|
||||
|
||||
**2. Check structure and organization:**
|
||||
|
||||
Read TODOS.md and verify it follows the recommended structure:
|
||||
- Items grouped under `## <Skill/Component>` headings
|
||||
- Each item has `**Priority:**` field with P0-P4 value
|
||||
- A `## Completed` section at the bottom
|
||||
|
||||
**If disorganized** (missing priority fields, no component groupings, no Completed section): Use AskUserQuestion:
|
||||
- Message: "TODOS.md doesn't follow the recommended structure (skill/component groupings, P0-P4 priority, Completed section). Would you like to reorganize it?"
|
||||
- Options: A) Reorganize now (recommended), B) Leave as-is
|
||||
- If A: Reorganize in-place following TODOS-format.md. Preserve all content — only restructure, never delete items.
|
||||
- If B: Continue to step 3 without restructuring.
|
||||
|
||||
**3. Detect completed TODOs:**
|
||||
|
||||
This step is fully automatic — no user interaction.
|
||||
|
||||
Use the diff and commit history already gathered in earlier steps:
|
||||
- `git diff main...HEAD` (full diff against main)
|
||||
- `git log main..HEAD --oneline` (all commits being shipped)
|
||||
|
||||
For each TODO item, check if the changes in this PR complete it by:
|
||||
- Matching commit messages against the TODO title and description
|
||||
- Checking if files referenced in the TODO appear in the diff
|
||||
- Checking if the TODO's described work matches the functional changes
|
||||
|
||||
**Be conservative:** Only mark a TODO as completed if there is clear evidence in the diff. If uncertain, leave it alone.
|
||||
|
||||
**4. Move completed items** to the `## Completed` section at the bottom. Append: `**Completed:** vX.Y.Z (YYYY-MM-DD)`
|
||||
|
||||
**5. Output summary:**
|
||||
- `TODOS.md: N items marked complete (item1, item2, ...). M items remaining.`
|
||||
- Or: `TODOS.md: No completed items detected. M items remaining.`
|
||||
- Or: `TODOS.md: Created.` / `TODOS.md: Reorganized.`
|
||||
|
||||
**6. Defensive:** If TODOS.md cannot be written (permission error, disk full), warn the user and continue. Never stop the ship workflow for a TODOS failure.
|
||||
|
||||
Save this summary — it goes into the PR body in Step 8.
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Commit (bisectable chunks)
|
||||
|
||||
**Goal:** Create small, logical commits that work well with `git bisect` and help LLMs understand what changed.
|
||||
@@ -262,7 +322,7 @@ For each classified comment:
|
||||
- **Infrastructure:** migrations, config changes, route additions
|
||||
- **Models & services:** new models, services, concerns (with their tests)
|
||||
- **Controllers & views:** controllers, views, JS/React components (with their tests)
|
||||
- **VERSION + CHANGELOG:** always in the final commit
|
||||
- **VERSION + CHANGELOG + TODOS.md:** always in the final commit
|
||||
|
||||
3. **Rules for splitting:**
|
||||
- A model and its test file go in the same commit
|
||||
@@ -320,6 +380,12 @@ gh pr create --title "<type>: <summary>" --body "$(cat <<'EOF'
|
||||
<If no Greptile comments found: "No Greptile comments.">
|
||||
<If no PR existed during Step 3.75: omit this section entirely>
|
||||
|
||||
## TODOS
|
||||
<If items marked complete: bullet list of completed items with version>
|
||||
<If no items completed: "No TODO items completed in this PR.">
|
||||
<If TODOS.md created or reorganized: note that>
|
||||
<If TODOS.md doesn't exist and user skipped: omit this section>
|
||||
|
||||
## Test plan
|
||||
- [x] All Rails tests pass (N runs, 0 failures)
|
||||
- [x] All Vitest tests pass (N tests)
|
||||
@@ -342,4 +408,6 @@ EOF
|
||||
- **Always use the 4-digit version format** from the VERSION file.
|
||||
- **Date format in CHANGELOG:** `YYYY-MM-DD`
|
||||
- **Split commits for bisectability** — each commit = one logical change.
|
||||
- **TODOS.md completion detection must be conservative.** Only mark items as completed when the diff clearly shows the work is done.
|
||||
- **Use Greptile reply templates from greptile-triage.md.** Every reply includes evidence (inline diff, code references, re-rank suggestion). Never post vague replies.
|
||||
- **The goal is: user says `/ship`, next thing they see is the review + PR URL.**
|
||||
|
||||
@@ -361,6 +361,29 @@ describe('Greptile history format consistency', () => {
|
||||
});
|
||||
});
|
||||
|
||||
// --- Part 7b: TODOS-format.md reference consistency ---
|
||||
|
||||
describe('TODOS-format.md reference consistency', () => {
|
||||
test('review/TODOS-format.md exists and defines canonical format', () => {
|
||||
const content = fs.readFileSync(path.join(ROOT, 'review', 'TODOS-format.md'), 'utf-8');
|
||||
expect(content).toContain('**What:**');
|
||||
expect(content).toContain('**Why:**');
|
||||
expect(content).toContain('**Priority:**');
|
||||
expect(content).toContain('**Effort:**');
|
||||
expect(content).toContain('## Completed');
|
||||
});
|
||||
|
||||
test('skills that write TODOs reference TODOS-format.md', () => {
|
||||
const shipContent = fs.readFileSync(path.join(ROOT, 'ship', 'SKILL.md'), 'utf-8');
|
||||
const ceoPlanContent = fs.readFileSync(path.join(ROOT, 'plan-ceo-review', 'SKILL.md'), 'utf-8');
|
||||
const engPlanContent = fs.readFileSync(path.join(ROOT, 'plan-eng-review', 'SKILL.md'), 'utf-8');
|
||||
|
||||
expect(shipContent).toContain('TODOS-format.md');
|
||||
expect(ceoPlanContent).toContain('TODOS-format.md');
|
||||
expect(engPlanContent).toContain('TODOS-format.md');
|
||||
});
|
||||
});
|
||||
|
||||
// --- Part 7: Planted-bug fixture validation (A4) ---
|
||||
|
||||
describe('Planted-bug fixture validation', () => {
|
||||
|
||||
Reference in New Issue
Block a user