mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 03:35:09 +02:00
07b4e15b34
* fix: cookie import picker returns JSON instead of HTML jsonResponse() was defined at module scope but referenced `url` which only existed as a parameter of handleCookiePickerRoute(). Every API call crashed, the catch block also crashed, and Bun returned a default HTML page that the frontend couldn't parse as JSON. Thread port via corsOrigin() helper and options objects. Add route-level tests to prevent this class of bug from shipping again. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add help command to browse server Agents that don't have SKILL.md loaded (or misread flags) had no way to self-discover the CLI. The help command returns a formatted reference of all commands and snapshot flags. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: version-aware find-browse with META signal protocol Agents in other workspaces found stale browse binaries that were missing newer flags. find-browse now compares the local binary's git SHA against origin/main via git ls-remote (4hr cache), and emits META:UPDATE_AVAILABLE when behind. SKILL.md setup checks parse META signals and prompt the user to update. - New compiled binary: browse/dist/find-browse (TypeScript, testable) - Bash shim at browse/bin/find-browse delegates to compiled binary - .version file written at build time with git commit SHA - Build script compiles both browse and find-browse binaries - Graceful degradation: offline, missing .version, corrupt cache all skip check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: clean up .bun-build temp files after compile bun build --compile leaves ~58MB temp files in the working directory. Add rm -f .*.bun-build to the build script to clean up after each build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: make help command reachable by removing it from META_COMMANDS help was in META_COMMANDS, so it dispatched to handleMetaCommand() which threw "Unknown meta command: help". Removing it from the set lets the dedicated else-if handler in handleCommand() execute correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: bump version and changelog (v0.3.2) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add shared Greptile comment triage reference doc Shared reference for fetching, filtering, and classifying Greptile review comments on GitHub PRs. Used by both /review and /ship skills. Includes parallel API fetching, suppressions check, classification logic, reply APIs, and history file writes. * feat: make /review and /ship Greptile-aware /review: Step 2.5 fetches and classifies Greptile comments, Step 5 resolves them with AskUserQuestion for valid issues and false positives. /ship: Step 3.75 triages Greptile comments between pre-landing review and version bump. Adds Greptile Review section to PR body in Step 8. Re-runs tests if any Greptile fixes are applied. * feat: add Greptile batting average to /retro Reads ~/.gstack/greptile-history.md, computes signal ratio (valid catches vs false positives), includes in metrics table, JSON snapshot, and Code Quality Signals narrative. * docs: add Greptile integration section to README Personal endorsement, two-layer review narrative, full UX walkthrough transcript, skills table updates. Add Greptile training feedback loop to TODO.md future ideas. * feat: add local dev mode for testing skills from within the repo bin/dev-setup creates .claude/skills/gstack symlink to the working tree so Claude Code discovers skills locally. bin/dev-teardown cleans up. DEVELOPING_GSTACK.md documents the workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: narrow gitignore to .claude/skills/ instead of all .claude/ Avoids ignoring legitimate Claude Code config like settings.json or CLAUDE.md. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: rename DEVELOPING_GSTACK.md to CONTRIBUTING.md Rewritten as a contributor-friendly guide instead of a dry plan doc. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: explain why dev-setup is needed in CONTRIBUTING.md quick start Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add browser interaction guidance to CLAUDE.md Prevents Claude from using mcp__claude-in-chrome__* tools instead of /browse. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add shared config module for project-local browse state Centralizes path resolution (git root detection, state dir, log paths) into config.ts. Both cli.ts and server.ts import from it, eliminating duplicated PORT_OFFSET/BROWSE_PORT/STATE_FILE logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: rewrite port selection to use random ports Replace CONDUCTOR_PORT magic offset and 9400-9409 scan with random port 10000-60000. Atomic state file writes, log paths from config module, binaryVersion field for auto-restart on update. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: move browse state from /tmp to project-local .gstack/ CLI now uses config module for state paths, passes BROWSE_STATE_FILE to spawned server. Adds version mismatch auto-restart, legacy /tmp cleanup with PID verification, and removes stale global install fallback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update crash log path reference to .gstack/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add config tests and update CLI lifecycle test 14 new tests for config resolution, ensureStateDir, readVersionHash, resolveServerScript, and version mismatch detection. Remove obsolete CONDUCTOR_PORT/BROWSE_PORT filtering from commands.test.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update BROWSER.md and TODO.md for project-local state Replace /tmp paths with .gstack/, remove CONDUCTOR_PORT docs, document random port selection and per-project isolation. Add server bundling TODO. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update README, CHANGELOG, and CONTRIBUTING for v0.3.2 - README: replace Conductor-aware language with project-local isolation, add Greptile setup note - CHANGELOG: comprehensive v0.3.2 entry with all state management changes - CONTRIBUTING: add instructions for testing branches in other repos Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add diff-aware mode to /qa — auto-tests affected pages from branch diff When on a feature branch, /qa now reads git diff main, identifies affected pages/routes from changed files, and tests them automatically. No URL required. The most natural flow: write code, /ship, /qa. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: update CHANGELOG for complete v0.3.2 coverage Add missing entries: diff-aware QA mode, Greptile integration, local dev mode, crash log path fix, README/SKILL.md updates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
118 lines
6.7 KiB
Markdown
118 lines
6.7 KiB
Markdown
# TODO — gstack roadmap
|
|
|
|
## Phase 1: Foundations (v0.2.0)
|
|
- [x] Rename to gstack
|
|
- [x] Restructure to monorepo layout
|
|
- [x] Setup script for skill symlinks
|
|
- [x] Snapshot command with ref-based element selection
|
|
- [x] Snapshot tests
|
|
|
|
## Phase 2: Enhanced Browser (v0.2.0) ✅
|
|
- [x] Annotated screenshots (--annotate flag, ref labels overlaid on screenshot)
|
|
- [x] Snapshot diffing (--diff flag, unified diff against previous snapshot)
|
|
- [x] Dialog handling (auto-accept/dismiss, dialog buffer, prevents browser lockup)
|
|
- [x] File upload (upload <sel> <files>)
|
|
- [x] Cursor-interactive elements (-C flag, cursor:pointer/onclick/tabindex scan)
|
|
- [x] Element state checks (is visible/hidden/enabled/disabled/checked/editable/focused)
|
|
- [x] CircularBuffer — O(1) ring buffer for console/network/dialog (was O(n) array+shift)
|
|
- [x] Async buffer flush with Bun.write() (was appendFileSync)
|
|
- [x] Health check with page.evaluate('1') + 2s timeout
|
|
- [x] Playwright error wrapping — actionable messages for AI agents
|
|
- [x] Fix useragent — context recreation preserves cookies/storage/URLs
|
|
- [x] DRY: getCleanText exported, command sets in chain updated
|
|
- [x] 148 integration tests (was ~63)
|
|
|
|
## Phase 3: QA Testing Agent (v0.3.0)
|
|
- [x] `/qa` SKILL.md — 6-phase workflow: Initialize → Authenticate → Orient → Explore → Document → Wrap up
|
|
- [x] Issue taxonomy reference (7 categories: visual, functional, UX, content, performance, console, accessibility)
|
|
- [x] Severity classification (critical/high/medium/low)
|
|
- [x] Exploration checklist per page
|
|
- [x] Report template (structured markdown with per-issue evidence)
|
|
- [x] Repro-first philosophy: every issue gets evidence before moving on
|
|
- [x] Two evidence tiers: interactive bugs (multi-step screenshots), static bugs (single annotated screenshot)
|
|
- [x] Key guidance: 5-10 well-documented issues per session, depth over breadth, write incrementally
|
|
- [x] Three modes: full (systematic), quick (30-second smoke test), regression (compare against baseline)
|
|
- [x] Framework detection guidance (Next.js, Rails, WordPress, SPA)
|
|
- [x] Health score rubric (7 categories, weighted average)
|
|
- [x] `wait --networkidle` / `wait --load` / `wait --domcontentloaded`
|
|
- [x] `console --errors` (filter to error/warning only)
|
|
- [x] `cookie-import <json-file>` (bulk cookie import with auto-fill domain)
|
|
- [x] `browse/bin/find-browse` (DRY binary discovery across skills)
|
|
- [ ] Video recording (deferred to Phase 5 — recreateContext destroys page state)
|
|
|
|
## Phase 3.5: Browser Cookie Import (v0.3.x)
|
|
- [x] `cookie-import-browser` command (Chromium cookie DB decryption)
|
|
- [x] Cookie picker web UI (served from browse server)
|
|
- [x] `/setup-browser-cookies` skill
|
|
- [x] Unit tests with encrypted cookie fixtures (18 tests)
|
|
- [x] Browser registry (Comet, Chrome, Arc, Brave, Edge)
|
|
|
|
## Phase 3.6: Visual PR Annotations + S3 Upload
|
|
- [ ] `/setup-gstack-upload` skill (configure S3 bucket for image hosting)
|
|
- [ ] `browse/bin/gstack-upload` helper (upload file to S3, return public URL)
|
|
- [ ] `/ship` Step 7.5: visual verification with screenshots in PR body
|
|
- [ ] `/review` Step 4.5: visual review with annotated screenshots in PR
|
|
- [ ] WebM → GIF conversion (ffmpeg) for video evidence in PRs
|
|
- [ ] README documentation for visual PR annotations
|
|
|
|
## Phase 4: Skill + Browser Integration
|
|
- [ ] ship + browse: post-deploy verification
|
|
- Browse staging/preview URL after push
|
|
- Screenshot key pages
|
|
- Check console for JS errors
|
|
- Compare staging vs prod via snapshot diff
|
|
- Include verification screenshots in PR body
|
|
- STOP if critical errors found
|
|
- [ ] review + browse: visual diff review
|
|
- Browse PR's preview deploy
|
|
- Annotated screenshots of changed pages
|
|
- Compare against production visually
|
|
- Check responsive layouts (mobile/tablet/desktop)
|
|
- Verify accessibility tree hasn't regressed
|
|
- [ ] deploy-verify skill: lightweight post-deploy smoke test
|
|
- Hit key URLs, verify 200s
|
|
- Screenshot critical pages
|
|
- Console error check
|
|
- Compare against baseline snapshots
|
|
- Pass/fail with evidence
|
|
|
|
## Phase 5: State & Sessions
|
|
- [ ] Bundle server.ts into compiled binary (eliminate resolveServerScript() fallback chain entirely) (P2, M)
|
|
- [ ] v20 encryption format support (AES-256-GCM) — future Chromium versions may change from v10
|
|
- [ ] Sessions (isolated browser instances with separate cookies/storage/history)
|
|
- [ ] State persistence (save/load cookies + localStorage to JSON files)
|
|
- [ ] Auth vault (encrypted credential storage, referenced by name, LLM never sees passwords)
|
|
- [ ] Video recording (record start/stop — needs sessions for clean context lifecycle)
|
|
- [ ] retro + browse: deployment health tracking
|
|
- Screenshot production state
|
|
- Check perf metrics (page load times)
|
|
- Count console errors across key pages
|
|
- Track trends over retro window
|
|
|
|
## Phase 6: Advanced Browser
|
|
- [ ] Iframe support (frame <sel>, frame main)
|
|
- [ ] Semantic locators (find role/label/text/placeholder/testid with actions)
|
|
- [ ] Device emulation presets (set device "iPhone 16 Pro")
|
|
- [ ] Network mocking/routing (intercept, block, mock requests)
|
|
- [ ] Download handling (click-to-download with path control)
|
|
- [ ] Content safety (--max-output truncation, --allowed-domains)
|
|
- [ ] Streaming (WebSocket live preview for pair browsing)
|
|
- [ ] CDP mode (connect to already-running Chrome/Electron apps)
|
|
|
|
## Future Ideas
|
|
- [ ] Linux/Windows cookie decryption (GNOME Keyring / kwallet / DPAPI)
|
|
- [ ] Trend tracking across QA runs — compare baseline.json over time, detect regressions (P2, S)
|
|
- [ ] CI/CD integration — `/qa` as GitHub Action step, fail PR if health score drops (P2, M)
|
|
- [ ] Accessibility audit mode — `--a11y` flag for focused accessibility testing (P3, S)
|
|
- [ ] Greptile training feedback loop — export suppression patterns to Greptile team for model improvement (P3, S)
|
|
|
|
## Ideas & Notes
|
|
- Browser is the nervous system — every skill should be able to see, interact with, and verify the web
|
|
- Skills are the product; the browser enables them
|
|
- One repo, one install, entire AI engineering workflow
|
|
- Bun compiled binary matches Rust CLI performance for this use case (bottleneck is Chromium, not CLI parsing)
|
|
- Accessibility tree snapshots use ~200-400 tokens vs ~3000-5000 for full DOM — critical for AI context efficiency
|
|
- Locator map approach for refs: store Map<string, Locator> on BrowserManager, no DOM mutation, no CSP issues
|
|
- Snapshot scoping (-i, -c, -d, -s flags) is critical for performance on large pages
|
|
- All new commands follow existing pattern: add to command set, add switch case, return string
|