* feat: add escalation protocol to preamble — all skills get DONE/BLOCKED/NEEDS_CONTEXT Every skill now reports completion status (DONE, DONE_WITH_CONCERNS, BLOCKED, NEEDS_CONTEXT) and has escalation rules: 3 failed attempts → STOP, security uncertainty → STOP, scope exceeds verification → STOP. "It is always OK to stop and say 'this is too hard for me.'" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add verification gate to /ship (Step 6.5) — no push without fresh evidence Before pushing, re-verify tests if code changed during review fixes. Rationalization prevention: "Should work now" → RUN IT. "I'm confident" → Confidence is not evidence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add scope drift detection + verification of claims to /review Step 1.5: Before reviewing code quality, check if the diff matches stated intent. Flags scope creep and missing requirements (INFORMATIONAL). Step 5 addition: Every review claim must cite evidence — "this pattern is safe" needs a line reference, "tests cover this" needs a test name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: mandatory implementation alternatives + design doc lookup in /plan-ceo-review Step 0C-bis: Every plan must consider 2-3 approaches (minimal viable vs ideal architecture) before mode selection. RECOMMENDATION required. Pre-Review System Audit now checks ~/.gstack/projects/ for /brainstorm design docs (branch-filtered with fallback). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: design doc lookup in /plan-eng-review + fix branch name sanitization Step 0 now checks ~/.gstack/projects/ for /brainstorm design docs (branch-filtered with fallback, reads Supersedes: for revision context). Fix: branch names with '/' (e.g. garrytan/better-process) now get sanitized via tr '/' '-' in test plan artifact filenames. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: new /brainstorm and /debug skills /brainstorm: Socratic design exploration before planning. Context gathering, clarifying questions (smart-skip), related design discovery (keyword grep), premise challenge, forced alternatives, design doc artifact with lineage tracking (Supersedes: field). Writes to ~/.gstack/projects/$SLUG/. /debug: Systematic root-cause debugging. Iron Law: no fixes without root cause investigation. Pattern analysis, hypothesis testing with 3-strike escalation, structured DEBUG REPORT output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: structural tests for new skills + escalation protocol assertions Add brainstorm + debug to skillsWithUpdateCheck and skillsWithPreamble arrays. Add structural tests: brainstorm (Phase 1-6, Design Doc, Supersedes, Smart-skip), debug (Iron Law, Root Cause, Pattern Analysis, Hypothesis, DEBUG REPORT, 3-strike). Add escalation protocol tests (DONE_WITH_CONCERNS, BLOCKED, NEEDS_CONTEXT) for all preamble skills. Also: 2 new TODOs (design docs → Supabase sync, /plan-design-review skill), update CLAUDE.md project structure with new skill directories. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.6.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: rename /brainstorm → /office-hours across references Update CHANGELOG, CLAUDE.md, TODOS, design-consultation, plan-ceo-review, and gen-skill-docs to reference the new office-hours skill name. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: YC Office Hours — dual-mode product diagnostic + builder brainstorm Rewrite /office-hours with two modes: Startup mode: six forcing questions (Demand Reality, Status Quo, Desperate Specificity, Narrowest Wedge, Observation & Surprise, Future-Fit) that push founders toward radical honesty about demand, users, and product decisions. Includes smart routing by product stage, intrapreneurship adaptation, and YC apply CTA for strong-signal founders. Builder mode: generative brainstorming for side projects, hackathons, learning, and open source. Enthusiastic collaborator tone, design thinking questions, no business interrogation. Mode is determined by an explicit question in Phase 1 — no guessing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add 14 assertions for YC Office Hours content coverage Validates dual-mode structure (Startup/Builder), all six forcing questions, builder brainstorming content, intrapreneurship adaptation, YC apply CTA, and operating principles for both modes. 192 tests total, all passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.6.1 - README.md: added /office-hours and /debug to skills table, updated skill count from 13 to 15, added both to install instructions - docs/skills.md: added /office-hours and /debug deep dive sections - CLAUDE.md: updated office-hours description to reflect dual-mode - CONTRIBUTING.md: updated skill count from 13 to 15 - CHANGELOG.md: added YC Office Hours and /debug entries to 0.6.0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: founder discovery engine in /office-hours (v0.7.0) Turn /office-hours into a YC founder discovery engine. Every session now ends with three beats: signal reflection (specific callbacks to what the user said), "One more thing." transition, and a personal plea from Garry Tan with three tiers based on founder signal strength. Top tier uses AskUserQuestion to ask directly and opens ycombinator.com/apply?ref=gstack. Adds Phase 4.5 (Founder Signal Synthesis), "What I noticed about how you think" section to both design doc templates, anti-slop GOOD/BAD examples, and emotional targets per tier. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add validation assertions for founder discovery engine 8 new assertions covering: YC apply CTA with ref=gstack tracking, "What I noticed" design doc section, golden age framing, Garry Tan personal plea, founder signal synthesis phase, three-tier decision rubric, anti-slop GOOD/BAD examples, "One more thing" transition beat. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update project documentation for v0.7.0 VERSION: 0.6.4.1 → 0.7.0 CHANGELOG: new entry — Office Hours Gets Personal README: updated /office-hours and /plan-design-review descriptions docs/skills.md: updated /office-hours table + deep dive section TODOS.md: added /yc-prep skill TODO (P2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove duplicate Install section, fix stale skills lists, deduplicate CHANGELOG entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
22 KiB
TODOS
Browse
Bundle server.ts into compiled binary
What: Eliminate resolveServerScript() fallback chain entirely — bundle server.ts into the compiled browse binary.
Why: The current fallback chain (check adjacent to cli.ts, check global install) is fragile and caused bugs in v0.3.2. A single compiled binary is simpler and more reliable.
Context: Bun's --compile flag can bundle multiple entry points. The server is currently resolved at runtime via file path lookup. Bundling it removes the resolution step entirely.
Effort: M Priority: P2 Depends on: None
Sessions (isolated browser instances)
What: Isolated browser instances with separate cookies/storage/history, addressable by name.
Why: Enables parallel testing of different user roles, A/B test verification, and clean auth state management.
Context: Requires Playwright browser context isolation. Each session gets its own context with independent cookies/localStorage. Prerequisite for video recording (clean context lifecycle) and auth vault.
Effort: L Priority: P3
Video recording
What: Record browser interactions as video (start/stop controls).
Why: Video evidence in QA reports and PR bodies. Currently deferred because recreateContext() destroys page state.
Context: Needs sessions for clean context lifecycle. Playwright supports video recording per context. Also needs WebM → GIF conversion for PR embedding.
Effort: M Priority: P3 Depends on: Sessions
v20 encryption format support
What: AES-256-GCM support for future Chromium cookie DB versions (currently v10).
Why: Future Chromium versions may change encryption format. Proactive support prevents breakage.
Effort: S Priority: P3
State persistence
What: Save/load cookies + localStorage to JSON files for reproducible test sessions.
Why: Enables "resume where I left off" for QA sessions and repeatable auth states.
Effort: M Priority: P3 Depends on: Sessions
Auth vault
What: Encrypted credential storage, referenced by name. LLM never sees passwords.
Why: Security — currently auth credentials flow through the LLM context. Vault keeps secrets out of the AI's view.
Effort: L Priority: P3 Depends on: Sessions, state persistence
Iframe support
What: frame <sel> and frame main commands for cross-frame interaction.
Why: Many web apps use iframes (embeds, payment forms, ads). Currently invisible to browse.
Effort: M Priority: P4
Semantic locators
What: find role/label/text/placeholder/testid with attached actions.
Why: More resilient element selection than CSS selectors or ref numbers.
Effort: M Priority: P4
Device emulation presets
What: set device "iPhone 16 Pro" for mobile/tablet testing.
Why: Responsive layout testing without manual viewport resizing.
Effort: S Priority: P4
Network mocking/routing
What: Intercept, block, and mock network requests.
Why: Test error states, loading states, and offline behavior.
Effort: M Priority: P4
Download handling
What: Click-to-download with path control.
Why: Test file download flows end-to-end.
Effort: S Priority: P4
Content safety
What: --max-output truncation, --allowed-domains filtering.
Why: Prevent context window overflow and restrict navigation to safe domains.
Effort: S Priority: P4
Streaming (WebSocket live preview)
What: WebSocket-based live preview for pair browsing sessions.
Why: Enables real-time collaboration — human watches AI browse.
Effort: L Priority: P4
CDP mode
What: Connect to already-running Chrome/Electron apps via Chrome DevTools Protocol.
Why: Test production apps, Electron apps, and existing browser sessions without launching new instances.
Effort: M Priority: P4
Linux/Windows cookie decryption
What: GNOME Keyring / kwallet / DPAPI support for non-macOS cookie import.
Why: Cross-platform cookie import. Currently macOS-only (Keychain).
Effort: L Priority: P4
Ship
Ship log — persistent record of /ship runs
What: Append structured JSON entry to .gstack/ship-log.json at end of every /ship run (version, date, branch, PR URL, review findings, Greptile stats, todos completed, test results).
Why: /retro has no structured data about shipping velocity. Ship log enables: PRs-per-week trending, review finding rates, Greptile signal over time, test suite growth.
Context: /retro already reads greptile-history.md — same pattern. Eval persistence (eval-store.ts) shows the JSON append pattern exists in the codebase. ~15 lines in ship template.
Effort: S Priority: P2 Depends on: None
Post-deploy verification (ship + browse)
What: After push, browse staging/preview URL, screenshot key pages, check console for JS errors, compare staging vs prod via snapshot diff. Include verification screenshots in PR body. STOP if critical errors found.
Why: Catch deployment-time regressions (JS errors, broken layouts) before merge.
Context: Requires S3 upload infrastructure for PR screenshots. Pairs with visual PR annotations.
Effort: L Priority: P2 Depends on: /setup-gstack-upload, visual PR annotations
Visual verification with screenshots in PR body
What: /ship Step 7.5: screenshot key pages after push, embed in PR body.
Why: Visual evidence in PRs. Reviewers see what changed without deploying locally.
Context: Part of Phase 3.6. Needs S3 upload for image hosting.
Effort: M Priority: P2 Depends on: /setup-gstack-upload
Review
Inline PR annotations
What: /ship and /review post inline review comments at specific file:line locations using gh api to create pull request review comments.
Why: Line-level annotations are more actionable than top-level comments. The PR thread becomes a line-by-line conversation between Greptile, Claude, and human reviewers.
Context: GitHub supports inline review comments via gh api repos/$REPO/pulls/$PR/reviews. Pairs naturally with Phase 3.6 visual annotations.
Effort: S Priority: P2 Depends on: None
Greptile training feedback export
What: Aggregate greptile-history.md into machine-readable JSON summary of false positive patterns, exportable to the Greptile team for model improvement.
Why: Closes the feedback loop — Greptile can use FP data to stop making the same mistakes on your codebase.
Context: Was a P3 Future Idea. Upgraded to P2 now that greptile-history.md data infrastructure exists. The signal data is already being collected; this just makes it exportable. ~40 lines.
Effort: S Priority: P2 Depends on: Enough FP data accumulated (10+ entries)
Visual review with annotated screenshots
What: /review Step 4.5: browse PR's preview deploy, annotated screenshots of changed pages, compare against production, check responsive layouts, verify accessibility tree.
Why: Visual diff catches layout regressions that code review misses.
Context: Part of Phase 3.6. Needs S3 upload for image hosting.
Effort: M Priority: P2 Depends on: /setup-gstack-upload
QA
QA trend tracking
What: Compare baseline.json over time, detect regressions across QA runs.
Why: Spot quality trends — is the app getting better or worse?
Context: QA already writes structured reports. This adds cross-run comparison.
Effort: S Priority: P2
CI/CD QA integration
What: /qa as GitHub Action step, fail PR if health score drops.
Why: Automated quality gate in CI. Catch regressions before merge.
Effort: M Priority: P2
Smart default QA tier
What: After a few runs, check index.md for user's usual tier pick, skip the AskUserQuestion.
Why: Reduces friction for repeat users.
Effort: S Priority: P2
Accessibility audit mode
What: --a11y flag for focused accessibility testing.
Why: Dedicated accessibility testing beyond the general QA checklist.
Effort: S Priority: P3
CI/CD generation for non-GitHub providers
What: Extend CI/CD bootstrap to generate GitLab CI (.gitlab-ci.yml), CircleCI (.circleci/config.yml), and Bitrise pipelines.
Why: Not all projects use GitHub Actions. Universal CI/CD bootstrap would make test bootstrap work for everyone.
Context: v1 ships with GitHub Actions only. Detection logic already checks for .gitlab-ci.yml, .circleci/, bitrise.yml and skips with an informational note. Each provider needs ~20 lines of template text in generateTestBootstrap().
Effort: M Priority: P3 Depends on: Test bootstrap (shipped)
Auto-upgrade weak tests (★) to strong tests (★★★)
What: When Step 3.4 coverage audit identifies existing ★-rated tests (smoke/trivial assertions), generate improved versions testing edge cases and error paths.
Why: Many codebases have tests that technically exist but don't catch real bugs — expect(component).toBeDefined() isn't testing behavior. Upgrading these closes the gap between "has tests" and "has good tests."
Context: Requires the quality scoring rubric from the test coverage audit. Modifying existing test files is riskier than creating new ones — needs careful diffing to ensure the upgraded test still passes. Consider creating a companion test file rather than modifying the original.
Effort: M Priority: P3 Depends on: Test quality scoring (shipped)
Retro
Deployment health tracking (retro + browse)
What: Screenshot production state, check perf metrics (page load times), count console errors across key pages, track trends over retro window.
Why: Retro should include production health alongside code metrics.
Context: Requires browse integration. Screenshots + metrics fed into retro output.
Effort: L Priority: P3 Depends on: Browse sessions
Infrastructure
/setup-gstack-upload skill (S3 bucket)
What: Configure S3 bucket for image hosting. One-time setup for visual PR annotations.
Why: Prerequisite for visual PR annotations in /ship and /review.
Effort: M Priority: P2
gstack-upload helper
What: browse/bin/gstack-upload — upload file to S3, return public URL.
Why: Shared utility for all skills that need to embed images in PRs.
Effort: S Priority: P2 Depends on: /setup-gstack-upload
WebM to GIF conversion
What: ffmpeg-based WebM → GIF conversion for video evidence in PRs.
Why: GitHub PR bodies render GIFs but not WebM. Needed for video recording evidence.
Effort: S Priority: P3 Depends on: Video recording
Deploy-verify skill
What: Lightweight post-deploy smoke test: hit key URLs, verify 200s, screenshot critical pages, console error check, compare against baseline snapshots. Pass/fail with evidence.
Why: Fast post-deploy confidence check, separate from full QA.
Effort: M Priority: P2
GitHub Actions eval upload
What: Run eval suite in CI, upload result JSON as artifact, post summary comment on PR.
Why: CI integration catches quality regressions before merge and provides persistent eval records per PR.
Context: Requires ANTHROPIC_API_KEY in CI secrets. Cost is ~$4/run. Eval persistence system (v0.3.6) writes JSON to ~/.gstack-dev/evals/ — CI would upload as GitHub Actions artifacts and use eval:compare to post delta comment.
Effort: M Priority: P2 Depends on: Eval persistence (shipped in v0.3.6)
E2E model pinning
What: Pin E2E tests to claude-sonnet-4-6 for cost efficiency, add retry:2 for flaky LLM responses.
Why: Reduce E2E test cost and flakiness.
Effort: XS Priority: P2
Eval web dashboard
What: bun run eval:dashboard serves local HTML with charts: cost trending, detection rate, pass/fail history.
Why: Visual charts better for spotting trends than CLI tools.
Context: Reads ~/.gstack-dev/evals/*.json. ~200 lines HTML + chart.js via Bun HTTP server.
Effort: M Priority: P3 Depends on: Eval persistence (shipped in v0.3.6)
CI/CD QA quality gate
What: Run /qa as a GitHub Action step, fail PR if health score drops below threshold.
Why: Automated quality gate catches regressions before merge. Currently QA is manual — CI integration makes it part of the standard workflow.
Context: Requires headless browse binary available in CI. The /qa skill already produces baseline.json with health scores — CI step would compare against the main branch baseline and fail if score drops. Would need ANTHROPIC_API_KEY in CI secrets since /qa uses Claude.
Effort: M Priority: P2 Depends on: None
Cross-platform URL open helper
What: gstack-open-url helper script — detect platform, use open (macOS) or xdg-open (Linux).
Why: The first-time Completeness Principle intro uses macOS open to launch the essay. If gstack ever supports Linux, this silently fails.
Effort: S (human: ~30 min / CC: ~2 min) Priority: P4 Depends on: Nothing
CDP-based DOM mutation detection for ref staleness
What: Use Chrome DevTools Protocol DOM.documentUpdated / MutationObserver events to proactively invalidate stale refs when the DOM changes, without requiring an explicit snapshot call.
Why: Current ref staleness detection (async count() check) only catches stale refs at action time. CDP mutation detection would proactively warn when refs become stale, preventing the 5-second timeout entirely for SPA re-renders.
Context: Parts 1+2 of ref staleness fix (RefEntry metadata + eager validation via count()) are shipped. This is Part 3 — the most ambitious piece. Requires CDP session alongside Playwright, MutationObserver bridge, and careful performance tuning to avoid overhead on every DOM change.
Effort: L Priority: P3 Depends on: Ref staleness Parts 1+2 (shipped)
Office Hours / Design
Design docs → Supabase team store sync
What: Add design docs (*-design-*.md) to the Supabase sync pipeline alongside test plans, retro snapshots, and QA reports.
Why: Cross-team design discovery at scale. Local ~/.gstack/projects/$SLUG/ keyword-grep discovery works for same-machine users now, but Supabase sync makes it work across the whole team. Duplicate ideas surface, everyone sees what's been explored.
Context: /office-hours writes design docs to ~/.gstack/projects/$SLUG/. The team store already syncs test plans, retro snapshots, QA reports. Design docs follow the same pattern — just add a sync adapter.
Effort: S
Priority: P2
Depends on: garrytan/team-supabase-store branch landing on main
/yc-prep skill
What: Skill that helps founders prepare their YC application after /office-hours identifies strong signal. Pulls from the design doc, structures answers to YC app questions, runs a mock interview.
Why: Closes the loop. /office-hours identifies the founder, /yc-prep helps them apply well. The design doc already contains most of the raw material for a YC application.
Effort: M (human: ~2 weeks / CC: ~2 hours) Priority: P2 Depends on: office-hours founder discovery engine shipping first
Design Review
/plan-design-review + /qa-design-review + /design-consultation — SHIPPED
Shipped as v0.5.0 on main. Includes /plan-design-review (report-only design audit), /qa-design-review (audit + fix loop), and /design-consultation (interactive DESIGN.md creation). {{DESIGN_METHODOLOGY}} resolver provides shared 80-item design audit checklist.
Document-Release
Auto-invoke /document-release from /ship
What: Add Step 8.5 to /ship that reads document-release/SKILL.md and executes the doc update workflow after creating the PR.
Why: Zero-friction doc updates — user runs /ship and docs are automatically current. No extra command to remember.
Context: /ship currently ends at Step 8 (PR URL output). Step 8.5 would continue into the document-release workflow. Same pattern as /ship calling /review's checklist in Step 3.5.
Effort: S Priority: P1 Depends on: /document-release shipped
{{DOC_VOICE}} shared resolver
What: Create a placeholder resolver in gen-skill-docs.ts encoding the gstack voice guide (friendly, user-forward, lead with benefits). Inject into /ship Step 5, /document-release Step 5, and reference from CLAUDE.md.
Why: DRY — voice rules currently live inline in 3 places (CLAUDE.md CHANGELOG style section, /ship Step 5, /document-release Step 5). When the voice evolves, all three drift.
Context: Same pattern as {{QA_METHODOLOGY}} — shared block injected into multiple templates to prevent drift. ~20 lines in gen-skill-docs.ts.
Effort: S Priority: P2 Depends on: None
Ship Confidence Dashboard
Smart review relevance detection — PARTIALLY SHIPPED
What: Auto-detect which of the 4 reviews are relevant based on branch changes (skip Design Review if no CSS/view changes, skip Code Review if plan-only).
bin/gstack-diff-scope shipped — categorizes diff into SCOPE_FRONTEND, SCOPE_BACKEND, SCOPE_PROMPTS, SCOPE_TESTS, SCOPE_DOCS, SCOPE_CONFIG. Used by design-review-lite to skip when no frontend files changed. Dashboard integration for conditional row display is a follow-up.
Remaining: Dashboard conditional row display (hide "Design Review: NOT YET RUN" when SCOPE_FRONTEND=false). Extend to Eng Review (skip for docs-only) and CEO Review (skip for config-only).
Effort: S Priority: P3 Depends on: gstack-diff-scope (shipped)
/merge skill — review-gated PR merge
What: Create a /merge skill that merges an approved PR, but first checks the Review Readiness Dashboard and runs /review (Fix-First) if code review hasn't been done. Separates "ship" (create PR) from "merge" (land it).
Why: Currently /review runs inside /ship Step 3.5 but isn't tracked as a gate. A /merge skill ensures code review always happens before landing, and enables workflows where someone else reviews the PR first.
Context: /ship creates the PR. /merge would: check dashboard → run /review if needed → gh pr merge. This is where code review tracking belongs — at merge time, not at plan time.
Effort: M Priority: P2 Depends on: Ship Confidence Dashboard (shipped)
Completeness
Completeness metrics dashboard
What: Track how often Claude chooses the complete option vs shortcut across gstack sessions. Aggregate into a dashboard showing completeness trend over time.
Why: Without measurement, we can't know if the Completeness Principle is working. Could surface patterns (e.g., certain skills still bias toward shortcuts).
Context: Would require logging choices (e.g., append to a JSONL file when AskUserQuestion resolves), parsing them, and displaying trends. Similar pattern to eval persistence.
Effort: M (human) / S (CC) Priority: P3 Depends on: Boil the Lake shipped (v0.6.1)
Safety & Observability
On-demand hook skills (/careful, /freeze, /guard)
What: Three new skills that use Claude Code's session-scoped PreToolUse hooks to add safety guardrails on demand.
Why: Anthropic's internal skill best practices recommend on-demand hooks for safety. Claude Code already handles destructive command permissions, but these add an explicit opt-in layer for high-risk sessions (touching prod, debugging live systems).
Skills:
/careful— PreToolUse hook on Bash tool. Warns (not blocks) before destructive commands:rm -rf,DROP TABLE,git push --force,git reset --hard,kubectl delete,docker system prune. UsespermissionDecision: "ask"so user can override./freeze— PreToolUse hook on Edit/Write tools. Restricts file edits to a user-specified directory. Great for debugging without accidentally "fixing" unrelated code./guard— meta-skill composing/careful+/freezeinto one command.
Implementation notes: Use ${CLAUDE_SKILL_DIR} (not ${SKILL_DIR}) for script paths in hook commands. Pure bash JSON parsing (no jq dependency). Freeze dir storage: ${CLAUDE_PLUGIN_DATA}/freeze-dir.txt with ~/.gstack/freeze-dir.txt fallback. Ensure trailing / on freeze dir paths to prevent /src matching /src-old.
Effort: M (human) / S (CC) Priority: P3 Depends on: None
Skill usage telemetry
What: Track which skills get invoked, how often, from which repo.
Why: Enables finding undertriggering skills and measuring adoption. Anthropic uses a PreToolUse hook for this; simpler approach is appending JSONL from the preamble.
Context: Add to generatePreamble() in scripts/gen-skill-docs.ts. Append to ~/.gstack/analytics/skill-usage.jsonl with skill name, timestamp, and repo name. mkdir -p ensures the directory exists.
Effort: S (human) / S (CC) Priority: P3 Depends on: None
Completed
Phase 1: Foundations (v0.2.0)
- Rename to gstack
- Restructure to monorepo layout
- Setup script for skill symlinks
- Snapshot command with ref-based element selection
- Snapshot tests Completed: v0.2.0
Phase 2: Enhanced Browser (v0.2.0)
- Annotated screenshots, snapshot diffing, dialog handling, file upload
- Cursor-interactive elements, element state checks
- CircularBuffer, async buffer flush, health check
- Playwright error wrapping, useragent fix
- 148 integration tests Completed: v0.2.0
Phase 3: QA Testing Agent (v0.3.0)
- /qa SKILL.md with 6-phase workflow, 3 modes (full/quick/regression)
- Issue taxonomy, severity classification, exploration checklist
- Report template, health score rubric, framework detection
- wait/console/cookie-import commands, find-browse binary Completed: v0.3.0
Phase 3.5: Browser Cookie Import (v0.3.x)
- cookie-import-browser command (Chromium cookie DB decryption)
- Cookie picker web UI, /setup-browser-cookies skill
- 18 unit tests, browser registry (Comet, Chrome, Arc, Brave, Edge) Completed: v0.3.1
E2E test cost tracking
- Track cumulative API spend, warn if over threshold Completed: v0.3.6
Auto-upgrade mode + smart update check
- Config CLI (
bin/gstack-config), auto-upgrade via~/.gstack/config.yaml, 12h cache TTL, exponential snooze backoff (24h→48h→1wk), "never ask again" option, vendored copy sync on upgrade Completed: v0.3.8