mirror of
https://github.com/garrytan/gstack.git
synced 2026-05-02 11:45:20 +02:00
7665adf4fe
* feat: CDP connect — control real Chrome/Comet via Playwright Add `connectCDP()` to BrowserManager: connects to a running browser via Chrome DevTools Protocol. All existing browse commands work unchanged through Playwright's abstraction layer. - chrome-launcher.ts: browser discovery, CDP probe, auto-relaunch with rollback - browser-manager.ts: connectCDP(), mode guards (close/closeTab/recreateContext/handoff), auto-reconnect on browser restart, getRefMap() for extension API - server.ts: CDP branch in start(), /health gains mode field, /refs endpoint, idle timer only resets on /command (not passive endpoints) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: browse connect/disconnect/focus CLI commands - connect: pre-server command that discovers browser, starts server in CDP mode - disconnect: drops CDP connection, restarts in headless mode - focus: brings browser window to foreground via osascript (macOS) - status: now shows Mode: cdp | launched | headed - startServer() accepts extra env vars for CDP URL/port passthrough Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: CDP-aware skill templates — skip cookie import in real browser mode Skills now check `$B status` for CDP mode and skip: - /qa: cookie import prompt, user-agent override, headless workarounds - /design-review: cookie import for authenticated pages - /setup-browser-cookies: returns "not needed" in CDP mode Regenerated SKILL.md files from updated templates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: activity streaming — SSE endpoint for Chrome extension Side Panel Real-time browse command feed via Server-Sent Events: - activity.ts: ActivityEntry type, CircularBuffer (capacity 1000), privacy filtering (redacts passwords, auth tokens, sensitive URL params), cursor-based gap detection, async subscriber notification - server.ts: /activity/stream SSE, /activity/history REST, handleCommand instrumented with command_start/command_end events - 18 unit tests for filterArgs privacy, emitActivity, subscribe lifecycle Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: Chrome extension Side Panel + Conductor API proposal Chrome extension (Manifest V3, sideload): - Side Panel with live activity feed, @ref overlays, dark terminal aesthetic - Background worker: health polling, SSE relay, ref fetching - Popup: port config, connection status, side panel launcher - Content script: floating ref panel with @ref badges Conductor API proposal (docs/designs/CONDUCTOR_SESSION_API.md): - SSE endpoint for full Claude Code session mirroring in Side Panel - Discovery via HTTP endpoint (not filesystem — extensions can't read files) TODOS.md: add $B watch, multi-agent tabs, cross-platform CDP, Web Store publishing. Mark CDP mode as shipped. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: detect Conductor runtime, skip osascript quit for sandboxed apps macOS App Management blocks Electron apps (Conductor) from quitting other apps via osascript. Now detects the runtime environment: - terminal/claude-code/codex: can manage apps freely - conductor: prints manual restart instructions + polls for 60s detectRuntime() checks env vars and parent process. When Chrome needs restart but we can't quit it, prints step-by-step instructions and waits for the user to restart Chrome with --remote-debugging-port. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: detect Conductor via actual env vars (CONDUCTOR_WORKSPACE_NAME) Previous detection checked CONDUCTOR_WORKSPACE_ID which doesn't exist. Conductor sets CONDUCTOR_WORKSPACE_NAME, CONDUCTOR_BIN_DIR, CONDUCTOR_PORT, and __CFBundleIdentifier=com.conductor.app. Check these FIRST because Conductor sessions also have ANTHROPIC_API_KEY (which was matching claude-code). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: connection status pill — floating indicator when gstack controls Chrome Small pill in bottom-right corner of every page: "● gstack · 3 refs" Shows when connected via CDP, fades to 30% opacity after 3s, full on hover. Disappears entirely when disconnected. Background worker now notifies content scripts on connect/disconnect state changes so the pill appears/disappears without polling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Chrome requires --user-data-dir for remote debugging Chrome refuses --remote-debugging-port without an explicit --user-data-dir. Add userDataDir to BrowserBinary registry (macOS Application Support paths) and pass it in both auto-launch and manual restart instructions. Fix double-quoting in CLI manual restart instructions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: Chrome must be fully quit before launching with --remote-debugging-port Chrome refuses to enable CDP on its default profile when another instance is running (even with explicit --user-data-dir). The only reliable path: fully quit Chrome first, then relaunch with the flag. Updated instructions to emphasize this clearly with verification step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: bin/chrome-cdp — quit Chrome and relaunch with CDP in one command Quits Chrome gracefully, waits for full exit, relaunches with --remote-debugging-port, polls until CDP is ready. Usage: chrome-cdp [port] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use Playwright channel:chrome instead of broken connectOverCDP Playwright's connectOverCDP hangs with Chrome 146 due to CDP protocol version mismatch. Switch to channel:'chrome' which uses Playwright's native pipe protocol to launch the system Chrome binary directly. This is simpler and more reliable: - No CDP port discovery needed - No --remote-debugging-port or --user-data-dir hassles - $B connect just works — launches real Chrome headed window - All Playwright APIs (snapshot, click, fill) work unchanged bin/chrome-cdp updated with symlinked profile approach (kept for manual CDP use cases, but $B connect no longer needs it). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: green border + gstack label on controlled Chrome window Injects a 2px green border and small "gstack" label on every page loaded in the controlled Chrome window via context.addInitScript(). Users can instantly tell which Chrome window Claude controls. Also fixes close() for channel:chrome mode (uses browser.close() not browser.disconnect() which doesn't exist). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: cleanup chrome-launcher runtime detection, remove puppeteer-core dep Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style(design): redesign controlled Chrome indicator Replace crude green border + label with polished indicator: - 2px shimmer gradient at top edge (green→cyan→green, 3s loop) - Floating pill bottom-right with frosted glass bg, fades to 25% opacity after 4s so it doesn't compete with page content - prefers-reduced-motion disables shimmer animation - Much more subtle — looks like a developer tool, not broken CSS Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: document real browser mode + Chrome extension in BROWSER.md and README.md BROWSER.md: new sections for connect/disconnect/focus commands, Chrome extension Side Panel install, CDP-aware skills, activity streaming. Updated command reference table, key components, env vars, source map. README.md: updated /browse description, added "Real browser mode" to What's New section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: step-by-step Chrome extension install guide in BROWSER.md Replace terse bullet points with numbered walkthrough covering: developer mode toggle, load unpacked, macOS file picker tip (Cmd+Shift+G), pin extension, configure port, open side panel. Added troubleshooting section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Cmd+Shift+. tip for hidden folders in macOS file picker macOS hides folders starting with . by default. Added both shortcuts: Cmd+Shift+G (paste path directly) and Cmd+Shift+. (show hidden files). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: integrate hidden folder tips into the install flow naturally Move Cmd+Shift+G and Cmd+Shift+. tips inline with the file picker step instead of as a separate tip block after it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: auto-load Chrome extension when $B connect launches Chrome Extension auto-loads via --load-extension flag — no manual chrome://extensions install needed. findExtensionPath() checks repo root, global install, and dev paths. Also adds bin/gstack-extension helper for manual install in regular Chrome, and rewrites BROWSER.md install docs with auto-load as primary path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: /connect-chrome skill — one command to launch Chrome with Side Panel New skill that runs $B connect, verifies the connection, guides the user to open the Side Panel, and demos the live activity feed. Extension auto-loads via --load-extension so no manual chrome://extensions install needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use launchPersistentContext for Chrome extension loading Playwright's chromium.launch() silently ignores --load-extension. Switch to launchPersistentContext with ignoreDefaultArgs to remove --disable-extensions flag. Use bundled Chromium (real Chrome blocks unpacked extensions). Fixed port 34567 for CDP mode so the extension auto-connects. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sync extension to DESIGN.md — amber accent, zinc neutrals, grain texture Import design system from gstack-website. Update all extension colors: green (#4ade80) → amber (#F59E0B/#FBBF24), zinc gray neutrals, grain texture overlay. Regenerate icons as amber "G" monogram on dark background. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar chat with Claude Code — icon opens side panel directly Replace popup flyout with direct side panel open on icon click. Primary UI is now a chat interface that sends messages to Claude Code via file queue. Activity/Refs tabs moved behind a debug toggle in the footer. Command bar with history, auto-poll for responses, amber design system. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar agent — Claude-powered chat backend via file queue Add /sidebar-command, /sidebar-response, and /sidebar-chat endpoints to the browse server. sidebar-agent.ts watches the command queue file, spawns claude -p with browse context for each message, and streams responses back to the sidebar chat. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove duplicate gstack pill overlay, hide crash restore bubble The addInitScript indicator and the extension's content script were both injecting bottom-right pills, causing duplicates. Remove the pill from addInitScript (extension handles it). Replace --restore-last-session with --hide-crash-restore-bubble to suppress the "Chromium didn't shut down correctly" dialog. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: state file authority — CDP server cannot be silently replaced Hardens the connect/disconnect lifecycle: - ensureServer() refuses to auto-start headless when CDP server is alive - $B connect does full cleanup: SIGTERM → 2s → SIGKILL, profile locks, state - shutdown() cleans Chromium SingletonLock/Socket/Cookie files - uncaughtException/unhandledRejection handlers do emergency cleanup This prevents the bug where a headless server overwrites the CDP server's state file, causing $B commands to hit the wrong browser. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar agent streaming events + session state management Enhance sidebar-agent.ts with: - Live streaming of claude -p events (tool_use, text, result) to sidebar - Session state file for BROWSE_STATE_FILE propagation to claude subprocess - Improved logging (stderr, exit codes, event types) - stdin.end() to prevent claude waiting for input - summarizeToolInput() with path shortening for compact sidebar display Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: sidebar chat UI — streaming events, agent status, reconnect retry Sidebar panel improvements: - Chat tab renders streaming agent events (tool_use, text, result) - Thinking dots animation while agent processes - Agent error display with styled error blocks - tryConnect() with 2s retry loop for initial connection - Debug tabs (Activity/Refs) hidden behind gear toggle - Clear chat button - Compact tool call display with path shortening Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: server-integrated sidebar agent with sessions and message queue Move the sidebar agent from a separate bun process into server.ts: - Agent spawns claude -p directly when messages arrive via /sidebar-command - In-memory chat buffer backed by per-session chat.jsonl on disk - Session manager: create, load, persist, list sessions - Message queue (cap 5) with agent status tracking (idle/processing/hung) - Stop/kill endpoints with queue dismiss support - /health now returns agent status + session info - All sidebar endpoints require Bearer auth - Agent killed on server shutdown - 120s timeout detects hung claude processes Eliminates: file-queue polling, separate sidebar-agent.ts process, stale auth tokens, state file conflicts between processes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: extension auth + token flow for server-integrated agent Update Chrome extension to use Bearer auth on all sidebar endpoints: - background.js captures auth token from /health, exposes via getToken msg - background.js sets openPanelOnActionClick for direct side panel access - sidepanel.js gets token from background, sends in all fetch headers - Health broadcasts include token so sidebar auto-authenticates - Removes popup from manifest — icon click opens side panel directly Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: self-healing sidebar — reconnect banner, state machine, copy button Sidebar UI now handles disconnection gracefully: - Connection state machine: connected → reconnecting → dead - Amber pulsing banner during reconnect (2s retry, 30 attempts) - Red "Server offline" banner with Reconnect + Copy /connect-chrome buttons - Green "Reconnected" toast that fades after 3s on successful reconnect - Copy button lets user paste /connect-chrome into any Claude Code session Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: crash handling — save session, kill agent, distinct exit codes Hardened shutdown/crash behavior: - Browser disconnect exits with code 2 (distinct from crash code 1) - emergencyCleanup kills agent subprocess and saves session state - Clean shutdown saves session before exit (chat history persists) - Clear user message on browser disconnect: "Run $B connect to reconnect" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: worktree-per-session isolation for sidebar agent Each sidebar session gets an isolated git worktree so the agent's file operations don't conflict with the user's working directory: - createWorktree() creates detached HEAD worktree in ~/.gstack/worktrees/ - Falls back to main cwd for non-git repos or on creation failure - Handles collision cleanup from prior crashes - removeWorktree() cleans up on session switch and shutdown - worktreePath persisted in session.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(qa): ISSUE-001 — disconnect blocked by CDP guard in ensureServer $B disconnect was routed through ensureServer() which refused to start a headless server when a CDP state file existed. Disconnect is now handled before ensureServer() (like connect), with force-kill + cleanup fallback when the CDP server is unresponsive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve claude binary path for daemon-spawned agent The browse server runs as a daemon and may not inherit the user's shell PATH. Add findClaudeBin() that checks ~/.local/bin/claude (standard install location), which claude, and common system paths. Shows a clear error in the sidebar chat if claude CLI is not found. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve claude symlinks + check Conductor bundled binary posix_spawn fails on symlinks in compiled bun binaries. Now: - Checks Conductor app's bundled binary first (not a symlink) - Scans ~/.local/share/claude/versions/ for direct versioned binaries - Uses fs.realpathSync() to resolve symlinks before spawning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: compiled bun binary cannot posix_spawn — use external agent process Compiled bun binaries fail posix_spawn on ALL executables (even /bin/bash). The server now writes to an agent queue file, and a separate non-compiled bun process (sidebar-agent.ts) reads the queue, spawns claude, and POSTs events back via /sidebar-agent/event. Changes: - server.ts: spawnClaude writes to queue file instead of spawning directly - server.ts: new /sidebar-agent/event endpoint for agent → server relay - server.ts: fix result event field name (event.text vs event.result) - sidebar-agent.ts: rewritten to poll queue file, relay events via HTTP - cli.ts: $B connect auto-starts sidebar-agent as non-compiled bun process Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: loading spinner on sidebar open while connecting to server Shows an amber spinner with "Connecting..." when the sidebar first opens, replacing the empty state. After the first successful /sidebar-chat poll: - If chat history exists: renders it immediately - If no history: shows the welcome message Prevents the jarring empty-then-populated flash on sidebar open. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: zero-friction side panel — auto-open on install, pill is clickable Three changes to eliminate manual side panel setup: - Auto-open side panel on extension install/update (onInstalled listener) - gstack pill (bottom-right) is now clickable — opens the side panel - Pill has pointer-events: auto so clicks always register (was: none) User no longer needs to find the puzzle piece icon, pin the extension, or know the side panel exists. It opens automatically on first launch and can be re-opened by clicking the floating gstack pill. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: kill CDP naming, delete chrome-launcher.ts dead code The connectCDP() method and connectionMode: 'cdp' naming was a legacy artifact — real Chrome was tried but failed (silently blocks --load-extension), so the implementation already used Playwright's bundled Chromium via launchPersistentContext(). The naming was misleading. Changes: - Delete chrome-launcher.ts (361 LOC) — only import was in unreachable attemptReconnect() method - Delete dead attemptReconnect() and reconnecting field - Delete preExistingTabIds (was for protecting real Chrome tabs we never connect to) - Rename connectCDP() → launchHeaded() - Rename connectionMode: 'cdp' → 'headed' across all files - Replace BROWSE_CDP_URL/BROWSE_CDP_PORT env vars with BROWSE_HEADED=1 - Regenerate SKILL.md files for updated command descriptions - Move BrowserManager unit tests to browser-manager-unit.test.ts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: converge handoff into connect — extension loads on handoff Handoff now uses launchPersistentContext() with extension auto-loading, same as the connect/launchHeaded() path. This means when the agent gets stuck (2FA, CAPTCHA) and hands off to the user, the Chrome extension + side panel are available automatically. Before: handoff used chromium.launch() + newContext() — no extension After: handoff uses chromium.launchPersistentContext() — extension loads Also sets connectionMode to 'headed' and disables dialog auto-accept on handoff, matching connect behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: gate sidebar chat behind --chat flag $B connect (default): headed Chromium + extension with Activity + Refs tabs only. No separate agent spawned. Clean, no confusion. $B connect --chat: same + Chat tab with standalone claude -p agent. Shows experimental banner: "Standalone mode — this is a separate agent from your workspace." Implementation: - cli.ts: parse --chat, set BROWSE_SIDEBAR_CHAT env, conditionally spawn sidebar-agent - server.ts: gate /sidebar-* routes behind chatEnabled, return 403 when disabled, include chatEnabled in /health response - sidepanel.js: applyChatEnabled() hides/shows Chat tab + banner - background.js: forward chatEnabled from health response - sidepanel.html/css: experimental banner with amber styling Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: file drop relay + $B inbox command Sidebar agent now writes structured messages to .context/sidebar-inbox/ when processing user input. The workspace agent can read these via $B inbox to see what the user reported from the browser. File drop format: .context/sidebar-inbox/{timestamp}-observation.json { type, timestamp, page: {url}, userMessage, sidebarSessionId } Atomic writes (tmp + rename) prevent partial reads. $B inbox --clear removes messages after display. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: $B watch — passive observation mode Claude enters read-only mode and captures periodic snapshots (every 5s) while the user browses. Mutation commands (click, fill, etc.) are blocked during watch. $B watch stop exits and returns a summary with the last snapshot. Requires headed mode ($B connect). This is the inverse of the scout pattern — the workspace agent watches through the browser instead of the sidebar relaying to it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add coverage for sidebar-agent, file-drop, and watch mode 33 new tests covering: - Sidebar agent queue parsing (valid/malformed/empty JSONL) - writeToInbox file drop (directory creation, atomic writes, JSON format) - Inbox command (display, sorting, --clear, malformed file handling) - Watch mode state machine (start/stop cycles, snapshots, duration) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: TODOS cleanup + Chrome vs Chromium exploration doc - Update TODOS.md: mark CDP mode, $B watch, sidebar scout as SHIPPED - Delete dead "cross-platform CDP browser discovery" TODO - Rename dependencies from "CDP connect" to "headed mode" - Add docs/designs/CHROME_VS_CHROMIUM_EXPLORATION.md memorializing the architecture exploration and decision to use Playwright Chromium Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add Conductor Chrome sidebar integration design doc Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: sidebar-agent validates cwd before spawning claude The queue entry may reference a worktree that was cleaned up between sessions. Now falls back to process.cwd() if the path doesn't exist, preventing silent spawn failures. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: gen-skill-docs resolver merge + preamble tier gate + plan file discovery The local RESOLVERS record in gen-skill-docs.ts was shadowing the imported canonical resolvers, causing stale test coverage and preamble generators to be used instead of the authoritative versions in resolvers/. Changes: - Merge imported RESOLVERS with local overrides (spread + override pattern) - Fix preamble tier gate: tier 1 skills no longer get AskUserQuestion format - Make plan file discovery host-agnostic (search multiple plan dirs) - Add missing E2E tier entries for ship/review plan completion tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: ungate sidebar agent + raise timeout to 5 minutes (v0.12.0) Sidebar chat is now always available in headed mode — no --chat flag needed. Agent tasks get 5 minutes instead of 2, enabling multi-page workflows like navigating directories and filling forms across pages. Changes: - cli.ts: remove --chat flag, always set BROWSE_SIDEBAR_CHAT=1, always spawn agent - server.ts: remove chatEnabled gate (403 response), raise AGENT_TIMEOUT_MS to 300s - sidebar-agent.ts: raise child process timeout from 120s to 300s Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: headed mode + sidebar agent documentation (v0.12.0) - README: sidebar agent section, personal automation example (school parent portal), two auth paths (manual login + cookie import), DevTools MCP mention - BROWSER.md: sidebar agent section with usage, timeout, session isolation, authentication, and random delay documentation - connect-chrome template: add sidebar chat onboarding step - CHANGELOG: v0.12.0 entry covering headed mode, sidebar agent, extension - VERSION: bump to 0.12.0.0 - TODOS: Chrome DevTools MCP integration as P0 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files Generated from updated templates + resolver merge. Key changes: - Tier 1 skills no longer include AskUserQuestion format section - Ship/review skills now include coverage gate with thresholds - Connect-chrome skill includes sidebar chat onboarding step - Plan file discovery uses host-agnostic paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate Codex connect-chrome skill Updated preamble with proactive prompt and sidebar chat onboarding step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: network idle, state persistence, iframe support, chain pipe format (v0.12.1.0) (#516) * feat: network idle detection + chain pipe format - Upgrade click/fill/select from domcontentloaded to networkidle wait (2s timeout, best-effort). Catches XHR/fetch triggered by interactions. - Add pipe-delimited format to chain as JSON fallback: $B chain 'goto url | click @e5 | snapshot -ic' - Add post-loop networkidle wait in chain when last command was a write. - Frame-aware: commands use target (getActiveFrameOrPage) for locator ops, page-only ops (goto/back/forward/reload) guard against frame context. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: $B state save/load + $B frame — new browse commands - state save/load: persist cookies + URLs to .gstack/browse-states/{name}.json File perms 0o600, name sanitized to [a-zA-Z0-9_-]. V1 skips localStorage (breaks on load-before-navigate). Load replaces session via closeAllPages(). - frame: switch command context to iframe via CSS selector, @ref, --name, or --url. 'frame main' returns to main frame. Execution target abstraction (getActiveFrameOrPage) across read-commands, snapshot, and write-commands. - Frame context cleared on tab switch, navigation, resume, and handoff. - Snapshot shows [Context: iframe src="..."] header when in frame. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add tests for network idle, chain pipe format, state, and frame - Network idle: click on fetch button waits for XHR, static click is fast - Chain pipe: pipe-delimited commands, quoted args, JSON still works - State: save/load round-trip, name sanitization, missing state error - Frame: switch to iframe + back, snapshot context header, fill in frame, goto-in-frame guard, usage error New fixtures: network-idle.html (fetch + static buttons), iframe.html (srcdoc) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: review fixes — iframe ref scoping, detached frame recovery, state validation - snapshot.ts: ref locators, cursor-interactive scan, and cursor locator now use target (frame-aware) instead of page — fixes @ref clicking in iframes - browser-manager.ts: getActiveFrameOrPage auto-recovers from detached frames via isDetached() check - meta-commands.ts: state load resets activeFrame, elementHandle disposed after contentFrame(), state file schema validation (cookies + pages arrays), filter empty pipe segments in chain tokenizer - write-commands.ts: upload command uses target.locator() for frame support Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate SKILL.md files + rebuild binary Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.12.1.0) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
836 lines
44 KiB
TypeScript
836 lines
44 KiB
TypeScript
import type { TemplateContext } from './types';
|
||
|
||
export function generateReviewDashboard(_ctx: TemplateContext): string {
|
||
return `## Review Readiness Dashboard
|
||
|
||
After completing the review, read the review log and config to display the dashboard.
|
||
|
||
\`\`\`bash
|
||
~/.claude/skills/gstack/bin/gstack-review-read
|
||
\`\`\`
|
||
|
||
Parse the output. Find the most recent entry for each skill (plan-ceo-review, plan-eng-review, review, plan-design-review, design-review-lite, adversarial-review, codex-review, codex-plan-review). Ignore entries with timestamps older than 7 days. For the Eng Review row, show whichever is more recent between \`review\` (diff-scoped pre-landing review) and \`plan-eng-review\` (plan-stage architecture review). Append "(DIFF)" or "(PLAN)" to the status to distinguish. For the Adversarial row, show whichever is more recent between \`adversarial-review\` (new auto-scaled) and \`codex-review\` (legacy). For Design Review, show whichever is more recent between \`plan-design-review\` (full visual audit) and \`design-review-lite\` (code-level check). Append "(FULL)" or "(LITE)" to the status to distinguish. For the Outside Voice row, show the most recent \`codex-plan-review\` entry — this captures outside voices from both /plan-ceo-review and /plan-eng-review.
|
||
|
||
**Source attribution:** If the most recent entry for a skill has a \\\`"via"\\\` field, append it to the status label in parentheses. Examples: \`plan-eng-review\` with \`via:"autoplan"\` shows as "CLEAR (PLAN via /autoplan)". \`review\` with \`via:"ship"\` shows as "CLEAR (DIFF via /ship)". Entries without a \`via\` field show as "CLEAR (PLAN)" or "CLEAR (DIFF)" as before.
|
||
|
||
Note: \`autoplan-voices\` and \`design-outside-voices\` entries are audit-trail-only (forensic data for cross-model consensus analysis). They do not appear in the dashboard and are not checked by any consumer.
|
||
|
||
Display:
|
||
|
||
\`\`\`
|
||
+====================================================================+
|
||
| REVIEW READINESS DASHBOARD |
|
||
+====================================================================+
|
||
| Review | Runs | Last Run | Status | Required |
|
||
|-----------------|------|---------------------|-----------|----------|
|
||
| Eng Review | 1 | 2026-03-16 15:00 | CLEAR | YES |
|
||
| CEO Review | 0 | — | — | no |
|
||
| Design Review | 0 | — | — | no |
|
||
| Adversarial | 0 | — | — | no |
|
||
| Outside Voice | 0 | — | — | no |
|
||
+--------------------------------------------------------------------+
|
||
| VERDICT: CLEARED — Eng Review passed |
|
||
+====================================================================+
|
||
\`\`\`
|
||
|
||
**Review tiers:**
|
||
- **Eng Review (required by default):** The only review that gates shipping. Covers architecture, code quality, tests, performance. Can be disabled globally with \\\`gstack-config set skip_eng_review true\\\` (the "don't bother me" setting).
|
||
- **CEO Review (optional):** Use your judgment. Recommend it for big product/business changes, new user-facing features, or scope decisions. Skip for bug fixes, refactors, infra, and cleanup.
|
||
- **Design Review (optional):** Use your judgment. Recommend it for UI/UX changes. Skip for backend-only, infra, or prompt-only changes.
|
||
- **Adversarial Review (automatic):** Auto-scales by diff size. Small diffs (<50 lines) skip adversarial. Medium diffs (50–199) get cross-model adversarial. Large diffs (200+) get all 4 passes: Claude structured, Codex structured, Claude adversarial subagent, Codex adversarial. No configuration needed.
|
||
- **Outside Voice (optional):** Independent plan review from a different AI model. Offered after all review sections complete in /plan-ceo-review and /plan-eng-review. Falls back to Claude subagent if Codex is unavailable. Never gates shipping.
|
||
|
||
**Verdict logic:**
|
||
- **CLEARED**: Eng Review has >= 1 entry within 7 days from either \\\`review\\\` or \\\`plan-eng-review\\\` with status "clean" (or \\\`skip_eng_review\\\` is \\\`true\\\`)
|
||
- **NOT CLEARED**: Eng Review missing, stale (>7 days), or has open issues
|
||
- CEO, Design, and Codex reviews are shown for context but never block shipping
|
||
- If \\\`skip_eng_review\\\` config is \\\`true\\\`, Eng Review shows "SKIPPED (global)" and verdict is CLEARED
|
||
|
||
**Staleness detection:** After displaying the dashboard, check if any existing reviews may be stale:
|
||
- Parse the \\\`---HEAD---\\\` section from the bash output to get the current HEAD commit hash
|
||
- For each review entry that has a \\\`commit\\\` field: compare it against the current HEAD. If different, count elapsed commits: \\\`git rev-list --count STORED_COMMIT..HEAD\\\`. Display: "Note: {skill} review from {date} may be stale — {N} commits since review"
|
||
- For entries without a \\\`commit\\\` field (legacy entries): display "Note: {skill} review from {date} has no commit tracking — consider re-running for accurate staleness detection"
|
||
- If all reviews match the current HEAD, do not display any staleness notes`;
|
||
}
|
||
|
||
export function generatePlanFileReviewReport(_ctx: TemplateContext): string {
|
||
return `## Plan File Review Report
|
||
|
||
After displaying the Review Readiness Dashboard in conversation output, also update the
|
||
**plan file** itself so review status is visible to anyone reading the plan.
|
||
|
||
### Detect the plan file
|
||
|
||
1. Check if there is an active plan file in this conversation (the host provides plan file
|
||
paths in system messages — look for plan file references in the conversation context).
|
||
2. If not found, skip this section silently — not every review runs in plan mode.
|
||
|
||
### Generate the report
|
||
|
||
Read the review log output you already have from the Review Readiness Dashboard step above.
|
||
Parse each JSONL entry. Each skill logs different fields:
|
||
|
||
- **plan-ceo-review**: \\\`status\\\`, \\\`unresolved\\\`, \\\`critical_gaps\\\`, \\\`mode\\\`, \\\`scope_proposed\\\`, \\\`scope_accepted\\\`, \\\`scope_deferred\\\`, \\\`commit\\\`
|
||
→ Findings: "{scope_proposed} proposals, {scope_accepted} accepted, {scope_deferred} deferred"
|
||
→ If scope fields are 0 or missing (HOLD/REDUCTION mode): "mode: {mode}, {critical_gaps} critical gaps"
|
||
- **plan-eng-review**: \\\`status\\\`, \\\`unresolved\\\`, \\\`critical_gaps\\\`, \\\`issues_found\\\`, \\\`mode\\\`, \\\`commit\\\`
|
||
→ Findings: "{issues_found} issues, {critical_gaps} critical gaps"
|
||
- **plan-design-review**: \\\`status\\\`, \\\`initial_score\\\`, \\\`overall_score\\\`, \\\`unresolved\\\`, \\\`decisions_made\\\`, \\\`commit\\\`
|
||
→ Findings: "score: {initial_score}/10 → {overall_score}/10, {decisions_made} decisions"
|
||
- **codex-review**: \\\`status\\\`, \\\`gate\\\`, \\\`findings\\\`, \\\`findings_fixed\\\`
|
||
→ Findings: "{findings} findings, {findings_fixed}/{findings} fixed"
|
||
|
||
All fields needed for the Findings column are now present in the JSONL entries.
|
||
For the review you just completed, you may use richer details from your own Completion
|
||
Summary. For prior reviews, use the JSONL fields directly — they contain all required data.
|
||
|
||
Produce this markdown table:
|
||
|
||
\\\`\\\`\\\`markdown
|
||
## GSTACK REVIEW REPORT
|
||
|
||
| Review | Trigger | Why | Runs | Status | Findings |
|
||
|--------|---------|-----|------|--------|----------|
|
||
| CEO Review | \\\`/plan-ceo-review\\\` | Scope & strategy | {runs} | {status} | {findings} |
|
||
| Codex Review | \\\`/codex review\\\` | Independent 2nd opinion | {runs} | {status} | {findings} |
|
||
| Eng Review | \\\`/plan-eng-review\\\` | Architecture & tests (required) | {runs} | {status} | {findings} |
|
||
| Design Review | \\\`/plan-design-review\\\` | UI/UX gaps | {runs} | {status} | {findings} |
|
||
\\\`\\\`\\\`
|
||
|
||
Below the table, add these lines (omit any that are empty/not applicable):
|
||
|
||
- **CODEX:** (only if codex-review ran) — one-line summary of codex fixes
|
||
- **CROSS-MODEL:** (only if both Claude and Codex reviews exist) — overlap analysis
|
||
- **UNRESOLVED:** total unresolved decisions across all reviews
|
||
- **VERDICT:** list reviews that are CLEAR (e.g., "CEO + ENG CLEARED — ready to implement").
|
||
If Eng Review is not CLEAR and not skipped globally, append "eng review required".
|
||
|
||
### Write to the plan file
|
||
|
||
**PLAN MODE EXCEPTION — ALWAYS RUN:** This writes to the plan file, which is the one
|
||
file you are allowed to edit in plan mode. The plan file review report is part of the
|
||
plan's living status.
|
||
|
||
- Search the plan file for a \\\`## GSTACK REVIEW REPORT\\\` section **anywhere** in the file
|
||
(not just at the end — content may have been added after it).
|
||
- If found, **replace it** entirely using the Edit tool. Match from \\\`## GSTACK REVIEW REPORT\\\`
|
||
through either the next \\\`## \\\` heading or end of file, whichever comes first. This ensures
|
||
content added after the report section is preserved, not eaten. If the Edit fails
|
||
(e.g., concurrent edit changed the content), re-read the plan file and retry once.
|
||
- If no such section exists, **append it** to the end of the plan file.
|
||
- Always place it as the very last section in the plan file. If it was found mid-file,
|
||
move it: delete the old location and append at the end.`;
|
||
}
|
||
|
||
export function generateSpecReviewLoop(_ctx: TemplateContext): string {
|
||
return `## Spec Review Loop
|
||
|
||
Before presenting the document to the user for approval, run an adversarial review.
|
||
|
||
**Step 1: Dispatch reviewer subagent**
|
||
|
||
Use the Agent tool to dispatch an independent reviewer. The reviewer has fresh context
|
||
and cannot see the brainstorming conversation — only the document. This ensures genuine
|
||
adversarial independence.
|
||
|
||
Prompt the subagent with:
|
||
- The file path of the document just written
|
||
- "Read this document and review it on 5 dimensions. For each dimension, note PASS or
|
||
list specific issues with suggested fixes. At the end, output a quality score (1-10)
|
||
across all dimensions."
|
||
|
||
**Dimensions:**
|
||
1. **Completeness** — Are all requirements addressed? Missing edge cases?
|
||
2. **Consistency** — Do parts of the document agree with each other? Contradictions?
|
||
3. **Clarity** — Could an engineer implement this without asking questions? Ambiguous language?
|
||
4. **Scope** — Does the document creep beyond the original problem? YAGNI violations?
|
||
5. **Feasibility** — Can this actually be built with the stated approach? Hidden complexity?
|
||
|
||
The subagent should return:
|
||
- A quality score (1-10)
|
||
- PASS if no issues, or a numbered list of issues with dimension, description, and fix
|
||
|
||
**Step 2: Fix and re-dispatch**
|
||
|
||
If the reviewer returns issues:
|
||
1. Fix each issue in the document on disk (use Edit tool)
|
||
2. Re-dispatch the reviewer subagent with the updated document
|
||
3. Maximum 3 iterations total
|
||
|
||
**Convergence guard:** If the reviewer returns the same issues on consecutive iterations
|
||
(the fix didn't resolve them or the reviewer disagrees with the fix), stop the loop
|
||
and persist those issues as "Reviewer Concerns" in the document rather than looping
|
||
further.
|
||
|
||
If the subagent fails, times out, or is unavailable — skip the review loop entirely.
|
||
Tell the user: "Spec review unavailable — presenting unreviewed doc." The document is
|
||
already written to disk; the review is a quality bonus, not a gate.
|
||
|
||
**Step 3: Report and persist metrics**
|
||
|
||
After the loop completes (PASS, max iterations, or convergence guard):
|
||
|
||
1. Tell the user the result — summary by default:
|
||
"Your doc survived N rounds of adversarial review. M issues caught and fixed.
|
||
Quality score: X/10."
|
||
If they ask "what did the reviewer find?", show the full reviewer output.
|
||
|
||
2. If issues remain after max iterations or convergence, add a "## Reviewer Concerns"
|
||
section to the document listing each unresolved issue. Downstream skills will see this.
|
||
|
||
3. Append metrics:
|
||
\`\`\`bash
|
||
mkdir -p ~/.gstack/analytics
|
||
echo '{"skill":"${_ctx.skillName}","ts":"'$(date -u +%Y-%m-%dT%H:%M:%SZ)'","iterations":ITERATIONS,"issues_found":FOUND,"issues_fixed":FIXED,"remaining":REMAINING,"quality_score":SCORE}' >> ~/.gstack/analytics/spec-review.jsonl 2>/dev/null || true
|
||
\`\`\`
|
||
Replace ITERATIONS, FOUND, FIXED, REMAINING, SCORE with actual values from the review.`;
|
||
}
|
||
|
||
export function generateBenefitsFrom(ctx: TemplateContext): string {
|
||
if (!ctx.benefitsFrom || ctx.benefitsFrom.length === 0) return '';
|
||
|
||
const skillList = ctx.benefitsFrom.map(s => `\`/${s}\``).join(' or ');
|
||
const first = ctx.benefitsFrom[0];
|
||
|
||
return `## Prerequisite Skill Offer
|
||
|
||
When the design doc check above prints "No design doc found," offer the prerequisite
|
||
skill before proceeding.
|
||
|
||
Say to the user via AskUserQuestion:
|
||
|
||
> "No design doc found for this branch. ${skillList} produces a structured problem
|
||
> statement, premise challenge, and explored alternatives — it gives this review much
|
||
> sharper input to work with. Takes about 10 minutes. The design doc is per-feature,
|
||
> not per-product — it captures the thinking behind this specific change."
|
||
|
||
Options:
|
||
- A) Run /${first} now (we'll pick up the review right after)
|
||
- B) Skip — proceed with standard review
|
||
|
||
If they skip: "No worries — standard review. If you ever want sharper input, try
|
||
/${first} first next time." Then proceed normally. Do not re-offer later in the session.
|
||
|
||
If they choose A:
|
||
|
||
Say: "Running /${first} inline. Once the design doc is ready, I'll pick up
|
||
the review right where we left off."
|
||
|
||
Read the ${first} skill file from disk using the Read tool:
|
||
\`~/.claude/skills/gstack/${first}/SKILL.md\`
|
||
|
||
Follow it inline, **skipping these sections** (already handled by the parent skill):
|
||
- Preamble (run first)
|
||
- AskUserQuestion Format
|
||
- Completeness Principle — Boil the Lake
|
||
- Search Before Building
|
||
- Contributor Mode
|
||
- Completion Status Protocol
|
||
- Telemetry (run last)
|
||
|
||
If the Read fails (file not found), say:
|
||
"Could not load /${first} — proceeding with standard review."
|
||
|
||
After /${first} completes, re-run the design doc check:
|
||
\`\`\`bash
|
||
SLUG=$(~/.claude/skills/gstack/browse/bin/remote-slug 2>/dev/null || basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)")
|
||
BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '-' || echo 'no-branch')
|
||
DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-$BRANCH-design-*.md 2>/dev/null | head -1)
|
||
[ -z "$DESIGN" ] && DESIGN=$(ls -t ~/.gstack/projects/$SLUG/*-design-*.md 2>/dev/null | head -1)
|
||
[ -n "$DESIGN" ] && echo "Design doc found: $DESIGN" || echo "No design doc found"
|
||
\`\`\`
|
||
|
||
If a design doc is now found, read it and continue the review.
|
||
If none was produced (user may have cancelled), proceed with standard review.`;
|
||
}
|
||
|
||
export function generateCodexSecondOpinion(ctx: TemplateContext): string {
|
||
// Codex host: strip entirely — Codex should never invoke itself
|
||
if (ctx.host === 'codex') return '';
|
||
|
||
return `## Phase 3.5: Cross-Model Second Opinion (optional)
|
||
|
||
**Binary check first — no question if unavailable:**
|
||
|
||
\`\`\`bash
|
||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||
\`\`\`
|
||
|
||
If \`CODEX_NOT_AVAILABLE\`: skip Phase 3.5 entirely — no message, no AskUserQuestion. Proceed directly to Phase 4.
|
||
|
||
If \`CODEX_AVAILABLE\`: use AskUserQuestion:
|
||
|
||
> Want a second opinion from a different AI model? Codex will independently review your problem statement, key answers, premises, and any landscape findings from this session. It hasn't seen this conversation — it gets a structured summary. Usually takes 2-5 minutes.
|
||
> A) Yes, get a second opinion
|
||
> B) No, proceed to alternatives
|
||
|
||
If B: skip Phase 3.5 entirely. Remember that Codex did NOT run (affects design doc, founder signals, and Phase 4 below).
|
||
|
||
**If A: Run the Codex cold read.**
|
||
|
||
1. Assemble a structured context block from Phases 1-3:
|
||
- Mode (Startup or Builder)
|
||
- Problem statement (from Phase 1)
|
||
- Key answers from Phase 2A/2B (summarize each Q&A in 1-2 sentences, include verbatim user quotes)
|
||
- Landscape findings (from Phase 2.75, if search was run)
|
||
- Agreed premises (from Phase 3)
|
||
- Codebase context (project name, languages, recent activity)
|
||
|
||
2. **Write the assembled prompt to a temp file** (prevents shell injection from user-derived content):
|
||
|
||
\`\`\`bash
|
||
CODEX_PROMPT_FILE=$(mktemp /tmp/gstack-codex-oh-XXXXXXXX.txt)
|
||
\`\`\`
|
||
|
||
Write the full prompt (context block + instructions) to this file. Use the mode-appropriate variant:
|
||
|
||
**Startup mode instructions:** "You are an independent technical advisor reading a transcript of a startup brainstorming session. [CONTEXT BLOCK HERE]. Your job: 1) What is the STRONGEST version of what this person is trying to build? Steelman it in 2-3 sentences. 2) What is the ONE thing from their answers that reveals the most about what they should actually build? Quote it and explain why. 3) Name ONE agreed premise you think is wrong, and what evidence would prove you right. 4) If you had 48 hours and one engineer to build a prototype, what would you build? Be specific — tech stack, features, what you'd skip. Be direct. Be terse. No preamble."
|
||
|
||
**Builder mode instructions:** "You are an independent technical advisor reading a transcript of a builder brainstorming session. [CONTEXT BLOCK HERE]. Your job: 1) What is the COOLEST version of this they haven't considered? 2) What's the ONE thing from their answers that reveals what excites them most? Quote it. 3) What existing open source project or tool gets them 50% of the way there — and what's the 50% they'd need to build? 4) If you had a weekend to build this, what would you build first? Be specific. Be direct. No preamble."
|
||
|
||
3. Run Codex:
|
||
|
||
\`\`\`bash
|
||
TMPERR_OH=$(mktemp /tmp/codex-oh-err-XXXXXXXX)
|
||
codex exec "$(cat "$CODEX_PROMPT_FILE")" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_OH"
|
||
\`\`\`
|
||
|
||
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
|
||
\`\`\`bash
|
||
cat "$TMPERR_OH"
|
||
rm -f "$TMPERR_OH" "$CODEX_PROMPT_FILE"
|
||
\`\`\`
|
||
|
||
**Error handling:** All errors are non-blocking — Codex second opinion is a quality enhancement, not a prerequisite.
|
||
- **Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key": "Codex authentication failed. Run \\\`codex login\\\` to authenticate. Skipping second opinion."
|
||
- **Timeout:** "Codex timed out after 5 minutes. Skipping second opinion."
|
||
- **Empty response:** "Codex returned no response. Stderr: <paste relevant error>. Skipping second opinion."
|
||
|
||
On any error, proceed to Phase 4 — do NOT fall back to a Claude subagent (this is brainstorming, not adversarial review).
|
||
|
||
4. **Presentation:**
|
||
|
||
\`\`\`
|
||
SECOND OPINION (Codex):
|
||
════════════════════════════════════════════════════════════
|
||
<full codex output, verbatim — do not truncate or summarize>
|
||
════════════════════════════════════════════════════════════
|
||
\`\`\`
|
||
|
||
5. **Cross-model synthesis:** After presenting Codex output, provide 3-5 bullet synthesis:
|
||
- Where Claude agrees with Codex
|
||
- Where Claude disagrees and why
|
||
- Whether Codex's challenged premise changes Claude's recommendation
|
||
|
||
6. **Premise revision check:** If Codex challenged an agreed premise, use AskUserQuestion:
|
||
|
||
> Codex challenged premise #{N}: "{premise text}". Their argument: "{reasoning}".
|
||
> A) Revise this premise based on Codex's input
|
||
> B) Keep the original premise — proceed to alternatives
|
||
|
||
If A: revise the premise and note the revision. If B: proceed (and note that the user defended this premise with reasoning — this is a founder signal if they articulate WHY they disagree, not just dismiss).`;
|
||
}
|
||
|
||
export function generateAdversarialStep(ctx: TemplateContext): string {
|
||
// Codex host: strip entirely — Codex should never invoke itself
|
||
if (ctx.host === 'codex') return '';
|
||
|
||
const isShip = ctx.skillName === 'ship';
|
||
const stepNum = isShip ? '3.8' : '5.7';
|
||
|
||
return `## Step ${stepNum}: Adversarial review (auto-scaled)
|
||
|
||
Adversarial review thoroughness scales automatically based on diff size. No configuration needed.
|
||
|
||
**Detect diff size and tool availability:**
|
||
|
||
\`\`\`bash
|
||
DIFF_INS=$(git diff origin/<base> --stat | tail -1 | grep -oE '[0-9]+ insertion' | grep -oE '[0-9]+' || echo "0")
|
||
DIFF_DEL=$(git diff origin/<base> --stat | tail -1 | grep -oE '[0-9]+ deletion' | grep -oE '[0-9]+' || echo "0")
|
||
DIFF_TOTAL=$((DIFF_INS + DIFF_DEL))
|
||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||
# Respect old opt-out
|
||
OLD_CFG=$(~/.claude/skills/gstack/bin/gstack-config get codex_reviews 2>/dev/null || true)
|
||
echo "DIFF_SIZE: $DIFF_TOTAL"
|
||
echo "OLD_CFG: \${OLD_CFG:-not_set}"
|
||
\`\`\`
|
||
|
||
If \`OLD_CFG\` is \`disabled\`: skip this step silently. Continue to the next step.
|
||
|
||
**User override:** If the user explicitly requested a specific tier (e.g., "run all passes", "paranoid review", "full adversarial", "do all 4 passes", "thorough review"), honor that request regardless of diff size. Jump to the matching tier section.
|
||
|
||
**Auto-select tier based on diff size:**
|
||
- **Small (< 50 lines changed):** Skip adversarial review entirely. Print: "Small diff ($DIFF_TOTAL lines) — adversarial review skipped." Continue to the next step.
|
||
- **Medium (50–199 lines changed):** Run Codex adversarial challenge (or Claude adversarial subagent if Codex unavailable). Jump to the "Medium tier" section.
|
||
- **Large (200+ lines changed):** Run all remaining passes — Codex structured review + Claude adversarial subagent + Codex adversarial. Jump to the "Large tier" section.
|
||
|
||
---
|
||
|
||
### Medium tier (50–199 lines)
|
||
|
||
Claude's structured review already ran. Now add a **cross-model adversarial challenge**.
|
||
|
||
**If Codex is available:** run the Codex adversarial challenge. **If Codex is NOT available:** fall back to the Claude adversarial subagent instead.
|
||
|
||
**Codex adversarial:**
|
||
|
||
\`\`\`bash
|
||
TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
|
||
codex exec "Review the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_ADV"
|
||
\`\`\`
|
||
|
||
Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. After the command completes, read stderr:
|
||
\`\`\`bash
|
||
cat "$TMPERR_ADV"
|
||
\`\`\`
|
||
|
||
Present the full output verbatim. This is informational — it never blocks shipping.
|
||
|
||
**Error handling:** All errors are non-blocking — adversarial review is a quality enhancement, not a prerequisite.
|
||
- **Auth failure:** If stderr contains "auth", "login", "unauthorized", or "API key": "Codex authentication failed. Run \\\`codex login\\\` to authenticate."
|
||
- **Timeout:** "Codex timed out after 5 minutes."
|
||
- **Empty response:** "Codex returned no response. Stderr: <paste relevant error>."
|
||
|
||
On any Codex error, fall back to the Claude adversarial subagent automatically.
|
||
|
||
**Claude adversarial subagent** (fallback when Codex unavailable or errored):
|
||
|
||
Dispatch via the Agent tool. The subagent has fresh context — no checklist bias from the structured review. This genuine independence catches things the primary reviewer is blind to.
|
||
|
||
Subagent prompt:
|
||
"Read the diff for this branch with \`git diff origin/<base>\`. Think like an attacker and a chaos engineer. Your job is to find ways this code will fail in production. Look for: edge cases, race conditions, security holes, resource leaks, failure modes, silent data corruption, logic errors that produce wrong results silently, error handling that swallows failures, and trust boundary violations. Be adversarial. Be thorough. No compliments — just the problems. For each finding, classify as FIXABLE (you know how to fix it) or INVESTIGATE (needs human judgment)."
|
||
|
||
Present findings under an \`ADVERSARIAL REVIEW (Claude subagent):\` header. **FIXABLE findings** flow into the same Fix-First pipeline as the structured review. **INVESTIGATE findings** are presented as informational.
|
||
|
||
If the subagent fails or times out: "Claude adversarial subagent unavailable. Continuing without adversarial review."
|
||
|
||
**Persist the review result:**
|
||
\`\`\`bash
|
||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"adversarial-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","tier":"medium","commit":"'"$(git rev-parse --short HEAD)"'"}'
|
||
\`\`\`
|
||
Substitute STATUS: "clean" if no findings, "issues_found" if findings exist. SOURCE: "codex" if Codex ran, "claude" if subagent ran. If both failed, do NOT persist.
|
||
|
||
**Cleanup:** Run \`rm -f "$TMPERR_ADV"\` after processing (if Codex was used).
|
||
|
||
---
|
||
|
||
### Large tier (200+ lines)
|
||
|
||
Claude's structured review already ran. Now run **all three remaining passes** for maximum coverage:
|
||
|
||
**1. Codex structured review (if available):**
|
||
\`\`\`bash
|
||
TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
|
||
codex review --base <base> -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR"
|
||
\`\`\`
|
||
|
||
Set the Bash tool's \`timeout\` parameter to \`300000\` (5 minutes). Do NOT use the \`timeout\` shell command — it doesn't exist on macOS. Present output under \`CODEX SAYS (code review):\` header.
|
||
Check for \`[P1]\` markers: found → \`GATE: FAIL\`, not found → \`GATE: PASS\`.
|
||
|
||
If GATE is FAIL, use AskUserQuestion:
|
||
\`\`\`
|
||
Codex found N critical issues in the diff.
|
||
|
||
A) Investigate and fix now (recommended)
|
||
B) Continue — review will still complete
|
||
\`\`\`
|
||
|
||
If A: address the findings${isShip ? '. After fixing, re-run tests (Step 3) since code has changed' : ''}. Re-run \`codex review\` to verify.
|
||
|
||
Read stderr for errors (same error handling as medium tier).
|
||
|
||
After stderr: \`rm -f "$TMPERR"\`
|
||
|
||
**2. Claude adversarial subagent:** Dispatch a subagent with the adversarial prompt (same prompt as medium tier). This always runs regardless of Codex availability.
|
||
|
||
**3. Codex adversarial challenge (if available):** Run \`codex exec\` with the adversarial prompt (same as medium tier).
|
||
|
||
If Codex is not available for steps 1 and 3, note to the user: "Codex CLI not found — large-diff review ran Claude structured + Claude adversarial (2 of 4 passes). Install Codex for full 4-pass coverage: \`npm install -g @openai/codex\`"
|
||
|
||
**Persist the review result AFTER all passes complete** (not after each sub-step):
|
||
\`\`\`bash
|
||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"adversarial-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","tier":"large","gate":"GATE","commit":"'"$(git rev-parse --short HEAD)"'"}'
|
||
\`\`\`
|
||
Substitute: STATUS = "clean" if no findings across ALL passes, "issues_found" if any pass found issues. SOURCE = "both" if Codex ran, "claude" if only Claude subagent ran. GATE = the Codex structured review gate result ("pass"/"fail"), or "informational" if Codex was unavailable. If all passes failed, do NOT persist.
|
||
|
||
---
|
||
|
||
### Cross-model synthesis (medium and large tiers)
|
||
|
||
After all passes complete, synthesize findings across all sources:
|
||
|
||
\`\`\`
|
||
ADVERSARIAL REVIEW SYNTHESIS (auto: TIER, N lines):
|
||
════════════════════════════════════════════════════════════
|
||
High confidence (found by multiple sources): [findings agreed on by >1 pass]
|
||
Unique to Claude structured review: [from earlier step]
|
||
Unique to Claude adversarial: [from subagent, if ran]
|
||
Unique to Codex: [from codex adversarial or code review, if ran]
|
||
Models used: Claude structured ✓ Claude adversarial ✓/✗ Codex ✓/✗
|
||
════════════════════════════════════════════════════════════
|
||
\`\`\`
|
||
|
||
High-confidence findings (agreed on by multiple sources) should be prioritized for fixes.
|
||
|
||
---`;
|
||
}
|
||
|
||
export function generateCodexPlanReview(ctx: TemplateContext): string {
|
||
// Codex host: strip entirely — Codex should never invoke itself
|
||
if (ctx.host === 'codex') return '';
|
||
|
||
return `## Outside Voice — Independent Plan Challenge (optional, recommended)
|
||
|
||
After all review sections are complete, offer an independent second opinion from a
|
||
different AI system. Two models agreeing on a plan is stronger signal than one model's
|
||
thorough review.
|
||
|
||
**Check tool availability:**
|
||
|
||
\`\`\`bash
|
||
which codex 2>/dev/null && echo "CODEX_AVAILABLE" || echo "CODEX_NOT_AVAILABLE"
|
||
\`\`\`
|
||
|
||
Use AskUserQuestion:
|
||
|
||
> "All review sections are complete. Want an outside voice? A different AI system can
|
||
> give a brutally honest, independent challenge of this plan — logical gaps, feasibility
|
||
> risks, and blind spots that are hard to catch from inside the review. Takes about 2
|
||
> minutes."
|
||
>
|
||
> RECOMMENDATION: Choose A — an independent second opinion catches structural blind
|
||
> spots. Two different AI models agreeing on a plan is stronger signal than one model's
|
||
> thorough review. Completeness: A=9/10, B=7/10.
|
||
|
||
Options:
|
||
- A) Get the outside voice (recommended)
|
||
- B) Skip — proceed to outputs
|
||
|
||
**If B:** Print "Skipping outside voice." and continue to the next section.
|
||
|
||
**If A:** Construct the plan review prompt. Read the plan file being reviewed (the file
|
||
the user pointed this review at, or the branch diff scope). If a CEO plan document
|
||
was written in Step 0D-POST, read that too — it contains the scope decisions and vision.
|
||
|
||
Construct this prompt (substitute the actual plan content — if plan content exceeds 30KB,
|
||
truncate to the first 30KB and note "Plan truncated for size"):
|
||
|
||
"You are a brutally honest technical reviewer examining a development plan that has
|
||
already been through a multi-section review. Your job is NOT to repeat that review.
|
||
Instead, find what it missed. Look for: logical gaps and unstated assumptions that
|
||
survived the review scrutiny, overcomplexity (is there a fundamentally simpler
|
||
approach the review was too deep in the weeds to see?), feasibility risks the review
|
||
took for granted, missing dependencies or sequencing issues, and strategic
|
||
miscalibration (is this the right thing to build at all?). Be direct. Be terse. No
|
||
compliments. Just the problems.
|
||
|
||
THE PLAN:
|
||
<plan content>"
|
||
|
||
**If CODEX_AVAILABLE:**
|
||
|
||
\`\`\`bash
|
||
TMPERR_PV=$(mktemp /tmp/codex-planreview-XXXXXXXX)
|
||
codex exec "<prompt>" -C "$(git rev-parse --show-toplevel)" -s read-only -c 'model_reasoning_effort="xhigh"' --enable web_search_cached 2>"$TMPERR_PV"
|
||
\`\`\`
|
||
|
||
Use a 5-minute timeout (\`timeout: 300000\`). After the command completes, read stderr:
|
||
\`\`\`bash
|
||
cat "$TMPERR_PV"
|
||
\`\`\`
|
||
|
||
Present the full output verbatim:
|
||
|
||
\`\`\`
|
||
CODEX SAYS (plan review — outside voice):
|
||
════════════════════════════════════════════════════════════
|
||
<full codex output, verbatim — do not truncate or summarize>
|
||
════════════════════════════════════════════════════════════
|
||
\`\`\`
|
||
|
||
**Error handling:** All errors are non-blocking — the outside voice is informational.
|
||
- Auth failure (stderr contains "auth", "login", "unauthorized"): "Codex auth failed. Run \\\`codex login\\\` to authenticate."
|
||
- Timeout: "Codex timed out after 5 minutes."
|
||
- Empty response: "Codex returned no response."
|
||
|
||
On any Codex error, fall back to the Claude adversarial subagent.
|
||
|
||
**If CODEX_NOT_AVAILABLE (or Codex errored):**
|
||
|
||
Dispatch via the Agent tool. The subagent has fresh context — genuine independence.
|
||
|
||
Subagent prompt: same plan review prompt as above.
|
||
|
||
Present findings under an \`OUTSIDE VOICE (Claude subagent):\` header.
|
||
|
||
If the subagent fails or times out: "Outside voice unavailable. Continuing to outputs."
|
||
|
||
**Cross-model tension:**
|
||
|
||
After presenting the outside voice findings, note any points where the outside voice
|
||
disagrees with the review findings from earlier sections. Flag these as:
|
||
|
||
\`\`\`
|
||
CROSS-MODEL TENSION:
|
||
[Topic]: Review said X. Outside voice says Y. [Your assessment of who's right.]
|
||
\`\`\`
|
||
|
||
For each substantive tension point, auto-propose as a TODO via AskUserQuestion:
|
||
|
||
> "Cross-model disagreement on [topic]. The review found [X] but the outside voice
|
||
> argues [Y]. Worth investigating further?"
|
||
|
||
Options:
|
||
- A) Add to TODOS.md
|
||
- B) Skip — not substantive
|
||
|
||
If no tension points exist, note: "No cross-model tension — both reviewers agree."
|
||
|
||
**Persist the result:**
|
||
\`\`\`bash
|
||
~/.claude/skills/gstack/bin/gstack-review-log '{"skill":"codex-plan-review","timestamp":"'"$(date -u +%Y-%m-%dT%H:%M:%SZ)"'","status":"STATUS","source":"SOURCE","commit":"'"$(git rev-parse --short HEAD)"'"}'
|
||
\`\`\`
|
||
|
||
Substitute: STATUS = "clean" if no findings, "issues_found" if findings exist.
|
||
SOURCE = "codex" if Codex ran, "claude" if subagent ran.
|
||
|
||
**Cleanup:** Run \`rm -f "$TMPERR_PV"\` after processing (if Codex was used).
|
||
|
||
---`;
|
||
}
|
||
|
||
// ─── Plan File Discovery (shared helper) ──────────────────────────────
|
||
|
||
function generatePlanFileDiscovery(): string {
|
||
return `### Plan File Discovery
|
||
|
||
1. **Conversation context (primary):** Check if there is an active plan file in this conversation. The host agent's system messages include plan file paths when in plan mode. If found, use it directly — this is the most reliable signal.
|
||
|
||
2. **Content-based search (fallback):** If no plan file is referenced in conversation context, search by content:
|
||
|
||
\`\`\`bash
|
||
BRANCH=$(git branch --show-current 2>/dev/null | tr '/' '-')
|
||
REPO=$(basename "$(git rev-parse --show-toplevel 2>/dev/null)")
|
||
# Search common plan file locations
|
||
for PLAN_DIR in "$HOME/.claude/plans" "$HOME/.codex/plans" ".gstack/plans"; do
|
||
[ -d "$PLAN_DIR" ] || continue
|
||
PLAN=$(ls -t "$PLAN_DIR"/*.md 2>/dev/null | xargs grep -l "$BRANCH" 2>/dev/null | head -1)
|
||
[ -z "$PLAN" ] && PLAN=$(ls -t "$PLAN_DIR"/*.md 2>/dev/null | xargs grep -l "$REPO" 2>/dev/null | head -1)
|
||
[ -z "$PLAN" ] && PLAN=$(find "$PLAN_DIR" -name '*.md' -mmin -1440 -maxdepth 1 2>/dev/null | xargs ls -t 2>/dev/null | head -1)
|
||
[ -n "$PLAN" ] && break
|
||
done
|
||
[ -n "$PLAN" ] && echo "PLAN_FILE: $PLAN" || echo "NO_PLAN_FILE"
|
||
\`\`\`
|
||
|
||
3. **Validation:** If a plan file was found via content-based search (not conversation context), read the first 20 lines and verify it is relevant to the current branch's work. If it appears to be from a different project or feature, treat as "no plan file found."
|
||
|
||
**Error handling:**
|
||
- No plan file found → skip with "No plan file detected — skipping."
|
||
- Plan file found but unreadable (permissions, encoding) → skip with "Plan file found but unreadable — skipping."`;
|
||
}
|
||
|
||
// ─── Plan Completion Audit ────────────────────────────────────────────
|
||
|
||
type PlanCompletionMode = 'ship' | 'review';
|
||
|
||
function generatePlanCompletionAuditInner(mode: PlanCompletionMode): string {
|
||
const sections: string[] = [];
|
||
|
||
// ── Plan file discovery (shared) ──
|
||
sections.push(generatePlanFileDiscovery());
|
||
|
||
// ── Item extraction ──
|
||
sections.push(`
|
||
### Actionable Item Extraction
|
||
|
||
Read the plan file. Extract every actionable item — anything that describes work to be done. Look for:
|
||
|
||
- **Checkbox items:** \`- [ ] ...\` or \`- [x] ...\`
|
||
- **Numbered steps** under implementation headings: "1. Create ...", "2. Add ...", "3. Modify ..."
|
||
- **Imperative statements:** "Add X to Y", "Create a Z service", "Modify the W controller"
|
||
- **File-level specifications:** "New file: path/to/file.ts", "Modify path/to/existing.rb"
|
||
- **Test requirements:** "Test that X", "Add test for Y", "Verify Z"
|
||
- **Data model changes:** "Add column X to table Y", "Create migration for Z"
|
||
|
||
**Ignore:**
|
||
- Context/Background sections (\`## Context\`, \`## Background\`, \`## Problem\`)
|
||
- Questions and open items (marked with ?, "TBD", "TODO: decide")
|
||
- Review report sections (\`## GSTACK REVIEW REPORT\`)
|
||
- Explicitly deferred items ("Future:", "Out of scope:", "NOT in scope:", "P2:", "P3:", "P4:")
|
||
- CEO Review Decisions sections (these record choices, not work items)
|
||
|
||
**Cap:** Extract at most 50 items. If the plan has more, note: "Showing top 50 of N plan items — full list in plan file."
|
||
|
||
**No items found:** If the plan contains no extractable actionable items, skip with: "Plan file contains no actionable items — skipping completion audit."
|
||
|
||
For each item, note:
|
||
- The item text (verbatim or concise summary)
|
||
- Its category: CODE | TEST | MIGRATION | CONFIG | DOCS`);
|
||
|
||
// ── Cross-reference against diff ──
|
||
sections.push(`
|
||
### Cross-Reference Against Diff
|
||
|
||
Run \`git diff origin/<base>...HEAD\` and \`git log origin/<base>..HEAD --oneline\` to understand what was implemented.
|
||
|
||
For each extracted plan item, check the diff and classify:
|
||
|
||
- **DONE** — Clear evidence in the diff that this item was implemented. Cite the specific file(s) changed.
|
||
- **PARTIAL** — Some work toward this item exists in the diff but it's incomplete (e.g., model created but controller missing, function exists but edge cases not handled).
|
||
- **NOT DONE** — No evidence in the diff that this item was addressed.
|
||
- **CHANGED** — The item was implemented using a different approach than the plan described, but the same goal is achieved. Note the difference.
|
||
|
||
**Be conservative with DONE** — require clear evidence in the diff. A file being touched is not enough; the specific functionality described must be present.
|
||
**Be generous with CHANGED** — if the goal is met by different means, that counts as addressed.`);
|
||
|
||
// ── Output format ──
|
||
sections.push(`
|
||
### Output Format
|
||
|
||
\`\`\`
|
||
PLAN COMPLETION AUDIT
|
||
═══════════════════════════════
|
||
Plan: {plan file path}
|
||
|
||
## Implementation Items
|
||
[DONE] Create UserService — src/services/user_service.rb (+142 lines)
|
||
[PARTIAL] Add validation — model validates but missing controller checks
|
||
[NOT DONE] Add caching layer — no cache-related changes in diff
|
||
[CHANGED] "Redis queue" → implemented with Sidekiq instead
|
||
|
||
## Test Items
|
||
[DONE] Unit tests for UserService — test/services/user_service_test.rb
|
||
[NOT DONE] E2E test for signup flow
|
||
|
||
## Migration Items
|
||
[DONE] Create users table — db/migrate/20240315_create_users.rb
|
||
|
||
─────────────────────────────────
|
||
COMPLETION: 4/7 DONE, 1 PARTIAL, 1 NOT DONE, 1 CHANGED
|
||
─────────────────────────────────
|
||
\`\`\``);
|
||
|
||
// ── Gate logic (mode-specific) ──
|
||
if (mode === 'ship') {
|
||
sections.push(`
|
||
### Gate Logic
|
||
|
||
After producing the completion checklist:
|
||
|
||
- **All DONE or CHANGED:** Pass. "Plan completion: PASS — all items addressed." Continue.
|
||
- **Only PARTIAL items (no NOT DONE):** Continue with a note in the PR body. Not blocking.
|
||
- **Any NOT DONE items:** Use AskUserQuestion:
|
||
- Show the completion checklist above
|
||
- "{N} items from the plan are NOT DONE. These were part of the original plan but are missing from the implementation."
|
||
- RECOMMENDATION: depends on item count and severity. If 1-2 minor items (docs, config), recommend B. If core functionality is missing, recommend A.
|
||
- Options:
|
||
A) Stop — implement the missing items before shipping
|
||
B) Ship anyway — defer these to a follow-up (will create P1 TODOs in Step 5.5)
|
||
C) These items were intentionally dropped — remove from scope
|
||
- If A: STOP. List the missing items for the user to implement.
|
||
- If B: Continue. For each NOT DONE item, create a P1 TODO in Step 5.5 with "Deferred from plan: {plan file path}".
|
||
- If C: Continue. Note in PR body: "Plan items intentionally dropped: {list}."
|
||
|
||
**No plan file found:** Skip entirely. "No plan file detected — skipping plan completion audit."
|
||
|
||
**Include in PR body (Step 8):** Add a \`## Plan Completion\` section with the checklist summary.`);
|
||
} else {
|
||
// review mode
|
||
sections.push(`
|
||
### Integration with Scope Drift Detection
|
||
|
||
The plan completion results augment the existing Scope Drift Detection. If a plan file is found:
|
||
|
||
- **NOT DONE items** become additional evidence for **MISSING REQUIREMENTS** in the scope drift report.
|
||
- **Items in the diff that don't match any plan item** become evidence for **SCOPE CREEP** detection.
|
||
|
||
This is **INFORMATIONAL** — does not block the review (consistent with existing scope drift behavior).
|
||
|
||
Update the scope drift output to include plan file context:
|
||
|
||
\`\`\`
|
||
Scope Check: [CLEAN / DRIFT DETECTED / REQUIREMENTS MISSING]
|
||
Intent: <from plan file — 1-line summary>
|
||
Plan: <plan file path>
|
||
Delivered: <1-line summary of what the diff actually does>
|
||
Plan items: N DONE, M PARTIAL, K NOT DONE
|
||
[If NOT DONE: list each missing item]
|
||
[If scope creep: list each out-of-scope change not in the plan]
|
||
\`\`\`
|
||
|
||
**No plan file found:** Fall back to existing scope drift behavior (check TODOS.md and PR description only).`);
|
||
}
|
||
|
||
return sections.join('\n');
|
||
}
|
||
|
||
export function generatePlanCompletionAuditShip(_ctx: TemplateContext): string {
|
||
return generatePlanCompletionAuditInner('ship');
|
||
}
|
||
|
||
export function generatePlanCompletionAuditReview(_ctx: TemplateContext): string {
|
||
return generatePlanCompletionAuditInner('review');
|
||
}
|
||
|
||
// ─── Plan Verification Execution ──────────────────────────────────────
|
||
|
||
export function generatePlanVerificationExec(_ctx: TemplateContext): string {
|
||
return `## Step 3.47: Plan Verification
|
||
|
||
Automatically verify the plan's testing/verification steps using the \`/qa-only\` skill.
|
||
|
||
### 1. Check for verification section
|
||
|
||
Using the plan file already discovered in Step 3.45, look for a verification section. Match any of these headings: \`## Verification\`, \`## Test plan\`, \`## Testing\`, \`## How to test\`, \`## Manual testing\`, or any section with verification-flavored items (URLs to visit, things to check visually, interactions to test).
|
||
|
||
**If no verification section found:** Skip with "No verification steps found in plan — skipping auto-verification."
|
||
**If no plan file was found in Step 3.45:** Skip (already handled).
|
||
|
||
### 2. Check for running dev server
|
||
|
||
Before invoking browse-based verification, check if a dev server is reachable:
|
||
|
||
\`\`\`bash
|
||
curl -s -o /dev/null -w '%{http_code}' http://localhost:3000 2>/dev/null || \\
|
||
curl -s -o /dev/null -w '%{http_code}' http://localhost:8080 2>/dev/null || \\
|
||
curl -s -o /dev/null -w '%{http_code}' http://localhost:5173 2>/dev/null || \\
|
||
curl -s -o /dev/null -w '%{http_code}' http://localhost:4000 2>/dev/null || echo "NO_SERVER"
|
||
\`\`\`
|
||
|
||
**If NO_SERVER:** Skip with "No dev server detected — skipping plan verification. Run /qa separately after deploying."
|
||
|
||
### 3. Invoke /qa-only inline
|
||
|
||
Read the \`/qa-only\` skill from disk:
|
||
|
||
\`\`\`bash
|
||
cat \${CLAUDE_SKILL_DIR}/../qa-only/SKILL.md
|
||
\`\`\`
|
||
|
||
**If unreadable:** Skip with "Could not load /qa-only — skipping plan verification."
|
||
|
||
Follow the /qa-only workflow with these modifications:
|
||
- **Skip the preamble** (already handled by /ship)
|
||
- **Use the plan's verification section as the primary test input** — treat each verification item as a test case
|
||
- **Use the detected dev server URL** as the base URL
|
||
- **Skip the fix loop** — this is report-only verification during /ship
|
||
- **Cap at the verification items from the plan** — do not expand into general site QA
|
||
|
||
### 4. Gate logic
|
||
|
||
- **All verification items PASS:** Continue silently. "Plan verification: PASS."
|
||
- **Any FAIL:** Use AskUserQuestion:
|
||
- Show the failures with screenshot evidence
|
||
- RECOMMENDATION: Choose A if failures indicate broken functionality. Choose B if cosmetic only.
|
||
- Options:
|
||
A) Fix the failures before shipping (recommended for functional issues)
|
||
B) Ship anyway — known issues (acceptable for cosmetic issues)
|
||
- **No verification section / no server / unreadable skill:** Skip (non-blocking).
|
||
|
||
### 5. Include in PR body
|
||
|
||
Add a \`## Verification Results\` section to the PR body (Step 8):
|
||
- If verification ran: summary of results (N PASS, M FAIL, K SKIPPED)
|
||
- If skipped: reason for skipping (no plan, no server, no verification section)`;
|
||
}
|